Why Graph API skips feed posts?

121 views Asked by At

I am trying to implement a facebook scraper, to get insights about the reactions on feed posts of facebook-pages. I've noticed that the results (posts) of the actual day and last days are right, but the further it goes in the past, the more feed posts get skipped, and the count of the returned results is very low.

Why is Graph skipping many posts? Sometimes it skips even complete months!

Here is the code I'm using:

import json
import datetime
import csv
import time
import urllib.request  
import urllib.error
import requests
import numpy as np
import matplotlib.pyplot as plt
import json
from urllib.parse import urlencode
import pandas as pd

page_id="nytimes"

token="my_User_Token_Here" #using a user token got from [https://developers.facebook.com/tools/explorer/][1]

url="https://graph.facebook.com/v2.12/"+page_id+"/posts/?fields=id,created_time,message,shares.summary(true).limit(0),comments.summary(true).limit(0),likes.summary(true),reactions.type(LOVE).limit(0).summary(total_count).as(Love),reactions.type(WOW).limit(0).summary(total_count).as(Wow),reactions.type(HAHA).limit(0).summary(total_count).as(Haha),reactions.type(SAD).limit(0).summary(1).as(Sad),reactions.type(ANGRY).limit(0).summary(1).as(Angry)&access_token="+token+"&limit=100"

posts = []
found = False

try:
    while (True):
        print(url)
        facebook_connection = urlopen(url)
        data = facebook_connection.read().decode('utf8')
        json_object = json.loads(data)
        allposts=json_object["data"]
        allposts = np.asarray(allposts)
        created = '2018-03-01' 
        for i in range(0,100,1):
            if (pd.to_datetime(allposts[i]['created_time']) > pd.to_datetime(created)):
                #print(allposts[i]['created_time'])
                posts.append(allposts[i])
            else:
                print(i,  "%i fucking here!")
                posts.append(allposts[i])
                found = True
                break;
            if (i == 99):
                #print('here is: ' + i)
                url = json_object["paging"]["next"]
        if (found == True):
            break; 


    df=pd.DataFrame(posts)


except Exception as ex:
    print (ex)
1

There are 1 answers

2
frodik On

This is a reported bug. Since it was reported, the rules have changed with API v2.12 and only the top 600 posts per year can be reached. This is obviously bad news for developers and researchers.