I am trying to implement a facebook scraper, to get insights about the reactions on feed posts of facebook-pages. I've noticed that the results (posts) of the actual day and last days are right, but the further it goes in the past, the more feed posts get skipped, and the count of the returned results is very low.
Why is Graph skipping many posts? Sometimes it skips even complete months!
Here is the code I'm using:
import json
import datetime
import csv
import time
import urllib.request
import urllib.error
import requests
import numpy as np
import matplotlib.pyplot as plt
import json
from urllib.parse import urlencode
import pandas as pd
page_id="nytimes"
token="my_User_Token_Here" #using a user token got from [https://developers.facebook.com/tools/explorer/][1]
url="https://graph.facebook.com/v2.12/"+page_id+"/posts/?fields=id,created_time,message,shares.summary(true).limit(0),comments.summary(true).limit(0),likes.summary(true),reactions.type(LOVE).limit(0).summary(total_count).as(Love),reactions.type(WOW).limit(0).summary(total_count).as(Wow),reactions.type(HAHA).limit(0).summary(total_count).as(Haha),reactions.type(SAD).limit(0).summary(1).as(Sad),reactions.type(ANGRY).limit(0).summary(1).as(Angry)&access_token="+token+"&limit=100"
posts = []
found = False
try:
while (True):
print(url)
facebook_connection = urlopen(url)
data = facebook_connection.read().decode('utf8')
json_object = json.loads(data)
allposts=json_object["data"]
allposts = np.asarray(allposts)
created = '2018-03-01'
for i in range(0,100,1):
if (pd.to_datetime(allposts[i]['created_time']) > pd.to_datetime(created)):
#print(allposts[i]['created_time'])
posts.append(allposts[i])
else:
print(i, "%i fucking here!")
posts.append(allposts[i])
found = True
break;
if (i == 99):
#print('here is: ' + i)
url = json_object["paging"]["next"]
if (found == True):
break;
df=pd.DataFrame(posts)
except Exception as ex:
print (ex)
This is a reported bug. Since it was reported, the rules have changed with API v2.12 and only the top 600 posts per year can be reached. This is obviously bad news for developers and researchers.