How to scrape a Facebook page posts using jsoup?

63 views Asked by At

I'm trying to scrape in Spring boot with the Jsoup library. I have an empty json as a result of the method, I'm out of ideas.

@GetMapping("/test-json")
public String scrapeFacebookPageJson() throws IOException {
    try {
        String facebookPageUrl = "https://www.facebook.com/abcd"; // your URL 

        Document doc = Jsoup.connect(facebookPageUrl).get();
        Elements posts = doc.select(".post"); // 

        List<String> results = new ArrayList<>();
        for (Element post : posts) {
            results.add(post.text());
        }

        ObjectMapper objectMapper = new ObjectMapper();
        return objectMapper.writeValueAsString(results);
    } catch (IOException e) {
        e.printStackTrace(); // ou logue a exceção
        return "Erro ";
    }
}
1

There are 1 answers

0
Janez Kuhar On

Your idea was to look for elements with class post and fetch their text content. Seems reasonable enough at first sight.

Take a look at a sample Facebook page in the DOM explorer using DevTools (F12). All elements inside body have obfuscated class names (e.g. x78zum5). So your strategy of querying elements that contain post class won't work.

Not to mention that when you first load the page, you only load the bare bones HTML with a GDPR cookies consent dialog. It doesn't contain much Facebook page content since that is loaded subsequently using JavaScript. You'd have to look into dynamic web page scraping, which is something that jsoup can't do.

If you want to programmatically obtain posts from a FB page, I think your best bet is their GraphQL API. This documentation page in particular might be of interest to you: https://developers.facebook.com/docs/graph-api/reference/v19.0/page/feed