I have my script to monitor some Facebook pages. Since Facebook API banned page public access permission on 4-SEP-2019. I need to parse the content by xpath method.
Each Facebook post is wrap by div[contains(@class,"userContentWrapper")]
. I would like to loop posts one by one to find a desired data.
I don't known why $message = $post->findvalue('//div[@data-testid="post_message"]//p');
show all text in <p>
of every posts.
use LWP::UserAgent;
$ua = new LWP::UserAgent;
$request = new HTTP::Request;
$request->url('https://www.facebook.com/pg/FIFA/posts/');
$request->method('GET');
$request->header('User-Agent' => 'Mozilla/5.0 Chrome/71.0.3578.98 Safari/537.36');
$response = $ua->request($request);
open(HTM, ">zzz.htm");
print HTM $response->content;
close(HTM);
use HTML::TreeBuilder::XPath;
$tree = HTML::TreeBuilder::XPath->new_from_content($response->content);
$posts = $tree->findnodes('//div[contains(@class,"userContentWrapper")]');
for my $post (@{$posts})
{
$id = $post->findnodes('//div[@data-testid="story-subtitle"]/@id');
$id = $id->[0]->getValue;
print "id = $id\n\n";
$object_id = $post->findnodes('//div[@data-testid="story-subtitle"]//a/@href');
$object_id = 'https://www.facebook.com' . $object_id->[0]->getValue;
print "object_id = $object_id\n\n";
$message = $post->findvalue('//div[@data-testid="post_message"]//p');
# $message = $message->[0]->getValue;
print "$message\n\n";
$ajaxify = $post->findnodes('//div[@class="mtm"]//a/@ajaxify');
$ajaxify = $ajaxify->[0]->getValue;
print "ajaxify = $ajaxify\n\n";
$ploi = $post->findnodes('//div[@class="mtm"]//a/@data-ploi');
$ploi = $ploi->[0]->getValue;
print "ploi = $ploi\n\n";
# $plsi = $post->findnodes('//div[@class="mtm"]//a/@data-plsi');
# $plsi = $plsi->[0]->getValue;
# print "plsi = $plsi\n\n";
$href = $post->findnodes('//div[@class="mtm"]//a/@href');
$href = 'https://www.facebook.com' . $href->[0]->getValue;
print "href = $href\n\n";
print "---------------------------------------------------------\n\n";
}
The post is unclear and it seems to contain multiple questions. This needs to be fixed, but in the mean time, I'll address the following:
From HTML::TreeBuilder::XPath,
From Tree::XPathEngine::NodeSet,
So,
or
Any other questions should be posted as separate Questions.