How to loop the result from findnodes() with HTML::TreeBuilder::XPath

317 views Asked by At

I have my script to monitor some Facebook pages. Since Facebook API banned page public access permission on 4-SEP-2019. I need to parse the content by xpath method.

Each Facebook post is wrap by div[contains(@class,"userContentWrapper")]. I would like to loop posts one by one to find a desired data.

I don't known why $message = $post->findvalue('//div[@data-testid="post_message"]//p'); show all text in <p> of every posts.

use LWP::UserAgent;
$ua       = new LWP::UserAgent;
$request  = new HTTP::Request;
$request->url('https://www.facebook.com/pg/FIFA/posts/');
$request->method('GET');
$request->header('User-Agent' => 'Mozilla/5.0 Chrome/71.0.3578.98 Safari/537.36');
$response = $ua->request($request);


open(HTM, ">zzz.htm");
print HTM $response->content;
close(HTM);


use HTML::TreeBuilder::XPath;
$tree = HTML::TreeBuilder::XPath->new_from_content($response->content);


$posts = $tree->findnodes('//div[contains(@class,"userContentWrapper")]');


for my $post (@{$posts})
{
    $id =  $post->findnodes('//div[@data-testid="story-subtitle"]/@id');
    $id =  $id->[0]->getValue;
    print "id = $id\n\n";

    $object_id =  $post->findnodes('//div[@data-testid="story-subtitle"]//a/@href');
    $object_id =  'https://www.facebook.com' . $object_id->[0]->getValue;
    print "object_id = $object_id\n\n";

    $message = $post->findvalue('//div[@data-testid="post_message"]//p');
#   $message = $message->[0]->getValue;
    print "$message\n\n";

    $ajaxify =  $post->findnodes('//div[@class="mtm"]//a/@ajaxify');
    $ajaxify =  $ajaxify->[0]->getValue;
    print "ajaxify = $ajaxify\n\n";

    $ploi = $post->findnodes('//div[@class="mtm"]//a/@data-ploi');
    $ploi = $ploi->[0]->getValue;
    print "ploi = $ploi\n\n";

#   $plsi = $post->findnodes('//div[@class="mtm"]//a/@data-plsi');
#   $plsi = $plsi->[0]->getValue;
#   print "plsi = $plsi\n\n";

    $href =  $post->findnodes('//div[@class="mtm"]//a/@href');
    $href =  'https://www.facebook.com' . $href->[0]->getValue;
    print "href = $href\n\n";

    print "---------------------------------------------------------\n\n";
}
1

There are 1 answers

2
ikegami On

The post is unclear and it seems to contain multiple questions. This needs to be fixed, but in the mean time, I'll address the following:

I would like to loop posts one by one to find a desired data.


From HTML::TreeBuilder::XPath,

findnodes ($path)

Returns a list of nodes found by $path. In scalar context returns an Tree::XPathEngine::NodeSet object.

From Tree::XPathEngine::NodeSet,

get_nodelist()

Returns a list of nodes. See Tree::XPathEngine::XMLParser for the format of the nodes.

So,

my @posts = $tree->findnodes('...');
for my $post (@posts) { ... }

or

my $posts = $tree->findnodes('...');
for my $post ($posts->get_nodelist()) { ... }

Any other questions should be posted as separate Questions.