Trying to parse using Mojo::DOM, not getting the tag right

134 views Asked by At

I am using $ua to grab some HTML from my $url = "http://finance.yahoo.com/quote/MSFT?p=MSFT";

I am able to grab the HTML content from the URL just fine. Then I am using Mojo::DOM to sub parse, that is the correct step, right? I want to further strip the A HREF from the Mojo $url get() html content...This is what I have:

my $ua = Mojo::UserAgent->new( max_redirects => 5, timeout => $timeout );
my $dom = Mojo::DOM->new;

my $content = $ua->get($url)->res->dom->at('div#quoteNewsStream-0-Stream')->content;
my $content2 = $content->$dom->find('a href#');
1

There are 1 answers

1
Miller On

Just use the Mojo::DOM that is returned by Mojo::UserAgent:

#!/usr/bin/env perl

use strict;
use warnings;
use v5.10;

use Mojo::UserAgent;

my $url = "http://finance.yahoo.com/quote/MSFT?p=MSFT";

my $dom = Mojo::UserAgent->new->get($url)->res->dom;

my $stream = $dom->at('div#quoteNewsStream-0-Stream');

for my $href ( $stream->find('a')->each ) {
    say $href->{href};
}

Outputs:

/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/donald-trump-tech-summit-at-trump-tower-202517070.html
/video/microsoft-surface-sales-surge-disappointment-181934121.html
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/microsoft-surface-sales-surge-on-disappointment-with-macbook-pro-163819168.html
/news/microsoft-surface-sales-surge-on-disappointment-with-macbook-pro-163819168.html
/m/7f581deb-0089-341a-b637-e1e979e9e210/ss_5-point-checklist-for.html

For an 8 minute tutorial on using these tools, check out Mojocast Episode 5