I am using HTML::TreeBuilder to extract contents of a url by using tree->lookdown and then extracting text part from the string returned in lookdown method. My problem here is when I read that text and write it into a file its showing as junk. I am not able to make a progress regarding this.
My Sample Code:
use HTML::TreeBuilder;
use HTML::Element;
use utf8;
$url = $ARGV[0];
$page = `wget -qO - "$url"| tee data.txt`;
#print "iam $page\n";
my $tree = HTML::TreeBuilder->new( );
$tree->parse_file('data.txt');
my @story = $tree->look_down(
_tag => 'div',
class => 'storydescription'
);
my @title = $tree->look_down(
_tag => 'title'
);
open(OUT,">","story.txt") or die"Cannot open story.txt:$!\n";
binmode(OUT,":utf8");
foreach my $story(@story) {
print OUT $story->as_text;
}
close(OUT);
I have tried binmode for the output file handle but it was of no use and the text other than Unicode such as ascii characters prints properly into file.
It's documented in HTML::TreeBuilder: