I'm trying to delete all table elements from several HTML files.
The following code runs perfectly on a single file, but when trying to automate the process it returns the error
can't call method "look_down" on an undefined value
Do you have any solution please?
Here is the code:
use strict;
use warnings;
use Path::Class;
use HTML::TreeBuilder;
opendir( DH, "C:/myfiles" );
my @files = readdir(DH);
closedir(DH);
foreach my $file ( @files ) {
print("Analyzing file $file\n");
my $tree = HTML::TreeBuilder->new->parse_file("C:/myfiles/$file");
foreach my $e ( $tree->look_down( _tag => "table" ) ) {
$e->delete();
}
use HTML::FormatText;
my $formatter = HTML::FormatText->new;
my $parsed = $formatter->format($tree);
print $parsed;
}
The problem is that you're feeding
HTML::TreeBuilder
all sorts of junk in addition to the HTML files that you intend. As well as any files in the opened directory,readdir
returns the names of all subdirectories, as well as the pseudo-directories.
and..
. You should have seen this in the output from yourprint
statementOne way to fix this is to check that each value in the loop is a file before processing it. Something like this
But it would be much cleaner to use a call to
glob
. That way you will only get the files that you want, and there is also no need to build the full path to each fileThat would look something like this. You would have to adjust the glob pattern if your files don't all end with
.html
Strictly speaking, a directory name may also look like
*.html
, and if you don't trust your file structure you should also test that each result ofglob
is a file before processing it. But in normal situations where you know what's in the directory you're processing that isn't necessary