1

I am trying to parse an HTML file through my perl script. I am using a module called HTML::TreeBuilder.

Here is what I have so far:

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new; 

$tree->parse_file("sample.html");

foreach my $anchor ($tree->find("p")) {

  print $anchor->as_text, "\n";

}

It is working fine. I am getting everything inside < p> tag.

sample.html file:

< td>Release Version:< /td>< td> 5134< /td>< /tr>

< tr class="d0">< td>Executed By:< /td>< td>spoddar< /td>< /tr>

< tr class="d1">< td> Duration:< /td>< td>0 Hrs 0 Mins 0 Secs < /td>< /tr>

< tr class="d0">< td>#TCs Executed:< /td>< td>1< /td>< /tr>

I want 5134 to be printed when i pass Release Version. In the same way I want spoddar to be printed when i pass Execute By. These are not HTML tags. But is there any way to obtain this?

1
  • is there any rule that you need to apply when deciding what to print? should you print 0 Hrs 0 Mins 0 Secs also? Commented Nov 6, 2012 at 6:21

2 Answers 2

3

The most straightforward thing to do is to filter the tags you want and look through the text. The following approach assumes the format you have in the sample, with a 2-column table.

sub get_value {
    my $key = shift;

    foreach my $tr ($tree->find('tr')) {
        my @td = $tree->find('td');
        return $td[1]->as_text if $td[0]->as_text eq $key;
    }
    return;
}

print get_value('Release Version:');
Sign up to request clarification or add additional context in comments.

Comments

2

HTML::Parser and HTML::TokeParser may also be of use to you.


UNTESTED

use HTML::TokeParser;

my $p = HTML::TokeParser->new('sample.html');

while (my $token = $p->get_token) {
    my $tokenType = shift @{$token}; # 'S' is start tag 'E' end tag etc. (see doc)
    if ($tokenType =~ /S/) {
        my ($tag, $attr, $attrseq, $rawtxt) = @{$token};
        my $class = $attr->{class}; #get tag class
        if ($class =~ /d0/ && $tag =~ /tr/) {
            print "$p->get_trimmed_text('/tr')\n";
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.