Parse HTML file using Perl

Question

I am trying extract the table inside this html file using perl.

I have tried this:

my $te = HTML::TableExtract->new();
$te->parse_file($g_log);
print "=====TE: $te ======\n";

Output is :

HTML:TableExtract = Hash(0x266f5f)

I tried iterating through $te and nothing found. Can anyone guide what to do next. I am new to this.

This is the HTML file:

    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:math="http://exslt.org/math"
          xmlns:testng="http://testng.org">
       <head xmlns="">
          <title>TestNG Results</title>
          <meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
          <meta http-equiv="pragma" content="no-cache"></meta>
          <meta http-equiv="cache-control" content="max-age=0"></meta>
          <meta http-equiv="cache-control" content="no-cache"></meta>
          <meta http-equiv="cache-control" content="no-store"></meta>
          <LINK rel="stylesheet" href="style.css"></LINK>
          <script type="text/javascript" src="main.js"></script>
       </head>
       <body>
          <h2>Test suites overview</h2>


<table width="100%">
                 <tr>
                    <td align="center" id="chart-container"><script type="text/javascript">
                                            renderSvgEmbedTag(600, 200);
                                        </script></td>
                 </tr>
              </table>

   </body>
  </html>

Chankey Pathak · Accepted Answer · 2015-09-07 06:05:23Z

2

#!/usr/bin/perl
#use strict;
use warnings;
use HTML::TableExtract;
my $filename = "testfile.html";
my $te = HTML::TableExtract->new();
$te->parse_file($filename);
foreach $ts ($te->tables) {
   print "Table found at ", join(',', $ts->coords), ":\n";
   foreach $row ($ts->rows) {
      print "   ", join(',', @$row), "\n";
   }
}

Note that HTML::TableExtract can also be invoked in 'tree' mode where the resulting HTML and extracted tables are encoded in HTML::Element tree structures.

use HTML::TableExtract 'tree';

answered Sep 7, 2015 at 6:05

Chankey Pathak

21.8k12 gold badges88 silver badges137 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Chankey Pathak Over a year ago

I tried the written code and it worked. Make sure that the path to HTML file is correct.

User Over a year ago

Plus 1 Good. @ChankeyPathak

Dan Walmsley · Accepted Answer · 2015-09-07 06:06:01Z

Not sure what you are wanting to get out of the table. But I would strongly recomend using data dumper to look inside the hash.

#!/usr/bin/perl

use strict;
use warnings;
use HTML::TableExtract;
use Data::Dumper;

my $html = <<'EOT';
<html xmlns="http://w...content-available-to-author-only...3.org/1999/xhtml" xmlns:math="http://e...content-available-to-author-only...t.org/math"
          xmlns:testng="http://t...content-available-to-author-only...g.org">
       <head xmlns="">
          <title>TestNG Results</title>
          <meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
          <meta http-equiv="pragma" content="no-cache"></meta>
          <meta http-equiv="cache-control" content="max-age=0"></meta>
          <meta http-equiv="cache-control" content="no-cache"></meta>
          <meta http-equiv="cache-control" content="no-store"></meta>
          <LINK rel="stylesheet" href="style.css"></LINK>
          <script type="text/javascript" src="main.js"></script>
       </head>
       <body>
          <h2>Test suites overview</h2>


<table width="100%">
                 <tr>
                    <td align="center" id="chart-container"><script type="text/javascript">
                                            renderSvgEmbedTag(600, 200);
                                        </script></td>
                 </tr>
              </table>

    </table>
   </body>
  </html>
EOT

my $te = HTML::TableExtract->new();
$te->parse($html);

print Dumper($te);

Thanks! I saw the data through the dumper and I see this: '_tables' => {}

Collectives™ on Stack Overflow

Parse HTML file using Perl

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related