0

I am trying extract the table inside this html file using perl.

I have tried this:

my $te = HTML::TableExtract->new();
$te->parse_file($g_log);
print "=====TE: $te ======\n";

Output is :

HTML:TableExtract = Hash(0x266f5f)

I tried iterating through $te and nothing found. Can anyone guide what to do next. I am new to this.

This is the HTML file:

    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:math="http://exslt.org/math"
          xmlns:testng="http://testng.org">
       <head xmlns="">
          <title>TestNG Results</title>
          <meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
          <meta http-equiv="pragma" content="no-cache"></meta>
          <meta http-equiv="cache-control" content="max-age=0"></meta>
          <meta http-equiv="cache-control" content="no-cache"></meta>
          <meta http-equiv="cache-control" content="no-store"></meta>
          <LINK rel="stylesheet" href="style.css"></LINK>
          <script type="text/javascript" src="main.js"></script>
       </head>
       <body>
          <h2>Test suites overview</h2>


<table width="100%">
                 <tr>
                    <td align="center" id="chart-container"><script type="text/javascript">
                                            renderSvgEmbedTag(600, 200);
                                        </script></td>
                 </tr>
              </table>

   </body>
  </html>

2 Answers 2

2
#!/usr/bin/perl
#use strict;
use warnings;
use HTML::TableExtract;
my $filename = "testfile.html";
my $te = HTML::TableExtract->new();
$te->parse_file($filename);
foreach $ts ($te->tables) {
   print "Table found at ", join(',', $ts->coords), ":\n";
   foreach $row ($ts->rows) {
      print "   ", join(',', @$row), "\n";
   }
}

Note that HTML::TableExtract can also be invoked in 'tree' mode where the resulting HTML and extracted tables are encoded in HTML::Element tree structures.

use HTML::TableExtract 'tree';

Sign up to request clarification or add additional context in comments.

2 Comments

I tried the written code and it worked. Make sure that the path to HTML file is correct.
Plus 1 Good. @ChankeyPathak
1

Not sure what you are wanting to get out of the table. But I would strongly recomend using data dumper to look inside the hash.

#!/usr/bin/perl

use strict;
use warnings;
use HTML::TableExtract;
use Data::Dumper;

my $html = <<'EOT';
<html xmlns="http://w...content-available-to-author-only...3.org/1999/xhtml" xmlns:math="http://e...content-available-to-author-only...t.org/math"
          xmlns:testng="http://t...content-available-to-author-only...g.org">
       <head xmlns="">
          <title>TestNG Results</title>
          <meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
          <meta http-equiv="pragma" content="no-cache"></meta>
          <meta http-equiv="cache-control" content="max-age=0"></meta>
          <meta http-equiv="cache-control" content="no-cache"></meta>
          <meta http-equiv="cache-control" content="no-store"></meta>
          <LINK rel="stylesheet" href="style.css"></LINK>
          <script type="text/javascript" src="main.js"></script>
       </head>
       <body>
          <h2>Test suites overview</h2>


<table width="100%">
                 <tr>
                    <td align="center" id="chart-container"><script type="text/javascript">
                                            renderSvgEmbedTag(600, 200);
                                        </script></td>
                 </tr>
              </table>

    </table>
   </body>
  </html>
EOT

my $te = HTML::TableExtract->new();
$te->parse($html);

print Dumper($te);

1 Comment

Thanks! I saw the data through the dumper and I see this: '_tables' => {}

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.