I'm trying to save the whole web page on my system as a .html file and then parse that file, to find some tags and use them.
I'm able to save/parse http://<url>, but not able to save/parse https://<url>. I'm using Perl.
I'm using the following code to save HTTP and it works fine but doesn't work for HTTPS:
use strict;
use warnings;
use LWP::Simple qw($ua get);
use LWP::UserAgent;
use LWP::Protocol::https;
use HTTP::Cookies;
sub main
{
my $ua = LWP::UserAgent->new();
my $cookies = HTTP::Cookies->new(
file => "cookies.txt",
autosave => 1,
);
$ua->cookie_jar($cookies);
$ua->agent("Google Chrome/30");
#$ua->ssl_opts( SSL_ca_file => 'cert.pfx' );
$ua->proxy('http','http://proxy.com');
my $response = $ua->get('http://google.com');
#$ua->credentials($response, "", "usrname", "password");
unless($response->is_success) {
print "Error: " . $response->status_line;
}
# Let's save the output.
my $save = "save.html";
unless(open SAVE, '>' . $save) {
die "nCannot create save file '$save'n";
}
# Without this line, we may get a
# 'wide characters in print' warning.
binmode(SAVE, ":utf8");
print SAVE $response->decoded_content;
close SAVE;
print "Saved ",
length($response->decoded_content),
" bytes of data to '$save'.";
}
main();
Is it possible to parse an HTTPS page?
perl -MLWP::UserAgent -e '$ua=LWP::UserAgent->new;print $ua->get("https://github.com")->decoded_content();'