0

I am using a regex to grab content of all script tag of an html page. the regex and code I use is like:

$content = file_get_contents($url, false, stream_context_create(
                    array("http" => array("user_agent" => "any"))
            ));

$pattern = "/<script[^>]*?>([\s\S]*?)<\/script>/";
preg_match_all($pattern, $content, $inside_script_array);

echo "<pre>";
print_r($inside_script_array);
echo "</pre>";

when I take 1.>

$url = 'http://www.bestylish.com/' ;

it returns me all the script tag . but when I take 2.>

$url = 'http://www.bestylish.com/sale' ;

it doesn't reply me many tags which are same and present in above url 1. What should be the reason ?

1

1 Answer 1

4

The reason is that regular expressions are not a good tool to manipulate HTML. If you still have the option to switch to a DOM parser, fetching <script> tags can be as simple as:

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML(file_get_contents('http://www.google.com'));
libxml_use_internal_errors(false);

$items = $domd->getElementsByTagName('script');
$data = array();

foreach($items as $item) {
  $data[] = array(
    'src' => $item->getAttribute('src'),
    'outerHTML' => $domd->saveHTML($item),
    'innerHTML' => $domd->saveHTML($item->firstChild),
  );
}

print_r($data);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.