3

I'm attempting to scrape a <script> tag from a set of webpages using Simple HTML Dom. At first, I was scraping it by providing the numerical order of the tag I needed:

$script = $html->find('script', 17); //The tag I need is typically the 18th <script> tag on the page

I've come to realize that the order differs depending on the page (and it's just not a scalable way of doing this since it could change at any time). How can I instead search for a keyword within the tag that I need and then pull back the full tag? For example, the tag I need always contains the string "PRODUCT_METADATA".

Thanks in advance for any ideas!

1
  • Use Xpath with simpleXML aor DomDocument Commented Aug 3, 2015 at 18:49

2 Answers 2

7

I ended up using the below code to search all script tags for my keyword:

$scripts = $html->find('script');
    foreach($scripts as $s) {
        if(strpos($s->innertext, 'PRODUCT_METADATA') !== false) {
            $script = $s;
        }
    }
Sign up to request clarification or add additional context in comments.

Comments

0

It works, but for me I was trying to find a csrf token hidden in a script tag and at first couldn't get it to work, all a got out was NULL.

My solution was use explode() on the script s and very important remember ->innertext else you can't get a string.

I was lucky that the token was in doublequotes so it was easy to get it.

My final code looks like this:

$scripts = $html->find('script');
foreach($scripts as $s) {
    if (strpos($s->innertext, 'csrf_token') !== false) {
        $script_array = explode('"', $s->innertext);
        $token = $script_array[1];
        break;
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.