0

I realize HTML can not be parsed with regex. However, I have a string with some source code from a typical amazon web page.

            <script type="text/javascript">
                P.when("A", "jQuery").execute(function(A, $) {
                    var pageState = A.state('ftPageState');
                    if (typeof pageState === 'undefined') {
                        pageState = {};
                    }
                    if (pageState["fast-track-message"]) {
                        pageState["fast-track-message"].stopTimer();
                    }

        <li> 48 pages</li>

                    pageState["fast-track-message"] = new fastTrackCountDown(20710,"fast-track-message");
                    A.state('ftPageState', pageState);
                });
            </script>
        
        

I want to grab the 48. Every number will be followed by pages</li> How can I match this?

Attempt

var string_tester = String(datastuff.html());
var regex_tester = string_tester.match(/\d+ pages<\/li>/);
0

2 Answers 2

1

If you know it will always be in the list element, try this: (<li>\s*)([0-9]+)(\s*pages\s*</li>) (48 would be in $2. However, that won't accommodate number formatting. This should be generic enough: (<li>\s*)([0-9,\.\-\(\)]+)(\s*pages\s*</li>). I should note that amazon has a seller and publisher API that might provide a more stable route for you to pursue depending on your use case.

Edit: I checked a few Amazon pages to see if there was a better approach to getting what you want and noticed that for the pages I checked there was no number, just this:

                <script type="text/javascript">
                P.when("A", "jQuery").execute(function(A, $) {
                    var pageState = A.state('ftPageState');
                    if (typeof pageState === 'undefined') {
                        pageState = {};
                    }
                    if (pageState["fast-track-message"]) {
                        pageState["fast-track-message"].stopTimer();
                    }
                    pageState["fast-track-message"] = new fastTrackCountDown(57592,"fast-track-message");
                    A.state('ftPageState', pageState);
                });
            </script>

I don't know what you are doing, but I wanted to mention that in case it invalidates an assumption you have made.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the information. You have a good point. I ran across this docs.aws.amazon.com/AWSECommerceService/latest/DG/…. So obtaining the ISBN to go through their API seems like a good route.
1

Your attempt was close! But returned "48 pages" instead of "48."

  • If you want to match one number per query, use
    string_tester.match(/(\d+) pages<\/li>/)[1];
    note the '(' ')' captured group
  • To match multiple numbers:

string_tester = "testing <li> 48 pages</li> now, and also testing <li> 52 pages</li>. see?";
regex_tester = string_tester.match(/\d+ pages<\/li>/g)
               .map(function(m){
                 return m.match(/\d+/)[0]; // or return m.replace(/\D/g, "");
               });
document.getElementsByTagName('p')[0].innerHTML = regex_tester;
<p></p>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.