2

I have an HTML page which has a very large and very complex chunk of JSON in a script tag.

I want to extract the JSON so that I can decode it in a php script.

The JSON looks something like:

<script type="text/javascript">
    var user_list_data_obj = (
    ({

    ... truncated ...

    })
    );

    ... some more js ...
</script>

The script tags can't be used in the pattern, because there's other JS between them, and there's nothing to make them unqiue anyway.

I believe I need to match against the variable name, and the first occurrence of '}));' but my attempts to match that have failed.

What I've got so far is:

$pattern = '/var user_list_data_obj = \(\s\(({.*})\)\s\);/';

Which returns nothing.

What am I doing wrong in that pattern? I know its difficult to match anything that has opening and closing delimiters like JSON, etc with a regex, but it should be possible in this case, no?

EDIT:

I'm trying to get the entire "user_list_data_obj" object parsed into my php script. But really, the bits I'm interested in are the several "columns :[] " arrays, so if it's easier to get those out separately, it might make sense to do that.

The columns[] arrays look something like

columns : [
       { display_value : '<input type="checkbox" name="user" value="username">'}, 
       { display_value : 'username', sort_value : 'username'}, 
       { display_value : 'username', sort_value : 'username'}, 
       { display_value : 'Enabled', sort_value : '1' },
       { display_value : '<img class="" src="/enabled.gif">', sort_value : '1' }, 
       { display_value : '<img class="" src="/enabled.gif">', sort_value : '1' },
       { display_value : '<img class="" src="/enabled.gif">', sort_value : '1' }
       ],
4
  • 1
    Did you try using m modifier for multiline? Commented May 10, 2013 at 17:59
  • The JSON grammar is not regular, so you need to depend on text before and after the JSON to find its boundaries. Can you expand on the context in which it appears? Commented May 10, 2013 at 18:19
  • @MikeSamuel I've added a pastebin that has an actual example, you can see the context in that. Commented May 10, 2013 at 18:39
  • 1
    @AmitKriplani: Multiline mode (m) is not relevant here. It changes the behavior of the anchors (^ and $), which he isn't using. It's singleline mode (s) that lets the dot match line separator characters. Commented May 10, 2013 at 19:02

2 Answers 2

2

I was able to match the entire json object with the following

/user_list_data_obj\s*=\s*\(\s*\({(.*?)}\)\s*\);/

But in actuality, I ended up using preg_match_all to match each columns[] array in the json by using:

/columns\s*:\s*\[.*?\],/s
Sign up to request clarification or add additional context in comments.

2 Comments

You saved my day! I changed a little bit and it worked very fine for those who has not ( regex: datasource\s*=\s*\s*{(.*?)}\s*;
And to get actual json: (?<=datasource\s*=\s*){(.*?)}\s*;
1

The closest I can get is

preg_match('/var user_list_data_obj = \(\s+\(({.*})\)\s+\);/s', $html, $matches);

The s modifer allows for the matching of newlines.

This is imperfect as it makes assumptions about the structure: namely that the JSON you need literally starts with

( /* some space */
({

and ends with

}) /* some space */
);

If you can't make those assumptions, a less specific regex will likely match other parts of the script. Also, if you have }) ); at some point in the script that you don't want to match, it will still be matched. Using {.*?} won't work because there can be many nested objects literals in the string you want to capture.

1 Comment

Ahh yes, I do see where I went wrong in my expression after looking at yours, but unfortunately, neither work to extract what I'm trying to extract. It looks like the contents of the JSON may be causing some trouble. Here's an actual example of something I'm trying to extract: pastebin.com/DvNx1TF0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.