I have an HTML page which has a very large and very complex chunk of JSON in a script tag.
I want to extract the JSON so that I can decode it in a php script.
The JSON looks something like:
<script type="text/javascript">
var user_list_data_obj = (
({
... truncated ...
})
);
... some more js ...
</script>
The script tags can't be used in the pattern, because there's other JS between them, and there's nothing to make them unqiue anyway.
I believe I need to match against the variable name, and the first occurrence of '}));' but my attempts to match that have failed.
What I've got so far is:
$pattern = '/var user_list_data_obj = \(\s\(({.*})\)\s\);/';
Which returns nothing.
What am I doing wrong in that pattern? I know its difficult to match anything that has opening and closing delimiters like JSON, etc with a regex, but it should be possible in this case, no?
EDIT:
I'm trying to get the entire "user_list_data_obj" object parsed into my php script. But really, the bits I'm interested in are the several "columns :[] " arrays, so if it's easier to get those out separately, it might make sense to do that.
The columns[] arrays look something like
columns : [
{ display_value : '<input type="checkbox" name="user" value="username">'},
{ display_value : 'username', sort_value : 'username'},
{ display_value : 'username', sort_value : 'username'},
{ display_value : 'Enabled', sort_value : '1' },
{ display_value : '<img class="" src="/enabled.gif">', sort_value : '1' },
{ display_value : '<img class="" src="/enabled.gif">', sort_value : '1' },
{ display_value : '<img class="" src="/enabled.gif">', sort_value : '1' }
],
m) is not relevant here. It changes the behavior of the anchors (^and$), which he isn't using. It's singleline mode (s) that lets the dot match line separator characters.