0

I need to extract json which is existing inside some html tags. how to extract name(key) values from below json using regular expression

<div id="gwt_products_display_results" class="gwt_products_display_results">
                <span class="JSON" style="display: none;">
{
    "products": [
        {
            "targetURL": "/athena-mineral-fabric-by-the-yard/262682",
            "listIndex": "0",
            "minimumPrice": 20,
            "categoryOnSale": "false",
            "mfPartNumber": "FF010ATM",
            "hasAtLeastOneBuyableAndPublishedItem": "true",
            "attributes": [],
            "partNumber": "b_FF010ATM",
            "itemAsProduct": "true",
            "iapAttribute": "",
            "productDetailTargetURL": "/athena-mineral-fabric-by-the-yard/262682",
            "iapAttributeCode": "",
            "beanType": "bundle",
            "name": "Athena Mineral Fabric by the Yard",
            "maxListPrice": 0,
            "thumbNail": "null",
            "hasSaleSKUs": false,
            "productId": "262682",
            "currencyCode": "USD",
            "hasMoreColors": false,
            "xPriceLabel": "null",
            "minListPrice": 0,
            "maximumPrice": 20,
            "iapAttributeDisplayName": "",
            "shortDescription": "null",
            "listId": "SEARCHRESULTS",
            "categoryId": "null"
        },
        {
            "targetURL": "/athena-slate-fabric-by-the-yard/262683",
            "listIndex": "1",
            "minimumPrice": 20,
            "categoryOnSale": "false",
            "mfPartNumber": "FF010ATS",
            "hasAtLeastOneBuyableAndPublishedItem": "true",
            "attributes": [],
            "partNumber": "b_FF010ATS",
            "itemAsProduct": "true",
            "iapAttribute": "",
            "productDetailTargetURL": "/athena-slate-fabric-by-the-yard/262683",
            "iapAttributeCode": "",
            "beanType": "bundle",
            "name": "Athena Slate Fabric by the Yard",
            "maxListPrice": 0,
            "thumbNail": "null",
            "hasSaleSKUs": false,
            "productId": "262683",
            "currencyCode": "USD",
            "hasMoreColors": false,
            "xPriceLabel": "null",
            "minListPrice": 0,
            "maximumPrice": 20,
            "iapAttributeDisplayName": "",
            "shortDescription": "null",
            "listId": "SEARCHRESULTS",
            "categoryId": "null"
        },
        {
            "targetURL": "/typewriter-keys-giclee/261307",
            "listIndex": "2",
            "minimumPrice": 259,
            "categoryOnSale": "false",
            "mfPartNumber": "WD813",
            "hasAtLeastOneBuyableAndPublishedItem": "true",
            "attributes": [
                {
                    "S7 - Overlay 1": "blank"
                }
            ],
            "partNumber": "p_WD813",
            "itemAsProduct": "true",
            "iapAttribute": "",
            "productDetailTargetURL": "/typewriter-keys-giclee/261307",
            "iapAttributeCode": "",
            "beanType": "product",
            "name": "Typewriter Keys Giclee",
            "maxListPrice": 0,
            "thumbNail": "null",
            "hasSaleSKUs": false,
            "productId": "261307",
            "currencyCode": "USD",
            "hasMoreColors": false,
            "xPriceLabel": "null",
            "minListPrice": 0,
            "maximumPrice": 259,
            "iapAttributeDisplayName": "",
            "shortDescription": "null",
            "listId": "SEARCHRESULTS",
            "categoryId": "null"
        }
    ]
}
</span>
</div>

what I have tried so far is

<span class="JSON" style="display: none;">([\s\S]+?)<\/span>
3
  • 4
    Why??? Just use json_decode. Commented May 31, 2013 at 12:28
  • 4
    Why for everything-in-the-world-that-might-be-considered-holy’s sake would you want to use regular expressions on a data structure like JSON? Parse it into an object/array, and access the values you want directly or by looping over it. Commented May 31, 2013 at 12:28
  • If you are planning to drop json_encode() and write your own full-fledge JSON parser you'll probably need much more than regular expressions because JSON allows indefinite nesting level of arbitrary elements. Are you looking for something to do this Summer? Commented May 31, 2013 at 12:30

3 Answers 3

4

You can convert it to an array and then get the names using array_keys();

$array = json_decode($json);

$keys = array_keys($array['products']);
Sign up to request clarification or add additional context in comments.

Comments

1

Why, regular expression? As the other people here mentioned, you can use json_decode to parse it to an array and process it.

But if you insist on regular expression I would say /"(.+?)":/ will match ALL keys if your JSON have exact format as shown.

UPDATE

So you are getting it from a html string. Consider the variable is $html and as you insist on regular expression, parse the json using regex as follows and then decode. To parse the keys, use array_keys()

preg_match('/<span.*?class="JSON".*?>(.+?)<\/span>/s', $html, $matches);

$decoded_array = json_decode($matches[1], true);

print_r($decoded_array);

$keys = array_keys($decoded_array['products'][0]);

print_r($keys);

2 Comments

Acctually i am talking data from html structure so its difficult to use json_decode
Then parse the json from html and then decode. Updated the answer.
0

You can use DOMDocument and DOMXPath to find the span elements which contain the JSON, and then json_decode that. Here's a rough example to get you on your way: -

<?php
$html = '
<html>
    <head>
        <title>Example</title>
    </head>
    <body>
        <div id="gwt_products_display_results" class="gwt_products_display_results">
            <span class="JSON" style="display: none;">
            {
                "products": [
                    {
                        "targetURL": "/athena-mineral-fabric-by-the-yard/262682",
                        "listIndex": "0",
                        "minimumPrice": 20,
                        "categoryOnSale": "false",
                        "mfPartNumber": "FF010ATM",
                        "hasAtLeastOneBuyableAndPublishedItem": "true",
                        "attributes": [],
                        "partNumber": "b_FF010ATM",
                        "itemAsProduct": "true",
                        "iapAttribute": "",
                        "productDetailTargetURL": "/athena-mineral-fabric-by-the-yard/262682",
                        "iapAttributeCode": "",
                        "beanType": "bundle",
                        "name": "Athena Mineral Fabric by the Yard",
                        "maxListPrice": 0,
                        "thumbNail": "null",
                        "hasSaleSKUs": false,
                        "productId": "262682",
                        "currencyCode": "USD",
                        "hasMoreColors": false,
                        "xPriceLabel": "null",
                        "minListPrice": 0,
                        "maximumPrice": 20,
                        "iapAttributeDisplayName": "",
                        "shortDescription": "null",
                        "listId": "SEARCHRESULTS",
                        "categoryId": "null"
                    },
                    {
                        "targetURL": "/athena-slate-fabric-by-the-yard/262683",
                        "listIndex": "1",
                        "minimumPrice": 20,
                        "categoryOnSale": "false",
                        "mfPartNumber": "FF010ATS",
                        "hasAtLeastOneBuyableAndPublishedItem": "true",
                        "attributes": [],
                        "partNumber": "b_FF010ATS",
                        "itemAsProduct": "true",
                        "iapAttribute": "",
                        "productDetailTargetURL": "/athena-slate-fabric-by-the-yard/262683",
                        "iapAttributeCode": "",
                        "beanType": "bundle",
                        "name": "Athena Slate Fabric by the Yard",
                        "maxListPrice": 0,
                        "thumbNail": "null",
                        "hasSaleSKUs": false,
                        "productId": "262683",
                        "currencyCode": "USD",
                        "hasMoreColors": false,
                        "xPriceLabel": "null",
                        "minListPrice": 0,
                        "maximumPrice": 20,
                        "iapAttributeDisplayName": "",
                        "shortDescription": "null",
                        "listId": "SEARCHRESULTS",
                        "categoryId": "null"
                    }
                ]
            }
            </span>
        </div>
    </body>    
</html>
';

$document   = DOMDocument::loadHTML($html);
$xpath      = new DOMXPath($document);
$spans      = $xpath->query('//div/span[@class="JSON"]');

foreach ($spans as $span) {
    $catalog = json_decode($span->nodeValue);
    printf("We found %d products.\n", count($catalog->products));
    foreach ($catalog->products as $index => $product) {
        printf("Product #%d - %s.\n", ++$index, $product->name);
    }
}

/*
    We found 2 products.
    Product #1 - Athena Mineral Fabric by the Yard.
    Product #2 - Athena Slate Fabric by the Yard.
*/

1 Comment

sorry to say i need regex instead of DOMXpath

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.