0

How to parse the HTML data to an PHP array PHP

HTML Data

<div class="test">
    <strong>ID</strong>
    <a href="a.html" title="a html">123456</a><br>
    <label class='label'>Occupation </label>    
    House wife      <br>
    <label>Language?</label>    
    English     <br>
    <label style="width:50%">Basic Language Knowledge of?</label>   
    Hindi       <br>
    <label>Start date</label>
    Nov 2013        <br>
    <label>Other Info</label>
    yes     <br>
    <label>age</label>
    19      <br>
    <label>Gender</label>   
    Female      <br>
    <strong>Address</strong>
    India       <br><br>
    <p>Hi, <br>
Lorem ipsum doner inut</p>
</div>

I tried this,

<?php
    $html='Let above html to parse';
    preg_match_all('/<label\s(.*)>(.*)<\/label>/U',$html,$m);
    print_r($m);
    // gives all label contents only but I need pair of label text 
    // and value showing after it
?>

Output like,

Array('ID'=>123456,'link'=>'a.html','Occupation'=>'House wife','Language?'=>'English', 'Basic Language Knowledge of?'=>'Hindi','Start date'=>'Nov 2013','Other Info'=>'yes' ,'age'=>'19','Gender'=>'Female','Address'=>'India','description'=>'Hi, Lorem ipsum doner inut');

Yes, forgot to mention I am using ganon for scraping

1
  • so what is the problem? Commented Nov 11, 2013 at 10:16

3 Answers 3

1

Use DOMDocument to parse HTML.

$doc = new DOMDocument();
$doc->loadHTML($html);

And use DOMXPath to get all your labels:

$xpath = new DOMXPath($doc);
$allLabels = $xpath->query('//label');

foreach($allLabels as $label) {
    var_dump($label, $label->nodeValue);

    /* or */
    $labelElmnts = $xpath->query('/*', $label);

    $innerHTML = '';

    foreach($labelElmnts as $elmnt)
        $innerHTML .= $domDoc->saveHTML($elmnt);

    var_dump($innerHTML);
}
Sign up to request clarification or add additional context in comments.

2 Comments

It will only give the list of label data and I need the text after label data like, occupation=>'house wife'`
Read the doc: php.net/manual/en/class.domnode.php. ->nodeValue is what you're looking for. Have a look at my edited code in t he answer.
0

Even easier solution.

Use QueryPath:

foreach(qp($html, 'label') as $label){
  echo $label->text();
}

Just like jquery.

Comments

0

I used ganon so I don't want to use Dom Document I tried it something and worked like,

// for description
echo $desc=$html('div.right_div p',0)->getInnerText();

$s=$html('div.right_div',0)->getInnerText();

// for occupation
$r='/<label>\s*Occupation\s*<\/label>\s*(.*)\s*<br\s*[\/]>/i';
preg_match_all($r,$s,$ma);
echo $occupation=$ma[1];

// for address
$r='/<strong>\s*Address\s*<\/strong>\s*(.*)\s*<br\s*[\/]>/i';
preg_match_all($r,$s,$ma);
echo $address=$ma[1];

// for id
echo $id=$html('div.right_div a',0)->getInnerText();

And so on ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.