1

Hope someone could help me because I'm not aware with regex.
I need to extract data included in a classic html page to a PHP array.
The HTML code is as below :

<html>
...some html code...
<div data-companycounter="9879" data-code="A" data-seatcounter="9783" class="">
...some html code...
<div data-companycounter="9879" data-code="B" data-seatcounter="9784" class="">
...some html code...
<div data-companycounter="11397" data-code="A" data-seatcounter="11509" class="">
...some html code...
</html>

And I would like to extract some data in an array like this :

$companycounter = [
    9879 => [
        'A' => 9783,
        'B' => 9784,
    ],
    11397 => [
        'A' => 11509
    ]
];

Hope it's clear enought. Thank for those who can help me

3
  • 6
    You might want to use an actual HTML parser for this job. Commented May 11, 2021 at 22:25
  • 1
    php.net/manual/en/class.domdocument Commented May 11, 2021 at 23:36
  • Yeah if you use DOMDocument class it would be easy to extract data. could you please check my answer out! Commented May 14, 2021 at 18:44

2 Answers 2

1
function custom_parse_html($html)
{
    $company_counter = [];

    preg_match_all('/<div data-companycounter="(.*)" data-code="(.*)" data-seatcounter="(.*)" (.*)>/im', $html, $matches);

    foreach ($matches[0] as $key => $arr) {
        //  $matches[1][$key] => data-companycounter
        //  $matches[2][$key] => data-code
        //  $matches[3][$key] => data-seatcounter

        if (!empty($company_counter[$matches[1][$key]])) {
            $company_counter[$matches[2][$key]] = $matches[3][$key];
        } else {
            $company_counter[$matches[1][$key]] = [$matches[2][$key] => $matches[3][$key]];
        }
    }

    return $company_counter;
}
Sign up to request clarification or add additional context in comments.

Comments

0

As said in comments Use HTML parser instead of regex it would be easy to extract data from HTML. So just intial an object $doc from DOMDocument class. get all divs by using method getElementsByTagName, Then iterate over them and take the desired company's information attributes, Set them to $companycounter array in a specific order.

$html = 
'<div data-companycounter="9879" data-code="A" data-seatcounter="9783"/>
<div data-companycounter="9879" data-code="B" data-seatcounter="9784"/>
<div data-companycounter="11397" data-code="A" data-seatcounter="11509"/>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$divs = $doc->getElementsByTagName('div');
$companycounter = [];
foreach ($divs as $div) {
  $counter = $div->attributes->item(0)->value; //data-companycounter
  $code = $div->attributes->item(1)->value; //data-code
  $seatcounter = $div->attributes->item(2)->value; //data-seatcounter
  $companycounter[$counter][$code] = $seatcounter;
}
echo "<pre>";
print_r($companycounter);

The Output as expected:

/*
Array
(
    [9879] => Array
        (
            [A] => 9783
            [B] => 9784
        )

    [11397] => Array
        (
            [A] => 11509
        )

)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.