6

I'm trying to parse an html file.

The idea is to fetch the span's with title and desc classes and to fetch their information in each div that has the attribute class='thebest'.

here is my code:

<?php

$example=<<<KFIR
<html>
<head>
<title>test</title>
</head>
<body>
 <div class="a">moshe1
<div class="aa">haim</div>
 </div>
 <div class="a">moshe2</div>
 <div class="b">moshe3</div>

<div class="thebest">
<span class="title">title1</span>
<span class="desc">desc1</span>
</div>
<div class="thebest">
span class="title">title2</span>
<span class="desc">desc2</span>
</div>

</body>
</html>
KFIR;


$doc = new DOMDocument();
@$doc->loadHTML($example);
$xpath = new DOMXPath($doc);
$expression="//div[@class='thebest']";
$arts = $xpath->query($expression);

foreach ($arts as $art) {
    $arts2=$xpath->query("//span[@class='title']",$art);
    echo $arts2->item(0)->nodeValue;
    $arts2=$xpath->query("//span[@class='desc']",$art);
    echo $arts2->item(0)->nodeValue;
}
echo "done";

the expected results are:

title1desc1title2desc2done 

the results that I'm receiving are:

title1desc1title1desc1done

2 Answers 2

15

Make the queries relative... start them with a dot (e.g. ".//…").

foreach ($arts as $art) {
    // Note: single slash (direct child)
    $titles = $xpath->query("./span[@class='title']", $art);
    if ($titles->length > 0) {
        $title = $titles->item(0)->nodeValue;
        echo $title;
    }

    $descs = $xpath->query("./span[@class='desc']", $art);
    if ($descs->length > 0) {
        $desc = $descs->item(0)->nodeValue;
        echo $desc;
    }
}
Sign up to request clarification or add additional context in comments.

Comments

1

Instead of doing the second query try textContent

foreach ($arts as $art) {
    echo $art->textContent;
}

textContent returns the text content of this node and its descendants.

As an alternative, change the XPath to

$expression="//div[@class='thebest']/span[@class='title' or @class='desc']";
$arts = $xpath->query($expression);

foreach ($arts as $art) {
    echo $art->nodeValue;
}

That would fetch the span children of the divs with a class thebest having a class of title or desc.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.