1

In PHP I'm using the Simple HTML DOM Parser class.

I have a HTML file which has multiple A-tags.

Now I need to find the tag that has a certain text inside.

for example :

$html = "<a id='tag1'>A</a>
         <a id='tag2'>B</a>
         <a id='tag3'>C</a>
        ";

$dom = str_get_html($html);
$tag = $dom->find("a[plaintext=B]");

The above example doesn't work, since plaintext can only be used as an attribute.

Any idea's?

1
  • In normal XPath it would be a[content()="B"]. Question is: does simplehtmldom support this? Normal DOM with DOMXPath would... Commented Jun 16, 2012 at 2:00

2 Answers 2

3
<?php
include("simple_html_dom.php");
$html = "<a id='tag1'>A</a>
         <a id='tag2'>B</a>
         <a id='tag3'>C</a>
        ";

$dom = str_get_html($html);
$select = NULL;
foreach($dom->find('a') as $element) {
       if ($element->innertext === "B") {
            $select = $element;
            break;   
       }
}
?>
Sign up to request clarification or add additional context in comments.

2 Comments

This works, but need to get it in 1 find() expression , since I need to make several hundred different scrapers with the expressions coming from a database.
I don't think there is any other way. Alternatively you can update the simple_html_dom.php code and add finding innertext search functionality to it. I don't know if it's going to be any more efficient than the code above unless there is some sort of hash on the innertext.
0

Assuming each specific text you are looking for maps only to a single link (which sounds like you do), you can build an associative lookup array. I just encountered this need myself. Here is how I handled it. This way you don't need to loop thru all the links every time.

function populateOutlines($htmlOutlines)
{
  $marker = "courses";
  $charSlashFwd = "/";

  $outlines = array();

  foreach ($htmlOutlines->find("a") as $element)
  {
    // filter links for ones with certain markers if required
    if (strpos($element->href, $marker) !== false)
    {
      // construct the key the way you need it
      $dir = explode($charSlashFwd, $element->href);
      $code = preg_replace(
        "/[^a-zA-Z0-9 ]/", "", strtoupper(
          $dir[1]." ".$dir[2]));

      // insert the lookup entry
      $outlines[$code] = $element->href;
    }
  }

  return $outlines;
}

// ...stuff...

$htmlOutlines = file_get_html($urlOutlines);
$outlines = populateOutlines($htmlOutlines);

// ...more stuff...

if (array_key_exists($code, $outlines)) {
  $outline = $outlines[$code];
} else {
  $outline = "n/a";
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.