3

I have opened an HTML file using

file_get_contents('http://www.example.com/file.html')

and want to parse the line including "ParseThis":

 <h1 class=\"header\">ParseThis<\/h1>

As you can see, it's within an h1 tag (the first h1 tag from the file). How can I get the text "ParseThis"?

3 Answers 3

5

You can use DOM for this.

// Load remote file, supress parse errors
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://www.example.com/file.html');
libxml_clear_errors();

// use XPath to find all nodes with a class attribute of header
$xp = new DOMXpath($dom);
$nodes = $xp->query('//h1[@class="header"]');

// output first item's content
echo $nodes->item(0)->nodeValue;

Also see

Marking this CW because I have answered this before, but I am too lazy to find the duplicate

Sign up to request clarification or add additional context in comments.

Comments

4

Use this function.

<?php
function get_string_between($string, $start, $end)
{
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0)
        return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

$data = file_get_contents('http://www.example.com/file.html');

echo get_string_between($data, '<h1 class=\"header\">', '<\/h1>');

2 Comments

It may work for this case, but you should be using DOM selectors or XML navigation.
I prefer this because it work faster than DOM and when there are very simple requirements like this, I use my get_string_between :)
1

Since it is the first h1 tag, getting it should be fairly trivial:

$doc = new DOMDocument();
$doc->loadHTML($html);
$h1 = $doc->getElementsByTagName('h1');
echo $h1->item(0)->nodeValue;

http://php.net/manual/en/class.domdocument.php

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.