How to Extract Particular String from the HTML Source code using PHP

Question

I'm trying to extract particular string from the whole HTML source code.

HTML Source: view-source:https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en

Need To Extract String: https://instagram.fmaa1-2.fna.fbcdn.net/t51.2885-15/e35/18645014_163619900839441_7821159798480568320_n.jpg From the "og:image" Meta Property.

i have tried some methods, but everything gone wrong. Is there any way to grab the image link from the og:image meta property of the source code. After extracting need to store the image url on a particular variable. Expert helps needed. Url that need to extract

You could use PHP DomDocument to build a scraper. php.net/manual/en/class.domdocument.php — melkawakibi
– melkawakibi, Commented May 23, 2017 at 20:27
Why don't you share those "some methods" and what "everything gone wrong" means, i.e. what specific errors do you get? — ceejayoz
– ceejayoz, Commented May 23, 2017 at 20:28
So you want to extract the content attribute of the og:image meta? — BenM
– BenM, Commented May 23, 2017 at 20:31
yes, I need to extract the content attribute of the og:image meta from the whole source code @BenM — Narendhiran vignesh
– Narendhiran vignesh, Commented May 23, 2017 at 20:32

mickmackusa · Accepted Answer · 2017-05-24 07:20:10Z

1

Don't use preg_match_all() if you are only grabbing one substring. Loading a DOMDocument seems like overkill for this task.

By using \K you can reduce result array bloat.

Sample Input:

$input='<meta property="og:title" content="Instagram post by Narendiran blah blah" />
<meta property="og:image" content="https://instagram.fmma1-2.blah.jpg" />
<meta property="og:description" content="8 Likes, 1 Comments - blah" />';

Method (Demo):

$url=preg_match('/"og:image"[^"]+"\K[^"]+/',$input,$out)?$out[0]:null;
echo $url;

Output:

https://instagram.fmma1-2.blah.jpg

The regex engine will run more efficiently by using a negated character class. [^"]. (Pattern Demo)

edited May 24, 2017 at 7:20

answered May 24, 2017 at 6:06

mickmackusa♦

49.2k13 gold badges98 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BenM · Accepted Answer · 2017-05-23 20:33:18Z

0

Assuming you have the markup inside a string with PHP, what's wrong with a RegEx?

preg_match_all('/<meta.*property="og:image".*content="(.*)".*\/>/', $string, $matches);
echo $matches[1][0];

Demo

Disclaimer: more efficient regexes may be available.

answered May 23, 2017 at 20:33

BenM

53.3k26 gold badges116 silver badges172 bronze badges

3 Comments

Narendhiran vignesh Over a year ago

The above methods works awesomely. how to extract the above string from the whole source code without assuming the markup ( Because it may vary for different html sources ). Is it possible to extract the "content" attribute of "og:meta" meta from whole HTML source code. If possible means could you provide the way..

BenM Over a year ago

The above will work for that. $string can contain whatever source code you have.

Narendhiran vignesh Over a year ago

Yeah, i have executed the code as my requirement, The above code is perfectly working.. Thanks @BenM

melkawakibi · Accepted Answer · 2017-05-23 21:29:18Z

0

In this code snippet I'm using DOMDocument to scrap the attribute content form the meta tag. It stores it in an Array in case there are more and returns it. Hope it works.

   function get_img_url($url) { 

        // Create a new DOM object 
        $html = new DOMDocument(); 

        // load the HTML page 
        $html->loadHTMLFile($url); 

        // create a empty array object 
        $imageArray = array(); 

        //Loop through each meta tag
        foreach($html->getElementsByTagName('meta') as $meta) { 
            $imageArray[] = array('url' => $meta->getAttribute('content')); 
        } 

        //Return the list 
        return $imageArray; 
    }

answered May 23, 2017 at 21:29

melkawakibi

8814 gold badges11 silver badges27 bronze badges

Comments

Azad Bhagat Singh · Accepted Answer · 2017-05-24 09:23:50Z

0

Try this code to scrap webpage. I used simple_html_dom_parser. you can download it from https://sourceforge.net/projects/simplehtmldom/files/

include_once("simple_html_dom.php");

$output_filename = "example_homepage.html";
$fp = fopen($output_filename, 'w');
$url = 'https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en';
$curl = curl_init();

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt ($curl, CURLOPT_FILE, $fp);
$result = curl_exec($curl);

curl_close($curl);
fclose($fp);

$html = file_get_html('example_homepage.html');

foreach($html->find('meta[property=og:image]') as $element) 
   echo $element->content . '<br>';

answered May 24, 2017 at 9:23

Azad Bhagat Singh

415 bronze badges

Collectives™ on Stack Overflow

How to Extract Particular String from the HTML Source code using PHP

4 Answers 4

Comments

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related