0

I'm trying to extract particular string from the whole HTML source code.

HTML Source: view-source:https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en

Need To Extract String: https://instagram.fmaa1-2.fna.fbcdn.net/t51.2885-15/e35/18645014_163619900839441_7821159798480568320_n.jpg From the "og:image" Meta Property.

i have tried some methods, but everything gone wrong. Is there any way to grab the image link from the og:image meta property of the source code. After extracting need to store the image url on a particular variable. Expert helps needed. Url that need to extract

6
  • 1
    You could use PHP DomDocument to build a scraper. php.net/manual/en/class.domdocument.php Commented May 23, 2017 at 20:27
  • 1
    Why don't you share those "some methods" and what "everything gone wrong" means, i.e. what specific errors do you get? Commented May 23, 2017 at 20:28
  • So you want to extract the content attribute of the og:image meta? Commented May 23, 2017 at 20:31
  • yes, I need to extract the content attribute of the og:image meta from the whole source code @BenM Commented May 23, 2017 at 20:32
  • @Narendhiranvignesh Please see my answer. Commented May 23, 2017 at 20:35

4 Answers 4

1

Don't use preg_match_all() if you are only grabbing one substring. Loading a DOMDocument seems like overkill for this task.

By using \K you can reduce result array bloat.

Sample Input:

$input='<meta property="og:title" content="Instagram post by Narendiran blah blah" />
<meta property="og:image" content="https://instagram.fmma1-2.blah.jpg" />
<meta property="og:description" content="8 Likes, 1 Comments - blah" />';

Method (Demo):

$url=preg_match('/"og:image"[^"]+"\K[^"]+/',$input,$out)?$out[0]:null;
echo $url;

Output:

https://instagram.fmma1-2.blah.jpg

The regex engine will run more efficiently by using a negated character class. [^"]. (Pattern Demo)

Sign up to request clarification or add additional context in comments.

Comments

0

Assuming you have the markup inside a string with PHP, what's wrong with a RegEx?

preg_match_all('/<meta.*property="og:image".*content="(.*)".*\/>/', $string, $matches);
echo $matches[1][0];

Demo

Disclaimer: more efficient regexes may be available.

3 Comments

The above methods works awesomely. how to extract the above string from the whole source code without assuming the markup ( Because it may vary for different html sources ). Is it possible to extract the "content" attribute of "og:meta" meta from whole HTML source code. If possible means could you provide the way..
The above will work for that. $string can contain whatever source code you have.
Yeah, i have executed the code as my requirement, The above code is perfectly working.. Thanks @BenM
0

In this code snippet I'm using DOMDocument to scrap the attribute content form the meta tag. It stores it in an Array in case there are more and returns it. Hope it works.

   function get_img_url($url) { 

        // Create a new DOM object 
        $html = new DOMDocument(); 

        // load the HTML page 
        $html->loadHTMLFile($url); 

        // create a empty array object 
        $imageArray = array(); 

        //Loop through each meta tag
        foreach($html->getElementsByTagName('meta') as $meta) { 
            $imageArray[] = array('url' => $meta->getAttribute('content')); 
        } 

        //Return the list 
        return $imageArray; 
    } 

Comments

0

Try this code to scrap webpage. I used simple_html_dom_parser. you can download it from https://sourceforge.net/projects/simplehtmldom/files/

include_once("simple_html_dom.php");

$output_filename = "example_homepage.html";
$fp = fopen($output_filename, 'w');
$url = 'https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en';
$curl = curl_init();

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt ($curl, CURLOPT_FILE, $fp);
$result = curl_exec($curl);

curl_close($curl);
fclose($fp);

$html = file_get_html('example_homepage.html');

foreach($html->find('meta[property=og:image]') as $element) 
   echo $element->content . '<br>';

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.