1

I've following two variables which contains the HTML code :

$var1= Profile photo uploaded<div class="comment_attach_image">
<a class="group1 cboxElement" 
   href="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png" >
  <img src="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png" height="150px" width="150px" />
</a>

<a class="comment_attach_image_link_dwl"  href="http://52.1.47.143/feed/download/year_2015/month_03/file_a4ea5532b83a56bbbae2fffc80de4fee.png" >Download</a>
</div>

$var2 = PDF file added<div class="comment_attach_file">
        <a class="comment_attach_file_link" href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf" >1b87d4420c693f2bbdf738cbf2457d89.pdf</a>

        <a class="comment_attach_file_link_dwl"  href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf" >Download</a>
        </div>

I want to extract only the URL's from the above two variables. What I want from the above two variables is as follows :

$new_var1 = http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png;

$new_var2 = http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf ;

How to do this in an efficient and smarter way in PHP?

5
  • It's unclear what you're asking for. Are you saying you want to use PHP to extract URLs from HTML? Commented Mar 12, 2015 at 5:31
  • @Sildoreth:Yes, you got my point exactly but I don't know how to do it. Can you help me in this regard please? Commented Mar 12, 2015 at 5:33
  • 1
    You'll need to provide additional clarification. How is the PHP being given the HTML to process? What code do you have so far? Commented Mar 12, 2015 at 5:34
  • 2
    Regex is the solution! Commented Mar 12, 2015 at 5:42
  • 3
    @AmitThakur No, an HTML Parser is the solution. Commented Mar 12, 2015 at 5:52

2 Answers 2

1

Or do it the PHP way (yeah … j/k):

<?php

$var1 = 'Profile photo uploaded<div class="comment_attach_image">
<a class="group1 cboxElement" 
   href="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png" >
  <img src="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png" height="150px" width="150px" />
</a>

<a class="comment_attach_image_link_dwl"  href="http://52.1.47.143/feed/download/year_2015/month_03/file_a4ea5532b83a56bbbae2fffc80de4fee.png" >Download</a>
</div>';

$var2 = 'PDF file added<div class="comment_attach_file">
        <a class="comment_attach_file_link" href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf" >1b87d4420c693f2bbdf738cbf2457d89.pdf</a>

        <a class="comment_attach_file_link_dwl"  href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf" >Download</a>
        </div>';

$url_regex = '/(href|src)="(.*?)"/';

preg_match_all($url_regex, $var1, $matches);
var_dump($matches);

preg_match_all($url_regex, $var2, $matches);
var_dump($matches);

will yield this:

array(3) {
  [0]=>
  array(3) {
    [0]=>
    string(86) "href="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png""
    [1]=>
    string(85) "src="http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png""
    [2]=>
    string(100) "href="http://52.1.47.143/feed/download/year_2015/month_03/file_a4ea5532b83a56bbbae2fffc80de4fee.png""
  }
  [1]=>
  array(3) {
    [0]=>
    string(4) "href"
    [1]=>
    string(3) "src"
    [2]=>
    string(4) "href"
  }
  [2]=>
  array(3) {
    [0]=>
    string(79) "http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png"
    [1]=>
    string(79) "http://52.1.47.143/file/attachment/2015/03/a4ea5532b83a56bbbae2fffc80de4fee.png"
    [2]=>
    string(93) "http://52.1.47.143/feed/download/year_2015/month_03/file_a4ea5532b83a56bbbae2fffc80de4fee.png"
  }
}
array(3) {
  [0]=>
  array(2) {
    [0]=>
    string(100) "href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf""
    [1]=>
    string(100) "href="http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf""
  }
  [1]=>
  array(2) {
    [0]=>
    string(4) "href"
    [1]=>
    string(4) "href"
  }
  [2]=>
  array(2) {
    [0]=>
    string(93) "http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf"
    [1]=>
    string(93) "http://52.1.47.143/feed/download/year_2015/month_03/file_1b87d4420c693f2bbdf738cbf2457d89.pdf"
  }
}

See preg_match_all for what's included. If you really only need the first URL that matches, go for preg_match, it has the same function signature as preg_match_all.

Sign up to request clarification or add additional context in comments.

Comments

0

JavaScript would be a much better option if you're trying to parse a DOM. But, if you insist on using PHP, try downloading this HTML parser called Simple HTML DOM. There is good documentation on their site, but for what you're trying to do, I'd use the following

// Get the contents of your page
$html = file_get_html('http://linkto.com/yourfile.html');

// Find all links this way
foreach($html->find('a') as $element)  {
   echo $element->href.'<br>';
}

// Target the two particular variables as follows
// Target the first variable by the anchor tag's class name
$new_var1 = $html->find('a[class=group1 cboxElement]', 0)->href; 
$new_var2 = $html->find('a[class=comment_attach_file_link_dwl]', 0)->href;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.