2

Below is the code I am using.

It reads links from a textarea, and then gets the source code and finally filters the meta tags. However it only displays the last element in the array.

So if for example I put 3 websites into the textarea, it will only read the last one, the others are just shown as blank.

Have spent hours trying this, please help.

function file_get_contents_curl($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

if(isset($_POST['url'])){
    $url = $_POST['url'];
    $url = explode("\n",$url);
    print_r($url);
    for($counter = 0; $counter < count($url); $counter++){
        $html = file_get_contents_curl($url[$counter]); // PASSING LAST VALUE OF ARRAY
        $doc = new DOMDocument();
        @$doc->loadHTML($html);
        $nodes = $doc->getElementsByTagName('title');
        $title = $nodes->item(0)->nodeValue;
        $metas = $doc->getElementsByTagName('meta');
        for ($i = 0; $i < $metas->length; $i++){
            $meta = $metas->item($i);
            if($meta->getAttribute('name') == 'description')
                $description = $meta->getAttribute('content');
            if($meta->getAttribute('name') == 'keywords')
                $keywords = $meta->getAttribute('content');
        }
        print
        ('
        <fieldset>
            <table>
                <legend><b>URL: </b>'.$url[$counter].'</legend>
                <tr>
                    <td><b>Title:</b></td><td>'.$title.'</td>
                </tr>
                <tr>
                    <td><b>Description:</b></td><td>'.$description.'</td>
                </tr>
                <tr>
                    <td><b>Keywords:</b></td><td>'.$keywords.'</td>
                </tr>
            </table>
        </fieldset><br />
        ');
    }
}                            
4
  • What is the output of print_r($url);? Commented Dec 8, 2011 at 16:33
  • Array ( [0] => site1.com [1] => site2.com [2] => site3.com ) Commented Dec 8, 2011 at 16:35
  • Simplified your loop - working for me based on your info -> codepad.org/jkZHnfAo Commented Dec 8, 2011 at 16:55
  • 1
    the loop is dynamic, i said site1 site2 site3 as an example, the array if fine, curl however is only executing the last element of the array Commented Dec 8, 2011 at 16:59

1 Answer 1

4

This was an annoying little bug to find - but here is the (ridiculously simple) solution:

Your URLs are getting white space added to them, for all but the last URL therefore you'll need to trim it, you can do the following:

curl_setopt($ch, CURLOPT_URL, trim($url));

If available, you could have possibly just used file_get_contents() (still requires you trimming the URL).

The second problem is that if there is no meta data then the old variables are used (from the previous loop) so just before the end of your main loop, after your print() add the following:

unset($title,$description,$keywords);
Sign up to request clarification or add additional context in comments.

1 Comment

Just can't believe this simple solution could be that much of a headache! Fantastic!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.