I've been scratching my head for days over this stupid one.
I have an array of urls called $url_array pulled from the database like so -
Array (
[id] => 2
[url] => http://example.com
)
I have foreach loop which runs over $url_array and scrapes the url for data like so -
foreach ($url_array as $row) {
$data = $this->scrapePage($row["url"]);
print_r($data);
return false;
}
Currently $data is outputting nothing. But if I replace $row["url"] with http://example.com, the scrape happens correctly.
This is the first time I've also hosted this script on DigitalOcean so I'm not sure if there are any server technicalities possibly stopping a foreach loop from working.
edit: Here is the scrapePage function -
private function scrapePage($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Accept-Charset: utf-8'));
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
$content = curl_exec($ch);
$header = curl_getinfo($ch);
curl_close($ch);
return array("header" => $header, "content" => $content);
}
Like I said, if I manually enter a url in there, it works fine, just not when in a loop.
As for the $url_array, this is the output when I print it out -
Array
(
[0] => Array
(
[id] => 41
[url] => http://www.example1.com
)
[1] => Array
(
[id] => 85
[url] => http://test-url-2.com
)
)
I've also tried a for loop over the data. If I modify the scrapePage function to return the $url, it returns the $url correctly.
$url_arrayexactly as the array you posted above? Or is that just one subarray from a larger, multidimensional array that you are not showing?scrapePageor add a log-statement to it, logging$url-- see whats really happening.$key=>$row) and compare(===) the URLs from file and db arrays. If no then try to addsleep(2)in your loop.