Parsing XML using curl returning null

Question

I'm trying to parse an RSS feed and I am getting what appears to be an empty DOM Document object. My current code is:

$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

    $curl = curl_init();
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt( $curl, CURLOPT_URL, $xml_url );

    $xml = curl_exec( $curl );
    curl_close( $curl );

    //$xml = iconv('UTF-8', 'UTF-8//IGNORE', $xml);
    //$xml = utf8_encode($xml);
    $document = new DOMDocument;
    $document->loadXML( $xml ); 
    if( ini_get('allow_url_fopen') ) {
      echo "allow url fopen? Yes";
    }
    echo "<br />";
    var_dump($document);

    $items = $document->getElementsByTagName("item");

    foreach ($items as $item) {
        $title = $item->getElementsByTagName('title');
        echo $title;
    }

    $url = 'https://thehockeywriters.com/category/san-jose-sharks/feed/';
    $xml = simplexml_load_file($url);
    foreach ($items as $item) {
        $title = $item->title;
        echo $title;
    }
    print_r($xml);
    echo "<br />";
    var_dump($xml);
    echo "<br />hello?";

This code is two separate attempts at parsing the same url based on answers and suggestions given in the following examples found on stack overflow:
Example 1
Example 2

Things I have tried or looked up:
1. Checked to make sure that allow_url_fopen is allowed
2. Made sure that there is UTF encoding
3. Validated the XML
4. Code examples provided on previously linked Stack Overflow posts

Here is my current output with the var_dumps and echo's

allow url fopen? Yes
object(DOMDocument)#2 (34) { ["doctype"]=> NULL ["implementation"]=> string(22) "(object value omitted)" 
["documentElement"]=> NULL ["actualEncoding"]=> NULL ["encoding"]=> NULL 
["xmlEncoding"]=> NULL ["standalone"]=> bool(true) ["xmlStandalone"]=> bool(true) 
["version"]=> string(3) "1.0" ["xmlVersion"]=> string(3) "1.0" 
["strictErrorChecking"]=> bool(true) ["documentURI"]=> NULL ["config"]=> NULL 
["formatOutput"]=> bool(false) ["validateOnParse"]=> bool(false) ["resolveExternals"]=> bool(false) 
["preserveWhiteSpace"]=> bool(true) ["recover"]=> bool(false) ["substituteEntities"]=> bool(false) 
["nodeName"]=> string(9) "#document" ["nodeValue"]=> NULL ["nodeType"]=> int(9) ["parentNode"]=> NULL 
["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> NULL ["lastChild"]=> NULL 
["previousSibling"]=> NULL ["attributes"]=> NULL ["ownerDocument"]=> NULL ["namespaceURI"]=> NULL 
["prefix"]=> string(0) "" ["localName"]=> NULL ["baseURI"]=> NULL ["textContent"]=> string(0) "" } 
bool(false) 
hello?

Neither of the answers you've looked at previously are using SSL. Take a look at stackoverflow.com/questions/4372710/php-curl-https I think the issue is the certificate. — user3783243
– user3783243, Commented Mar 28, 2019 at 3:02
Hmm, I tried the quickfix curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, false); just to see if it would work and it did not. Plus, I guess that is a security issue as well. — Kurt Leadley
– Kurt Leadley, Commented Mar 28, 2019 at 3:09

Prix · Accepted Answer · 2019-03-28 04:40:20Z

1

The only issue I had with your code was that not defining a user-agent would give me error 403 to access the feed.

In the future, you could use curl_getinfo to extract the status code of the request to ensure it didn't failed and further match it against code 200, which means OK.

$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);

Aside from that a few mistakes within your loops.

With SimpleXML:

<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

if ($httpcode !== 200)
{
    echo "Failed to retrieve feed... Error code: $httpcode";
    die();
}

$feed = new SimpleXMLElement($data);
// list all titles...
foreach ($feed->channel->item as $item)
{
    echo $item->title, "<br>\n";
}

With DOMDocument:

<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

if ($httpcode !== 200)
{
    echo "Failed to retrieve feed... Error code: $httpcode";
    die();
}

$xml = new DOMDocument();
$xml->loadXML($data);
// list all titles...
foreach ($xml->getElementsByTagName("item") as $item)
{
    foreach ($item->getElementsByTagName("title") as $title)
    {
        echo $title->nodeValue, "<br>\n";
    }
}

If you just want to print the title/description of all items:

foreach ($feed->channel->item as $item)
{
    echo $item->title;
    echo $item->description;
    // uncomment the below line to print only the first entry.
    // break;
}

If you want just the first entry, without using a foreach:

echo $feed->channel->item[0]->title;
echo $feed->channel->item[0]->description;

Saving title and description to an array for later using it:

$result = [];
foreach ($feed->channel->item as $item)
{
    $result[] = 
    [
        'title' => (string)$item->title,
        'description' => (string)$item->description
    ];
    // could make a key => value alternatively from the above with 
    // title as key like this: 
    // $result[(string)$item->title] = (string)$item->description;
}

Foreach with MySQLi/PDO prepared statement:

foreach ($feed->channel->item as $item)
{
    // MySQLi
    $stmt->bind_param('ss', $item->title, $item->description);
    $stmt->execute();
    // PDO
    //$stmt->bindParam(':title', $item->title, PDO::PARAM_STR);
    //$stmt->bindParam(':description', $item->description, PDO::PARAM_STR);
    //$stmt->execute();
}

edited Mar 28, 2019 at 4:40

answered Mar 28, 2019 at 3:31

Prix

19.6k16 gold badges79 silver badges136 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kurt Leadley Over a year ago

As soon as I added that useragent line, I was able to get it to work. Thank you. Fixed the loop too, like you mentioned.

Prix Over a year ago

@KurtLeadley you could further use $httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE); to verify the code is 200 to ensure you got the data as well, see updated code.

Kurt Leadley Over a year ago

Ahh, very nice. I need to do more research on the curl options.

Kurt Leadley · Accepted Answer · 2019-03-28 04:03:19Z

1

I selected Prix's answer for pointing out the user agent definition, but I came up with another way of doing the loop that avoids nested loops and makes it easier to access other nodes. Here is what I am using (DOM Document solution):

$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, $xml_url );
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");

$xml = curl_exec( $curl );
curl_close( $curl );

$document = new DOMDocument;
$document->loadXML( $xml ); 

$items = $document->getElementsByTagName("item");       
foreach ($items as $item) {     
    $title = $item->getElementsByTagName('title')->item(0)->nodeValue;
    echo $title;
    $desc = $item->getElementsByTagName('description')->item(0)->nodeValue;
    echo $desc;
}

answered Mar 28, 2019 at 4:03

Kurt Leadley

5035 silver badges22 bronze badges

5 Comments

Prix Over a year ago

I still prefer SimpleXML, it feels to me more straight forward to use, I've added 3 other examples to show you that.

Kurt Leadley Over a year ago

I see! This is great. I have options next time. I happened to have a working version of code similar to what I just posted myself, so I went with that. I'm interested in trying the array one. Could the array solution reduce SQL insert query's? Right now I insert to my db per loop.

Prix Over a year ago

If you're using a framework like codeigniter you could use it for a bulk insert but it would be pretty much a loop behind the scenes. Just make sure you're using prepared statements to bind all that data within your foreach to avoid headaches later.

Prix Over a year ago

Added an example at the bottom of what it would look like using MySQLi bind_param with the foreach just in case ;)

Kurt Leadley Over a year ago

Thanks, I've been yelled at enough to know better and always use prepared statements haha. Already have the March articles pulling from my websites db : ) sjsharktank.com/index.php

Collectives™ on Stack Overflow

Parsing XML using curl returning null

2 Answers 2

3 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related