0

I'm trying to parse an RSS feed and I am getting what appears to be an empty DOM Document object. My current code is:

$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

    $curl = curl_init();
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt( $curl, CURLOPT_URL, $xml_url );

    $xml = curl_exec( $curl );
    curl_close( $curl );

    //$xml = iconv('UTF-8', 'UTF-8//IGNORE', $xml);
    //$xml = utf8_encode($xml);
    $document = new DOMDocument;
    $document->loadXML( $xml ); 
    if( ini_get('allow_url_fopen') ) {
      echo "allow url fopen? Yes";
    }
    echo "<br />";
    var_dump($document);

    $items = $document->getElementsByTagName("item");

    foreach ($items as $item) {
        $title = $item->getElementsByTagName('title');
        echo $title;
    }

    $url = 'https://thehockeywriters.com/category/san-jose-sharks/feed/';
    $xml = simplexml_load_file($url);
    foreach ($items as $item) {
        $title = $item->title;
        echo $title;
    }
    print_r($xml);
    echo "<br />";
    var_dump($xml);
    echo "<br />hello?";

This code is two separate attempts at parsing the same url based on answers and suggestions given in the following examples found on stack overflow:
Example 1
Example 2

Things I have tried or looked up:
1. Checked to make sure that allow_url_fopen is allowed
2. Made sure that there is UTF encoding
3. Validated the XML
4. Code examples provided on previously linked Stack Overflow posts

Here is my current output with the var_dumps and echo's

allow url fopen? Yes
object(DOMDocument)#2 (34) { ["doctype"]=> NULL ["implementation"]=> string(22) "(object value omitted)" 
["documentElement"]=> NULL ["actualEncoding"]=> NULL ["encoding"]=> NULL 
["xmlEncoding"]=> NULL ["standalone"]=> bool(true) ["xmlStandalone"]=> bool(true) 
["version"]=> string(3) "1.0" ["xmlVersion"]=> string(3) "1.0" 
["strictErrorChecking"]=> bool(true) ["documentURI"]=> NULL ["config"]=> NULL 
["formatOutput"]=> bool(false) ["validateOnParse"]=> bool(false) ["resolveExternals"]=> bool(false) 
["preserveWhiteSpace"]=> bool(true) ["recover"]=> bool(false) ["substituteEntities"]=> bool(false) 
["nodeName"]=> string(9) "#document" ["nodeValue"]=> NULL ["nodeType"]=> int(9) ["parentNode"]=> NULL 
["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> NULL ["lastChild"]=> NULL 
["previousSibling"]=> NULL ["attributes"]=> NULL ["ownerDocument"]=> NULL ["namespaceURI"]=> NULL 
["prefix"]=> string(0) "" ["localName"]=> NULL ["baseURI"]=> NULL ["textContent"]=> string(0) "" } 
bool(false) 
hello?
2
  • Neither of the answers you've looked at previously are using SSL. Take a look at stackoverflow.com/questions/4372710/php-curl-https I think the issue is the certificate. Commented Mar 28, 2019 at 3:02
  • Hmm, I tried the quickfix curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, false); just to see if it would work and it did not. Plus, I guess that is a security issue as well. Commented Mar 28, 2019 at 3:09

2 Answers 2

1

The only issue I had with your code was that not defining a user-agent would give me error 403 to access the feed.

In the future, you could use curl_getinfo to extract the status code of the request to ensure it didn't failed and further match it against code 200, which means OK.

$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);

Aside from that a few mistakes within your loops.

With SimpleXML:

<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

if ($httpcode !== 200)
{
    echo "Failed to retrieve feed... Error code: $httpcode";
    die();
}

$feed = new SimpleXMLElement($data);
// list all titles...
foreach ($feed->channel->item as $item)
{
    echo $item->title, "<br>\n";
}

With DOMDocument:

<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

if ($httpcode !== 200)
{
    echo "Failed to retrieve feed... Error code: $httpcode";
    die();
}

$xml = new DOMDocument();
$xml->loadXML($data);
// list all titles...
foreach ($xml->getElementsByTagName("item") as $item)
{
    foreach ($item->getElementsByTagName("title") as $title)
    {
        echo $title->nodeValue, "<br>\n";
    }
}

If you just want to print the title/description of all items:

foreach ($feed->channel->item as $item)
{
    echo $item->title;
    echo $item->description;
    // uncomment the below line to print only the first entry.
    // break;
}

If you want just the first entry, without using a foreach:

echo $feed->channel->item[0]->title;
echo $feed->channel->item[0]->description;

Saving title and description to an array for later using it:

$result = [];
foreach ($feed->channel->item as $item)
{
    $result[] = 
    [
        'title' => (string)$item->title,
        'description' => (string)$item->description
    ];
    // could make a key => value alternatively from the above with 
    // title as key like this: 
    // $result[(string)$item->title] = (string)$item->description;
}

Foreach with MySQLi/PDO prepared statement:

foreach ($feed->channel->item as $item)
{
    // MySQLi
    $stmt->bind_param('ss', $item->title, $item->description);
    $stmt->execute();
    // PDO
    //$stmt->bindParam(':title', $item->title, PDO::PARAM_STR);
    //$stmt->bindParam(':description', $item->description, PDO::PARAM_STR);
    //$stmt->execute();
}
Sign up to request clarification or add additional context in comments.

3 Comments

As soon as I added that useragent line, I was able to get it to work. Thank you. Fixed the loop too, like you mentioned.
@KurtLeadley you could further use $httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE); to verify the code is 200 to ensure you got the data as well, see updated code.
Ahh, very nice. I need to do more research on the curl options.
1

I selected Prix's answer for pointing out the user agent definition, but I came up with another way of doing the loop that avoids nested loops and makes it easier to access other nodes. Here is what I am using (DOM Document solution):

$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, $xml_url );
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");

$xml = curl_exec( $curl );
curl_close( $curl );

$document = new DOMDocument;
$document->loadXML( $xml ); 

$items = $document->getElementsByTagName("item");       
foreach ($items as $item) {     
    $title = $item->getElementsByTagName('title')->item(0)->nodeValue;
    echo $title;
    $desc = $item->getElementsByTagName('description')->item(0)->nodeValue;
    echo $desc;
}

5 Comments

I still prefer SimpleXML, it feels to me more straight forward to use, I've added 3 other examples to show you that.
I see! This is great. I have options next time. I happened to have a working version of code similar to what I just posted myself, so I went with that. I'm interested in trying the array one. Could the array solution reduce SQL insert query's? Right now I insert to my db per loop.
If you're using a framework like codeigniter you could use it for a bulk insert but it would be pretty much a loop behind the scenes. Just make sure you're using prepared statements to bind all that data within your foreach to avoid headaches later.
Added an example at the bottom of what it would look like using MySQLi bind_param with the foreach just in case ;)
Thanks, I've been yelled at enough to know better and always use prepared statements haha. Already have the March articles pulling from my websites db : ) sjsharktank.com/index.php

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.