1

I'm new to PHP, MySQL and XML... and have been trying to wrap my head around classes, objects, arrays and loops. I'm working on a parser that extracts data from an XML file, then stores it into a database. A fun and delightfully frustrating challenge to work on during the christmas holiday.

Before posting this question I've gone over the PHP5.x documentation, W3C and also searched quite a bit around stackoverflow.

Here's the code...

> XML:

<alliancedata>
    <server>
        <name>irrelevant</name>
    </server>

    <alliances>
        <alliance>
            <alliance id="101">Knock Out</alliance>

            <roles>
                <role>
                    <role id="1">irrelevant</role>
                </role>
            </roles>

            <relationships>
                <relationship>
                    <proposedbyalliance id="102" />
                    <acceptedbyalliance id="101" />
                    <relationshiptype id="4">NAP</relationshiptype>
                    <establishedsince>2014-12-27T18:01:34.130</establishedsince>
                </relationship>
                <relationship>
                    <proposedbyalliance id="101" />
                    <acceptedbyalliance id="103" />
                    <relationshiptype id="4">NAP</relationshiptype>
                    <establishedsince>2014-12-27T18:01:34.130</establishedsince>
                </relationship>
                <relationship>
                    <proposedbyalliance id="104" />
                    <acceptedbyalliance id="101" />
                    <relationshiptype id="4">NAP</relationshiptype>
                    <establishedsince>2014-12-27T18:01:34.130</establishedsince>
                </relationship>
            </relationships>
        </alliance>
</alliancedata>

> PHP:

$xml = simplexml_load_file($alliances_xml); // $alliances_xml = path to file

  // die(var_dump($xml));
  // var_dump prints out the entire unparsed xml file.

  foreach ($xml->alliances as $alliances) {

       // Alliance info 
       $alliance_id = mysqli_real_escape_string($dbconnect, $alliances->alliance->alliance['id']);
       $alliance_name = mysqli_real_escape_string($dbconnect,$alliances->alliance->alliance);

       // Diplomacy info
       $proposed_by_alliance_id = mysqli_real_escape_string($dbconnect,$alliances->alliance->relationships->relationship->proposedbyalliance['id']);
       $accepted_by_alliance_id = mysqli_real_escape_string($dbconnect,$alliances->alliance->relationships->relationship->acceptedbyalliance['id']);
       $relationship_type_id = mysqli_real_escape_string($dbconnect,$alliances->alliance->relationships->relationship->relationshiptype['id']);
       $established_date = mysqli_real_escape_string($dbconnect,$alliances->alliance->relationships->relationship->establishedsince);

// this is my attempt to echo every result
echo "Alliance ID: <b>$alliance_id</b> <br/>";
echo "Alliance NAME: <b>$alliance_name</b> <br/>";
echo "Diplomacy Proposed: <b>$proposed_by_alliance_id</b> <br/>";
echo "Diplomacy Accepted: <b>$accepted_by_alliance_id</b> <br/>";
echo "Diplomacy Type: <b>$relationship_type_id</b> <br/>";
echo "Date Accepted: <b>$established_date</b> <br/>";
echo "<hr/>";
}

> intrepter output:

Alliance ID: 1 
Alliance NAME: Knock Out
Diplomacy Proposed: 102 
Diplomacy Accepted: 101
Diplomacy Type: 4 
Date Accepted: 2011-10-24T05:08:35.830

I don't understand why the loop simply stops after parsing the first row of data. My best guess, is that my code is not telling PHP what to do after the first values are parsed.

Honestly I have no idea how to explain this in words, so here's a visual representation.

First row is interpreted as

--->$alliance_id
--->$alliance_name
--->$proposed_by_alliance_id
--->$accepted_by_alliance_id
--->$relationship_type_id
--->$established_date

then for the next <relationship> subnodes the following happens...

---> ?? _(no data)_
---> ?? _(no data)_
--->$proposed_by_alliance_id
--->$accepted_by_alliance_id
--->$relationship_type_id
--->$established_date

Since I'm not telling PHP to add $alliance_id and $alliance_name to every iteration of the <relationship> subnode, the interpreter simply decides to abort the foreach operation. As I mentioned above, I'm new to both PHP and Stackoverflow and I really appreciate any help or wisdom you can share. Thank you in advance.

17
  • whats stored in this variable $alliances_xml ? Path to file or xml content? also do var_dump of $xml just after calling simplexml function, and check if its object or false Commented Dec 30, 2014 at 14:30
  • It's just the path to the file. Thanks for the prompt reply! (will add that crucial bit of info to the question) Commented Dec 30, 2014 at 14:32
  • please try to var_dump $xml just after calling simplexml function, and check whats returning. keep in mind that simplexml has problem to read huge xml files, yesterday i tried a file of size 2mb and wasnt able to read and parse it Commented Dec 30, 2014 at 14:34
  • Instead of $alliance->alliance['id'] use $alliance->alliance->attributes()->id Commented Dec 30, 2014 at 14:35
  • @AleksandarVasić: adding var_dump $xml gives the following result... Parse error: syntax error, unexpected '$xml' (T_VARIABLE). I'm Consulting the PHP docs for the proper way to do this. Commented Dec 30, 2014 at 14:40

2 Answers 2

2

You write that you've got problems to debug your issues traversing an XML document with SimpleXML.

The first puzzle you come over is that your foreach does only iterate once:

foreach ($xml->alliances as $alliances) {

You can't accept the fact. However, if we take the XML you've got in your question and actually take a look how many <alliances> elements the XML document has, we can see that SimpleXML is doing the right thing here:

  • there is exactly one (1) <alliances> element inside the document element.
  • $xml->alliances has one (1) iteration.
  • $xml->alliances->count() gives int(1)

The accordance with the XML can be easily verified as well. Commented dead code in your questions example suggests that you were using var_dump to see whether or not the XML loads. You don't have to, if simplexml_load_file does not return false, the document was loaded (if you opt for falsy: the document was either not loaded or empty).

So if you want to ensure the document has loaded, just check the return value and throw an exception in case there was a problem.

To check which XML a SimpleXMLElement contains, you shouldn't use var_dump as well. Instead output the XML. As the XML can be quite large at this point, take only the first 256 bytes for example, that normally shows a good picture:

echo substr($xml->alliances->asXML(), 0, 256), "\n";

<alliances>
    <alliance>
    <alliance id="1">Harmless?</alliance>  
    <foundedbyplayerid id="10"/><alliancecapitaltownid id="14646"/>
    <allianceticker>H?</allianceticker>  
    <foundeddatetime>2010-02-25T14:18:07.867</foundeddatetime>  
    <alliancecapitallastmoved>2012-01-19T17:42
 ^^^^^^^^^

This directly shows that you're iterating over the element(s) named alliances which exist only once in the document. This is totally aligned with the observation you've made that there is only one foreach.

With this really basic debugging you can do the following conclusion:

  • It is observed that Foreach does only iterate once (1).
  • Foreach has been commanded to iterates over elements named alliances.
  • As there is only one (1) iteration, there has to be only one (1) alliances element.
  • Counting the alliances elements, the result is one.
  • Therefore it is confirmed that there is only one (1) alliances element.

So obviously you're iterating over the wrong element(s).

As this outline of the error finding is rather extensive (just to give you the picture at which many points you could have already improved both your code but also the error checking and especially to show you places where you can start with trouble-shooting), the question remains, why you weren't able to spot this already. As until now, an answer here already pointed to the fact, that you were iterating over the wrong element(s). However it was not written out, but just a bit cryptic in code:

[...] change your for loop from foreach ($xml->alliances->alliance as $alliance) { to foreach ($xml->alliance as $alliance) {

and that's all

Source

Sure it's weak, as this only gives code but doesn't answer any of your (programming) question(s).

After finding the cause, let's cure this step by step

So after finding out that it's the wrong element, it's easy to fix that: iterate over the right elements.

This can be done by applying incremental changes to your code.

First of all the correct element needs to be chosen:

foreach ($xml->alliances->alliance as $alliances) {

This will immediately make your code spit out a lot of errors, many for each iteration. And there are many iterations. So you can already say with this little change, something was effectively changed into the right direction: Instead of one iteration, there are now many more.

But before fixing the mess with the newly introduced errors and warnings, first take care about the code just changed. The next thing is to rename the variable $alliances to $alliance (your editor should support your with that by either using search and replace (often CTRL+R) or by offering a refactoring command named "rename variable" (e.g. SHIFT+F6 in Phpstorm)). Afterwards that line (and the following lines are also changed but I don't show them) looks like:

 foreach ($xml->alliances->alliance as $alliance) {

And it's yet still not ready. As $xml->alliances->alliance is a bit bulky, let's move it out and take a more speaking variable for that: $alliances:

$alliances = $xml->alliances->alliance;
foreach ($alliances as $alliance) {

The next step that needs to be done is just to correct an error you made. For some obscure reason totally not clear to me is that pass all data through mysqli_real_escape_string(). Even though if you would have intended to pass the data later on to a database, this is yet at the wrong place to call that function. First of all extract the data, that function is called later on in preparation of the database insert operation which is a different part of your application.

I just replaced all occurences of "mysqli_real_escape_string($dbconnect," with "trim(" so that finally - after proper indentation - the code has changed to this:

$alliances = $xml->alliances->alliance;
foreach ($alliances as $alliance) {

    // Alliance info
    $alliance_id   = trim($alliance->alliance->alliance['id']);
    $alliance_name = trim($alliance->alliance->alliance);

    // Diplomacy info
    $proposed_by_alliance_id = trim($alliance->alliance->relationships->relationship->proposedbyalliance['id']);
    $accepted_by_alliance_id = trim($alliance->alliance->relationships->relationship->acceptedbyalliance['id']);
    $relationship_type_id    = trim($alliance->alliance->relationships->relationship->relationshiptype['id']);
    $established_date        = trim($alliance->alliance->relationships->relationship->establishedsince);

Thanks to the better named variables it now is pretty visible where the many

Notice: Trying to get property of non-object

warnings come from: The many calls to $alliance->alliance-> are just redundant. If we remember that originally you did iterate over the wrong elements, this is the counter-part: Because you used the wrong elements, you had to make the error more than once, otherwise you could not have extracted any data at all. Just think a second about this. It also means, that the earlier you could have verified that what your intention to do is actually done by the code, the less little problems were introduced.

Good thing here again is that this is easy to fix by replacing all "$alliance->alliance->" with "$alliance->":

$alliances = $xml->alliances->alliance;
foreach ($alliances as $alliance) {

    // Alliance info
    $alliance_id   = trim($alliance->alliance['id']);
    $alliance_name = trim($alliance->alliance);

    // Diplomacy info
    $proposed_by_alliance_id = trim($alliance->relationships->relationship->proposedbyalliance['id']);
    $accepted_by_alliance_id = trim($alliance->relationships->relationship->acceptedbyalliance['id']);
    $relationship_type_id    = trim($alliance->relationships->relationship->relationshiptype['id']);
    $established_date        = trim($alliance->relationships->relationship->establishedsince);

Running the code again now shows that the iteration works and the information to obtain from each alliance element works perfectly fine as well. Still there are errors given because as you already say in your question, you not only wonder about the iteration but also about further traversing the relationships:

Alliance ID ......: 1
Alliance NAME ....: Harmless?
Diplomacy Proposed: 454
Diplomacy Accepted: 1
Diplomacy Type ...: 4
Date Accepted  ...: 2011-10-24T05:08:35.830
-------------------------------------------------
[4x Notice: Trying to get property of non-object]
Alliance ID ......: 2
Alliance NAME ....: Danger
Diplomacy Proposed: 
Diplomacy Accepted: 
Diplomacy Type ...: 
Date Accepted  ...: 
-------------------------------------------------
...

The error messages correspond to the following four lines:

$proposed_by_alliance_id = trim($alliance->relationships->relationship->proposedbyalliance['id']);
$accepted_by_alliance_id = trim($alliance->relationships->relationship->acceptedbyalliance['id']);
$relationship_type_id    = trim($alliance->relationships->relationship->relationshiptype['id']);
$established_date        = trim($alliance->relationships->relationship->establishedsince);

Which means, that again, you need to apply trouble-shooting steps as outlined at the very beginning of my answer to this section now of your code.

Here is the code example so far:

$xml = simplexml_load_file($alliances_xml); // $alliances_xml = path to file
if (!$xml) {
    throw new UnexpectedValueException(
        sprintf("Unable to load XML or it was empty. Filename given was %s", var_export($alliances_xml, true))
    );
}

$alliances = $xml->alliances->alliance;
// limit to two iterations for debugging
$alliances = new LimitIterator(new IteratorIterator($alliances), 0, 2);

foreach ($alliances as $alliance) {

    // Alliance info
    $alliance_id   = trim($alliance->alliance['id']);
    $alliance_name = trim($alliance->alliance);

    // Diplomacy info

    $proposed_by_alliance_id = trim($alliance->relationships->relationship->proposedbyalliance['id']);
    $accepted_by_alliance_id = trim($alliance->relationships->relationship->acceptedbyalliance['id']);
    $relationship_type_id    = trim($alliance->relationships->relationship->relationshiptype['id']);
    $established_date        = trim($alliance->relationships->relationship->establishedsince);

    // this is my attempt to echo every result
    echo "Alliance ID ......: $alliance_id\n";
    echo "Alliance NAME ....: $alliance_name\n";
    echo "Diplomacy Proposed: $proposed_by_alliance_id\n";
    echo "Diplomacy Accepted: $accepted_by_alliance_id\n";
    echo "Diplomacy Type ...: $relationship_type_id\n";
    echo "Date Accepted  ...: $established_date\n";
    echo "-------------------------------------------------\n";
}

Please note that I'm using the command-line to execute the PHP code as it's much faster then via the browser over a webserver. I also do not need to write HTML to just have nicely formatted output.

Sign up to request clarification or add additional context in comments.

2 Comments

Hey Hakre, thank you so much for you answer. I haven't entirely solved the mystery yet, but I wanted to express my gratitude to you for writing such a thorough reply. I'm studying up on multidimensional arrays, and am using your guidelines to debug as I go. Should have the solution (not to mention an actual understanding of Arrays in PHP) by early next week. Thanks again! :)
Half a year later... I tried again and easily figured it out. Your explanation was spot on Hakre, thanks for help! (also to @jezrael for the edit)
0

I made phpfiddle of your code, tested, working.

http://phpfiddle.org/main/code/7agg-si3f

You need to remove

<server>
     <name>Epic1</name>
</server>

and add </alliances> to the end, since it's reporting invalid xml

after that change your for loop from foreach ($xml->alliances->alliance as $alliance) { to foreach ($xml->alliance as $alliance) {

and that's all

3 Comments

Just checked your fiddle. It's not looping to through the 3 relationship nodes. Also, I can't remove <server> <name>Epic1</name> </server> because this is part of an XML file I'm pulling from a data dump on the internet. My existing code is able to parse and write data to the database. The main problem is that its mixing up all the values. So properties that belong to alliance 1, are being added to other alliances. I think I'm missing something, or just not understanding your explanation.
please add the whole your code to phpfiddle, and bigger xml file
I've made a PHP fiddle but apparently cannot share it without registering, so instead I'll share a direct link to the XML dump. [data.illyriad.co.uk/datafile_alliances.xml]. Please note that I rewrote the entire question to better conform to stackoverflow standards.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.