0

I have a method which is based on this function: https://github.com/gaarf/XML-string-to-PHP-array/blob/master/xmlstr_to_array.php

Now I altered it to suit my needs, which looks like this now:

private function parseXml($xmlString)
{
    $doc = new \DOMDocument;
    $doc->loadXML($xmlString);
    $root = $doc->documentElement;
    $output[$root->tagName] = $this->domnodeToArray($root);

    return $output;
}

/**
 * @param $node
 * @return array|string
 */
private function domNodeToArray($node)
{
    $output = [];
    switch ($node->nodeType)
    {
        case XML_CDATA_SECTION_NODE:
        case XML_TEXT_NODE:
            $output = trim($node->textContent);
            break;
        case XML_ELEMENT_NODE:
            for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++)
            {
                $child = $node->childNodes->item($i);
                $v = $this->domNodeToArray($child);

                if (isset($child->tagName))
                {
                    $t = $child->tagName;

                    if (!isset($output['value'][$t]))
                    {
                        $output['value'][$t] = [];
                    }

                    $output['value'][$t][] = $v;
                }
                else if ($v || $v === '0')
                {
                    $output['value'] = (string)$v;
                }
            }

            if (isset($output['value']) && $node->attributes->length && !is_array($output['value']))
            {
                $output = ['value' => $output['value']];
            }

            if (!$node->attributes->length && isset($output['value']) && !is_array($output['value']))
            {
                $output = ['attributes' => [], 'value' => $output['value']];
            }

            if (isset($output['value']) && is_array($output['value']))
            {
                if ($node->attributes->length)
                {
                    $a = [];
                    foreach ($node->attributes as $attrName => $attrNode)
                    {
                        $a[$attrName] = (string)$attrNode->value;
                    }
                    $output['attributes'] = $a;
                }
                else
                {
                    $output['attributes'] = [];
                }

                foreach ($output['value'] as $t => $v)
                {
                    if (is_array($v) && count($v) == 1 && $t != 'attributes')
                    {
                        $output['value'][$t] = $v[0];
                    }
                }
            }
            break;
    }

    return $output;
}

Taking an example XML/XSD string and trying to convert it to an array with the method above (parseXML), will result in the loss of some attributes, but only when using my altered version, it works properly with the methods provided in the github repository.

The example XSD string looks like so:

$xsdStr = '<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="book">
        <xs:complexType>

            <xs:sequence>
                <xs:element name="title">
                    <xs:simpleType>
                        <xs:restriction base="xs:string">
                            <xs:maxLength value="40"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>

                <xs:element name="author">
                    <xs:simpleType>
                        <xs:restriction base="xs:string">
                            <xs:maxLength value="40"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>

                <xs:element name="character" maxOccurs="unbounded" minOccurs="0">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="name">
                                <xs:simpleType>
                                    <xs:restriction base="xs:string">
                                        <xs:maxLength value="40"/>
                                    </xs:restriction>
                                </xs:simpleType>
                            </xs:element>
                            <xs:element name="friend-of" maxOccurs="unbounded" minOccurs="0">
                                <xs:simpleType>
                                    <xs:restriction base="xs:string">
                                        <xs:maxLength value="40"/>
                                    </xs:restriction>
                                </xs:simpleType>
                            </xs:element>
                            <xs:element name="since" type="xs:date"/>
                            <xs:element name="qualification" type="xs:string"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
            <xs:attribute name="isbn" use="required"> 
                <xs:simpleType>
                    <xs:restriction base="xs:integer">
                        <xs:totalDigits value="10"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:attribute> 

        </xs:complexType>
    </xs:element>

</xs:schema>';

echo '<pre>';
echo print_r($this->parseXml($xsdStr), true);

The output of this array will looks like so (print_r): https://pastebin.com/sYvf5Z4X (using URL as it will exceed the character limit).

To make it easier, the maxLength tag loses its attribute value with the value 40 in all occurrences of it. I simply can't see why that is happening with my altered version, but not the original code.

1 Answer 1

1

The problem is (must admit I don't totally understand the ins and outs of the code)...

In this code here...

if (isset($output['value']) && is_array($output['value']))
{
     if ($node->attributes->length)

This only works if there is a value set for the node. I think what happens is that any leaf node doesn't have a value and therefore the attribute values are skipped.

if ($node->attributes->length)
{
            // ...
}

if (isset($output['value']) && is_array($output['value']))

If you move the check for attributes outside of this branch it works OK.

The difference is that the original code doesn't check there is a value set, it just checks that there is something there (line 48 from original code) ...

if(is_array($output)) {
Sign up to request clarification or add additional context in comments.

4 Comments

The original code basically stores the values directly on the key, but I need them to be inside value, so I switched the check like above. I know there's definitely something wrong in my altered version overall, but I can't see what exactly. I will test out your solution.
The main difference seems to be that you only ever create the value element of the array IF there are some child elements. Which in the case of <xs:maxLength value="40"/> there are no child elements.
Is that appropriate or would that need changing?
You could create it for consistency, but IMHO your code is correct in not creating it in this case. There is no 'value' for leaf nodes, so putting one in can potentially lead to the assumption there should be something there.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.