0

I have an xml string. That xml string has to be converted into PHP array in order to be processed by other parts of software my team is working on. For xml -> array conversion i'm using something like this:

if(get_class($xmlString) != 'SimpleXMLElement') {
    $xml = simplexml_load_string($xmlString);
} 
if(!$xml) { 
    return false; 
} 

It works fine - most of the time :) The problem arises when my "xmlString" contains something like this:

<Line0 User="-5" ID="7436194"><Node0 Key="<1" Value="0"></Node0></Line0>

Then, simplexml_load_string won't do it's job (and i know that's because of character "<"). As i can't influence any other part of the code (i can't open up a module that's generating XML string and tell it "encode special characters, please!") i need your suggestions on how to fix that problem BEFORE calling "simplexml_load_string".

Do you have some ideas? I've tried

str_replace("<","&lt;",$xmlString)

but, that simply ruins entire "xmlString"... :(

7
  • 1
    Using a valid XML string in the first place is the solution, not trying to make it valid when you're processing it.... where does your XML come from in the first place? Commented Jun 18, 2015 at 8:53
  • Horrid part of the software, Java, written long time ago by ex co-worked of mine :( That's why fixing original xml producer isn't an option (too much time would take, my boss won't allow that change) :( Commented Jun 18, 2015 at 8:58
  • how does it ruin $xmlString? Commented Jun 18, 2015 at 9:10
  • @michi - because an attribute value that contains certain characters like ", <, &, etc should be either CDATA, or use entities for those characters Commented Jun 18, 2015 at 9:23
  • I think you need to convince your boss that a hack is not a suitable fix for an obvious bug. What happens if the bug manifests itself in subtly different ways? Will you be expected to continue patching your XML parse code? That doesn't seem like a pretty scenario. In any case, the original bug might be a trivial fix - seems worth analysing at least. Commented Jun 18, 2015 at 10:05

1 Answer 1

2

Well, then you can just replace the special characters in the $xmlString to the HTML entity counterparts using htmlspecialchars() and preg_replace_callback().

I know this is not performance friendly, but it does the job :)

<?php
$xmlString = '<Line0 User="-5" ID="7436194"><Node0 Key="<1" Value="0"></Node0></Line0>';

$xmlString = preg_replace_callback('~(?:").*?(?:")~',
    function ($matches) {
        return htmlspecialchars($matches[0], ENT_NOQUOTES);
    },
    $xmlString
);

header('Content-Type: text/plain');
echo $xmlString; // you will see the special characters are converted to HTML entities :)

echo PHP_EOL . PHP_EOL; // tidy :)
$xmlobj = simplexml_load_string($xmlString);
var_dump($xmlobj);
?>
Sign up to request clarification or add additional context in comments.

1 Comment

This did the trick :) Nice solution i may use until we start refactoring old Java code which generated wrong XML in the first place :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.