1

I am having a problem while parsing an url of an intranet site with xml document. The following is a simplified example :

XML file

<?xml version="1.0" encoding="utf-8" ?>
<Nodes>
  <Node>
    <Project>Test</Project>
    <Link>https://www.google.com/?gws_rd=ssl#q=&+fails+in+url</Link>
  </Node>
</Nodes>

When I try to parse and load the xml above in my c# code, I get an error at "Xdoc.load" because of the "&" used in the above code. Generally, we can resolve this by using "%26" in place of &, but I can't in my case as changing the "&" to a "%26" is breaking the url. ie. I think the "&" is being used as part of a query string and removing the & is breaking the parameters on the page.

This might not be the efficient way to do it, but this is the requirement.

protected void Page_Load(object sender, EventArgs e)
        {
            XmlDocument xdoc = new XmlDocument();
            xdoc.Load(Server.MapPath("~/Content/XMLFile1.xml"));
            XmlNodeList lNodes = xdoc.SelectNodes("/Nodes/Node");

            foreach (XmlElement p in lNodes)
            {
                var m = p["Link"].InnerText;
                string s = "window.open('" + m + "', 'popup_window', 'width=300,height=100,left=100,top=100,resizable=yes');";
                //ClientScript.RegisterStartupScript(this.GetType(), "script", s, true);
                ScriptManager.RegisterStartupScript(this, this.GetType(), "script", s, true);

            }

        }
8
  • If you have control over the XML document, wrap the URL in a CDATA block. That will prevent the parser from breaking on special characters. <Link><![CDATA[url]]></Link>. Otherwise you'll have to manipulate the XML prior to loading it into XmlDocument, and that is a very brittle approach and not a good idea. Commented Sep 15, 2015 at 18:29
  • Well that's simply not valid XML. Where did you get the XML file from? Fix the document, rather than writing code to work around brokenness. Commented Sep 15, 2015 at 18:32
  • I created the xml file. The "Link" was given to me. Commented Sep 15, 2015 at 18:33
  • @CodeNinja - Then fix the XML you created so it is valid :) Commented Sep 15, 2015 at 18:33
  • 1
    Use &amp; instead of & in xml. Commented Sep 15, 2015 at 18:37

1 Answer 1

1

There are 5 characters that are considered special in XML: ", &, ', < and >. Any of those being present in an attribute or element value will cause an XML parser to break.

Since you control the generation of the XML, it is far better to fix the XML then to try to modify it in the application before parsing it.

The <![CDATA[[]]> tag is a good way to do this, as is replacing the special characters with their equivalent entities, like &amp; for &. If you're dealing with XML that has attributes with special characters in the values, you will have to use the character entities, as CDATA won't work with the attributes.

If it's an element's value, you can use either approach.

So, using your example posted above:

 <Link>https://www.google.com/?gws_rd=ssl#q=&+fails+in+url</Link>

Would become:

 <Link><![CDATA[https://www.google.com/?gws_rd=ssl#q=&+fails+in+url]]></Link>
Sign up to request clarification or add additional context in comments.

1 Comment

Yippiee :-) . This works . Thanks ! Marked as answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.