2

I am trying to load an xml file into a string as below:

System.IO.StreamReader myFile = new System.IO.StreamReader(@"C:\Users\kuruvilla.philip\Desktop\Files\sample1.xml");
String myString = myFile.ReadToEnd();

However I need to check if there are illegal variations of & without spaces in the string loaded in mystring. for eg: &P (it always needs to be & without spaces)

So I need to correct it in the above case and update it as & in every occurrence.

How to accomplish this?

5
  • 2
    Can you show some examples(valid, invalid, corrected)? Also, format it with the code-button in the editor then you don't need to put spaces between. Commented Sep 9, 2014 at 12:06
  • It might be easier to reverse this and say what is valid with & in it. Then check if the string contains the ampersand and if it does, validate it against the list of valid attributes Commented Sep 9, 2014 at 12:19
  • Then Juneidy Soo's answer is correct Commented Sep 9, 2014 at 12:21
  • 2
    So you're reading invalid XML and you're trying to make it valid? That's going to lash back against you soon. For example, what about CDATA sections? Comments? And someone smart is bound to find a way to exploit that behaviour for nefarious purposes! :D TLDR: Make the guy who creates the XML create valid XML and you're done :) Commented Sep 9, 2014 at 12:48
  • @Luaan You are very right , there can be infinite variations to the rule violations in the inbound XML Commented Sep 10, 2014 at 5:35

3 Answers 3

1

Use a regex to replace any combination of & followed by chars with &

Regex r = new Regex("&[^\\s]*");
r.Replace("&p bla & bla &ohnoes", "&")

outputs: & bla & bla &

The regex looks for matches that

  • Start with an &
  • followed by any char that is not whitespace

Of course this is a very inclusive regex, you may want to tweak it to ignore legimitate elements that don't need to be replaced

Sign up to request clarification or add additional context in comments.

4 Comments

this worked..for my query.However going ahead i found the other input xmls to have other rule violations like missing end tags, repeated root element. I wonder if there is any way we can update and correct all possible XML errors through a generic routine.
As @Luaan recommends in its comment, I would encourage you to have a stern word with the producer of this xml in order to explain that you are having problems consuming it. If you really cannot, there are some libraries that are able to try and "guess" correct xml from a badly formatted one: iirc HtmlAgilityPack in particular can read xml files that are malformed, so that may be a way to handle this problem
HtmlAgilityPack works rather well. If you're not concerned with outright malicious data, Internet Explorer might help a bit too, it's got a pretty decent fixer. In any case, there's just way too many things that can go wrong and result in a completely unreadable document. Seriously, if at all possible, do not suffer invalid XML. All the problems are going to fall on your head, and you will not be able to fix them.
Eh, the consensus seems to be "go shout at somebody about xml standards" ;) Thanks for the comment, @Luaan
0

Have you tried

string.Contains();

Simple example is

string someString = "StackOverflow&P";
bool isIllegal = someString.Contains("&P");
// Do something when isIllegal is true

then it's just a matter of using string.Replace()

1 Comment

OK, but we could have any combination of illegal variations.For eg &P &a &sdfs , it needs to be fixed & a m p ; the only thing that we could probably have constant is & in the illegal variations
0

I would avoid using str.Contains() as it does not return an array of indices, meaning you'll only know if such illegal string exists, not where it is located in the string.

I would actually just "brute-force" this:

**#Psuedo Code!#**
String myString = myFile.ReadToEnd();
for i=0, i<length(myString):
   if myString[i] == '&':
       if ....
       //do your tests here. for example, check that myString[i+1, i+3] is indeed "amp".
       else.... 
       //the test failed, illegal expression found. replace the substring using str.Replace()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.