2

I apologize if this question has been asked before, all the previous posts on this topic were helpful but I'm still having trouble figuring out the solution to my problem. I'm still very new to programming and regex, I'm sorry if this comes off as a dumb question.

I need a regex pattern that will take the value between 3 specific xml tags. And replace that value with nothing.

This is what I currently have:

string pattern = @"<A99_01>(.*?)</A99_01>";
string input = "<A99_01>TEST</A99_01><A99_02>TEST</A99_02><A99_03>TEST</A99_03><A99_04>TEST</A99_04>";
string replacement = "";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);

My current regex pattern only matches one of the required tags, I can't figure how to only select its value without pulling the entire line. Is it possible to list multiple patterns?

I want to only perform the replace on tags <A99_01>,<A99_02>, and <A99_03> without touching any tags above <E99_04>.

Thanks in advance for any help!

5
  • 1
    I may have missed something, but why don't use use the System.Xml namespace (like XmlDocument)? Commented Jun 12, 2015 at 16:38
  • 2
    You should never use regex to parse xml or html. An XML parser is a much better option Commented Jun 12, 2015 at 16:45
  • Currently the XML has two root nodes or I'd recommend using XmlDocument as well. Currently you wouldn't be able to load it into an XmlDocument Commented Jun 12, 2015 at 16:46
  • Hi Simon, i'm currently using System.Text.RegularExpressions. Is it easier to use System.Xml for this problem? I will do some research on that namespace. Thanks! Commented Jun 12, 2015 at 16:50
  • 1
    Always use XML APIs to process XML. Never use regular expressions to process XML. Use LINQ to XML instead. Commented Jun 12, 2015 at 16:54

2 Answers 2

1

You could use capturing group.

Regex.Replace(str, @"<(A99_0[123]>).*?</\1", "<$1</$1");

DEMO

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Avinash! What does the "str" do at the end of the pattern?
what if I want to replace the value between the tags with a numeric date? For example, 1900-01-01. Is there a way to set this as the replace "<$11900-01-01</$1" without it looking at 1900 as part of the first subpattern? I know I can separate it with a space or quotes, but I only want the number date as the value. Thanks!
1

There's absolutely no way you should be using regular expressions for this. While it could "work" it's going to be incredibly inflexible, hard to change, and a pain to maintain.

I suggest:

  • Load the XML via XmlDocument or XDocument (preferably)
  • Use Linq to XML to parse the XML
  • Filter out any of the tags you don't want
  • Construct a new XML document/file based upon the filtered version

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.