0

I want to resolve: <tag>alphabetic characters and space</tag>

I propose this one:

<.*>([A-Za-z]+)</.*>

is this correct?

2
  • It is almost correct in the narrow sense that, once you add the space to the character group, it will match the exact string in your question. Whether it is correct in the more general, and perhaps more useful, sense depends entirely on where you're going with this. Commented Dec 6, 2012 at 13:33
  • 3
    stackoverflow.com/questions/1732348/… :) Commented Dec 6, 2012 at 13:35

3 Answers 3

8

Please, for the sake of whatever poor developer will have to deal with your code after you, please do not try to parse XML with regular expressions.

Use a SAX or DOM parser instead. There are plenty of good guides on the web if you search on Google, but here is a quick example using the standard javax.xml package...

Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(xmlFile);
Node node = doc.getElementsByTagName("tag").item(0);
String value = node.getNodeValue();
Sign up to request clarification or add additional context in comments.

1 Comment

I use SAX, this code is not use to parse xml documents, it's for pentaho kettle
2

What if the input is: <tag> something <inner-tag> some other thing </inner-tag> </tag> ?

I'd suggest you to use an XML parser library, e.g. Apache Digester.

Comments

-1

You should add ? character to exclude redundancy selection

    <.*?>[A-Za-z ]*</.*?>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.