1

I have a java string containing XML. I want to read through this Java String wrap all the text nodes within CData, only I'm not sure how to do this. The reason for doing this is that the is a text node containing an angle bracket which is causing an exception when I try to parse the String. Can any1 help me out?

<node> this < is text <node> <node2> this is < text <node2>

I would like to know if there is an easy way of reading this text as a string with XMLReader and inserting CData around the text

thanks

Stefan

9
  • how are you parsing the string? You tagged it as SAX, but can you provide your code? Commented Jan 4, 2013 at 13:24
  • possible duplicate of How to parse XML for ![CDATA[] Commented Jan 4, 2013 at 13:24
  • I am trying to insert a CData wrapper for every text node within XML string - note using XMLReader and SAXParser. I am not trying to get the character data out rather im trying to wrap CData around the text, looking for advice on how to do this in anyway Commented Jan 4, 2013 at 13:48
  • 1
    Does the user enter the whole XML? If so, your design is broken (or you need to require your users to enter VALID XML). Does the user enter something and you build XML from that? Then make sure you produce valid XML in the first place! Trying to fix the not-really-XML after the fact will be a major pain in the but! Commented Jan 4, 2013 at 14:40
  • 1
    Then escape the necessary characters while building the XML. As I said: build correct XML and you'll be fine (i.e. you'll be able to use any XML parser to get your data back out). Commented Jan 4, 2013 at 16:24

2 Answers 2

2

Perhaps something like this (apologies in advance for any inefficiency:

if(currentNode instanceof XMLNodeType.Text)  
{  
     String toWrite = String.format("<![CDATA[%s]]>", currentNode.getText());   
     // or whatever retrieves text of the node
}  

It looks like you need to massage the data to be valid XML. The process for this is of course highly dependent on your input. So essentially what occurs is you receive a big string that you need to convert into valid XML. The advantage here is that you can define a schema that the third party adheres to, this is a meeting with them so it is outside of the scope of discussion, but is worth mentioning. Once you have this schema defined you will know which nodes are considered "text" nodes and need to be wrapped in CDATA blocks.

The basic idea is this:

List<String> textTags = new ArrayList<String>();  
textTags.add("NODE");  
//other things to add
String bigAwfulString = inputFromThirdParty();   
String validXML = ""; 
for(String currentNode : bigAwfulString.split("yourRegexHere")  
{  
    if(textTags.contains(currentNode)  
    {  
           validXML+=String.format("<![CDATA[%s]]>", currentNode.getText());    
           continue;
    }   
    validXML+=currentNode;
}
Sign up to request clarification or add additional context in comments.

5 Comments

That won't work. The data (as described in the question, but not in the example) is invalid XML, so it won't parse. This question is about fixing up the broken XML.
@Quentin ah yes I see now. Perhaps the issue is that OP isn't properly writing XML in the first placE?
Someone isn't. We can't tell if it is being written by the OP or if it is broken third party data.
@Quentin true, I will leave this post in the interim until OP changes his sample code.
Generating CDATA by String.format("%s") is fully broken, if the string contains two closing brackets. You have to encode it properly.
0

Try this, it worked for me.
http://www.java2s.com/Code/Java/XML/AddingaCDATASectiontoaDOMDocument.htm

import java.io.File;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.CDATASection;
import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class Main {
  public static void main(String[] argv) throws Exception {

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);

    factory.setExpandEntityReferences(false);

    Document doc = factory.newDocumentBuilder().parse(new File("filename"));
    Element element = doc.getElementById("key1");

    // Add a CDATA section to the root element
    element = doc.getDocumentElement();
    CDATASection cdata = doc.createCDATASection("data");
    element.appendChild(cdata);

  }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.