5

I'm trying to save a tree (extends JTree) which holds an XML document to a DOM Object having changed it's structure.

I have created a new document object, traversed the tree to retrieve the contents successfully (including the original encoding of the XML document), and now have a ByteArrayInputStream which has the tree contents (XML document) with the correct encoding.

The problem is when I parse the ByteArrayInputStream the encoding is changed to UTF-8 (in the XML document) automatically.

Is there a way to prevent this and use the correct encoding as provided in the ByteArrayInputStream.

It's also worth adding that I have already used the
transformer.setOutputProperty(OutputKeys.ENCODING, encoding) method to retrieve the right encoding.

Any help would be appreciated.

1
  • Can your share a bit of your code? Commented Aug 26, 2010 at 18:59

4 Answers 4

5

Here's an updated answer since OutputFormat is deprecated :

TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String output = writer.getBuffer().toString().replaceAll("\n|\r", "");

The second part will return the XML Document as String

Sign up to request clarification or add additional context in comments.

Comments

3
// Read XML
String xml = "xml"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));

// Append formatting
OutputFormat format = new OutputFormat(document);

if (document.getXmlEncoding() != null) {
  format.setEncoding(document.getXmlEncoding());
}

format.setLineWidth(100);
format.setIndenting(true);
format.setIndent(5);
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);
String result = out.toString();

2 Comments

Some explanation of this code would be useful for those who come and read this answer later.
Also it would be useful to mention which XMLSerializer you use. org.apache.xml.serialize.XMLSerializer I assume ? - using the "internal" ones of the sun package would be really bad practice.
2

I solved it, given alot of trial and errors.

I was using

OutputFormat format = new OutputFormat(document);

but changed it to

OutputFormat format = new OutputFormat(d, encoding, true);

and this solved my problem.

encoding is what I set it to be
true refers to whether or not indent is set.

Note to self - read more carefully - I had looked at the javadoc hours ago - if only I'd have read more carefully.

Comments

1

This worked for me and is very simple. No need for a transformer or output formatter:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(inputStream);
is.setEncoding("ISO-8859-1"); // set your encoding here
Document document = builder.parse(is);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.