I'm trying to parse, and replace values in a large xml file, ~45MB each. The Way I do this is:
private void replaceData(File xmlFile, File out)
{
DocumentBuilderFactory df = DocumentBuilderFactory.newInstance();
DocumentBuilder db = df.newDocumentBuilder();
Document xmlDoc = db.parse(xmlFile);
xmlDoc.getDocumentElement().normalize();
Node allData = xmlDoc.getElementsByTagName("Data").item(0);
Element ctrlData = getSubElement(allData, "ctrlData");
NodeList subData = ctrlData.getElementsByTagName("SubData");
int len = subData.getLength();
for (int logIndex = 0; logIndex < len; logIndex++) {
Node log = subData.item(logIndex);
Element info = getSubElement(log, "info");
Element value = getSubElement(info, "dailyInfo");
Node valueNode = value.getElementsByTagName("value").item(0);
valueNode.setTextContent("blah");
}
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
DOMSource s = new DOMSource(xmlDoc);
StreamResult r = new StreamResult(out);
t.transform(s, r);
} catch (TransformerException | ParserConfigurationException | SAXException | IOException e) {
throw e;
}
}
private static Element getSubElement(Node node, String elementName)
{
return (Element)((Element)node).getElementsByTagName(elementName).item(0);
}
I notice that as I am further along the for loop the longer it takes, and for an average of 100k node's it takes over 2 hours, while if I just break out smaller chunks by hand of 1k, it will take ~10s. Is there something that is inefficient with the way that this document is being parsed?
----EDIT----
Based on comments and answers to this, I switched over to using Sax and XmlStreamWriter. Reference/example here: http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/
After moving to using SAX, memory usage for the replaceData function does not expand to size of XML file, and XML file processing time went to ~18 seconds on average.
dimensionValuecome from? where does it go? Same withdimension.