2

I have an xml file and a python script is used for adding a new node to that xml file.I used xml.dom.minidom module for processing the xml file.My xml file after processing with the python module is given below

<?xml version="1.0" ?><Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PostBuildEvent>
  <Command>xcopy &quot;SourceLoc&quot; &quot;DestLoc&quot;</Command>
</PostBuildEvent>
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
<Import Project="project.targets"/></Project>

What i actually needed is as given below .The changes are a newline character after the first line and before the last line and also '&quot' is converted to "

<?xml version="1.0" ?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PostBuildEvent>
  <Command>xcopy "SourceLoc" "DestLoc"</Command>
</PostBuildEvent>
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
<Import Project="project.targets"/>
</Project>

The python code i used is given below

xmltree=xml.dom.minidom.parse(xmlFile)
for Import in Project.getElementsByTagName("Import"):
   newImport = xml.dom.minidom.Element("Import")
   newImport.setAttribute("Project", "project.targets")
vcxprojxmltree.writexml(open(VcxProjFile, 'w'))

What should i update in my code to get the xml in correct format

Thanks,

1 Answer 1

1

From docs of minidom:

Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])

Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to \n.

That's all customisation you get from minidom.

Tried inserting a Text node as a root sibling for newline. Hope dies last. I recommend using regular expressions from re module and inserting it manually.

As for removing SGML entities, there's apparently an undocumented function for that in python standard library:

import HTMLParser
h = HTMLParser.HTMLParser()
unicode_string = h.unescape(string_with_entities)

Alternatively, you can do this manually, again using re, as all named entity names and corresponding codepoints are inside the htmlentitydefs module.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.