2

I want to compare these two xml files:

File1.xml:

<ngs_sample id="40332">
  <workflow value="salmonella" version="101_provisional" />
  <results>
  <gastro_prelim_st reason="not novel" success="false">
      <type st="1364" />
      <type st="9999" />
  </gastro_prelim_st>
 </results>
</ngs_sample>

File2.xml:

<ngs_sample id="40332">
  <workflow value="salmonella" version="101_provisional" />
  <results>
  <gastro_prelim_st reason="not novel" success="false">
      <type st="1364" />
   </gastro_prelim_st>
 </results>
</ngs_sample>

I've used xmldiff to compare a.xml with b.xml:

def compare_xmls(observed,expected):

    from xmldiff import main, formatting
    formatter = formatting.DiffFormatter()
    diff = main.diff_files(observed,expected,formatter=formatter)
    return diff

out = compare_xmls(a.xml, b.xml)
print(out)

OUTPUT:

[delete, /ngs_sample/results/gastro_prelim_st/type[2]]

Anyone know how to identify what is the difference between the two xml files, i.e. what has been deleted compared to the file b.xml. Anyone recommend any other way of comparing xml files in python?

4
  • For comparing differences in general I use WinMerge, so if you don't need to do it in python, it's a pretty handy tool. But if you must, it seems the output already tells you the difference exactly? (That the second type tag under ngs_sample/...prelim_st/ was deleted). Did you mean you wanted to see the values being deleted? Commented Nov 22, 2018 at 14:33
  • Yes I want to see what has been deleted, i.e. what is the difference between the two xmls. Commented Nov 22, 2018 at 15:45
  • What exactly are you expecting from the output that's missing then? It's already telling you that second type tag has been deleted. As it stands it's not clear, would be helpful if you stated your expected output instead. Commented Nov 22, 2018 at 15:52
  • Helpful to say <type st="9999" /> is deleted. Commented Nov 22, 2018 at 16:25

3 Answers 3

6

Use the xmldiff to perform this exact task.

main.py

from xmldiff import main
diff = main.diff_files("file1.xml", "file2.xml")
print(diff)

output

[DeleteNode(node='/ngs_sample/results/gastro_prelim_st/type[2]')]
Sign up to request clarification or add additional context in comments.

1 Comment

Not sure if you read the question but this doesnt answer my query
4

You can switch to the XMLFormatter and manually filter out the results:

...
# Change formatter:
formatter = formatting.XMLFormatter(normalize=formatting.WS_BOTH)

...

# after `out` has been retrieved:
import re
for i in out.splitlines():
  if re.search(r'\bdiff:\w+', i):
    print(i)

# Result:
#       <type st="9999" diff:delete=""/>

Comments

0

Another option is use xml2 https://github.com/clone/xml2 (and something like bash process substitution)

$ diff --color <(xml2 < File1.xml) <(xml2 < File2.xml)

7,8d6
< /ngs_sample/results/gastro_prelim_st/type
< /ngs_sample/results/gastro_prelim_st/type/@st=9999

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.