0

I have two xml files as given below and I want to check the order of file B with file A (File B should follow File A's order). I also have written a program below which does the job of maintaining the order, the only problem is I am not able to correctly write the output to the another xml file. Before asking here I did researched about how to write edited xml files back to source or another file but maybe I am missing something very minor.

File A

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<p1:sample1 xmlns:p1="http://www.example.org/eHorizon">
<p1:time nTimestamp="1">
   <p1:location hours = "1" path = '1'>       
      <p1:feature color="6" type="a">560</p1:feature>
   </p1:location>
</p1:time>
<p1:time nTimestamp="2">
   <p1:location hours = "1" path = '1'>
      <p1:feature color="2" type="a">564</p1:feature>         
   </p1:location>
</p1:time>
<p1:time nTimestamp="3">
   <p1:location hours = "1" path = '1'>       
      <p1:feature color="6" type="a">560</p1:feature>          
   </p1:location>
</p1:time>
</p1:sample1>

File B

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<p1:sample1 xmlns:p1="http://www.example.org/eHorizon">
<p1:time nTimestamp="1">
   <p1:location hours = "1" path = '1'>       
      <p1:feature color="6" type="a">560</p1:feature>     
   </p1:location>
</p1:time>
<p1:time nTimestamp="3">
   <p1:location hours = "1" path = '1'>
      <p1:feature color="6" type="a">560</p1:feature>     
   </p1:location>
</p1:time>
<p1:time nTimestamp="2">
   <p1:location hours = "1" path = '1'>       
      <p1:feature color="2" type="a">564</p1:feature>      
   </p1:location>
</p1:time>
</p1:sample1>

Just for your information the only difference here is the order of entire p1:time element which are denoted by nTimestamps and its sub elements like location and feature. You can see that in File A it is 1,2,3... and in File B it is 1,3,2... (I am talking about entire p1:time element and everything inside)

What I want

from lxml import etree
from collections import defaultdict
from distutils.filelist import findall
from lxml._elementpath import findtext



recovering_parser = etree.XMLParser(recover=True)

Reference = etree.parse("C:/Users/your_location/Desktop/sample1.xml", parser=recovering_parser)
Copy = etree.parse("C:/Users/your_location/Desktop/sample2.xml", parser=recovering_parser)


ReferenceTest = Reference.findall("{http://www.example.org/eHorizon}time") #find all time elements in sample1
CopyTest = Copy.findall("{http://www.example.org/eHorizon}time") #find all time elements in sample2

a=[] #list for storing sample1's Time elements
b=[] #list for storing sample2's Time elements
new_list=[] #for storing sorted data

for i,j in zip(ReferenceTest,CopyTest):

    a.append((i, i.attrib.get("nTimestamp"))) # store data in format [(<Element {http://www.example.org/eHorizon}time at 0x213d738>, '1')  
                                              # where 1,2 or 3 is ntimestamp attribute and corresponding parent 'time' element of that attribute
    b.append((j, j.attrib.get("nTimestamp"))) # same as above 

def sortTimestamps(a,b):   #use this function to sort elements in 'b' list in such a manner that they follow sequence of 'a' list 

    for i in a:
        for j in b:
            if i[1]==j[1]:
                s = a.index(i)
                t = b.index(j)
                b[t],b[s]=b[s],b[t]     



sortTimestamps(a, b)  # call sort function 

for i in b:
    new_list.append(i[0]) # store the sorted timestamps in new_list


CopyTest = new_list # assign new sorted list of time elements to old list

Copy.write("C:/Users/your_location/Desktop/output_data.xml") # write data to another file and check results 

Above is the code that does the work for sorting B File according to order of File A. But when I write the program to another file, it writes File B's data as it is. That is to say it writes data in the same manner as it is shown in File B above. After sorting I expect that File B's data order should be modified and it should write data in format as given in File A

What I tried

Apart from program above I tried reading more on file writing but its getting me to nowhere. I checked format of my xml which i believe is totally fine. Finally i also followed a tutorial here just to see how it shows writing, but that approach is not working either. Maybe you guys can help me out.

Edit: I removed code from link and added it here. I did it previously to prevent long post

4
  • You should include the code which is giving you issue (the code for saving) in the question itself, not link to an external resource. Commented Aug 24, 2015 at 9:31
  • @AnandSKumar yea okay. I wanted to prevent long post so did that. I will add code here then. Commented Aug 24, 2015 at 9:32
  • Not the full code, the relevant part , or a short simplified version of your code (minimal reproducible example) . Commented Aug 24, 2015 at 9:33
  • @AnandSKumar Yes i wrote a short code and you can now refresh question. I edited it. Its comparatively short code. Commented Aug 24, 2015 at 9:35

1 Answer 1

1

This would not work , you are just assigning a new list to the old list - CopyTest . That does not change anything within the actual xml.

The easiest way for you to go would be to create the xml again from the elements in the new_list. Example -

root = etree.Element('{http://www.example.org/eHorizon}sample1',nsmap={'p1':'http://www.example.org/eHorizon'})
for elem in new_list:
    root.append(elem)

etree.ElementTree(root).write("c.xml") # write data to another file and check results 

You should replace the above lines, inplace of the CopyTest = new_list and the line after that.

Sign up to request clarification or add additional context in comments.

9 Comments

Thank you very much for solving my doubt. I totally understood what you mean. Every time i modify content I need to create new xml. At first I thought in LXML whatever I edit will take the source directly into consideration. This works like charm.
It is also possible to directly edit the Copy Element as well, but that would make the code really messy, I would prefer creating a new xml, since that is also what you want. But what you are doing, is not how you edit the contents of the xml.
you mean after trying your solution and after getting it correctly written on file my logic is not proper? can you suggest me some improvements. Also, is there any link which shows how to edit directly. I just wanted to know for my understanding.
No, I mean my logic is proper. I mean't what you were trying to do (like I inidicated in my answer) would not work. Also, from your sentence - Every time i modify content I need to create new xml. - is not complete true. You can also modify exiting xml content (elements) , but that would require deleting original nodes, appending new nodes, etc etc, which would make the code messier, and creating new xml is the better way here.
oh ok , typing that line twice would not working, its would still only put one \n as the text of the parent, if you want two newlines, put two \n .
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.