2

I have an xml file in the following format:

<table name="ecat">

  <tuple>
    <atom name="TaxAccNo1">test</atom>
    <atom name="TaxAccNo2">00005504</atom>
    <atom name="TaxAccNo3">1</atom>
    <atom name="irn">100107</atom>
  </tuple>

  <tuple>
    <atom name="TaxAccNo1">test</atom>
    <atom name="TaxAccNo2">00005505</atom>
    <atom name="TaxAccNo3">2</atom>
    <atom name="irn">100108</atom>
  </tuple>

  <tuple>
    <atom name="TaxAccNo1">test</atom>
    <atom name="TaxAccNo2">00005506</atom>
    <atom name="TaxAccNo3">3</atom>
    <atom name="irn">100109</atom>
  </tuple>

</table>

From this, I want to be able to generate a pandas dataframe that would look like this:

    TaxAccNo1   TaxAccNo2   TaxAccNo3   irn 
    test        00005504    1           100107
    test        00005505    2           100108
    test        00005505    3           100109

How would I do this?

3
  • 1
    check this might help : stackoverflow.com/questions/28259301/… Commented Jan 31, 2019 at 5:53
  • 1
    I surprised there's no read_xml (the read_html isn't as feature rich as i remember) Commented Jan 31, 2019 at 6:10
  • @AndyHayden Exactly, we need it. :) Commented Jan 31, 2019 at 6:17

1 Answer 1

3

You can use python's xml and a list/dict comprehension:

In [11]: import xml

In [12]: e = xml.etree.ElementTree.parse(open("foo.xml"))

In [13]: pd.DataFrame([{a.get('name'): a.text for a in t} for t in e.findall("tuple")])
Out[13]:
  TaxAccNo1 TaxAccNo2 TaxAccNo3     irn
0      test  00005504         1  100107
1      test  00005505         2  100108
2      test  00005506         3  100109
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.