extract text between xml tags in python

Question

I have xml string below and trying to print text between tags domain, receive_time , serial and seqno for each entry tag.

xml="""
<response status="success" code="19"><result><msg><line>query job enqueued with jobid 19032</line></msg><job>19032</job></result></response>
19032
<response status="success"><result>
  <job>
    <tenq>14:10:09</tenq>
    <tdeq>14:10:09</tdeq>
    <tlast>19:00:00</tlast>
    <status>ACT</status>
    <id>19032</id>
    <cached-logs>64</cached-logs>
  </job>
  <log>
    <logs count="20" progress="29">
      <entry logid="2473601">
        <domain>1</domain>
        <receive_time>2017/11/26 14:10:08</receive_time>
        <serial>007901004140</serial>
        <seqno>10156449120</seqno>
      </entry>
      <entry logid="2473601">
        <domain>1</domain>
        <receive_time>2017/11/26 14:10:08</receive_time>
        <serial>007901004140</serial>
        <seqno>10156449120</seqno>
      </entry>
      </logs>
  </log>
</result></response>
"""

using xml.etree.ElementTree. To get what's between domain tag I was trying node.attrib.get('domain') or node.get('domain')..please advise

import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('entry'):
        print node

It can be other python library too, does not have to be xml.etree. I do not want to print text between tags blindly, I need to print tag name followed by text so i.e.:

domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120

etc

Possible duplicate of How do I access text between tags with xml.etree.ElementTree — sam-pyt
– sam-pyt, Commented Nov 26, 2017 at 19:30
Not really..., the other one is printing text between tags blindly, I want to print tag name before text so i.e. domain: 1 — irom
– irom, Commented Nov 26, 2017 at 19:36
print node.find('domain').text should do it. By the way, your xml string in the example is not parsable. Had to remove some things before making it work. You might want to look into that if you are getting a ParseError — Vivek Kalyanarangan
– Vivek Kalyanarangan, Commented Nov 26, 2017 at 19:40

Vivek Kalyanarangan · Accepted Answer · 2017-11-26 20:43:40Z

11

You find the domain tag using the find() method first. Then, the tag attribute and the text attribute should fetch the details you are looking for -

import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('entry'):
    print('\n')
    for elem in node.iter():
        if not elem.tag==node.tag:
            print("{}: {}".format(elem.tag, elem.text))

Hope this helps!

Output -

domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120


domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120

edited Nov 26, 2017 at 20:43

answered Nov 26, 2017 at 19:43

Vivek Kalyanarangan

9,1011 gold badge27 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

G.Vitelli · Accepted Answer · 2017-11-26 19:43:28Z

2

You can use SAX Streams to get the inner text content of the xml element. SAX is the better way to parse xml without reading the whole XML into the memory aka DOM Python SAX

answered Nov 26, 2017 at 19:43

G.Vitelli

1,28711 silver badges19 bronze badges

Collectives™ on Stack Overflow

extract text between xml tags in python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related