Parse XML in python and add it to database

Question

I'm trying to write a Python script that will go through Rows and put them into my database

This is a structure of my xml:

Root>
    -<SvcNf>
        -<PersonNf>
            -<PersonList>
                -<Row>
                    <SysName>MI6</SysName>
                     <ServerDt>2016-10-28 03:00:12 +03:00</ServerDt>
                     <UID>9457A55E17341AA7ASDEDS057A8BFFF3</UID>
                     <PersID>007</PersID>
                     <Emp_name>James Bond</Emp_name>
                     <EventID>25</EventID>
                     <EventTXT>Drinking alcohol</EventTXT>
                     <CauseEventID>03</CauseEventID>
                     <CauseEventTXT>Martini with vodka</CauseEventTXT>
                     <EventBegda>2017-10-18</EventBegda>
                     <EventEndda>2017-10-18</EventEndda>
                     <AccrualsSum>171.0</AccrualsSum>
                     <AccrualsProz>0.0</AccrualsProz>
                     <AccrualsName>Chinees_</AccrualsName>
                     <OrderNum>P-336</OrderNum>
                     <Perg>0</Persg>
                     <Perk>15</Persk>
                     <Awart/>
                 </Row>
                 -<Row>
                     .....
                 </Row>
                <Row/>
            </PersonList>
        </PersonNf>
    </SvcNf>
</Root>

So, when i use this code to Parse XML:

ResultSet_Py_List = []

root = ET.parse(events)
nodes = root.findall('.//Row')

for node in nodes:
    for child in node:
        ResultSet_Py_List.append(child.text)

The nominal Row value, which I want, is

['MI6', '2016-10-28 03:00:12 +03:00', '9457A55E17341AA7ASDEDS057A8BFFF3', etc]

[Row2]

[Row3]

Now, its like: [Row1,Row2,Row3]

[EDIT] All characters is like &#1060.
To heal it:

tostring(doc.getroot(), 'unicode')

[EDIT]

I used 2nd example but now i have Oracle error: ORA-01704: string literal too long.

@Максим Федоров, so after every Row ends its 'None' that separates rows right ? — Chetan_Vasudevan
– Chetan_Vasudevan, Commented Oct 30, 2017 at 17:43
@МаксимФедоров, then which can be a factor/key of seperation between the rows ? — Chetan_Vasudevan
– Chetan_Vasudevan, Commented Oct 30, 2017 at 17:55
<SysName>MI6</SysName> - This a seperation between Rows, always = constant — chipie
– chipie, Commented Oct 30, 2017 at 18:04

chipie · Accepted Answer · 2017-11-10 11:39:43Z

Maybe someone it will be helpfull. Its begginer answer :) I work with unicode(russian symbols) and I didnt i know why xml didnt normal parse with Oracle function. When I try do ALTERNATIVELY WITH NESTED LIST COMPREHENSION my kernel and python are died.

So, I do CSV from XML and with ALTERNATIVELY WITH NESTED LIST COMPREHENSION insert it into my database

import pandas as pd
import xml.etree.ElementTree as ET
import cx_Oracle
import csv


doc = ET.parse(events)
nodes = doc.findall('.//Row')
#RANGE(num) - COLUMN COUNTS
(SysName, ServerDt, UIDS, EmplID, ....T) = (list() for i in range(18))

for node in nodes:
    for elem in node.findall("*"):
        try:
            if elem.tag == "SysName":
                SysName.append(elem.text)            
            if elem.tag == "UID":
                UIDS.append(elem.text)     
            if elem.tag == "ServerDt":
                ServerDt.append(elem.text)
            if elem.tag == "PersID":
                PersID.append(elem.text)
            ...
            except AttributeError:
                print(elem.tag)

    #ER nonetype' object has no attribute 'text' python & 
    #ER arrays must all be same length
    s1 = pd.Series(SysName)
    s2 = pd.Series(ServerDt)
    s3 = pd.Series(UIDS)
    .....
    #TEMPORY CSV. DONT KNOW WHY, BUT COLUMNS ARE MIXED WITH THIS ITERATION
    df= pd.DataFrame({"SysName": s1,
                    "ServerDt": s2,
                    "UIDS": s3,
                    "PersID": s4,
                    "Emp_name": s5,
                    "EVENTID": s6,
                    ...})
    file_name = 'events.csv'
    df.to_csv(file_name, sep='\t')

    print("File name: ", my_file.name, "created")

    ResultSet_Py_List = []   

    ora_conn = cx_Oracle.connect('login/pass@TNSNAME')
    ora_cursor = ora_conn.cursor()

    my_file = open(file_name, 'r', newline='')
    #PANDA ER: 'utf8' codec can't decode byte 0xe9 in position 10: invalid continuation byte        
    reader = csv.reader(my_file,  dialect='excel', delimiter='\t' )

    for index, row in enumerate(reader):
    #Without header CSV
            if index > 0:
                    try:
            ResultSet_Py_List.append(row)
                    except AttributeError:
            pass

    print(str(len(ResultSet_Py_List)) + ' Records from Source')

    sql_del = """delete from HR.EVENTS_TST"""

    ora_cursor.execute(sql_del)
    ora_cursor.execute("commit")

    print("Table is clean") 
    #COLUMNS ARE MIXED IN CSV ITERATION
    sql_insert = """
            INSERT INTO HR.EVENTS_TST (ROW_NUM
                    ,ACCRUALSNAME
                    , ACCRUALSPROZ
                    , ACCRUALSSUM
                    , AWART
                    , CAUSEEVENTID
                    ...)
                        VALUES         (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14
                        ,:15,:16,:17,:18, :19)
                        """  

    ora_cursor.prepare(sql_insert)

    ora_cursor.executemany(None, ResultSet_Py_List)
    ora_conn.commit()
    ora_cursor.execute("commit")
    print("Data imported")

Parfait · Accepted Answer · 2017-10-31 15:37:07Z

0

Consider using Oracle's XML handlers and avoid any nested looping:

SELECT  e.SysName, e.ServerDt, e."UID", e.PersID, e.Emp_name, e.EventID, e.EventTXT,
        e.CauseEventID, e.CauseEventTXT, e.EventBegda, e.EventEndda, 
        e.AccrualsSum, e.AccrualsProz, e.OrderNum, e.Perg, e.Perk, e.Awart

FROM  XMLTABLE('/Root/SvcNf/PersonNf/PersonList/Row' 
      PASSING XMLTYPE('<Root>
                         <SvcNf>
                            <PersonNf>
                                <PersonList>
                                    <Row>
                                        <SysName>MI6</SysName>
                                         <ServerDt>2016-10-28 03:00:12 +03:00</ServerDt>
                                         <UID>9457A55E17341AA7ASDEDS057A8BFFF3</UID>
                                         <PersID>007</PersID>
                                         <Emp_name>James Bond</Emp_name>
                                         <EventID>25</EventID>
                                         <EventTXT>Drinking alcohol</EventTXT>
                                         <CauseEventID>03</CauseEventID>
                                         <CauseEventTXT>Martini with vodka</CauseEventTXT>
                                         <EventBegda>2017-10-18</EventBegda>
                                         <EventEndda>2017-10-18</EventEndda>
                                         <AccrualsSum>171.0</AccrualsSum>
                                         <AccrualsProz>0.0</AccrualsProz>
                                         <AccrualsName>Chinees_</AccrualsName>
                                         <OrderNum>P-336</OrderNum>
                                         <Perg>0</Perg>
                                         <Perk>15</Perk>
                                         <Awart/>
                                     </Row>
                                </PersonList>
                            </PersonNf>
                         </SvcNf>
                      </Root>')
          COLUMNS  
              SysName   VARCHAR2(25) PATH 'SysName',  
              ServerDt  VARCHAR2(25) PATH 'ServerDt',  
              "UID"     VARCHAR2(25)       PATH 'UID',  
              PersID    VARCHAR2(25)  PATH 'PersID',
              Emp_name  VARCHAR2(25)  PATH 'Emp_name',
              EventID   NUMBER       PATH 'EventID',
              EventTXT  VARCHAR2(25)  PATH 'EventTXT',
              CauseEventID    VARCHAR2(25) PATH 'CauseEventID',
              CauseEventTXT   VARCHAR2(25) PATH 'CauseEventTXT',               
              EventBegda      VARCHAR2(25) PATH 'EventBegda',
              EventEndda      VARCHAR2(25) PATH 'EventEndda',
              AccrualsSum     VARCHAR2(25) PATH 'AccrualsSum',
              AccrualsProz    VARCHAR2(25) PATH 'AccrualsProz',
              OrderNum  VARCHAR2(25) PATH 'OrderNum',
              Perg      NUMBER PATH 'Perg',
              Perk      NUMBER PATH 'Perk',
              Awart     VARCHAR2(25) PATH 'Awart') AS e;

Rextester demo

To implement in a Python database cursor such as with cx_Oracle for an append query:

sql = """INSERT INTO mytable (Col1, Col2, Col3, ...)
         SELECT ...same as above...        
         FROM  XMLTABLE('/Root/SvcNf/PersonNf/PersonList/Row' 
                   PASSING XMLTYPE(:i_param)
                   COLUMNS  
                      ...same as above...) AS e
"""

# PARSE XML FILE
doc = ET.parse(events)
xmlstr = ET.tostring(doc.getroot()).decode('utf-8')

# PASS XML STRING AS PARAMETER
cur.execute(sql, {'i_param':xmlstr})
dbconn.commit()

For very large XML content that exceeds Oracle's 4,000 bytes in SQL, use PL/SQL with varchar2 limit of 32,767 bytes as shown here by @NickS:

sql = """
DECLARE
    xml_value varchar2(32767);
BEGIN
    xml_value := :i_param;

    INSERT INTO mytable (Col1, Col2, Col3, ...)
             SELECT ...same as above...        
             FROM  XMLTABLE('/Root/SvcNf/PersonNf/PersonList/Row' 
                       PASSING XMLTYPE(xml_value)
                       COLUMNS  
                          ...same as above...) AS e
    commit;
END;
"""

# PARSE XML FILE
doc = ET.parse(events)
xmlstr = ET.tostring(doc.getroot()).decode('utf-8')

# PASS XML STRING AS PARAMETER
cur.execute(sql, {'i_param':xmlstr})
dbconn.commit()

edited Oct 31, 2017 at 15:37

answered Oct 30, 2017 at 19:17

Parfait

108k19 gold badges102 silver badges138 bronze badges

11 Comments

chipie Over a year ago

I used 2nd example but now i have Oracle error: ORA-01704: string literal too long.

chipie Over a year ago

thanks for answer, child is large or the /row part? doesnt understand i need to correct with declare my COLUMNS part or this PASSING XMLTYPE('{}') or just do like Emp_name VARCHAR2(32767) PATH 'Emp_name '

Parfait Over a year ago

Entire XML content is large. How large is your file in MB?

chipie Over a year ago

its small, now its only 50 rows about 36.3 KB

Parfait Over a year ago

Did you try my updated answer here? Both now use parameterization. Also, can you run the first SELECT in Oracle (outside Python) with actual XML? Does it error out there as well?

|

Parfait · Accepted Answer · 2017-10-31 21:15:40Z

0

To use original setup, simply add an inner list and then append to larger, ResultSet_Py_List, where each <Row> is saved in a nested list for database insert.

import xml.etree.ElementTree as et

doc = et.parse('Source.xml')
nodes = doc.findall('.//Row')

ResultSet_Py_List = []

for node in nodes:
    inner = []
    for child in node:
        inner.append(child.text)
    ResultSet_Py_List.append(inner)

# ALTERNATIVELY WITH NESTED LIST COMPREHENSION
ResultSet_Py_List = [[child.text for child in node] for node in nodes]

Then run cx_Oracle's executemany where parameter placeholders are equal length to list items:

cursor.prepare("INSERT INTO myTable (Col1, Col2, Col3, ...) VALUES (:1, :2, :3, ...)")
cursor.executemany(None, ResultSet_Py_List)
db.commit()

answered Oct 31, 2017 at 21:15

Parfait

108k19 gold badges102 silver badges138 bronze badges

1 Comment

chipie Over a year ago

When i do

ora_cursor.prepare("INSERT INTO HR.EVENTS_TST (SYSNAME, PORTDATE, UID, TabNum, FIO ....) VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18)") ora_cursor.executemany(None, ResultSet_Py_List)

- the kernel and python are died. Have any ideas why?

Collectives™ on Stack Overflow

Parse XML in python and add it to database

3 Answers 3

Comments

11 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

11 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related