I have a file that contains text as well as some xml content dumped into it. It looks something like this :
The authentication details : <id>70016683</id><password>password@123</password>
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request>
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
I am using a python program to parse this file. I would like to replace the xml part with a place holder : xml_obj. The output should look something like this :
The authentication details : xml_obj
The next step is to send the request.
The request : xml_obj
Additional info includes xml_obj
At the same time I would also like to extract the replaced xml text and store it in a list. The list should contain None if the line doesn't have an xml object.
- I have tried using regex for this purpose :
xml_tag = re.search(r"<\w*>",line)
if xml_tag:
start_position = xml_tag.start()
xml_word = xml_tag.group()[:1]+'/'+xml_tag.group()[1:]
xml_pattern = r'{}'.format(xml_word)
stop_position = re.search(xml_pattern,line).stop()
But this code retrieves the start and stop positions for only one xml tag and it's content for the first line and the entire format for the last line ( in the input file ). I would like to get all xml content irrespective of the xml structure and also replace it with 'xml_obj'.
Any advice would be helpful. Thanks in advance.
Edit :
I also want to apply the same logic to files that look like this :
The authentication details : ID <id>70016683</id> Password <password>password@123</password> Authentication details complete
The next step is to send the request.
The request : <request><id>90016133</id><password>password@3212</password></request> Request successful
Additional info includes <Address><line1>House no. 341</line1><line2>A B Street</line2><City>Sample city</City></Address>
The above files may have more than one xml object in a line.
They may also have some plain text after the xml part.
:?