I'm trying to parse the two following strings in python:
Here's the first string
s1="< one > < two > < three > here's one attribute < six : 10.3 > < seven : 8.5 > < eight : 90.1 > < nine : 8.7 >"
I need a re so that I can split and store the above in a list like this where each item in a new line below is an element at a particular index in the list:
0 one
1 two
2 three
3 here's one attribute
4 six : 10.3
5 seven : 8.5
6 eight : 90.1
7 nine : 8.7
Here's the second string
s2="<one><two><three> an.attribute ::"
So similarly, i need the items stored in a list like this:
0 one
1 two
2 three
3 an.attribute
Here's what I've tried so far, the re is an answer I got from another question I posted on Stack Overflow.
res = re.findall('< (.*?) >', s1)
pprint(res)
index=0
for index in res:
print index
but that skips "here's one attribute"
output:
['one', 'two', 'three', 'six : 10.3', 'seven : 8.5', 'eight : 90.1', 'nine : 8.7']
one
two
three
six : 10.3
seven : 8.5
eight : 90.1
nine : 8.7
Could anyone help me out? =)
If anyone knows how to extract the numerical values from the string like 10.3, 8.5, 90.1 and 8.7 from the first string too that would be great too =)
EDIT: Duncan I tried your code but I don't seem to be getting the output like I should. I assume I've made some sort of error somewhere. could you tell me what it is?
from __future__ import generators
from pprint import pprint
s2="<one><two><three> an.attribute ::"
s1="< one > < two > < three > here's one attribute < six : 10.3 > < seven : 8.5 > < eight : 90.1 > < nine : 8.7 >"
def parse(s):
for t in s.split('<'):
for u in t.strip().split('>',1):
if u.strip(): yield u.strip()
list(parse(s1))
list(parse(s2))
pprint(s1)
pprint(s2)
Here's the output I'm getting:
"< one > < two > < three > here's one attribute < six : 10.3 > < seven : 8.5 > < eight : 90.1 > < nine : 8.7 >"
'<one><two><three> an.attribute ::'