Regular expression for multiple occurances in python

Question

I need to parse lines having multiple language codes as below

008800002     Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$<nld>

008800002 being a id
Bruxelles-Nord$Br�ussel Nord$ being name1
deu being language one
$Brussel Noord$ being name two
nld being language two.

SO, the idea is name and language can appear N number of times. I need to collect them all. the language in <> is 3 characters in length (fixed) and all names end with $ sign.

I tried this one but it is not giving expected output.

x = re.compile('(?P<stop_id>\d{9})\s(?P<authority>[[\x00-\x7F]{3}|\s{3}])\s(?P<stop_name>.*)
    (?P<lang_code>(?:[<]\S{0,4}))',flags=re.UNICODE)

I have no idea how to get repeated elements. It takes

Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$ as stop_name and <nld> as language.

You might want to fix encoding issues first. It's Brüssel, not Br�ussel. — georg
– georg, Commented Oct 1, 2014 at 9:48

Amadan · Accepted Answer · 2014-10-01 09:37:01Z

3

Do it in two steps. First separate ID from name/language pairs; then use re.finditer on the name/language section to iterate over the pairs and stuff them into a dict.

import re

line = u"008800002     Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$<nld>"
m = re.search("(\d+)\s+(.*)", line, re.UNICODE)
id = m.group(1)
names = {}
for m in re.finditer("(.*?)<(.*?)>", m.group(2), re.UNICODE):
    names[m.group(2)] = m.group(1)
print id, names

edited Oct 1, 2014 at 9:37

answered Oct 1, 2014 at 9:31

Amadan

200k23 gold badges252 silver badges321 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

vks · Accepted Answer · 2014-10-01 09:30:55Z

2

\b(\d+)\b\s*|(.*?)(?=<)<(.*?)>

Try this.Just grab the captures.see demo.

http://regex101.com/r/hS3dT7/4

answered Oct 1, 2014 at 9:30

vks

68.1k11 gold badges96 silver badges132 bronze badges

Collectives™ on Stack Overflow

Regular expression for multiple occurances in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related