I need to parse lines having multiple language codes as below
008800002 Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$<nld>
008800002being a idBruxelles-Nord$Br�ussel Nord$being name1deubeing language one$Brussel Noord$being name twonldbeing language two.
SO, the idea is name and language can appear N number of times. I need to collect them all.
the language in <> is 3 characters in length (fixed)
and all names end with $ sign.
I tried this one but it is not giving expected output.
x = re.compile('(?P<stop_id>\d{9})\s(?P<authority>[[\x00-\x7F]{3}|\s{3}])\s(?P<stop_name>.*)
(?P<lang_code>(?:[<]\S{0,4}))',flags=re.UNICODE)
I have no idea how to get repeated elements. It takes
Bruxelles-Nord$Br�ussel Nord$<deu>$Brussel Noord$ as stop_name and <nld> as language.
Brüssel, notBr�ussel.