I read a text file line by line with a Python script. What I get is a list of strings, one string per line. I now need to parse each string into more manageable data (i.e. strings, integers).
The strings look similar to this:
- "the description (number)" (e.g. "door (0)")
- "the description (number|number|number)" (e.g. "window (1|22|4))
- "the description (number|number|number|number)" (e.g. "toilet (2|6|5|10))
Now what I want is a list of split/parsed strings for each line from the text file that I can process further, for instance:
- "window (1|22|4)" -> [ "window", "1", "22", "4" ]
I guess regular expressions are the best fit to accomplish this and I already managed to come up with this:
(.+)\s+((\d+)\), which perfectly matches for instance [ “door", "0" ] for "door (0)"
However, some items have more data to parse:
(.+)\s((\d+)+\|\), which matches only [ "window", "1" ] for "window (1|22|4)
How can I repeat the pattern matching for the part (\d+)+\| (i.e "1|") up to the closing parenthesis for an undefined number repetitions of this pattern? The last item to match would be an integer, which could be caught separately with (\d+)\).
Also is there a way to match either the simple or the extended case with a single regular expression?
Thanks! And have a nice weekend, everybody!