2

I need to parse the numeric values from a string that is not well formatted. Example:

"0    0    .1        .05       .05       0.        0.         .01"

or

"0,0,.1,.05,.05,0.,0.,.01"

As you can see the delimiter can vary from several spaces to commas with no spaces. Also, the numbers may be ints or floats. I would like to split on any number of consecutive spaces, tabs, and commas. I thought I could do this with the str.split() function, however I found that it only works with one delimiter argument and will not do commas by default.

Does anyone know a clever way to do this? Possibly with regular expressions?

Thanks in advance.

4 Answers 4

3

I would like to split on any number of consecutive spaces, tabs, and commas.

You could use re.split() to split by a regular expression.

>>> import re
>>> s = '0    0    .1        .05       .05       0.        0.         .01'
>>> re.split(r'[\s,]+', s)

['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

Note: The above will split accordingly on whitespace and commas. If you want to split strictly on <space>, tabs and commas, you could change the regular expression to [ \t,]+ ...

Sign up to request clarification or add additional context in comments.

Comments

3

Regular expressions would work, but you could also just replace every comma with a space and then use regular split:

s.replace(',', ' ').split()

Demo:

>>> s = "0    0    .1        .05       .05       0.        0.         .01"
>>> s.replace(',', ' ').split()
['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

>>> s = "0,0,.1,.05,.05,0.,0.,.01"
>>> s.replace(',', ' ').split()
['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

1 Comment

Thanks for thinking outside the regular expressions box (+1).
2

You can use re.split.

[ ,]+

You can split by this.

import re
y="0,0,.1,.05,.05,0.,0.,.01"
print re.split(r"[ ,]+",y)

Or

You can use simply use re.findall.Here you can have any delimiter.

import re
y="0,0,.1,.05,.05,0.,0.,.01"
print re.findall(r"\d*(?:\.\d+)?",y)

1 Comment

Thanks for the additional note on "findall".
0

You can split with the following regex: [, ]+

Example:

import re

pattern = r'[,\s]+'

row = "0    0    .1        .05       .05       0.        0.         .01"
re.split(pattern, row)
# > ['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

row = "0,0,.1,.05,.05,0.,0.,.01"
re.split(pattern, row)
# > ['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.