Let’s review your four initial patterns and cover their syntax, then we can consider a few expressions that match the string you’re looking to match (ie 00.0).
Reviewing Patterns
re.findall('\s(.*),', string)
This pattern reads: Find all single whitespace character (\s), 0 or more repetitions of any character except a newline (.*), and a comma (, ).
This pattern will most likely match the entire string since repetition qualifiers are greedy (i.e. any of the expression characters + * ? will continue to match any character that returns a match for the previous expression character. When we use ‘.*’ in an expression, it will almost always capture the entire string because it will greedily match all characters that aren’t newline.
re.findall(' (.*),', string)
Same problem as previous pattern.
re.findall('\s++.+,', string)
I don’t think Python re accepts repetition qualifiers referencing another repetition qualifier without escaping it. Using ‘++’ would fail unless the first ‘+’ is preceded by a ‘\‘ like this: ‘++’. However, that expression reads: Match one or more ‘+’ characters (‘++). The expression part ‘.+’ matches one or more repetitions of any character that isn’t a newline (‘.+’) and falls prey to the greedy problem.
re.findall('\s{2}.{1},', string)
Squiggly brackets are repetition qualifiers that allow for a range of repetitions to be input. They follow the syntax, ‘{m, n}’ where m is the least amount of matches, and n is the most. For example, a pattern AB{3, 4} will not match ABB but it will match ABBB or ABBBB.
The pattern above looks to match: 2 repetitions of any white space character (‘\s{2}’) followed by any one character that is not a newline (‘.{1}’) followed by a comma.
Here are a couple different patterns to try out - I’ll touch on the syntax as well.
import re
p = ‘[0-9][0-9]\.[0-9]’
s = ‘ London:Jan 48.0,Feb 38.9,Mar 39.9,Apr 42.2,May 47.3,Jun 52.1,Jul 59.5,Aug 57.2,Sep 55.4,Oct 62.0,Nov 59.0,Dec 52.9’
if re.search(p, s):
m = re.findall(p, s)
print(m)
Note unless you know 100% that each input string contains the pattern you look to match, it’s helpful to test the string prior to executing the match. One way we can test the string is with an if clause checking for the occurrence of a match for re.search(p, s) where p is a variable for some pattern, and s is a variable for some string.
p = ‘[0-9][0-9]\.[0-9]’
This pattern will match: one number digit 0-9 (‘[0-9]’) followed by one number digit 0-9 (‘[0-9]’) followed by a single occurrence of period (‘.’) followed by one number digit 0-9 (‘[0-9]’). For example, this pattern will match the string 19.9 or 40.0 but not 40. or 40. The string ‘[0-9]’ utilizes brackets to identify a set in regex. With a set, any of the characters included in the brackets can be matched for that one spot. For example, [A5] will match A or 5 but not A5. Just like other literal characters, repetition qualifiers will work on a set. So we can use [A5]{1,2} to also match A5.
Note: The reason this expression registers the period as a period is because it is preceded by a backspace (I.e. it is escaped from its special class) so it no longer will match ‘any character that is not a newline character.’
‘[0-9]{2}\.[0-9]{1}’
This pattern does the same thing as above but uses the curly brackets to set a constant for the number of repetitions (rather than repeat the set twice like the previous pattern).
‘\d{2}\.\d{1}’
This pattern uses the special pattern \d to match any decimal digit (ie any number). It is equivalent to using the set [0-9] as used above.
It’s worth noting that technically the . doesn’t need to be escaped, since the period character ‘.’ is included in the class of ‘any character that isn’t a newline.’ However, it makes the pattern less robust, since it will (inaccurately) match any character that isn’t a newline in that spot. For example, it will match 29.9 or 29A9 or 2909 (As they all have a non-newline character in the 3rd position.
Hope this helps!
\sis single whitespace character. There is similarly\dfor digits. Have tried to research further?