String:
- "Roaming Calls, 1.5 GB/Day 100 SMS/Day"
- "Unlimited Loc/STD/Roaming Calls, 1GB/Day"
I want to get the "1.5" and "1" by regex.
I use r'.*([0-9.]+)(gb|GB| gb| GB)' but only get "5" matched for the case 1.
use Lookahead after the match to locate the float number before string GB/Day(case insensitive): (?= GB/Day)
[\d.]+(?= GB/Day|GB/Day| gb/day|gb/day)
Here is a fix to the immediate problem with your pattern:
input = "Roaming Calls, 1.5 GB/Day 100 SMS/Day"
m0 = re.match(r'.*?([0-9.]+)?(gb|GB| gb| GB)', input)
if m0:
print "match: ", m0.group(1)
Just make the dot appearing right before the capture group for the number lazy.
For both float and other numbers you can try this :
import re
k = "Roaming Calls, 1.5 GB/Day 100 SMS/Day"
print(re.findall(r"[-+]?\d*\.\d+|\d+",k))
if you want to find only float values go for this :
import re
k = "Roaming Calls, 1.5 GB/Day 100 SMS/Day"
print(re.findall(r"[-+]?\d*\.\d+",k))
it will return a list of float numbers in that string like this :
['1.5']
The issue that .* matches everything and leaves only 1 symbol for [0-9.]+. You can replace it with .? so it won't be that greedy:
.?([0-9.]+)(gb|GB| gb| GB)
.*is greedy and looks to backtrack with the higher priority. For what I see in your question it is not needed at all. You should skip it, using only(\d+(?:\.\d+)?)\s?(gb|GB).