3

I am using regex to parse float number from the string.

re.findall("[^a-zA-Z:][-+]?\d+[\.]?\d*", t)

is the code that I used. There is a problem with this code. It is not parse the number if there is no space between number and any character. For Example, the expect output from "0|1|2|3|4|5|6|7|8|9" is [0,1,2,3,4,5,6,7,8,9], but it returns "[|1,|2,|3,...].

Is there any way to solve this kind of problem?

6
  • Try re.findall(r"[^a-zA-Z:]([-+]?\d*\.?\d+)", t) or Try re.findall(r"(?<![a-zA-Z:])[-+]?\d*\.?\d+", t) Commented Feb 9, 2017 at 16:53
  • Why not a much simpler (\d+[\.]?)+ Commented Feb 9, 2017 at 16:55
  • @WiktorStribiżew It is working, but somehow it loses the first digit. if t is 120, it returns 20. Commented Feb 9, 2017 at 16:57
  • Why did you use [^a-zA-Z:]? Commented Feb 9, 2017 at 17:01
  • @WiktorStribiżew In the string, there is a case like M1, M2, and I would like to avoid the number from that case. Commented Feb 9, 2017 at 17:04

2 Answers 2

4

Use

re.findall(r"(?<![a-zA-Z:])[-+]?\d*\.?\d+", t)

See the regex demo

It will match integer and float numbers not preceded with letters or colon.

Details:

  • (?<![a-zA-Z:]) - a negative lookbehind that makes sure there is no ASCII letter or colon immediately before the current location
  • [-+]? - an optional + or -
  • \d* - zero or more digits
  • \.? - an optional dot
  • \d+ - 1+ digits
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, and one last question. In the follow case "3monthSummary : month1: 60.5 month2: 60.24 month3: 60.25", it parses 3 to the result. Is there any way to prevent that?
1

The easiest thing you should be able to do here is just wrap the "number" part of your regular expression into a capture group, and then look at those capture groups.

re.findall("[^a-zA-Z:]([-+]?\d+[\.]?\d*)", t)

I just added parentheses around the "number" part of your search.

2 Comments

Thank you. It is working, but somehow it loses the first digit. From the example, it returns [1,2,3,4,5,6,7,8,9].
Yeah, the issue here is that you are forcing the first character in the match to be some non-letter non-colon, whereas the first number in the string has nothing in front of it. You can add a ? to that block, but then you might get some other weird behaviors. I think you will need to provide more overall samples of what you want to happen in order to get an accurate RegEx to do what you want. Here is the regex with the change: [^a-zA-Z:]?([-+]?\d+[\.]?\d*)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.