0

I have the following string from which I need to extract the value 14.123456 which is directly after the keyword airline_freq: (which is a unique keyword in my string)

Please help find the correct regex (indexing m.group() doesn't work beyond 0)

import re
s =  "DATA:init:     221.000OTHER:airline_freq:  14.123456FEATURE:airline_amp:   0.333887 more text"
m = re.search(r'[airline_freq:\s]?\d*\.\d+|\d+', s)
m.group()

$ result 221.000
2
  • 1
    Try r'airline_freq:\s*(\d+(?:\.\d+)?)' oh and print(m.group(1)) Commented Apr 3, 2020 at 20:10
  • and what if the float values were sometimes positive and sometimes negative?( using a - to indicate neg) Commented Apr 3, 2020 at 20:31

3 Answers 3

1

You can probably use this:

(?<=airline_freq:)\s*(?:-?(?:\d+(?:\.\d*)?|\.\d+))

This uses a lookbehind to enforce that the number is preceded by airline_freq: but it does not make it part of the match.

The number-matching part of the regex can match numbers with or without . and, if there is ., it can also be just leading or trailing (in this case clearly not before the - sign). You can also allow an optional + instead of the -, by using [+-] instead of -.

Unfortunately it seems Python does not allow variable length lookbehind, so I cannot put the \s* in it; the consequence is that the spaces between the : and the number are part of the match. This in general could be no problem, as leading spaces when giving a number to a program are generally skipped automatically.

However, you can still remove the first ?: in the regex above to make the number-matching group capturing, so that the number is available as \1.

The example is here.

Sign up to request clarification or add additional context in comments.

2 Comments

and what if the float values were sometimes positive and sometimes negative?( using a - to indicate neg)
@Thediz, that's it.
1

This will match only the float as a single group.

r'airline_freq:\s+([-0-9.]+)'

"DATA:init:     221.000OTHER:airline_freq:  14.123456FEATURE:airline_amp:   0.333887 more text"

Comments

0

I have this:

(?<=airline_freq\:\s\s)(\d+\.\d+)

In [2]: import re
   ...: s =  "DATA:init:     221.000OTHER:airline_freq:  14.123456FEATURE:airline_amp:   0.333887 more text"
   ...: m = re.search(r'(?<=airline_freq\:\s\s)(\d+\.\d+)', s)
   ...: m.group()
Out[2]: '14.123456'

Test: https://regexr.com/51q41

If you're not sure about the number of spaces between airline_freq: and the desired float number, you can use:

(?<=airline_freq\:)\s*(\d+\.\d+)

and m.group().lstrip() to get rid of the left spaces.

1 Comment

This makes the assumption that there are exactly two spaces between the colon and the number.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.