Python regular expression - extracting float pattern

Question

I am trying to extract a particular "float" from a string, it contains multiple formatted "integers", "floats" and dates. The particular "float" in question is presided by some standardized text.

String sample

my_string = """03/14/2019 07:07 AM
💵Soles in mDm : 2864.35⬇
🔶BTC purchase in mdm: 11,202,782.0⬇
"""

I have been able to extract the desired float pattern for, 2864.35, from my_string but if this particular float changes in pattern or another float with the same format shows up, my script won't return the desired result

regex = r"(\d+\.\d+)"
matches = re.findall(regex, my_string)
for match in matches:
    print(match)

It might truncate the desired float because of inconsistent numerical formatting
It might print two floats because the numerical pattern of an undesired float is too similar to be filtered out by current regular expression regex

Desired return from regular expression `regex`

float with a flexible integer-part, sometimes comma is omitted ie. 45000.50 other times 45,000.50
unique line identifier: Soles it could be upper/lower case
line identifier: float prefix :
it should only return one float

Some variances of desired float in the Second line of the string only

What you see bellow are three examples of the same line, the second line in my_string. The regex should be able to return only line number two despite any variations such as soles or Soles

💵Soles in mDm : 2864.35⬇
soles MDM: 2,864.35
Soles in mdm :2,864.355

Any assistance in editing or re-writing the current regular expression regex is greatly appreciated

To answer from Micale's, Try this [S|s]oles.*?(\d[\d,]*\.\d+) or (?i)soles.*?(\d[\d,]*\.\d+) — FailSafe
– FailSafe, Commented Mar 16, 2019 at 1:31
@FailSafe they line contains the unique identifier 'soles' where it could be lower or uppercase 'Soles', the float at times may properly contain a comma '2,400.00' or sometimes it might be omitted '2400.00' I hope this helps to clarify — Enrique Bruzual
– Enrique Bruzual, Commented Mar 16, 2019 at 1:32
I think so. I provided 2 edits in this line of comments based off of Michael's. Try them but all the credit goes to him. These: [S|s]oles.*?(\d[\d,]*\.\d+) or (?i)soles.*?(\d[\d,]*\.\d+) — FailSafe
– FailSafe, Commented Mar 16, 2019 at 1:34

FailSafe · Accepted Answer · 2019-03-16 01:46:08Z

2

EDIT - Hmmm... If it has to follow soles then hopefully this helps

Try these, granted my console can't take the extra characters, but based on your input:

>>> my_string = """03/14/2019 07:07 AM
Soles in mDm : 2864.35
BTC purchase in mdm: 11,202,782.0
Soles in mDm : 2864.35
soles MDM: 2,864.35
Soles in mdm :2,864.355
"""


>>> re.findall('(?i)soles[\S\s]*?([\d]+[\d,]*\.[\d]+)', my_string)

#Output
['2864.35', '2864.35', '2,864.35', '2,864.355']



>>> re.findall('[S|s]oles[\S\s]*?([\d]+[\d,]*\.[\d]+)', my_string)

#Output
['2864.35', '2864.35', '2,864.35', '2,864.355']

edited Mar 16, 2019 at 1:46

answered Mar 16, 2019 at 0:58

FailSafe

4824 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Enrique Bruzual Over a year ago

Sure, but why?. I thought I was very diligent in researching, and writing the question. I'll be happy to edit, or remove inappropriate question

FailSafe Over a year ago

I'm sure you were diligent. I edited my solution btw to include the example. The reason is because if I understood your problem properly, the solution is one that was kind of easy and many users here are very quick to -1 because they jump to conclusions.

Enrique Bruzual Over a year ago

Thank you for the response, but the only number I want is from the following line "Soles in mDm : 2864.35"

FailSafe Over a year ago

So none of the other numbers should match?

Enrique Bruzual Over a year ago

It should return only the first number in your list, 2864.35. @Michael Butscher above has it almost right, since it focuses on the 'Soles' since there is only one line with this word and number. but it should recognize lowercase as well

|

A l w a y s S u n n y · Accepted Answer · 2019-03-16 01:08:40Z

If you want to match multiple instances then just add the g flag other wise it'll only match the single instance. REGEX

(?<=:)\s?([\d,]*\.\d+)

With Python,

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=:)\s?([\d,]*\.\d+)"

test_str = ("\n"
    "    💵Soles in mDm : 2864.35⬇\n"
    "    soles MDM: 2,864.35\n"
    "    Soles in mdm :2,864.355\n")

matches = re.search(regex, test_str, re.IGNORECASE)

if matches:
    print ("Match was found at {start}-{end}: {match}".format(start = matches.start(), end = matches.end(), match = matches.group()))

    for groupNum in range(0, len(matches.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = matches.start(groupNum), end = matches.end(groupNum), group = matches.group(groupNum)))

Collectives™ on Stack Overflow

Python regular expression - extracting float pattern

String sample

Desired return from regular expression `regex`

Some variances of desired float in the Second line of the string only

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

String sample

Desired return from regular expression regex

Some variances of desired float in the Second line of the string only

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

Desired return from regular expression `regex`