2

I am having problem creating a function that takes a string containing length values (eg: '32.0 mm / 1.259"`) and returning just the value in mm.

My current function parse is only able to handle strings that have just the mm value, but not if both mm and inches value exist.

Any help is greatly appreciated!

Regex pattern: re.sub("[^0-9.\-]", "", str)

import re

def parse(str):
    if not str:
        return None
    str = str.lower()
    return float(re.sub("[^0-9.\-]", "", str))

tests = ['12.3 mm', '12.3mm', '32.0 mm / 1.259"', '32.0mm / 1.259"']
for s in tests: 
    print( parse(s) )

Expected Output

12.3
12.3
32.0
32.0

Actual Output

12.3
12.3
ValueError: could not convert string to float: '32.01.259'
1
  • Could you please have a look at the answers and choose the one that worked best for you? There are solutions like 1) remving all starting with mm, 2) extracting numbers before mm. My solution is very similar to Daniel's, but it does not extract the number in case of 5. mmorph. since I am using a word boundary and my solution will also work in case of integer numbers before mm. Commented Oct 18, 2019 at 7:36

3 Answers 3

1

You may actually tell regex to capture a float/int value that is right before a mm whole word:

re.search(r"([0-9]+(?:\.[0-9]+)?)\s*mm\b", text.lower())

See the regex demo online.

Here,

  • ([0-9]+(?:\.[0-9]+)?) - Group 1: 1+ digits followed with an optional sequence of . and 1+ digits
  • \s* - 0+ whitespaces
  • mm\b - mm and a word boundary.

See the Python demo:

import re

def parse(text):
    if not text:
        return None
    match = re.search(r"([0-9]+(?:\.[0-9]+)?)\s*mm\b", text.lower())
    if match:
        return float(match.group(1))
    return text

tests = ['12.3 mm', '12.3mm', '32.0 mm / 1.259"', '32.0mm / 1.259"']
for s in tests: 
    print( parse(s) )
Sign up to request clarification or add additional context in comments.

Comments

1

Just simplify your regex pattern to the following:

re.sub("mm.*", "", str)

... and you'll the expected output

Comments

1

You could search for the matching pattern instead of using sub, for example:

import re


def parse(s):
    if not s:
        return None
    s = s.lower()
    return float(re.search("(\d+\.\d*\s*)mm", s).group(1))


tests = ['12.3 mm', '12.3mm', '32.0 mm / 1.259"', '32.0mm / 1.259"']

print([parse(test) for test in tests])

Output

[12.3, 12.3, 32.0, 32.0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.