33

I have a string that stores a number and a unit for example

x= '$120'
y = ' 90 Degrees F'
banana = '200 kgm'
orange = '300 gm'
total_weight = banana + orange/1000 

and for example I want to add the weights

total_weight  = 200 + 300/1000

Thanks!

I'm trying to extract the numbers only to do some operations with these... any idea of what the simplest way to do this? I'm only dealing with these two formats i.e. digits are at the begining or at the end of the string...

6
  • I would suggest you to have a look at re module. Regular expressions are meant for extracting structured data from corpus. Commented Apr 28, 2012 at 16:07
  • 1
    Your own example shows the issue here. banana is in kgm and orange is in gm, surely that means that the weights are 200000 and 300, not 200 and 300, which adds more complexity to the problem. Does that matter to you?. Commented Apr 28, 2012 at 16:09
  • 1
    He divides orange by 1000 for that exact reason Commented Apr 28, 2012 at 16:10
  • @jamylak Exactly, is that fixed? Or is that something that could change? Commented Apr 28, 2012 at 16:19
  • I presumed that was not part of the question since it was hard-coded in. Commented Apr 28, 2012 at 16:21

5 Answers 5

69

The simplest way to extract a number from a string is to use regular expressions and findall.

>>> import re
>>> s = '300 gm'
>>> re.findall('\d+', s)
['300']
>>> s = '300 gm 200 kgm some more stuff a number: 439843'
>>> re.findall('\d+', s)
['300', '200', '439843']

It might be that you need something more complex, but this is a good first step.

Note that you'll still have to call int on the result to get a proper numeric type (rather than another string):

>>> map(int, re.findall('\d+', s))
[300, 200, 439843]
Sign up to request clarification or add additional context in comments.

5 Comments

will this work for float numbers? I'm very new to python world and I have no clue what the d+ stand for.. thanks for you help :)
You can read up on the different regex expressions here but '\d+' means one or more digits (the + means one or more).
@KaRa 'd' stands for 'any decimal digit', and '+' stands for 'match 1 or more repetitions'. For more details have a look at docs.python.org/2/library/re.html.
@KaRa You should learn about regex expressions. Such as import re re.findall("[a-z]", "abcccd ff") gets all lowercase letters (a-z) one by one in the string "abcccd ff"
Another approach !!! but only if you have one number in your string !!! will be int(filter(str.isdigit, ' 90 Degrees F')) 90 int(filter(str.isdigit, '$120')) 120 int(filter(str.isdigit, '200 kgm')) 200 int(filter(str.isdigit, '300 gm')) 300 because (filter(str.isdigit, '300 gm 90')) 30090
32

Without using regex, you can just do:

def get_num(x):
    return int(''.join(ele for ele in x if ele.isdigit()))

Result:

>>> get_num(x)
120
>>> get_num(y)
90
>>> get_num(banana)
200
>>> get_num(orange)
300

EDIT :

Answering the follow up question.

If we know that the only period in a given string is the decimal point, extracting a float is quite easy:

def get_num(x):
    return float(''.join(ele for ele in x if ele.isdigit() or ele == '.'))

Result:

>>> get_num('dfgd 45.678fjfjf')
45.678

2 Comments

Awesome! is there anyway to edit this to make also work for float? is it correct to correct to change the return statement to float ?
This won't work if there are multiple numbers
3

This regular expression handles floats as well

import re
re_float = re.compile(r'\d*\.?\d+')

You could also add a group to the expression that catches your weight units.

re_banana = re.compile(r'(?P<number>\d*\.?\d+)\s?(?P<uni>[a-zA-Z]+)')

You can access the named groups like this re_banana.match("200 kgm").group('number').

I think this should help you getting started.

2 Comments

This does not handle scientific notation or many other interesting floating point values (e.g. nan, inf)
banana = '200 kgm' orange = '300 gm' banana = int(banana.replace('kgm', ' ')) orange = int (orange.replace('gm', ' ')) print (banana) print (orange) total_weight = (banana + (orange/1000 )) print (total_weight)
1
>>> x='$120'
>>> import string
>>> a=string.maketrans('','')
>>> ch=a.translate(a, string.digits)
>>> int(x.translate(a, ch))
120

2 Comments

This doesn't work... try it for yourself
@jaymylak thanks for pointing it.rectified
0

If you're doing some sort of math with the numbers you might also want to know the units. Given your input restrictions (that the input string contains unit and value only), this should correctly return both (you'll just need to figure out how to convert units into common units for your math).

def unit_value(str):
    m = re.match(r'([^\d]*)(\d*\.?\d+)([^\d]*)', str)
    if m:
        g = m.groups()
        return ' '.join((g[0], g[2])).strip(), float(g[1])
    else:
        return int(str)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.