2

There is a string, it contains numbers and characters.
I need to find an entire number(s) (in that string) that contains number 467033. e.g. 1.467033777777777

Thanks

2
  • Is it possible to get few lines of real data to give you more relevant answer? Commented Mar 22, 2011 at 23:00
  • Thanks very very much, but I did it! Commented Mar 22, 2011 at 23:03

3 Answers 3

2

Try this:

import re

RE_NUM = re.compile('(\d*\.\d+)', re.M)

text = 'eghwodugo83o135.13508yegn1.4670337777777773u87208t'
for num in RE_NUM.findall(text):
    if '467033' in num:
        print num

Prints:

1.4670337777777773

Generalized / optimized in response to comment:

def find(text, numbers):
    pattern = '|'.join('[\d.]*%s[\d.]*' % n for n in numbers)
    re_num = re.compile(pattern, re.M)
    return [m.group() for m in re_num.finditer(text)]

print find(text, ['467033', '13'])

Prints:

['135.13508', '1.4670337777777773']
Sign up to request clarification or add additional context in comments.

4 Comments

This is a decent example but possibly costly depending on the dataset. Since you're already scanning the string with re, you might as well bake the sentinel value into the pattern.
Yep, I went for simplicity over performance - just updated the answer with something more general, to search for multiple numbers in one pass.
I don't understand the remark of jathanism -.- In the first code, I would do: if '467033' in num.replace('.','') The second code prevents to do that. But maybe there is no need to do that, depending of what he wants.
I meant that it could be costly as far as the overhead of the operation of 1.) parsing the text with the regex, and then 2.) iterating your matches to perform an in membership test to find the matching substring. Whether or not this matters depends on the size of the dataset you are working with.
1

If you're just searching for a substring within another substring, you can use in:

>>> sub_num = "467033"
>>> my_num = "1.467033777777777"
>>> sub_num in my_num
True

However, I suspect there's more to your problem than just searching strings, and that doing it this way might not be optimal. Can you be more specific about what you're trying to do?

2 Comments

Got some data(a lot) from physics computations. I made a weird notice - numbers that contain number 467033 might be very informative to me.
So, there is a string -> find all the numbers that contain number e.g. 7
1
import re
a = 'e.g. 1.467033777777777\nand also 576575567467033546.90 Thanks '
r = re.compile('[0-9.]*467033[0-9.]*')
r.findall(a)
['1.467033777777777', '576575567467033546.90']

1 Comment

Well, I was assuming that you needed it before or after the decimal. The easiest thing would be to write a number of parallel regexes, and python should compile it optimally anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.