1

How can I extract a number from a string in python without having to use regex? I have seen isinstance but the number could change to almost anything. Any ideas?

https://www.investopedia.com/articles/retirement/?page=6

2
  • 1
    Well, you're just reading from a query string in that case... parse it out and read the page parameter... Commented Sep 19, 2018 at 23:34
  • urlparse documentation for parsing url Commented Sep 19, 2018 at 23:36

5 Answers 5

2

It's a bit verbose, but I would use url parsing for this. The advantage overy regex is that you would get some input validation for free, and more readable code.

>>> from urllib.parse import urlparse, parse_qs
>>> url = 'https://www.investopedia.com/articles/retirement/?page=6'
>>> parsed = urlparse(url)
>>> query = parse_qs(parsed.query)
>>> [page] = query['page']
>>> int(page)
6
Sign up to request clarification or add additional context in comments.

Comments

2

You can extract continuous groups of digits, anywhere on the string, using the following:

from itertools import groupby

url = 'https://www.investopedia.com/articles/retirement/?page=6&limit=10&offset=15'
print([int(''.join(group)) for key, group in groupby(iterable=url, key=lambda e: e.isdigit()) if key])

Output

[6, 10, 15]

Comments

1

This assumes that there isn't multiple blocks of integers (e.g. www.something212.com/page=?13)

You could try using list comprehensions and str.isdigit()

url = 'https://www.investopedia.com/articles/retirement/?page=6'

digits = [d for d in url if d.isdigit()]

digit = ''.join(digits)

digit
>>> 6

Edited: now works with digits above 9

15 Comments

what happens if that 6 is 12?
digits would produce [6,12]. You could join the answer by using number = ''.join(map(int, digits))
I know, I'm saying why not address that in your answer?
you could just ''.join(digits) since you already know whats in there
@vash_the_stampede I’m gonna have to agree with you. This is a pretty good bit of code.
|
1

If the url always has that format with only digits at the end you could do this:

s = 'https://www.investopedia.com/articles/retirement/?page=25'
new = []
k = list(s)
[new.append(i) for i in k if i.isdigit()]
print(''.join(new))
(xenial)vash@localhost:~/python/stack_overflow$ python3.7 isdigit.py
25

Comments

0

I know you do not need re, but it is actually very powerful. Under the hood, most libraries make use of re. Here is my solution to handle this situation:

import re

url = "www.fake888.com/article/?article=123&page=9&group=8"

numbers = re.findall(r'(?<==)(\d+)', url)
print(f'Found: {" ".join(numbers)}')

varval = re.findall(r'(\w+)=(\d+)', url)
urldict = {}
for var in varval:
  urldict[var[0]] = var[1]

print(urldict)

The output is

Found: 123 9 8
{'article': '123', 'page': '9', 'group': '8'}

1 Comment

I’m gonna have to check that out! Thanks for the link. I really have wanted to learn regex for a while because of its power. 💪

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.