1

My input string contains various entities like this: conn_type://host:port/schema#login#password

I want to find out all of them using regex in python.

As of now, I am able to find them one by one, like

conn_type=re.search(r'[a-zA-Z]+',test_string)
  if (conn_type):
    print "conn_type:", conn_type.group()
    next_substr_len = conn_type.end()
    host=re.search(r'[^:/]+',test_string[next_substr_len:])

and so on.

Is there a way to do it without if and else? I expect there to be some way, but not able to find it. Please note that every entity regex is different.

Please help, I don't want to write a boring code.

1
  • 2
    Could you add a bit of actual input to the question and expected matches? Commented Feb 2, 2017 at 6:05

3 Answers 3

2

Why don't you use re.findall?

Here is an example:

import re;

s = 'conn_type://host:port/schema#login#password asldasldasldasdasdwawwda conn_type://host:port/schema#login#email';

def get_all_matches(s):
    matches = re.findall('[a-zA-Z]+_[a-zA-Z]+:\/+[a-zA-Z]+:+[a-zA-Z]+\/+[a-zA-Z]+#+[a-zA-Z]+#[a-zA-Z]+',s);
    return matches;

print get_all_matches(s);

this will return a list full of matches to your current regex as seen in this example which in this case would be:

['conn_type://host:port/schema#login#password', 'conn_type://host:port/schema#login#email']

If you need help making regex patterns in Python I would recommend using the following website:

A pretty neat online regex tester

Also check the re module's documentation for more on re.findall

Documentation for re.findall

Hope this helps!

Sign up to request clarification or add additional context in comments.

Comments

1
>>>import re
>>>uri = "conn_type://host:port/schema#login#password"
>>>res = re.findall(r'(\w+)://(.*?):([A-z0-9]+)/(\w+)#(\w+)#(\w+)', uri)
>>>res
[('conn_type', 'host', 'port', 'schema', 'login', 'password')]

No need for ifs. Use findall or finditer to search through your collection of connection types. Filter the list of tuples, as need be.

1 Comment

Thanks Gregory. I was not able to find the grouping options of a match with (). It worked for me. Really really helpful
1

If you like it DIY, consider creating a tokenizer. This is very elegant "python way" solution.

Or use a standard lib: https://docs.python.org/3/library/urllib.parse.html but note, that your sample URL is not fully valid: there is no schema 'conn_type' and you have two anchors in the query string, so urlparse wouldn't work as expected. But for real-life URLs I highly recommend this approach.

1 Comment

writing a tokenizer for this task is absolutely overqualified, and a simple re.findall is a more elegant approach

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.