1

i have a file with rows like

From [email protected] Fri Jan  4 06:08:27 2008
Received: (from apache@localhost)
Return-Path: <[email protected]>
for <[email protected]>;

I was trying to read each line and use regular expression to find the domain name, basically the portion after the sign @. Here is the code I wrote

if re.search('[@]\S+?', line) : org = re.findall('@(\S+)',line)[0]

But it returns the following results

uct.ac.za
localhost)
collab.sakaiproject.org>
collab.sakaiproject.org>;

Is there any smart way to only keep the domain and do not include the ')', '>' or '>;' followed by the domain name?

2 Answers 2

3

Slight correction - a FQDN can include numbers also...

so the regex needs a slight adjustment to

[@][a-zA-Z0-9.-]+

Full Domain rules at https://en.wikipedia.org/wiki/Uniform_Resource_Locator

Sign up to request clarification or add additional context in comments.

Comments

2

Try this

use regex negation for to do it, [^\>\)\s]+

if re.search('@([^\>\)\s]+)', line) : org = re.findall('@([^\>\)\s]+)',line)[0]

output

uct.ac.za
localhost
collab.sakaiproject.org
collab.sakaiproject.org

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.