Python Regex to parse email URLs but excluding the public email

Question

I am parsing a file that have entries like:

xxx-yy.biz.  39405   A   156.154.66.33
mail.global.com.   3464    A   115.113.9.64
xyx xyx xyx
webmail.xyz.com.  1463    A   115.113.9.64
gmail.com.   3464    A   115.113.9.22

I am trying to extact the URL and its IP address with string "mail" in it:

for line in (dnsfile):
            match = re.search(r'(.*mail.*?)\s+(.*)\s+A\s+(.*)', line)

and match.group(1) and match.group(2) is giving me URL and IP.

I want to extent this search so that I don't want to parse public emails like: gmail, hotmail, yahoo,mail. More general : exclude a list of words in this search.

Plain regexes can't do this, but a "negative lookahead assertion" might help you. See stackoverflow.com/questions/2078915/… and stackoverflow.com/questions/1395177/… — DrWatson
– DrWatson, Commented Sep 24, 2015 at 21:31

Kasravnd · Accepted Answer · 2015-09-24 21:41:20Z

1

You can use a negative look ahead, but you need to add the start and end anchors so you need re.DOTALL flags too (make the anchors to match from start and end of each line), you can create your negative look-ahead with joining the list of words with | :

re.search(r'^(?!{})(.*mail.*?)\s+(.*)\s+A\s+(.*)$'.format('|'.join(list_of_domin)),line,re.DOTALL)

See demo https://regex101.com/r/bF5xQ3/1

edited Sep 24, 2015 at 21:41

answered Sep 24, 2015 at 21:32

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kasravnd Over a year ago

@HasanRamezani Thanks, I think that look-around is a knight in regex world! ;-)

terbolous · Accepted Answer · 2015-09-24 21:45:12Z

0

If it's not a requirement to have it as part of the regexp you could do a simple array search

nothanks = ['gmail.com', 'hotmail.com']
for line in (dnsfile):
    match = re.search(r'(.*mail.*?)\.\s+(.*)\s+A\s+(.*)', line)
    if match:
        if not match.group(1) in nothanks:
            print match.group(1)

answered Sep 24, 2015 at 21:45

terbolous

1736 bronze badges

Collectives™ on Stack Overflow

Python Regex to parse email URLs but excluding the public email

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related