1

I am parsing a file that have entries like:

xxx-yy.biz.  39405   A   156.154.66.33
mail.global.com.   3464    A   115.113.9.64
xyx xyx xyx
webmail.xyz.com.  1463    A   115.113.9.64
gmail.com.   3464    A   115.113.9.22

I am trying to extact the URL and its IP address with string "mail" in it:

for line in (dnsfile):
            match = re.search(r'(.*mail.*?)\s+(.*)\s+A\s+(.*)', line)

and match.group(1) and match.group(2) is giving me URL and IP.

I want to extent this search so that I don't want to parse public emails like: gmail, hotmail, yahoo,mail. More general : exclude a list of words in this search.

1

2 Answers 2

1

You can use a negative look ahead, but you need to add the start and end anchors so you need re.DOTALL flags too (make the anchors to match from start and end of each line), you can create your negative look-ahead with joining the list of words with | :

re.search(r'^(?!{})(.*mail.*?)\s+(.*)\s+A\s+(.*)$'.format('|'.join(list_of_domin)),line,re.DOTALL)

See demo https://regex101.com/r/bF5xQ3/1

Sign up to request clarification or add additional context in comments.

1 Comment

@HasanRamezani Thanks, I think that look-around is a knight in regex world! ;-)
0

If it's not a requirement to have it as part of the regexp you could do a simple array search

nothanks = ['gmail.com', 'hotmail.com']
for line in (dnsfile):
    match = re.search(r'(.*mail.*?)\.\s+(.*)\s+A\s+(.*)', line)
    if match:
        if not match.group(1) in nothanks:
            print match.group(1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.