27

I have a list of regex patterns.

rgx_list = ['pattern_1', 'pattern_2', 'pattern_3']

And I am using a function to loop through the list, compile the regex's, and apply a findall to grab the matched terms and then I would like a way of deleting said terms from the text.

def clean_text(rgx_list, text):
    matches = []
    for r in rgx_list:
        rgx = re.compile(r)
        found_matches = re.findall(rgx, text)
        matches.append(found_matches)

I want to do something like text.delete(matches) so that all of the matches will be deleted from the text and then I can return the cleansed text.

Does anyone know how to do this? My current code will only work for one match of each pattern, but the text may have more than one occurence of the same pattern and I would like to eliminate all matches.

1
  • 2
    Do you need those matches at all? Maybe it is easier to just re.sub the text first thing? Also, the order of patterns matters. You should see to that beforehand. Commented May 12, 2016 at 16:30

2 Answers 2

39

Use sub to replace matched patterns with an empty string. No need to separately find the matches first.

def clean_text(rgx_list, text):
    new_text = text
    for rgx_match in rgx_list:
        new_text = re.sub(rgx_match, '', new_text)
    return new_text
Sign up to request clarification or add additional context in comments.

Comments

0

For simple regex you can OR the expressions together using a "|". There are examples of combining regex using OR on stack overflow.

For really complex regex I would loop through the list of regex. You could get timeouts from combined complex regex.

1 Comment

Could you share some example or link?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.