Finding a sub string and deleting it using regex, python

Question

I have a data set which looks like thus,

"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection."
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"

So i am trying to get rid of all the @ AND the words attached to it. My dataset should look something like this.

"See the new #Gucci 5th Ave NY windows customized by for the debut of the #GucciGhost collection."
    "Before the #GucciGhost collection debuts tomorrow, read about the artist"

So i can use a simple replace statement to get rid of the @. But the adjacent word is a problem.

I am using re to search/find the occurrence. But i am not able to delete this word.

P.S -- If it was a single word, it would have not been a problem. But there are multiple words here and there in my data set attached to @

What is the problem you have? What code does not remove the @+word? Did you try re.sub? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Sep 15, 2016 at 9:07
my problem was that i was not able to remove the entire @+word. i was using a re.findall. anyway, re.sub works. thanks — M PAUL
– M PAUL, Commented Sep 15, 2016 at 10:25

LycuiD · Accepted Answer · 2016-09-15 09:15:34Z

2

You can use regex

import re

a = [ 
"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection.",
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"
]
pat = re.compile(r"@\S+") # \S+ all non-space characters
for i in range(len(a)):
    a[i] = re.sub(pat, "", a[i]) # replace it with empty string
print a

This will give you what you want.

answered Sep 15, 2016 at 9:15

LycuiD

2,5751 gold badge21 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Marcs · Accepted Answer · 2016-09-16 00:07:21Z

0

Idiomatic version, subs extra space:

import re

a = [
    "See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection.",
    "Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"
]

rgx = re.compile(r"\s?@\S+")

b = [ re.sub(rgx, "", row) for row in a ]

print b

\s?: \s matches ' ' and ? stands for zero or one occurrence.

edited Sep 16, 2016 at 0:07

answered Sep 16, 2016 at 0:02

Marcs

3,8485 gold badges35 silver badges42 bronze badges

Collectives™ on Stack Overflow

Finding a sub string and deleting it using regex, python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related