0

I have a data set which looks like thus,

"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection."
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"

So i am trying to get rid of all the @ AND the words attached to it. My dataset should look something like this.

"See the new #Gucci 5th Ave NY windows customized by for the debut of the #GucciGhost collection."
    "Before the #GucciGhost collection debuts tomorrow, read about the artist"

So i can use a simple replace statement to get rid of the @. But the adjacent word is a problem.

I am using re to search/find the occurrence. But i am not able to delete this word.

P.S -- If it was a single word, it would have not been a problem. But there are multiple words here and there in my data set attached to @

2
  • What is the problem you have? What code does not remove the @+word? Did you try re.sub? Commented Sep 15, 2016 at 9:07
  • my problem was that i was not able to remove the entire @+word. i was using a re.findall. anyway, re.sub works. thanks Commented Sep 15, 2016 at 10:25

2 Answers 2

2

You can use regex

import re

a = [ 
"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection.",
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"
]
pat = re.compile(r"@\S+") # \S+ all non-space characters
for i in range(len(a)):
    a[i] = re.sub(pat, "", a[i]) # replace it with empty string
print a

This will give you what you want.

Sign up to request clarification or add additional context in comments.

Comments

0

Idiomatic version, subs extra space:

import re

a = [
    "See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection.",
    "Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"
]

rgx = re.compile(r"\s?@\S+")

b = [ re.sub(rgx, "", row) for row in a ]

print b

\s?: \s matches ' ' and ? stands for zero or one occurrence.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.