1

I have a file with millions of retweets – like this:

RT @Username: Text_of_the_tweet

I just need to extract the username from this string. Since I'm a total zero when it comes to regex, sometime ago here I was advised to use

username = re.findall('@([^:]+)', retweet)

This works great for the most part, but sometimes I get lines like this:

RT @ReutersAero: Further pictures from the #MH17 crash site in  in Grabovo, #Ukraine #MH17 - @reuterspictures (GRAPHIC): http://t.co/4rc7Y4…

I only need "ReutersAero" from the string, but since it contains another "@" and ":" it messes up the regex, and I get this output:

['ReutersAero', 'reuterspictures (GRAPHIC)']

Is there a way to use the regex only for the first instance it finds in the string?

2 Answers 2

2

You can use a regex like this:

RT @(\w+):

Working demo

enter image description here

Match information:

MATCH 1
1.  [4-15]  `ReutersAero`
MATCH 2
1.  [145-156]   `AnotherAero`

You can use this python code:

import re
p = re.compile(ur'RT @(\w+):')
test_str = u"RT @ReutersAero: Further pictures from the #MH17 crash site in  in Grabovo, #Ukraine #MH17 - @reuterspictures (GRAPHIC): http://t.co/4rc7Y4…\nRT @AnotherAero: Further pictures from the #MH17 crash site in  in Grabovo, #Ukraine #MH17 - @reuterspictures (GRAPHIC): http://t.co/4rc7Y4…\n"

re.findall(p, test_str)
Sign up to request clarification or add additional context in comments.

Comments

2

Is there a way to use the regex only for the first instance it finds in the string?

Do not use findall, but search.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.