3

I am trying to develop a program on Python that would get the name of the artists from a twit from Pandora. Like for example if I have this twitter:

I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.

I would like to get only the name Luther Vandross back. I do not know much about regex, so I tried to do the following code:

print  re.findall('".+?" by [\w+]+',  text)    

But the result was "I can Make it Better" by Luther

Do you have any idea on how I would be able to develop a regular expression on python to get it?

5 Answers 5

3

Your regex is near, but you can change the delimiters to use " by and on. However, you need to use capturing groups by using parentheses.

You can use a regex like this:

" by (.+?) on

Working demo

Regular expression visualization

The idea behind this regex is to capture the content between the " by and on, using a simple nongreedy regex.

Match information

MATCH 1
1.  [43-58] `Luther Vandross`

Code

import re
p = re.compile(ur'" by (.+?) on')
test_str = u"I'm listening to \"I Can Make It Better\" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.\n"

re.search(p, test_str)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your help =), I was having some difficulties understanding on how regex works, but this answer made it way more clear.
2
>>> s = '''I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.'''

>>> import re
>>> m = re.search('to "?(.*?)"? by (.*?) on #?Pandora', s)
>>> m
<_sre.SRE_Match object; span=(14, 69), match='to "I Can Make It Better" by Luther Vandross on P>
>>> m.groups()
('I Can Make It Better', 'Luther Vandross')

More test cases:

>>> tests = [
    '''I'm listening to "Don't Turn Out The Lights (D.T.O.T.L.)" by NKOTBSB on #Pandora''',
    '''I'm listening to G.O.D. Remix by Canton Jones on #Pandora''',
    '''I'm listening to "It's Been Awhile" by @staindmusic on Pandora #pandora http://pdora.co/R1OdxE''',
    '''I'm listening to "Everlong" by @foofighters on #Pandora http://pdora.co/1eANfI0''',
    '''I'm listening to "El Preso (2000)" by Fruko Y Sus Tesos on #Pandora http://pdora.co/1GtOHC1'''
    '''I'm listening to "Cat Daddy" by Rej3ctz on #Pandora http://pdora.co/1eALNpc''',
    '''I'm listening to "Space Age Pimpin'" by 8 Ball & MJG on Pandora #pandora http://pdora.co/1h8swun'''
]
>>> expr = re.compile('to "?(.*?)"? by (.*?) on #?Pandora')
>>> for s in tests:
        print(expr.search(s).groups())

("Don't Turn Out The Lights (D.T.O.T.L.)", 'NKOTBSB')
('G.O.D. Remix', 'Canton Jones')
("It's Been Awhile", '@staindmusic')
('Everlong', '@foofighters')
('El Preso (2000)', 'Fruko Y Sus Tesos')
("Space Age Pimpin'", '8 Ball & MJG')

1 Comment

I scanned the #Pandora hashtag on Twitter for a few more examples and adjusted the expression to make it work with all those patters.
2

You need to use capturing group.

print re.findall(r'"[^"]*" by ([A-Z][a-z]+(?: [A-Z][a-z]+){0,2})',  text)  

I used the repeatation quantifier, since the name may contain only first name or first, lastname or first,middle,last name.

Comments

1
print  re.findall('".+?" by ((?:[A-Z][a-z]+ )+)',  text)   

You can try this.See demo.

https://regex101.com/r/vH0iN5/5

Comments

1

You can use this lookaround based regex:

str = 'I\'m listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.';
print re.search(r'(?<=by ).+?(?= on)', str).group()
Luther Vandross

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.