How to use regular expression to retrieve data in python?

Question

I have a string defined as,

content = "f(1, 4, 'red', '/color/down1.html');    
f(2, 5, 'green', '/color/colorpanel/down2.html');    
f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"

Here is the code I tried but it doesn't work:

results = re.findall(r"f(.*?)", content)
for each in results:
    print each

How to use regular expression to retrieve the links within the content? Thanks.

You should show us the code and regexes that you've tried already. — PM 2Ring
– PM 2Ring, Commented Feb 11, 2017 at 8:11
Here is the code I tried but it doesn't work. results = re.findall(r"f(.*?)", content) for each in results: print each — dullboy
– dullboy, Commented Feb 11, 2017 at 8:20
You probably want to use re.findall(re_pattern, content), where re_pattern is your regex. — Hesham Attia
– Hesham Attia, Commented Feb 11, 2017 at 8:21
That is exactly my question. What would be the correct pattern in order to retrieve the link. — dullboy
– dullboy, Commented Feb 11, 2017 at 8:24
What links are you referring to?, is it the last part as down3.html or the whole link? — Iron Fist
– Iron Fist, Commented Feb 11, 2017 at 9:58

GoingMyWay · Accepted Answer · 2017-02-12 01:07:37Z

1

You can learn the basic regexes on https://regex101.com/ and http://regexr.com/

In [4]: import re

In [5]: content = "f(1, 4, 'red', '/color/down1.html');    \
   ...: f(2, 5, 'green', '/color/colorpanel/down2.html');   \
   ...: f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"

In [6]: p = re.compile(r'(?=/).*?(?<=.html)')

In [7]: p.findall(content)
Out[7]: 
['/color/down1.html',
 '/color/colorpanel/down2.html',
 '/color/colorpanel/colorlibrary/down3.html']

.*? matches any character (except for line

*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)

You can also just get the last /

In [8]: p2 = re.compile(r'[^/]*.html')

In [9]: p2.findall(content)
Out[9]: ['down1.html', 'down2.html', 'down3.html']

[^/]* Match a single character not present in the list below

* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

/ matches the character / literally (case sensitive)

. matches any character (except for line terminators) html matches the characters html literally (case sensitive).

Or, you can extract all the data in f()

In [15]: p3 = re.compile(r"(?=f\().*?(?<=\);)")

In [16]: p3.findall(content)
Out[16]: 
["f(1, 4, 'red', '/color/down1.html');",
 "f(2, 5, 'green', '/color/colorpanel/down2.html');",
 "f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"]

edited Feb 12, 2017 at 1:07

answered Feb 11, 2017 at 8:20

GoingMyWay

17.6k33 gold badges105 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

dullboy Over a year ago

Regarding p = re.compile(r'(?=/).*?(?<=.html)'), why not simply p = re.compile(r'(?=/).*(?<=.html)') ?What is the purpose to add? after *? Thanks.

GoingMyWay Over a year ago

@dullboy, I added explanation in the answer, if you think my answer solved your problem, please consider accept my answer, thanks.

Hesham Attia · Accepted Answer · 2017-02-11 08:30:46Z

0

You could do something like:

re.findall(r"f\(.*,.*,.*, '(.*)'", content)

answered Feb 11, 2017 at 8:30

Hesham Attia

9778 silver badges13 bronze badges

1 Comment

dullboy Over a year ago

That is really a smart one. Thanks.

Mohammad Yusuf · Accepted Answer · 2017-02-11 10:08:15Z

0

You can try like so:

import re

content = """f(1, 4, 'red', '/color/down1.html');    
    f(2, 5, 'green', '/color/colorpanel/down2.html');    
    f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"""

print re.findall(r"(\/[^']+?)'", content)

Output:

['/color/down1.html', '/color/colorpanel/down2.html', '/color/colorpanel/colorlibrary/down3.html']

Regex:

(\/[^']+?)' - match / followed by 1 or more non ' characters till first occurence of ' and capture in group1.

edited Feb 11, 2017 at 10:08

answered Feb 11, 2017 at 9:57

Mohammad Yusuf

17.1k12 gold badges60 silver badges87 bronze badges

Collectives™ on Stack Overflow

How to use regular expression to retrieve data in python?

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related