You can learn the basic regexes on https://regex101.com/ and http://regexr.com/
In [4]: import re
In [5]: content = "f(1, 4, 'red', '/color/down1.html'); \
...: f(2, 5, 'green', '/color/colorpanel/down2.html'); \
...: f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"
In [6]: p = re.compile(r'(?=/).*?(?<=.html)')
In [7]: p.findall(content)
Out[7]:
['/color/down1.html',
'/color/colorpanel/down2.html',
'/color/colorpanel/colorlibrary/down3.html']
.*? matches any character (except for line
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
You can also just get the last /
In [8]: p2 = re.compile(r'[^/]*.html')
In [9]: p2.findall(content)
Out[9]: ['down1.html', 'down2.html', 'down3.html']
[^/]* Match a single character not present in the list below
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
/ matches the character / literally (case sensitive)
. matches any character (except for line terminators)
html matches the characters html literally (case sensitive).
Or, you can extract all the data in f()
In [15]: p3 = re.compile(r"(?=f\().*?(?<=\);)")
In [16]: p3.findall(content)
Out[16]:
["f(1, 4, 'red', '/color/down1.html');",
"f(2, 5, 'green', '/color/colorpanel/down2.html');",
"f(3, 6, 'blue', '/color/colorpanel/colorlibrary/down3.html');"]
down3.htmlor the whole link?