I was looking for URL regexp in python, after reading stack overflow I've decided to take this one: http://daringfireball.net/2010/07/improved_regex_for_matching_urls and use it in my python code.
I've put in something like this:
reg_url =
re.compile(r"""((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))\*))+(?:(([^\s()<>]+|(([^\s()<>]+)))\*)|[^\s`!()[]{};:`".,<>?«»“”‘’]))""",
re.DOTALL)
(Python 2.7)
After running my code with that regexp I am getting following error:
SyntaxError: Non-ASCII character '
\xe2' in filefile.pyon line 60, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
What is the best way to address this issue?
«»“”‘’. Read the PEP.