Regex Remove Markup Python

Question

Have a string:

myString = '<p>Phone Number:</p><p>706-878-8888</p>'

Trying to regex out all HTML tags, in this case Paragraphs.

Thanks!

Don't use Regex to parse (X)HTML. Use a parser. BeautifulSoup comes to mind. — g.d.d.c
– g.d.d.c, Commented Jan 30, 2012 at 19:37
possible duplicate of RegEx match open tags except XHTML self-contained tags — Hamish
– Hamish, Commented Jan 30, 2012 at 19:37
possible duplicate of stackoverflow.com/questions/8703017/… — juliomalegria
– juliomalegria, Commented Jan 30, 2012 at 19:44
I would link directly to the answer of that question @Hamish: stackoverflow.com/a/1732454/147129 :-P — GaretJax
– GaretJax, Commented Jan 30, 2012 at 19:45

jcollado · Accepted Answer · 2012-01-30 19:40:30Z

2

Using BeautifulSoup as pointed out by a comment:

>>> from BeautifulSoup import BeautifulSoup
>>> BeautifulSoup(myString).text
u'Phone Number:706-878-8888'

answered Jan 30, 2012 at 19:40

jcollado

40.5k9 gold badges108 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Hikalea Over a year ago

Perfect! I kept trying attribute 'string' instead of text. Much thanks!

juliomalegria · Accepted Answer · 2012-01-30 19:43:07Z

2

Use re.sub:

>>> re.sub('<[^>]+>', '', '<p>Phone Number:</p><p>706-878-8888</p>')
'Phone Number:706-878-8888'

Using re is a good solution if you just want to remove tags. But, if you're want to do things a little bit more complicated (involving HTML parsing) I suggest you to look into BeautifulSoup.

edited Jan 30, 2012 at 19:43

answered Jan 30, 2012 at 19:37

juliomalegria

25k14 gold badges77 silver badges89 bronze badges

Collectives™ on Stack Overflow

Regex Remove Markup Python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related