Remove Sub String by using Python

Question

I already extract some information from a forum. It is the raw string I have now:

string = 'i think mabe 124 + <font color="black"><font face="Times New Roman">but I don\'t have a big experience it just how I see it in my eyes <font color="green"><font face="Arial">fun stuff'

The thing I do not like is the sub string "<font color="black"><font face="Times New Roman">" and "<font color="green"><font face="Arial">". I do want to keep the other part of string except this. So the result should be like this

resultString = "i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"

How could I do this? Actually I used beautiful soup to extract the string above from a forum. Now I may prefer regular expression to remove the part.

this string is currently not working, it has both " and ' inside — juliomalegria
– juliomalegria, Commented Jan 2, 2012 at 16:23
@ThiefMaster Thanks for support. How could I remove it? It IS a shame for sure — Wenhao.SHE
– Wenhao.SHE, Commented Jan 2, 2012 at 16:23
@julio.alegria Please just treat the thing between beginning " and ending " as a string if you wanna some test. thanks lot — Wenhao.SHE
– Wenhao.SHE, Commented Jan 2, 2012 at 16:24
I dont get it, you extract the text with beautifulsoup, but you want to stop using it before you're done because ... ? — Jochen Ritzel
– Jochen Ritzel, Commented Jan 2, 2012 at 16:36

juliomalegria · Accepted Answer · 2012-01-02 16:26:34Z

192

import re
re.sub('<.*?>', '', string)
"i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"

The re.sub function takes a regular expresion and replace all the matches in the string with the second parameter. In this case, we are searching for all tags ('<.*?>') and replacing them with nothing ('').

The ? is used in re for non-greedy searches.

More about the re module.

answered Jan 2, 2012 at 16:26

juliomalegria

25k14 gold badges77 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sumanth232 Over a year ago

this is very helpful.. thanks . I used this to remove mentions (@s) in twitter tweets for my project - re.sub('@.*? ', '', tweetText)

keshav Over a year ago

I need to remove patterns like size 6.5 from mens tommy hilfiger knot boatshoe midnight uk size 6.5. If I use re.sub('size.*?[0-9]+', '', shoe), I get mens tommy hilfiger knot boatshoe midnight uk .5

Abhijit · Accepted Answer · 2012-01-02 16:27:59Z

21

>>> import re
>>> st = " i think mabe 124 + <font color=\"black\"><font face=\"Times New Roman\">but I don't have a big experience it just how I see it in my eyes <font color=\"green\"><font face=\"Arial\">fun stuff"
>>> re.sub("<.*?>","",st)
" i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
>>>

answered Jan 2, 2012 at 16:27

Abhijit

64k20 gold badges143 silver badges209 bronze badges

Comments

Benny Elgazar · Accepted Answer · 2020-12-11 08:42:46Z

-7

BeautifulSoup(text, features="html.parser").text

For the people who were seeking deep info in my answer, sorry.

I'll explain it.

Beautifulsoup is a widely use python package that helps the user (developer) to interact with HTML within python.

The above like just take all the HTML text (text) and cast it to Beautifulsoup object - that means behind the sense its parses everything up (Every HTML tag within the given text)

Once done so, we just request all the text from within the HTML object.

edited Dec 11, 2020 at 8:42

answered Apr 10, 2020 at 10:25

Benny Elgazar

3612 silver badges10 bronze badges

2 Comments

Mark Rotteveel Over a year ago

Please don't post only code as answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually of higher quality, and are more likely to attract upvotes.

Benny Elgazar Over a year ago

Hey Sorry, sometimes I feel the question is so straightforward that the real answer is the actual implementation.

Collectives™ on Stack Overflow

Remove Sub String by using Python

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related