I have a string like:
"<p>
<style type=""text/css"">
P { margin-bottom: 0.08in; direction: ltr; widows: 2; orphans: 2; }A:link { color: rgb(0, 0, 255); } </style>
</p>
<p style=""font-variant: normal; font-style: normal; font-weight: normal"">
<font face=""Trebuchet MS, Arial, Verdana, sans-serif""><span style=""font-size: 12px; background-color: rgb(238, 238, 238);"">blablabla. </span></font></p>
<p style=""font-variant: normal; font-style: normal; font-weight: normal"">
<font face=""Trebuchet MS, Arial, Verdana, sans-serif""><span style=""font-size: 12px; background-color: rgb(238, 238, 238);"">tjatjatja</span></font><span style=""font-family: 'Trebuchet MS', Arial, Verdana, sans-serif; font-size: 12px; background-color: rgb(238, 238, 238);"">tjetjetje</span><span style=""font-size: 12px; font-family: 'Trebuchet MS', Arial, Verdana, sans-serif; background-color: rgb(238, 238, 238);"">.</span></p>
<p style=""font-variant: normal; font-style: normal; font-weight: normal"">
<span style=""font-family: 'Trebuchet MS', Arial, Verdana, sans-serif; font-size: 12px; background-color: rgb(238, 238, 238);"">huehuehue</span></p>
"
I want to remove the first style tag and its' content. I have a regex like:
([\s\S]*)<style type=""text\/css"">[\s\S]+<\/style>([\s\S]*)
which matches just the first style tag but when I try to remove it in python with:
re.sub(r'([\s\S]*)<style type=""text/css"">[\s\S]*</style>([\s\S]*)', r'\1\2', cell_text, flags=re.M)
it doesn't work. I think it's either to do with the groups or with the string being multiline. Any ideas?
[\s\S]*non-greedy ([\s\S]*?), in case morestyletags are possible."in it? I'm guessing the string has 2 because that's how you escape quotes in python, but that shouldn't be necessary in a single quoted string, or...?