3

What I have is:

from lxml import etree
myscript = "if(0 < 1){alert(\"Hello World!\");}"
html = etree.fromstring("<script></script>")

for element in html.findall('//script'):
    element.text = myscript

result = etree.tostring(html)

What I get is:

>>> result
'<script>if(0 &lt; 1){alert("Hello World!");}</script>'

What I want is unescaped JavaScript:

>>> result
'<script>if(0 < 1){alert("Hello World!");}</script>'

2 Answers 2

1

The reason why your approach fails is because you're trying to change the "text" content of the element, whereas you need to change/insert/append the Element of its own, see this sample:

In [1]: from lxml import html

In [2]: myscript = "<script>if(0 < 1){alert(\"Hello World!\");}</script>"

In [3]: template = html.fromstring("<script></script>")

# just a quick hack to get the <script> element without <html> <head>
In [4]: script_element = html.fromstring(myscript).xpath("//script")[0]

# insert new element then remove the old one
In [10]: for element in template.xpath("//script"):
   ....:     element.getparent().insert(0, script_element)
   ....:     element.getparent().remove(element)
   ....:

In [11]: print html.tostring(template)
<html><head><script>if(0 < 1){alert("Hello World!");}</script></head></html>

So yes, you can still technically use lxml to insert element. And I suggest using lxml.html over etree as html is more friendly regarding to html elements.

Sign up to request clarification or add additional context in comments.

Comments

1

You can’t. lxml.etree and ElementTree are XML parsers, so whatever you want to parse or create with them has to be valid XML. And an unescaped < inside some node text is not valid XML. It’s valid HTML but not valid XML.

That’s why in XHTML, you usually had to add CDATA blocks inside <script> tags, so you could put whatever in there without having to worry about valid XML structure.

But in your case, you just want to produce HTML, and for that, you should use an HTML parser. For example BeautifulSoup:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<script></script>')
>>> soup.find('script').string = 'if(0 < 1){alert("Hello World!");}'
>>> str(soup)
'<script>if(0 < 1){alert("Hello World!");}</script>'

5 Comments

not being impolite but I believe technically you can use lxml to parse the <script> element, it's the method OP using is wrong -- trying to change the text than element itself.
@Anzel You’re using an HTML parser yourself in your answer… And you can easily confirm that you cannot handle the desired output text with an XML parser by just trying to parse the output text.
OP has never mentioned not to use HTML parser. The main question is around how to insert JS into the element. Simply put, it's doable by replacing the element on its own. Because normally you will not receive a script tag in XML file, so OP using etree is also inappropriated
@Anzel Please read my answer? It essentially says “you can’t with an XML parser, use an HTML parser”, so I really don’t get what you are trying to tell me.
I'm confused, lxml is not only a XML parser, it's also a HTML parser. BeautifulSoup can use "lxml" as its parser too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.