0

The answer to my earier question How do I access an XML node that uses quote marks with xmlstarlet? shows how to access a node using the namespace, and in that case, deleting the entire node.

xmlstarlet edit -N ns="http://www.w3.org/2005/Atom" -d "//ns:content[@type='html']" input.xml > output.xml

But how would I edit the contents of the <content type='html'> node?

Let's say I want to delete all HTML tags in all the <content type='html'> nodes, but leave the text.

Is it possible to use xmlstarlet to edit a node?

input.xml:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:blogger="http://schemas.google.com/blogger/2018">
  <title>Testv1</title>
<entry>
    <author>
      <name>Author</name>
    </author>
    <title/>
    <content type='html'><p>Test Post 2</p><p></p><p>
Sed ut perspiciatis unde omnis iste natus error sit voluptatem,
eaque ipsa quae voluptas nulla pariatur?</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/kitten2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="360" height="320" src="https://blogger.googleusercontent.com/kitten2.png" width="320" 
/></a></div><br /><p></p><p></p><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor</content>
  </entry>
<entry>
    <author>
      <name>Author</name>
    </author>
    <title/>
<content type='html'>....</content>
  </entry>

Desired output.xml:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:blogger="http://schemas.google.com/blogger/2018">
  <title>Testv1</title>
<entry>
    <author>
      <name>Author</name>
    </author>
    <title/>
    <content type='html'>Test Post 2 Sed ut perspiciatis unde omnis iste natus error sit voluptatem, eaque ipsa quae voluptas nulla pariatur? Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor</content>
  </entry>
<entry>
    <author>
      <name>Author</name>
    </author>
    <title/>
<content type='html'>....</content>
  </entry>
4
  • You can try to use xmlstarlet with XSLT transformations for your needs. XSLT is much more capable than just XPath. Commented Sep 7 at 21:05
  • Here's a similar question. You will probably have to do a parsing of the html code itself with xmlstarlet or xmllint Commented Sep 7 at 22:29
  • @YitzhakKhabinsky Thanks, I'll look into that. I'm just learning XML parsing and the tools involved. Commented Sep 14 at 19:27
  • @LMC thanks, I see your point, but am just learning XML parsing and the tools involved. Commented Sep 14 at 19:29

1 Answer 1

1
+50
xmlstarlet edit -N ns="http://www.w3.org/2005/Atom" \
                --update "//ns:content" --expr "normalize-space(string(.))" input.xml

Output:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:blogger="http://schemas.google.com/blogger/2018">
  <title>Testv1</title>
  <entry>
    <author>
      <name>Author</name>
    </author>
    <title/>
    <content type="html">Test Post 2 Sed ut perspiciatis unde omnis iste natus error sit voluptatem, eaque ipsa quae voluptas nulla pariatur?Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor</content>
  </entry>
  <entry>
    <author>
      <name>Author</name>
    </author>
    <title/>
    <content type="html">....</content>
  </entry>
</feed>

Unlike your desired output, there is no space before the word Lorem.


See: xmlstarlet edit

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! That works great. But I don't understand how normalize-space identifies HTML. Does it look for spaces inside the markup?
The string function strips the HTML and keeps only the content of node ns:content. You can see this if you replace --expr "normalize-space(string(.))" with --expr "string(.)" . The normalize-space function removes the line breaks in the output of the string function. This might help: en.wikipedia.org/wiki/XPath#String_functions
Thanks! That link helps. I'm just learning XML.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.