Cannot extract data from XML file using XMLStarlet in the command line (namespace restriction)

Question

I try to extract data from a xml file (which I named output.xml) on the command line (and then, if I manage to do it, put it in a script).

I've seen that the better tool to do that is XMLStarlet. However xmlstarlet sel -t -m "/entry/content" output.xml doesn't work.

Note: I tried for xmlstarlet el output.xml to check the Xpath structure of the file and it works.That means that the tool sees the elements.

I saw that there are 2 conditions to make XMLStarlet work:

1- The XML file should be well-formed. Stackoverflow related link

So I applied this command to create a well-formed file:

xmlstarlet fo -R output.xml >> good-output.xml

2- XML is very picky about the default namespace. If the document has it, declare it before selecting the elements or delete all the occurences of "xmlns" in the document. Stackoverflow related link

So I did:

$ cat good-output.xml | sed -e 's/ xmlns.*=".*"//g' >> very-good-output.xml

HOWEVER, even performing these two steps, I have another error, and don't know how to fix it... The terminal points to me the places I removed the namespaces and says "Namespace prefix app on collection is not defined". What I should do? With the namespaces it doesn't work and now it urges to put them again upon me...

Any help?

Screenshot of the original problem

Screenshot of the final problem

Okay, maybe I've understood why the terminal is blaming me. The tags where there is the error are formed like so: word:anotherword And it seems that this notation implies that we should find a xmlns attribute in that tag. But I don't know how I can delete these notations. I guess I have to use REGEX but I'm still not comfortable with it to do such action. — Héloïse Chauvel
– Héloïse Chauvel, Commented Apr 26, 2017 at 16:19
If your document defines the namespaces you can use its prefixes in the XPath expressions, see also xmlstar.sourceforge.net/doc/UG/ch05.html. — npostavs
– npostavs, Commented Apr 26, 2017 at 20:41
Thank you! I used xmlstarlet sel -t -m "//_:content" -c . good-output.xml and it gave me the corresponding tag. The only problem now is that I only want the content of the tag and not the tag itself + its content. How should I do? — Héloïse Chauvel
– Héloïse Chauvel, Commented Apr 27, 2017 at 8:10

Héloïse Chauvel · Accepted Answer · 2017-04-27 12:43:54Z

4

So this is the final solution to retrieve the content of a XML file with multiple namespaces:

xmlstarlet sel -t -m "//_:content" -c . good-output.xml

npostavs thank you for guiding me.

I believed the fact that my first attempt gave me the tag besides the desired content was a problem, but actually in my case no. If it is the case for someone else, this is how to proceed:

xmlstarlet sel -t -m "/_:entry/_:content/text()" -c . output.xml

OR

xmlstarlet sel -t -m "/_:entry/_:content" -v . output.xml

Simplified:

xmlstarlet sel -t -v "/_:entry/_:content" output.xml

edited Apr 27, 2017 at 12:43

answered Apr 27, 2017 at 12:29

Héloïse Chauvel

5423 gold badges7 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

npostavs Over a year ago

You could possibly simplify that to xmlstarlet sel -t -v "/_:entry/_:content" output.xml

Héloïse Chauvel Over a year ago

Tested, it works too, thank you :) I've updated the answer.

daparic · Accepted Answer · 2018-01-12 00:01:34Z

0

Seems that the problems like this happens if the xml uses a different namespace. In these cases, one solution to overcome namespaces issues is to tell xmlstarlet the expected namespace value of the element:

xmlstarlet sel -N x='http://different.namespace.url/XMLSchema' -t -m '//x:YourElemHere' input.xml

answered Jan 12, 2018 at 0:01

daparic

4,6222 gold badges47 silver badges46 bronze badges

Collectives™ on Stack Overflow

Cannot extract data from XML file using XMLStarlet in the command line (namespace restriction)

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related