1

i'm writing a small script that parse an rss using xmllint.

Now i fetch the titles list with the following command:

ITEMS=`echo "cat //title" | xmllint --shell rss.xml `
echo $ITEMS > tmpfile

But it returns:

<title>xxx</title> ------- <title>yyy :)</title> ------- <title>zzzzzz</title>

without newlines, or space. Now i'm interested only in the text content of title tags, and if possible i want to navigate through the titles using a for/while loop, something like:

for  val in $ITEMS 
do
       echo $val
done

How it can be done? Thanks in advance

5
  • 1
    Don't be a masochist, use a script language like python, ruby, any other language in the world, perl (in that order of preference :P) Commented May 11, 2012 at 13:09
  • 2
    @KurzedMetal You can do plenty of parsing and splitting and iterating in bash. Commented May 11, 2012 at 13:26
  • You will find that quoting your variables will help a lot: for val in "$ITEMS"; do echo "$val"; done Commented May 12, 2012 at 2:06
  • Thanks, it could help, but if i try that for, "$val" contains the whole string and the cycle run only one time, but it print $val with correct newlines. I need to read $ITEMS line by line, how i can do it? Commented May 12, 2012 at 11:04
  • Is something wrong with xmllint --xpath '//title/text()' rss.xml? Commented Jun 28, 2020 at 11:09

3 Answers 3

5

I had the same type of requirement at some point to parse xml in bash. I ended up using xmlstarlet http://xmlstar.sourceforge.net/ which you might be able to install.

If not, something like that will remove the surounding tags:

echo "cat  //title/text()" | xmllint --shell  rss.xml

Then you will need to cleanup the output after piping it, a basic solution would be:

echo "cat  //title/text()" | xmllint --shell  rss.xml  | egrep '^\w'

Hope this helps

Sign up to request clarification or add additional context in comments.

Comments

2

To answer your first question, The unquoted use of $ITEMS with echo is eliminating your new-line chars. Try

ITEMS=`echo "cat //title" | xmllint --shell rss.xml `
echo "$ITEMS" > tmpfile
#----^------^--- dbl-quotes only

In general, using for loops is best left to items that won't generate unexpected spaces or other non-printable characters. (non-alphanumerics), like for i in {1..10} ; do echo $i; done

AND you don't really need the variables, or the tempfile, try

  echo "cat //title" | xmllint --shell rss.xml |
  while read line ; do
      echo "$line"
  done

Depending on what is in your rrs feed, you may also benefit from changing the default IFS (Internal Field Separator) that is used by the read cmd, try

while IFS= read line ....
# or 
while IFS="\n" read line
# or
while IFS="\r\n" read line

I'm not sure what you're trying to achieve with echo "cat //title" | going into xmllint, so I'm leaving it as is. Is that an instruction to xmllint? or is it passed thru to create a header to the document? (Don't have xmllint to expermient with right now).

Also, you might want to look at reading rss feeds with awk, but it is rather low level.

I hope this helps.

2 Comments

yes is an instruction to xmllint, please check that i updated the question, because i noticed some charcater missing in the example that i provided. Thanks :D
I don't see anything different between your new posting and what I hav e used as your main command. Was the change in the cmd or in your sample current output? also I am adding an edit to my answer, check back in a minute. Good luck.
1

In addition to Philippe's answer, if you want to get the xml output directly from a command like cURL, you can use another file descriptor to pipe it.

Indeed, STDIN is already taken by xmllintt shell inputs. Below a working example (just remember to replace the URL argument with yours).

# Create a temporary file and use it as third fd
exec 3<> $(tempfile) &&
# cURL the RSS URL and redirect STDOUT to the 3rd fd
curl https://your-url/to/some/rss.xml >&3 &&
# Then read  fd 3 with xmllint
xmllint --format --shell /dev/fd/3 <<< 'cat //title/text()' | egrep '^\w' &&
# Close the temporary file (remember global warming issues)
exec 3>&-

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.