11

I'm searching for xml files that have certain properties. For example, files that contain the following pattern:

<param-value>
  <name>Hosts</name>
  <description>some description</description>
  <value></value>
</param-value>

For such files, I'd like to parse the value of another tag, such as:

<param-value>
  <name>Roles</name>
  <description>some description</description>
  <value>asdf</value>
</param-value>

And print out the file name along with "asdf". What's the simplest way to accomplish this from the command line?

One approach I was thinking of was just using grep with the -l option to filter the matching files out, and then using xargs grep to extract the value of Roles. However, grep doesn't work well with multi-line regexes. I saw another question that showed it could be done with the -Pzo options, but didn't have any luck getting it to work in my case. Is there a simpler approach?

9
  • Is there any particular reason you don't want to use a scripting language such as perl? Commented Feb 8, 2012 at 20:07
  • The simplest for me is to use Saxon from the command line. Here's an example of using XPath on the command line. This, combined with a shell script, would do exactly what you're asking. Commented Feb 8, 2012 at 20:10
  • According to the answer to this question, XMLStarlet seems to be very good for this kind of thing. Commented Feb 8, 2012 at 20:12
  • No, a perl solution would be great, preferably a compact one-liner, but I don't know the best way to go about writing it. Commented Feb 8, 2012 at 20:28
  • 2
    Possible duplicate of How to parse XML in Bash? Commented Oct 7, 2015 at 10:56

4 Answers 4

13

The following linux command uses XPath to access specified values within the XML file

for xml in `find . -name "*.xml"`
do  
echo $xml `xmllint --xpath "/param-value/value/text()" $xml`| awk 'NF>1'
done

Example output for matching XML files:

./test1.xml asdf
./test4.xml 1234
Sign up to request clarification or add additional context in comments.

1 Comment

Didn't knew xmllint could be used to parse xml. To me this is the best answer because it's always installed as it's a system dependency (at least on CentOS/Redhat/...)
1

I worked out a couple of solutions using basic perl/awk functionality (basically a poor man's parsing of the tags). If you see any improvements using only basic perl/awk functionality, let me know. I avoided dealing with multiline regular expressions by setting a flag with I see a particular tag. Kind of clumsy but it works.

perl:

perl -ne '$h = 1 if m/Host/; $r = 1 if m/Role/; if ($h && m/<value>/) { $h = 0; print "hosts: ", $_ =~ /<value>(.*)</, "\n"}; if ($r && m/<value>/) { $r = 0; print "\nrole: ", $_ =~ /<value>(.*)</, "\n" }'

awk:

awk '/Host/ {h = 1} /Role/ {r = 1} h && /<value>/ {h = 0; match($0, "<value>(.*)<", a); print "hosts: " a[1]} r && /<value>/ {r = 0; match($0, "<value>(.*)<", a); print "\nrole: " a[1]}'

Comments

1
$ xmlstarlet ed -u /param-value/name -v Roles -u /param-value/value -v asdf data.xml

<?xml version="1.0"?>
<param-value>
  <name>Roles</name>
  <description>some description</description>
  <value>asdf</value>
</param-value>

Comments

0

I usually use Perl's XML::XSH2. You can process XML files interactively in it, or script it. The script would be something like (untested):

for my $file in { glob "*.xml" } {
    open $file ;
    my $param_value = //param-value[name="Hosts"] ;
    if $param_value echo $file $value/value ;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.