I have an xml file where I need to keep the order of the tags but have a tag called media that has duplicate lines in consecutive order. I would like to delete one of the duplicate media tags but want to preserve all of the parent tags - (which are also consecutive and repeat). I'm wondering if there is an awk solution to delete only if a pattern is matched. For example:
<story>
<article>
<media>One line</media>
<media>One line</media> <-- Same line as above, want to delete this
<media>Another Line</media>
<media>Another Line</media> <-- Another duplicate, want to delete this
</article>
</story>
<story>
<article>
........ and so on
I want to keep the consecutive story and article tags and just delete duplicates for the media tag. I've tried a number of awk scripts but nothing seems to work without sorting the file and ruining the order of the xml. Any help much appreciated.
as abovenotations into your comments.