1

I have a XML file that looks something like this:

<Header version= '1.0' timestamp='2017-01-04T07:10:07'>
   <Date>2017-04-18</Date>
   .
   .
   .`
</Header>
<Header version= '1.0' timestamp='2017-01-04T07:10:07'>
   <Date>2017-04-18</Date>
   .
   .
   .`
</Header>
<Header version= '1.0' timestamp='2017-01-04T07:10:07'>
   <Date>2017-04-18</Date>
   .
   .
   .`
</Header>

I would like to delete the "Header" (and not /Header) lines starting with the 2nd occurrence - don't ask why :-). So the output should look something like this (yes, I know that it is not well formed, but I am going to perform other processing on it as well):

<Header version= '1.0' timestamp='2017-01-04T07:10:07'>
   <Date>2017-04-18</Date>
   .
   .
   .`
</Header>
<Date>2017-04-18</Date>
   .
   .
   .`
</Header>
<Date>2017-04-18</Date>
   .
   .
   .`
</Header>

I tried:

sed -i '2,${/<Header/d;}' file

but that deleted all the occurrences of Header. Any suggestions?

Thanks

4
  • What is the XY problem? Commented May 4, 2017 at 5:49
  • @Cyrus Could be a XY problem, true. But it is not so hard to do what OP wants, as to make understanding what the Y is important. However, OP, the recommendation by Cyrus (thinking of alternate ways to achieve the ultimate goal) is valid. You might spend some thought that way. Commented May 4, 2017 at 5:55
  • it makes no sense - after such deletion your xml will become invalid Commented May 4, 2017 at 6:11
  • 1
    Use an XML parser (xmlstarlet, e.g.) for your XY problem. Commented May 4, 2017 at 6:19

2 Answers 2

2

This might work for you (GNU sed):

sed '/^<\/Header/,${/^<Header/d}' file

From the first closing Header tag to the end of the file, remove any lines beginning with a Header tag.

Sign up to request clarification or add additional context in comments.

Comments

0
sed  "/<Header/{p;:a;s/^.*$//;N;s/\n//;/<Header/!p;ba}" input.txt
  • find the first occurence
  • print it
  • start a loop
    • forget the current line
    • get the next
    • get rid of the unwanted newline
    • print it if it is not a match
  • loop

This assumes that your header lines are always a single line. Otherwise it gets tough. In that case, think about whether this might be a XY problem (see comment by Cyrus). I also assume that removing the indentation of the date lines is not actually wanted.

3 Comments

I tried the above, but am getting an error: sed: -e expression #1, char 19: unknown command: `z'
I just used this command from another post and it worked: awk '!/^<Header/ || !f++' file
@user3152289 I avoided the z, which seems not to be supported by your sed. Maybe try my changed answer, just to test whether it works on your sed, too. Use whatever helps you. Consider making your own answer, having three different styles would be cool.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.