2

I have a huge file which I want to display with an easy filter with grep:

Let's say this is my file:

TIME0 random data
TIME1 random data
TIME2 INTERESTING LINE
TIME3 random data
TIME4 random data
TIME5 random data
TIME6 random data
TIME7 INTERESTING LINE
TIME8 random data
TIME9 random data
TIME10 random data
TIME11 INTERESTING LINE
TIME12 random data

I want to display INTERESTING LINEs:

grep "INTERESTING LINE" myfile

This works, but the file is huge and contains millions of INTERESTING LINEs. I would only need the last ones:

tac myfile | grep -m3 "INTERESTING LINE"

This works, but how can I specify that I need the INTERESTING LINEs only after a certain TIME prefix? (Or with tac until a certain TIME)

So for example with the above sample file, how can I grep all the INTERESTING LINEs from myfile from the end until TIME7 only? (so TIME2's interesting line is not needed):

TIME11 INTERESTING LINE
TIME7 INTERESTING LINE

Ordering is not important, I can live with either ASC or DESC ordering.

What is important is to not scan the whole file, i.e. to work in a line-by-line fashion from the end of the file.

I'm looking for the way to kind of giving an exit criteria to grep (instead of defining the max number of results with -m)

1 Answer 1

4

Using sed rather than grep to have more control over the parsing of the input data:

$ tac file | sed -n -e '/^TIME6 /q' -e '/INTERESTING LINE/p'
TIME11 INTERESTING LINE
TIME7 INTERESTING LINE

This reverses the file with tac as you suggest, and passes the reversed data through sed.

The two sed expressions:

  • /^TIME6 /q, quits as soon as we find a line starting with TIME6 . You could also use /^TIME[0-6] /q or any expression matching the time columns that are too old to be interesting.

  • /INTERESTING LINE/p, prints all lines that matches the given regular expression.

The effect is that the file is only read until we find timestamps that are too new. Any interesting lines found during that parsing is printed to standard output.

If you know TIME7 is the exact timestamp you want to search until:

$ tac file | sed -n -e '/INTERESTING LINE/p' -e '/^TIME7 /q'
TIME11 INTERESTING LINE
TIME7 INTERESTING LINE

This allows us to print the last read line if it's interesting, even if it's exactly the timestamp that we want to quit at.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.