0

I've a tab-delimited file, e.g. myfile.tsv:

abc\tfoo
xyz\tbar

but sometimes, it has some blank columns, e.g.

abc\tfoo
xyz\tbar
what\t
\tthe
bleep\tsleep

i.e.

$ printf "abc\tfoo\n" > myfile.tsv
printf "xyz\tbar\n" >> myfile.tsv
printf "what\t\n" >> myfile.tsv
printf "\tthe\n" >> myfile.tsv
printf "bleep\tsleep\n" >> myfile.tsv

$ cat myfile.tsv 
abc foo
xyz bar
what    
    the
bleep   sleep

I could write a python script to remove the lines where the columns are empty, e.g.

with open('myfile.tsv') as fin:
    for line in fin:
        x, y = line.strip().split('\t')
        x = x.strip()
        y = y.strip()
        if x and y:
            print(line)
            

But how do I do the same with some unix shell commands, e.g. grep, sed, awk or something.


I've tried also something like this in grep:

grep -e ".\t." myfile.tsv 

That seems to work but if the columns have spaces, it won't.

$ printf "abc\tfoo\n" > myfile.tsv
printf "xyz\tbar\n" >> myfile.tsv
printf "what\t  \n" >> myfile.tsv
printf "  \tthe\n" >> myfile.tsv
printf "bleep\tsleep\n" >> myfile.tsv

$ grep -e ".\t." myfile.tsv       
abc foo
xyz bar
what      
    the
bleep   sleep
0

3 Answers 3

3

Using Miller (mlr):

$ cat -t myfile.tsv
abc^Ifoo
xyz^Ibar
^I
what^I
^Ithe
bleep^Isleep
$ mlr --tsv filter 'bool empty=false ; for (k,v in $*) { empty = is_empty(v); empty { break }  } !empty' myfile.tsv
abc     foo
xyz     bar
bleep   sleep

The equivalent thing in awk:

$ awk -F '\t' '{ empty = 1; for (i = 1; i <= NF; ++i) if (empty = (length($i) == 0)) break }; !empty' myfile.tsv
abc     foo
xyz     bar
bleep   sleep
1

Using sed

$ sed -E '/^\t|\t$/d' myfile.tsv
abc     foo
xyz     bar
bleep   sleep
0

To remove lines where the ALL fields on that line either contain only spaces, tabs, or are empty, you can match and exclude lines containing only whitespace:

grep -v '^[[:space:]]*$'

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.