How to remove lines from TSV file where columns are empty or all whitespace?

Question

I've a tab-delimited file, e.g. myfile.tsv:

abc\tfoo
xyz\tbar

but sometimes, it has some blank columns, e.g.

abc\tfoo
xyz\tbar
what\t
\tthe
bleep\tsleep

i.e.

$ printf "abc\tfoo\n" > myfile.tsv
printf "xyz\tbar\n" >> myfile.tsv
printf "what\t\n" >> myfile.tsv
printf "\tthe\n" >> myfile.tsv
printf "bleep\tsleep\n" >> myfile.tsv

$ cat myfile.tsv 
abc foo
xyz bar
what    
    the
bleep   sleep

I could write a python script to remove the lines where the columns are empty, e.g.

with open('myfile.tsv') as fin:
    for line in fin:
        x, y = line.strip().split('\t')
        x = x.strip()
        y = y.strip()
        if x and y:
            print(line)

But how do I do the same with some unix shell commands, e.g. grep, sed, awk or something.

I've tried also something like this in grep:

grep -e ".\t." myfile.tsv

That seems to work but if the columns have spaces, it won't.

$ printf "abc\tfoo\n" > myfile.tsv
printf "xyz\tbar\n" >> myfile.tsv
printf "what\t  \n" >> myfile.tsv
printf "  \tthe\n" >> myfile.tsv
printf "bleep\tsleep\n" >> myfile.tsv

$ grep -e ".\t." myfile.tsv       
abc foo
xyz bar
what      
    the
bleep   sleep

Kusalananda · Accepted Answer · 2022-10-27 20:02:42Z

3

Using Miller (mlr):

$ cat -t myfile.tsv
abc^Ifoo
xyz^Ibar
^I
what^I
^Ithe
bleep^Isleep

$ mlr --tsv filter 'bool empty=false ; for (k,v in $*) { empty = is_empty(v); empty { break }  } !empty' myfile.tsv
abc     foo
xyz     bar
bleep   sleep

The equivalent thing in awk:

$ awk -F '\t' '{ empty = 1; for (i = 1; i <= NF; ++i) if (empty = (length($i) == 0)) break }; !empty' myfile.tsv
abc     foo
xyz     bar
bleep   sleep

answered Oct 27, 2022 at 20:02

Kusalananda♦

356k42 gold badges737 silver badges1.1k bronze badges

Add a comment |

sseLtaH · Accepted Answer · 2022-10-27 18:59:46Z

1

Using sed

$ sed -E '/^\t|\t$/d' myfile.tsv
abc     foo
xyz     bar
bleep   sleep

answered Oct 27, 2022 at 18:59

sseLtaH

2,9161 gold badge8 silver badges20 bronze badges

Add a comment |

Chris Davies · Accepted Answer · 2022-10-27 18:38:08Z

0

To remove lines where the ALL fields on that line either contain only spaces, tabs, or are empty, you can match and exclude lines containing only whitespace:

grep -v '^[[:space:]]*$'

answered Oct 27, 2022 at 18:38

Chris Davies

128k16 gold badges179 silver badges324 bronze badges

Add a comment |

Stack Exchange Network

How to remove lines from TSV file where columns are empty or all whitespace?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

How to remove lines from TSV file where columns are empty or all whitespace?

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions