Checking duplicate value in specific column in csv file with pipe seperated

Question

Hi I have a csv file with data more than l lakhs and seperated by pipe it look like

2|dfdf|er34Sr|afe|&*&|djhgjdsf|jhfgdhj12|dse|dsR|fcdf|erer|ddff|vcdf||||||
3||||dfrg||DFgfg||FDGRFG|FGB|FG|4546|@#$|FGFDG|DGFDFG|||FGfg||DGF |||GF |||
dhgfyukdsf|dfdf|#%||fghfhj|nvcbd,|bhd|cmnbch|vjh|jhfur||mhvjh|mnbvm||||
hjgg|||||gy|fdf|D|||fgfg|gfgf|Fgfg|FGfg|Sf||dfdfbhj|segrhb|zaefef|||
2|dfdf|er34Sr|afe|&*&|djhgjdsf|jhfgdhj12|dse|dsR|fcdf|erer|ddff|vcdf||||||
2|dfdf|er34Sr|afe|&*&|djhgjdsf|jhfgdhj12|dse|dsR|fcdf|erer|ddff|vcdf||||||
2|dfdf|er34Sr|afe|&*&|djhgjdsf|jhfgdhj12|dse|dsR|fcdf|erer|ddff|vcdf||||||
3||||dfrg||DFgfg||FDGRFG|FGB|FG|4546|@#$|FGFDG|DGFDFG|||FGfg||DGF |||GF |||
3||||dfrg||DFgfg||FDGRFG|FGB|FG|4546|@#$|FGFDG|DGFDFG|||FGfg||DGF |||GF |||
3||||dfrg||DFgfg||FDGRFG|FGB|FG|4546|@#$|FGFDG|DGFDFG|||FGfg||DGF |||GF |||

I want to check repetion of data in specific column each time i execute my script

For example i want to check if column 1,7,12,14 in all the CSV file contain same data or not if yes then display only those data which are repeting

I have tried

awk -F"|" '{
if (x[$'"$ColumnNo1"'$'"$ColumnNo2"'$'"$ColumnNo3"'$'"ColumnNo4"'])
{x_Count[$'"$ColumnNo1"'$'"$ColumnNo2"'$'"$ColumnNo3"'$'"ColumnNo4"']++;
print $0;
if(x_Count[$'"$ColumnNo1"'$'"$ColumnNo2"'$'"$ColumnNo3"'$'"ColumnNo4"']==1)
{
print x[$'"$ColumnNo1"'$'"$ColumnNo2"'$'"$ColumnNo3"'$'"ColumnNo4"']}}
x[$'"$ColumnNo1"'$'"$ColumnNo2"'$'"$ColumnNo3"'$'"ColumnNo4"']=$0}' csvfilename.csv

but i am not getting any output

$ColumnNo1,$ColumnNo2,$ColumnNo3 are shell script variable

Please help :)

Costas · Accepted Answer · 2016-11-28 11:32:50Z

1

Apart from there are few specilized tools to work with csv (e.g. csvtool)

awk -F"|" '
    {
        r = $w SUBSEP $x SUBSEP $y SUBSEP $z #prepare index from 4 fields data
    }
    R[r]{                                    #if index present in array already
        if ( R[r] != 1){                     #if it is a first repetition
            print R[r]                       #print line stored in array
            R[r] = 1                         #mark element «not a first time»
        }
        print                                #print present line
        next                                 #pass rest of code(goto next line)
    }
    {
        R[r] = $0                            #store line in array (first time only)
    }
    ' w=$ColumnNo1 x=$ColumnNo2 y=$ColumnNo3 z=$ColumnNo4 file.csv

edited Nov 28, 2016 at 11:32

answered Nov 28, 2016 at 8:38

Costas

15k24 silver badges38 bronze badges

can you please explain how it will work i.e what is SUBSEP

SinghChan
– SinghChan

2016-11-28 09:13:03 +00:00
Commented Nov 28, 2016 at 9:13
I have tried above code and its not working

SinghChan
– SinghChan

2016-11-28 09:26:13 +00:00
Commented Nov 28, 2016 at 9:26
@SinghChan I have tested it on your data — it works well. To find reason supply as with the sample data on which code is not working

Costas
– Costas

2016-11-28 11:25:03 +00:00
Commented Nov 28, 2016 at 11:25
@SinghChan LESS=+"@/^\s*SUBSEP" man awk helps you

Costas
– Costas

2016-11-28 11:41:21 +00:00
Commented Nov 28, 2016 at 11:41

Add a comment |

Valentin B. · Accepted Answer · 2016-11-28 10:32:57Z

Try this (note that your shell variables MUST be integers):

awk -v C1="$ColumnNo1" -v C2="$ColumnNo2" -v C3="$ColumnNo3" -v C4="$ColumnNo4" -F'|' '
       { a1[$C1]++; a2[$C2]++; a3[$C3]++; a4[$C4]++}
       END {
       printf "Non-unique entries in column %d\n", C1 
       for (key in a1) {              
         if (a1[key] > 1) print key
       }
       printf "Non-unique entries in column %d\n", C2
       for (key in a2) {              
         if (a2[key] > 1) print key
       }
       printf "Non-unique entries in column %d\n", C3
       for (key in a3) {               
         if (a3[key] > 1) print key
       }
       printf "Non-unique entries in column %d\n", C4
       for (key in a4) {               
         if (a4[key] > 1) print key
       }}' <myfile.csv

Stack Exchange Network

Checking duplicate value in specific column in csv file with pipe seperated

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Checking duplicate value in specific column in csv file with pipe seperated

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions