0

I have created a script to modify a CSV file, but it does not work correctly:

  1. The awk -F, 'NR==1| does not print the header but the second row.

  2. Even though there are only some columns specified in while read V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V14 V15 V16 it prints out all columns of the csv.

Some sample data:

COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),GINI INDEX (WORLD BANK ESTIMATE) AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED GALLUP","MOST PEOPLE CAN BE TRUSTED WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED WVS ROUND 2010-2014"
AFGHANISTAN,2008,3.72,7.16,0.45,50.79,0.71,0.17,0.88,0.51,0.25,0.61,-1.92,-1.65,1.77,0.47,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2009,4.40,7.33,0.55,51.20,0.67,0.20,0.85,0.58,0.23,0.61,-2.04,-1.63,1.72,0.39,NULL,NULL,0.44,0.28,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2010,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2011,3.83,7.41,0.52,51.91,0.49,0.17,0.73,0.61,0.26,0.30,-1.91,-1.61,1.78,0.46,NULL,NULL,0.33,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2012,3.78,7.51,0.52,52.24,0.53,0.24,0.77,0.71,0.26,0.43,-1.84,-1.40,1.79,0.47,NULL,NULL,0.34,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2013,3.57,7.52,0.48,52.56,0.57,0.07,0.82,0.62,0.27,0.48,-1.87,-1.40,1.22,0.34,NULL,NULL,0.30,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2014,3.13,7.51,0.52,52.88,0.50,0.11,0.87,0.53,0.37,0.40,-1.77,-1.31,1.39,0.44,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,
AFGHANISTAN,2015,3.98,7.50,0.52,53.20,0.38,0.08,0.88,0.55,0.33,0.26,-1.84,-1.29,2.16,0.54,NULL,NULL,0.59,NULL,NULL,NULL,NULL,NULL,NULL,

This is my script:

#!/usr/bin/env bash

INPUT=./happiness.csv.1
OLDIFS=$IFS
IFS=','
[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }


while read V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V14 V15 V16 

do
      awk -F, 'NR==1 || /^[A-Z,a-z]*,2018/' | 
      grep '^.*-.*$' 
                                                
done < $INPUT

And this is the output:

> AFGHANISTAN,2008,3.72,7.16,0.45,50.79,0.71,0.17,0.88,0.51,0.25,0.61,-1.92,-1.65,1.77,0.47,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
> AFGHANISTAN,2018,2.69,7.49,0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
> ALGERIA,2018,5.04,9.55,0.79,65.90,0.58,-0.17,0.75,0.59,0.29,NULL,NULL,NULL,1.97,0.39,NULL,0.27,0.66,NULL,NULL,NULL,NULL,0.10,NULL,0.17
> ARGENTINA,2018,5.79,9.80,0.89,68.80,0.84,-0.20,0.85,0.82,0.32,0.26,NULL,NULL,2.47,0.42,NULL,0.46,0.40,NULL,0.27,0.22,0.17,0.15,0.17,0.19
> ARMENIA,2018,5.06,9.11,0.81,66.90,0.80,-0.14,0.67,0.58,0.45,0.67,NULL,NULL,2.10,0.41,NULL,0.31,0.40,NULL,NULL,NULL,0.23,NULL,NULL,0.10
> AZERBAIJAN,2018,5.16,9.67,0.78,65.5,0.77,-0.25,0.56,0.59,0.19,0.83,NULL,NULL,1.86,0.36,NULL,0.21,0.26,NULL,NULL,NULL,0.19,NULL,NULL,0.14
> BANGLADESH,2018,4.49,8.22,0.70,64.30,0.90,-0.03,0.70,0.54,0.36,0.83,NULL,NULL,2.15,0.47,NULL,0.32,0.36,NULL,NULL,NULL,0.20,0.23,NULL,
> BELARUS,2018,5.23,9.77,0.90,66.09,0.64,-0.18,0.71,0.45,0.23,0.42,NULL,NULL,1.69,0.32,NULL,0.28,0.29,NULL,NULL,0.25,0.22,NULL,NULL,0.32
> BELGIUM,2018,6.89,10.67,0.92,72,0.80,-0.12,0.63,0.74,0.25,0.44,NULL,NULL,1.54,0.22,NULL,0.28,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
> BOLIVIA,2018,5.91,8.86,0.82,63.59,0.86,-0.08,0.78,0.74,0.38,0.39,NULL,NULL,2.20,0.37,NULL,0.52,0.45,NULL,NULL,NULL,NULL,NULL,NULL,
> BOTSWANA,2018,3.46,9.68,0.79,58.90,0.81,-0.25,0.80,0.72,0.26,0.71,NULL,NULL,2.63,0.76,NULL,0.62,0.61,NULL,NULL,NULL,NULL,NULL,NULL,
> BRAZIL,2018,6.19,9.55,0.88,66.40,0.75,-0.12,0.76,0.74,0.34,0.16,NULL,NULL,2.63,0.42,NULL,0.54,0.42,NULL,NULL,0.06,NULL,NULL,0.09,0.07
> BULGARIA,2018,5.09,9.87,0.92,66.80,0.72,-0.17,0.95,0.63,0.18,0.21,NULL,NULL,2.03,0.39,NULL,0.35,0.34,NULL,NULL,NULL,0.23,NULL,0.19,
> BURUNDI,2018,3.77,6.54,0.48,53.40,0.64,-0.01,0.59,0.66,0.36,NULL,NULL,NULL,2.81,0.74,NULL,0.36,0.68,NULL,NULL,NULL,NULL,NULL,NULL,
> CHILE,2018,6.43,10.06,0.89,69.90,0.78,-0.07,0.81,0.83,0.27,0.33,NULL,NULL,2.15,0.33,NULL,0.49,0.43,NULL,NULL,0.22,0.21,0.22,0.12,0.12
> CHINA,2018,5.13,9.69,0.78,69.30,0.89,-0.17,NULL,0.85,0.18,NULL,NULL,NULL,2.26,0.44,NULL,0.42,0.53,NULL,NULL,0.59,0.50,0.54,0.49,0.61

Can anyone see what am I doing wrong?

Could I exchange the awk statement to something like:

while read V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ....

do
      if [[ "$V2" = "2018" ]];then
7
  • 1
    I think this is the same question earlier where I said it needs a real parser because one of the field values have commas. E.g. Most people can be trusted, Gallup Commented Jan 1, 2022 at 18:17
  • @konsolebox I have solved this and cleaned the sample data, but the issue remains the same. The NR==1 prints the second row, and an incorrect amount of columns is being printed. Any other ideas? Commented Jan 1, 2022 at 19:31
  • 2
    You're mixing processing the file with a while read loop and with awk, in incompatible ways. The read reads the first line, then awk reads the rest of the file starting at line 2. Also, it looks like you're thinking that read ... V10 V14 V15 ... will skip fields 11 through 13, but it won't; it'll just assign field 11 to V14, field 12 to V15, etc. Commented Jan 1, 2022 at 19:51
  • What does grep '^.*-.*$' even do, and why not place it in awk originally. Commented Jan 1, 2022 at 20:01
  • 1
    So awk -F, 'NR==1 || /^[A-Z,a-z]*,2018/ && /-/'? Commented Jan 1, 2022 at 20:04

1 Answer 1

2

Based on the current intent of your code, this is the modification I can suggest:

#!/usr/bin/env bash

INPUT=./happiness.csv.1
[[ -f $INPUT ]] || { echo "$INPUT file not found"; exit 99; }
awk -F, 'NR == 1 || $2 == "2018" && /-/' "$INPUT"
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.