Data:
EMAIL,NAME,KEY,LOCATION
[email protected],Joe,ABC,Denver
[email protected],Jane,EFD,Denver
...
Overall goal: Script that takes in which fields I care about and produce multiple files with all unique columns in the data. E.g.:
myScript.sh NAME LOCATION
Produces:
Joe_Denver.csv - contains all lines with "Joe" and "Denver" in the
NAME and LOCATION columns
Jane_Denver.csv - contains all lines with "Jane" and "Denver" in the NAME and LOCATION columns
What I have so far:
- Bash script that takes in some arbitrary number of fields and stores it in an array
- Finds the column index numbers of the fields and stores that in an array
I'm trying to:
- use AWK to take in the array of indexes and then spit out all the unique combinations of the fields I specified then store that in an array
- iterate through that array of field combinations, printing out a file for each combination that contains all lines in the data that has those values in those columns
My AWK command for the 1st step would look something like:
awk -F, -v colIdxs="${bashIdxs[*]}" '!seen[$colIdxs[*]]++ {print $colIdxs[*]}'
That is I'm hoping to use the indexes stored in bashIdxs as column indexes inside an awk script (where bashIdxs can be of arbitrary size).
How would this be done? In addition, if there's a better way to accomplish what I'm trying to do (I'm sure there is), I'd love to know out of curiosity as well.