create variables from CSV with varying number of fields

Question

Looking for some help turning a CSV into variables. I tried using IFS, but seems you need to define the number of fields. I need something that can handle varying number of fields.

*I am modifying my original question with the current code I'm using (taken from the answer provided by hschou) which includes updated variable names using type instead of row, section etc.

I'm sure you can tell by my code, but I am pretty green with scripting, so I am looking for help to determine if and how I should add another loop or take a different approach to parsing the typeC data because although they follow the same format, there is only one entry for each of the typeA and typeB data, and there can be between 1-15 entries for the typeC data. The goal being only 3 files, one for each of the data types.

Data format:

Container: PL[1-100]    
TypeA: [1-20].[1-100].[1-1000].[1-100]-[1-100]                      
TypeB: [1-20].[1-100].[1-1000].[1-100]-[1-100]                          
TypeC (1 to 15 entries):  [1-20].[1-100].[1-1000].[1-100]-[1-100]

*There is no header in the CSV, but if there were it would look like this (Container, typeA, and typeB data always being in position 1,2,3, and typeC data being all that follow): Container,typeA,typeB,typeC,tycpeC,typeC,typeC,typeC,..

CSV:

PL3,12.1.4.5-77,13.6.4.5-20,17.3.577.9-29,17.3.779.12-33,17.3.802.12-60,17.3.917.12-45,17.3.956.12-63,17.3.993.12-42
PL4,12.1.4.5-78,13.6.4.5-21,17.3.577.9-30,17.3.779.12-34
PL5,12.1.4.5-79,13.6.4.5-22,17.3.577.9-31,17.3.779.12-35,17.3.802.12-62,17.3.917.12-47
PL6,12.1.4.5-80,13.6.4.5-23,17.3.577.9-32,17.3.779.12-36,17.3.802.12-63,17.3.917.12-48,17.3.956.12-66
PL7,12.1.4.5-81,13.6.4.5-24,17.3.577.9-33,17.3.779.12-37,17.3.802.12-64,17.3.917.12-49,17.3.956.12-67,17.3.993.12-46
PL8,12.1.4.5-82,13.6.4.5-25,17.3.577.9-34

Code:

#!/bin/bash
#Set input file
_input="input.csv"
#  Pull variables in from csv
# read file using while loop
while read; do
    declare -a COL=( ${REPLY//,/ } )
    echo -e "containerID=${COL[0]}\ntypeA=${COL[1]}\ntypeB=${COL[2]}" >/tmp/typelist.txt
    idx=1
    while [ $idx -lt 10 ]; do
        echo "typeC$idx=${COL[$((idx+2))]}" >>/tmp/typelist.txt
        let idx=idx+1
#whack off empty variables
sed '/\=$/d' /tmp/typelist.txt > /tmp/typelist2.txt && mv /tmp/typelist2.txt /tmp/typelist.txt
#set variables from temp file
. /tmp/typelist.txt
done
sleep 1

#Parse data in this loop.#
echo -e "\n"
echo "Begin Processing for $container"
#echo $typeA
#echo $typeB
#echo $typeC
#echo -e "\n"

#Strip - from sub data for extra parsing  
typeAsub="$(echo "$typeA" | sed 's/\-.*$//')"
typeBsub="$(echo "$typeB" | sed 's/\-.*$//')"
typeCsub1="$(echo "$typeC1" | sed 's/\-.*$//')"

#strip out first two decimils for extra parsing
typeAprefix="$(echo "$typeA" | cut -d "." -f1-2)"
typeBprefix="$(echo "$typeB" | cut -d "." -f1-2)"
typeCprefix1="$(echo "$typeC1" | cut -d "." -f1-2)"

#echo $typeAsub
#echo $typeBsub
#echo $typeCsub1
#echo -e "\n"

#echo $typeAprefix
#echo $typeBprefix
#echo $typeCprefix1
#echo -e "\n"

echo "Getting typeA dataset for $typeA"
#call api script to pull data ; echo out for test
echo "API-gather -option -b "$typeAsub" -g all > "$container"typeA-dataset"
sleep 1  


echo "Getting typeB dataset for $typeB"
#call api script to pull data ; echo out for test
echo "API-gather -option -b "$typeBsub" -g all > "$container"typeB-dataset"
sleep 1  

echo "Getting typeC dataset for $typeC1"
#call api script to pull data ; echo out for test
echo "API-gather -option -b "$typeCsub" -g all > "$container"typeC-dataset"
sleep 1  

echo "Getting additional typeC datasets for $typeC2-15"
#call api script to pull data ; echo out for test
echo "API-gather -option -b "$typeCsub2-15" -g all >> "$container"typeC-dataset"
sleep 1  

echo -e "\n"
done < "$_input"

exit 0

Speed isnt a concern, but if I've done anything really stupid up there, feel free to slap me in the right direction. :)

What do you expect the variables to contain if there is no content in the field? They are empty, as supposed. Or other: What do you plan to do with those variables? Maybe if we know what your goal is, we can show you what to do. — Philippos
– Philippos, Commented Jul 12, 2017 at 5:48
If there is no content in the field, I don't want it in the output. So in the example output above, section2 should be the last variable in the output, instead of the empty ones being printed. — Jdubyas
– Jdubyas, Commented Jul 12, 2017 at 6:32
I'd use perl, rather than bash for this. perl has the split function (perldoc split), and handles variable fields easily. — waltinator
– waltinator, Commented Jul 18, 2021 at 18:01

hschou · Accepted Answer · 2017-07-12 06:31:01Z

0

In this script the line is just read into the default variable $REPLY. Then replace comma with space ${REPLY//,/ } and put into an array declare -a COL=(). The section part is then handled with a loop where the column index is calculated with $((idx+2)):

#! /bin/bash
while read; do
    declare -a COL=( ${REPLY//,/ } )
    echo -e "container=${COL[0]}\nrow=${COL[1]}\nshelf=${COL[2]}"
    idx=1
    while [ $idx -lt 10 ]; do
        echo "section$idx=${COL[$((idx+2))]}"
        let idx=idx+1
    done
done

answered Jul 12, 2017 at 6:31

hschou

2,97614 silver badges16 bronze badges

I modified this code using sed to strip out the empties which gets me where I need to be for now! Thank you Hschou!

Jdubyas
– Jdubyas

2017-07-12 08:26:30 +00:00
Commented Jul 12, 2017 at 8:26
Leaving $REPLY unquoted as you do also exposes you to filename expansion

glenn jackman
– glenn jackman

2017-07-12 13:17:50 +00:00
Commented Jul 12, 2017 at 13:17
@Jdubyas so you used echo "$REPLY" | sed -e 's/,//g' instead of ${REPLY//,/ }? sed is slower so I avoid to use it when possible.

hschou
– hschou

2017-07-12 21:11:24 +00:00
Commented Jul 12, 2017 at 21:11
@glennjackman I don't think it would work if "${REPLY//,/ }" has quotes. Then it would be an array with one element.

hschou
– hschou

2017-07-12 21:12:52 +00:00
Commented Jul 12, 2017 at 21:12
@glennjackman - I'm wouldnt even consider myself at a novice level for scripting, pardon the rudimentary question, but what does it mean if I'm exposed to filename expansion?

Jdubyas
– Jdubyas

2017-07-15 02:18:08 +00:00
Commented Jul 15, 2017 at 2:18

| Show 3 more comments

the_velour_fog · Accepted Answer · 2017-07-12 06:46:50Z

I would use one associative array per csv record: assuming your data was in a file called input.csv

#!/usr/bin/env bash

counter=1          # provides index for each csv record
while read 
do
    IFS=',' a=( $REPLY )               # numeric array containing current row
    eval "declare -A row$counter"      # declare an assoc. array representing
                                       # this row   

    eval "row$counter+=( ['row']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['shelf']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section1']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section2']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section3']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section4']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section5']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section6']=${a[0]} )"
    a=( "${a[@]:1}" )

    declare -p row$counter

    (( counter = counter + 1 ))
done < <( cat input.csv )

# access arbitrary element
printf "\n---------\n%s\n" ${row3["section4"]}

this gives me an output like:

declare -A row1='([section6]="6" [section5]="5" [section4]="4" [section3]="4" [section2]="2" [section1]="1" [shelf]="12" [row]="PL3" )'
declare -A row2='([section6]="" [section5]="" [section4]="" [section3]="2" [section2]="1" [section1]="4" [shelf]="13" [row]="PL4" )'
declare -A row3='([section6]="" [section5]="" [section4]="3" [section3]="2" [section2]="1" [section1]="5" [shelf]="14" [row]="PL5" )'
declare -A row4='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="6" [shelf]="15" [row]="PL6" )'
declare -A row5='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="7" [shelf]="16" [row]="PL7" )'
declare -A row6='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="8" [shelf]="15" [row]="PL8" )'
declare -A row7='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="7" [shelf]="16" [row]="PL9" )'

---------
3

I would use declare instead of eval here: declare "row$counter['row']=${a[0]}" etc — glenn jackman
– glenn jackman, Commented Jul 12, 2017 at 13:23

glenn jackman · Accepted Answer · 2017-07-13 10:24:53Z

0

I would start with this:

while IFS=, read -ra fields; do
    for (( i = ${#fields[@]} - 1; i >= 0; i-- )); do
        [[ -z "${fields[i]}" ]] && unset fields[i] || break
    done
    declare -p fields
done < file

declare -a fields='([0]="PL3" [1]="12" [2]="3" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6")'
declare -a fields='([0]="PL4" [1]="13" [2]="4" [3]="1" [4]="2")'
declare -a fields='([0]="PL5" [1]="14" [2]="5" [3]="1" [4]="2" [5]="3")'
declare -a fields='([0]="PL6" [1]="15" [2]="6" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8")'
declare -a fields='([0]="PL7" [1]="16" [2]="7" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8" [11]="9")'
declare -a fields='([0]="PL8" [1]="15" [2]="8" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8")'
declare -a fields='([0]="PL9" [1]="16" [2]="7" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8" [11]="9")'

Make sure you don't have any trailing whitespace in your file.

I question your need to have numerically incrementing variable names. It sounds like you need two-dimensional arrays which is a data structure that bash does not have. Are you sure bash is the right tool for the job?

answered Jul 13, 2017 at 10:24

glenn jackman

88.5k16 gold badges124 silver badges179 bronze badges

"Are you sure bash is the right tool for the job?" True. It looks more like more like a Perl or Python job. The job looks like some core business production and I would not recommend bash for that.

hschou
– hschou

2017-07-15 12:32:16 +00:00
Commented Jul 15, 2017 at 12:32
Sure... If I knew Perl or Python. I can barely limp through getting a bash script together though. Its not going to be put into production, its only being used to prep some data for a migration. Once the files are in the correct format an actual api is used.

Jdubyas
– Jdubyas

2017-07-15 20:43:52 +00:00
Commented Jul 15, 2017 at 20:43
bash is actually a pretty tricky language: the syntax is quite unforgiving. You really should check out a python tutorial.

glenn jackman
– glenn jackman

2017-07-16 01:31:03 +00:00
Commented Jul 16, 2017 at 1:31

Add a comment |

Kusalananda · Accepted Answer · 2023-03-30 17:53:28Z

We can relatively easily transform your header-less CSV data into a structured JSON file, assuming the data is in "simple" CSV format (meaning there is no special CSV quoting of fields needed). The following creates a set of independent JSON objects, one for each line of input from your CSV file:

$ jq -R 'split(",") | {container:.[0], typeA:.[1], typeB:.[2], typeC:.[3:]}' file file.csv
{
  "container": "PL3",
  "typeA": "12.1.4.5-77",
  "typeB": "13.6.4.5-20",
  "typeC": [
    "17.3.577.9-29",
    "17.3.779.12-33",
    "17.3.802.12-60",
    "17.3.917.12-45",
    "17.3.956.12-63",
    "17.3.993.12-42"
  ]
}
{
  "container": "PL4",
  "typeA": "12.1.4.5-78",
  "typeB": "13.6.4.5-21",
  "typeC": [
    "17.3.577.9-30",
    "17.3.779.12-34"
  ]
}
[...] # output truncated for brevity

Assuming this JSON data is stored in file.json, we may then query it in different ways:

$ jq -r --arg container PL7 --arg type typeA 'select(.container==$container)[$type]' file.json
12.1.4.5-81

$ jq -r --arg container PL8 --arg type typeB 'select(.container==$container)[$type]' file.json
13.6.4.5-25

$ jq -r --arg container PL6 --arg type typeC 'select(.container==$container)[$type][]' file.json
17.3.577.9-32
17.3.779.12-36
17.3.802.12-63
17.3.917.12-48
17.3.956.12-66

(Note that I added [] at the end of the expression above to expand the array into separate elements.)

$ jq -r --arg container PL3 --arg type typeC --arg sub 60 'select(.container==$container)[$type][] | select(endswith("-"+$sub))' file.json
17.3.802.12-60

Stack Exchange Network

create variables from CSV with varying number of fields

4 Answers 4

You must log in to answer this question.

Hot Network Questions

create variables from CSV with varying number of fields

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions