How to parse csv file into multiple csv based on row spacing

Question

I'm trying to build a airflow DAG and need to split out 7 tables contained in one csv into seven separate csv's.

dataset1

header_a	header_b	header_c
One	Two	Three
One	Two	Three

                         <-Always two spaced rows between data sets

dataset N <-part of csv file giving details on data

header_d	header_e	header_f	header_g
One	Two	Three	Four
One	Two	Three	Four

out: dataset1.csv datasetn.csv

Based on my research i think my solution might lie in awk searching for the double spaces?

EDIT: In plain text as requested.

table1 details1,
table1 details2,
table1 details3,
header_a,header_b,header_c,
1,2,3
1,2,3


tableN details1,
tableN details2,
tableN details3,
header_a, header_b,header_c,header_N,
1,2,3,4
1,2,3,4

Please provide sample input as plain text, not a bunch of markdown tables. Something that can be copy and pasted into a file. — Shawn
– Shawn, Commented Feb 23, 2022 at 17:46
Apologies I thought markdown tables would be easier. I've added plain text in code block which appear to format correctly and can be pasted directly into a file for reference — Pat
– Pat, Commented Feb 23, 2022 at 18:48

pmf · Accepted Answer · 2022-02-23 18:37:02Z

2

Always two spaced rows between data sets

If your CSV file contains blank lines, and your goal is to write out each chunk of records that is separated by those blank lines into individual files, then you could use awk with its record separator RS set to nothing, which then defaults to treating each "paragraph" as a record. Each of them can then be redirected to a file whose name is based on the record number NR:

awk -vRS= '{print $0 > ("output_" NR ".csv")}' input.csv

This reads from input.csv and writes the chunks to output_1.csv, output_2.csv, output_3.csv and so forth.

If my interpretation of your input file's structure (or your problem in general) is wrong, please provide more detail to clarify.

answered Feb 23, 2022 at 18:37

pmf

38.3k3 gold badges31 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Pat Over a year ago

This this is exactly what I was hoping for, thanks for the pointer in the direction of the RS operator

Collectives™ on Stack Overflow

How to parse csv file into multiple csv based on row spacing

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related