Python - Panda Split a csv into multiple files at a condition

Question

I have a csv that looks like this:

'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'
'-'  's'  '8'
'-'  's'  '10'
'-'  'q'  '3'

I would like to split the data frame every time the number on the last column decreases and save into new file The output would look like this:
File1:

'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'

File 2

'-'  's'  '8'
'-'  's'  '10'

File 3

'-'  'q'  '3'

How have you read the csv into a DataFrame? You haven't provided any of your code. — not_speshal
– not_speshal, Commented Jul 9, 2021 at 13:50

zr0gravity7 · Accepted Answer · 2021-07-09 13:52:58Z

I am going to assume that your CSV files look like typical CSV files for simplicity:

-,d,5
-,d,9
...

I am also assuming the numbers in the last column are always positive integers.

prev = 0
accumulatedLines = []
decreasedCount = 0
with open("my_file.txt", "r") as fin:
    for line in fin:
        values = line.split(",")
        if int(values[2]) < prev:
            with open("File{}.txt".format(decreasedCount + 1), "w") as fout:
                fout.writelines(accumulatedLines)
            decreasedCount += 1
            accumulatedLines = []

        accumulatedLines.append(line)
        prev = int(values[2])

Essentially, we iterate over each line in the input file, splitting it on the comma delimiter, and keep track of the value of the last column on the previous line. We also accumulate the lines read up to the current line. If the current line's value in the last column is strictly lesser than that of the previous line, we write the accumulated lines to a new file (named after the number of times we have encountered a decreasing value so far). We then clear the accumulator (and increase the count).

Alex · Accepted Answer · 2021-07-09 13:53:45Z

Using Series.shift you can compare rows that are next to one another.

from io import StringIO
import pandas as pd

s = """'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'
'-'  's'  '8'
'-'  's'  '10'
'-'  'q'  '3'""".replace("'", "")
df = pd.read_csv(StringIO(s), sep="\s\s+", engine="python", names=["a", "b", "c"])

# preserve the original column names
cols = df.columns
# create a grouping column
df["d"] = (df["c"].shift(fill_value=0) > df["c"]).cumsum()

# output each group
for name, group in df.groupby("d"):
    group[cols].to_csv(f"output_{name}.csv", index=False)

Which produces these groups:

   a  b   c
0  -  d   5
1  -  d   9
2  -  v  15
-----------
   a  b   c
3  -  s   8
4  -  s  10
-----------
   a  b  c
5  -  q  3
-----------

Andrej Kesely · Accepted Answer · 2021-07-09 13:59:39Z

0

Read the CSV:

df = pd.read_csv(
    "your_file.txt",
    quotechar="'",
    quoting=1,
    sep=r"\s+",
    header=None,
)

Print the groups:

for _, g in df.groupby(df[2].diff().le(0).cumsum()):
    print(g.to_csv(index=None, quotechar="'", quoting=1, sep=" ", header=None))

Note: To write to files, add filename to .to_csv(). Without the filename, it writes groups to screen.

Prints:

'-' 'd' '5'
'-' 'd' '9'
'-' 'v' '15'

'-' 's' '8'
'-' 's' '10'

'-' 'q' '3'

edited Jul 9, 2021 at 13:59

answered Jul 9, 2021 at 13:55

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

2 Comments

zr0gravity7 Over a year ago

This does not write to new files.

Andrej Kesely Over a year ago

@zr0gravity7 Yes, just add filename to .to_csv. Without filename it writes to the sceen.

Collectives™ on Stack Overflow

Python - Panda Split a csv into multiple files at a condition

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related