0

I have a csv that looks like this:

'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'
'-'  's'  '8'
'-'  's'  '10'
'-'  'q'  '3'

I would like to split the data frame every time the number on the last column decreases and save into new file The output would look like this:
File1:

'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'

File 2

'-'  's'  '8'
'-'  's'  '10'

File 3

'-'  'q'  '3'
1
  • 1
    How have you read the csv into a DataFrame? You haven't provided any of your code. Commented Jul 9, 2021 at 13:50

3 Answers 3

1

I am going to assume that your CSV files look like typical CSV files for simplicity:

-,d,5
-,d,9
...

I am also assuming the numbers in the last column are always positive integers.

prev = 0
accumulatedLines = []
decreasedCount = 0
with open("my_file.txt", "r") as fin:
    for line in fin:
        values = line.split(",")
        if int(values[2]) < prev:
            with open("File{}.txt".format(decreasedCount + 1), "w") as fout:
                fout.writelines(accumulatedLines)
            decreasedCount += 1
            accumulatedLines = []

        accumulatedLines.append(line)
        prev = int(values[2])

Essentially, we iterate over each line in the input file, splitting it on the comma delimiter, and keep track of the value of the last column on the previous line. We also accumulate the lines read up to the current line. If the current line's value in the last column is strictly lesser than that of the previous line, we write the accumulated lines to a new file (named after the number of times we have encountered a decreasing value so far). We then clear the accumulator (and increase the count).

Sign up to request clarification or add additional context in comments.

Comments

0

Using Series.shift you can compare rows that are next to one another.

from io import StringIO
import pandas as pd

s = """'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'
'-'  's'  '8'
'-'  's'  '10'
'-'  'q'  '3'""".replace("'", "")
df = pd.read_csv(StringIO(s), sep="\s\s+", engine="python", names=["a", "b", "c"])

# preserve the original column names
cols = df.columns
# create a grouping column
df["d"] = (df["c"].shift(fill_value=0) > df["c"]).cumsum()

# output each group
for name, group in df.groupby("d"):
    group[cols].to_csv(f"output_{name}.csv", index=False)

Which produces these groups:

   a  b   c
0  -  d   5
1  -  d   9
2  -  v  15
-----------
   a  b   c
3  -  s   8
4  -  s  10
-----------
   a  b  c
5  -  q  3
-----------

Comments

0
  1. Read the CSV:
df = pd.read_csv(
    "your_file.txt",
    quotechar="'",
    quoting=1,
    sep=r"\s+",
    header=None,
)
  1. Print the groups:
for _, g in df.groupby(df[2].diff().le(0).cumsum()):
    print(g.to_csv(index=None, quotechar="'", quoting=1, sep=" ", header=None))

Note: To write to files, add filename to .to_csv(). Without the filename, it writes groups to screen.


Prints:

'-' 'd' '5'
'-' 'd' '9'
'-' 'v' '15'

'-' 's' '8'
'-' 's' '10'

'-' 'q' '3'

2 Comments

This does not write to new files.
@zr0gravity7 Yes, just add filename to .to_csv. Without filename it writes to the sceen.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.