Dividing single pandas dataframe into multiple csv files with predefined naming convention

Question

I am facing an issue where I have to load a huge CSV file, split the file into multiple files based on the unique values in the columns and outputting the files to a multiple Csv's with a predefined name pattern.

The example of the original CSV is as below.

date     place  type    product value   zone
09/10/16 NY     Zo      shirt   19       1
09/10/16 NY     Mo      jeans   18       2
09/10/16 CA     Zo      trouser 13       3
09/10/16 CA     Co      tie     17       4
09/10/16 WA     Wo      bat     11       1
09/10/16 FL     Zo      ball    12       2
09/10/16 NC     Mo      belt    13       3
09/10/16 WA     Zo      buckle  15       4
09/10/16 WA     Co      glass   16       1
09/10/16 FL     Zo      cup     19       2

I have to filer this massive pandas dataframe into multiple pandas dataframes based on place, type and zone and the output dataframes should be converted into multiple csv file with the naming convention place_type_product_zone.csv.

The code I have got till now is as below.

def list_of_dataframes(df, col_list):
    df_list = [df]
    name_list = []
    for _, i in enumerate(col_list):
        df_list, names = _split_dataframes(df_list, i)

file_name = zip(name_list, df)
_ = dict(zip(names, df))
for k, v in _:
    v.to_csv("{0}.csv".format(k))

Print("CSV files created")
return df, file_name


def _split_dataframes(df_list, col):
    names = []
    dfs = []
    for df in df_list:
        for c in df[col].unique():
            dfs.append(df.loc[df[col] == c])
            names.append(c)
    return dfs, names

list_of_dataframes(df,['place','type','zone']

It output csv files with the title 1.csv, 2.csv etc. How do I create a loop in the function to get the naming convention as NY_zo_shirt_1.csv, CA_Zo_trouser_3.csv etc. should I be creating a dictionary where it stores all the keys?

Thanks in advance.

Do you have to create a csv for each unique combination of product, type and place? — Vivek Kalyanarangan
– Vivek Kalyanarangan, Commented Nov 6, 2018 at 5:46
yes. I will have to create a seperate csv for every combination using the above naming convention. — Matt
– Matt, Commented Nov 6, 2018 at 5:48

Vivek Kalyanarangan · Accepted Answer · 2018-11-06 06:28:34Z

4

Here it is -

# Part 1
places = df['place'].unique()
types = df['type'].unique()
products = df['product'].unique()
zones = df['zone'].unique()

# Part 2
import itertools
combs = list(itertools.product(*[places, types, products, zones]))

#Part 3
for comb in combs:
    place, type_, prod, zone = comb
    df_subset = df[(df['place']==place) & (df['type']==type_) & (df['product']==prod) & (df['zone']==zone)]
    if df_subset.shape[0] > 0:
        df_subset.to_csv('temp1/{}_{}_{}_{}.csv'.format(place, type_, prod, zone), index=False)

Output

edited Nov 6, 2018 at 6:28

answered Nov 6, 2018 at 5:51

Vivek Kalyanarangan

9,1011 gold badge27 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Matt Over a year ago

If you run this, I see that there are a lot of additional empty files are being created. Imagine a tree being created. original df is divided into dataframes with 4 unique zones and those 4 dataframes are split on product and the resulting product dataframes are split into types. I hope I am being clear here.

Vivek Kalyanarangan Over a year ago

@Matt resolved that with the if statement checking the shape

Vivek Kalyanarangan Over a year ago

@Matt you can take care of it by taking up the filters in a nested way rather than in a combination. If this helped, you can upvote/accept as answer by click the greyed out tick mark. It helps others if they are searching for something similar

Collectives™ on Stack Overflow

Dividing single pandas dataframe into multiple csv files with predefined naming convention

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related