I am facing an issue where I have to load a huge CSV file, split the file into multiple files based on the unique values in the columns and outputting the files to a multiple Csv's with a predefined name pattern.
The example of the original CSV is as below.
date place type product value zone
09/10/16 NY Zo shirt 19 1
09/10/16 NY Mo jeans 18 2
09/10/16 CA Zo trouser 13 3
09/10/16 CA Co tie 17 4
09/10/16 WA Wo bat 11 1
09/10/16 FL Zo ball 12 2
09/10/16 NC Mo belt 13 3
09/10/16 WA Zo buckle 15 4
09/10/16 WA Co glass 16 1
09/10/16 FL Zo cup 19 2
I have to filer this massive pandas dataframe into multiple pandas dataframes based on place, type and zone and the output dataframes should be converted into multiple csv file with the naming convention place_type_product_zone.csv.
The code I have got till now is as below.
def list_of_dataframes(df, col_list):
df_list = [df]
name_list = []
for _, i in enumerate(col_list):
df_list, names = _split_dataframes(df_list, i)
file_name = zip(name_list, df)
_ = dict(zip(names, df))
for k, v in _:
v.to_csv("{0}.csv".format(k))
Print("CSV files created")
return df, file_name
def _split_dataframes(df_list, col):
names = []
dfs = []
for df in df_list:
for c in df[col].unique():
dfs.append(df.loc[df[col] == c])
names.append(c)
return dfs, names
list_of_dataframes(df,['place','type','zone']
It output csv files with the title 1.csv, 2.csv etc. How do I create a loop in the function to get the naming convention as NY_zo_shirt_1.csv, CA_Zo_trouser_3.csv etc. should I be creating a dictionary where it stores all the keys?
Thanks in advance.
