pandas recursive read_csv while adding column to each

Question

I'm recursively reading many csv's in multiple directories, and each time a read one in I want to add a column called num which is just the index of which csv it was in the list.

path = r'data/'
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)

After I have the filenames I want to read each in and then add the column, but leave it as a generator to simply concat afterwards. Is it possible to enumerate a generator?

df_from_each_file = (pd.read_csv(f) for f in all_files)
df_from_each_file = (df.insert(0,'num',i,allow_duplicates=True) for i, df in enumerate(df_from_each_file))
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)

This just returns a bunch of None df's

@Goyo even if it does the operation in-place it should still work the way it is, no? — conv3d
– conv3d, Commented Jun 4, 2019 at 19:11
In the second line you are asking for a bunch of Nones so that is what you get. — Stop harming Monica
– Stop harming Monica, Commented Jun 4, 2019 at 19:17

Chris Adams · Accepted Answer · 2019-06-04 19:20:51Z

2

Use enumerate and DataFrame.assign within the generator like:

path = r'data/'
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)

df_from_each_file = (pd.read_csv(f).assign(num=i) for i, f in enumerate(all_files))    
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)

answered Jun 4, 2019 at 19:20

Chris Adams

18.7k4 gold badges26 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pandas recursive read_csv while adding column to each

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related