1

I'm recursively reading many csv's in multiple directories, and each time a read one in I want to add a column called num which is just the index of which csv it was in the list.

path = r'data/'
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)

After I have the filenames I want to read each in and then add the column, but leave it as a generator to simply concat afterwards. Is it possible to enumerate a generator?

df_from_each_file = (pd.read_csv(f) for f in all_files)
df_from_each_file = (df.insert(0,'num',i,allow_duplicates=True) for i, df in enumerate(df_from_each_file))
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)

This just returns a bunch of None df's

3
  • df.insert() does not return what you think it returns. Commented Jun 4, 2019 at 19:10
  • @Goyo even if it does the operation in-place it should still work the way it is, no? Commented Jun 4, 2019 at 19:11
  • In the second line you are asking for a bunch of Nones so that is what you get. Commented Jun 4, 2019 at 19:17

1 Answer 1

2

Use enumerate and DataFrame.assign within the generator like:

path = r'data/'
all_files = glob.glob(os.path.join(path, "**/*.csv"), recursive=True)

df_from_each_file = (pd.read_csv(f).assign(num=i) for i, f in enumerate(all_files))    
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.