1

I am trying concat several csv files by customer group using the below code:

files = glob.glob(file_from + "/*.csv") <<-- Path where the csv resides
df_v0 = pd.concat([pd.read_csv(f) for f in files]) <<-- Dataframe that concat all csv files from files mentioned above

The problem is the number of column in the csv varies by customer and they do not have a header file.

I am trying to see if I could add in a dummmy header column with labels such as col_1, col_2 ... depending on the number of columns in that csv.

Could anyone guide as to how could I get this done. Thanks.

Update on trying to search for a specific string in the Dataframe:

Sample Dataframe

col_1,col_2,col_3
fruit,grape,green
fruit,watermelon,red
fruit,orange,orange
fruit,apple,red

Trying to filter out rows having the word red and expect it to return rows 2 and 4.

Tried the below code:

df[~df.apply(lambda x: x.astype(str).str.contains('red')).any(axis=1)]

1 Answer 1

1

Use parameters header=None for default range columns 0, 1, 2 and skiprows=1 if necessary remove original columns names:

df_v0 = pd.concat([pd.read_csv(f, header=None, skiprows=1) for f in files])

If want also change columns names add rename:

dfs = [pd.read_csv(f, header=None, skiprows=1).rename(columns = lambda x: f'col_{x + 1}') 
        for f in files]
df_v0 = pd.concat(dfs)
Sign up to request clarification or add additional context in comments.

10 Comments

one more help. files is a list that has list of filenames stored in it. I have an issue where few filename are written in upper case (eg : FILE1.CSV) and few are in small case (eg: file2.csv).. How could we make them all small case. Could you please assist on that. Thanks..
@darkhorse - not sure if understand, files return list of filenames with upper and lower names. Then looping by them and DataFrame are created. If change filenames to lowercase then errors will raise - file not exist. But if realluy need it use pd.read_csv(f.lower(), ...
if I got that right, are filenames case - sensitive when read in pandas. For example if file name is FILE1.CSV and if I pass in file1.csv will it fail because they are case-sensitive.
got another question. I am trying to search for a specific text from the entire dataframe (df_v0). Need to scan through all rows and columns. I am able to filter by a specific column but not sure how to extend this to the entire Dataframe..
@darkhorse - You are really close, only remove ~ like df[df.apply(lambda x: x.astype(str).str.contains('red')).any(axis=1)]
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.