0

I have multiple (more than 100) files like this:

filename: 00.csv

residue, vwd, total  
AAA,0.00, -9.45  
BBB, 0.45, -1.45  
CCC, 0.44, -3    
DDD, 0.1, -10 

filename: 01.csv

residue, vwd, total  
AAA, 2, -0.56  
BBB, -4, -9.32  
CCC, 2.54, -10  
DDD, 3, -6.4  

...

I would like to create a matrix in a new csv file where the first column is "residue", and the others are based on filename (without extension). Below filename, it should be values from "total" column. It would be like this:

residue, 00, 01, ...      
AAA, -9.45, -0,56, ...  
BBB, -1.45, -9.32, ...  
CCC, -3, -10,...  
DDD,  -10, -6.4, ... 

. . .

Thanks in advance!

2 Answers 2

1

This will work:

files = ['00.csv', '01.csv']

dfs = []
for file in files:
    df = pd.read_csv(file)
    df.columns = df.columns.str.strip()
    df = df[['residue', 'total']].rename({'total': os.path.splitext(file)[0]}, axis=1)
    dfs.append(df)

df = cols[0]
for sub_df in cols[1:]:
    df = df.merge(sub_df, on='residue')

Output:

>>> df
  residue     00     01
0     AAA  -9.45  -0.56
1     BBB  -1.45  -9.32
2     CCC  -3.00 -10.00
3     DDD -10.00  -6.40
Sign up to request clarification or add additional context in comments.

Comments

0

Lets try read the files then pivot table

first get all csv files in that dir

import os

#current directory
path = "."
file_names = list(filter(lambda x:x.endswith(".csv"),os.listdir(path)))

print(file_names)
['filename: 00.csv', 'filename: 01.csv']
df = pd.DataFrame()

for file in file_names:
   df = df.append(pd.concat([pd.read_csv(file),pd.DataFrame([file]*4,
   columns= ["filename"])],axis=1))

# to extract digit from filename
df["filename"] =df["filename"].str.extract('(\d+)')

print(df)
  residue vwd  total  filename
0 AAA    0.00  -9.45  00
1 BBB    0.45  -1.45  00
2 CCC    0.44  -3.00  00
3 DDD    0.10  -10.00 00
0 AAA    2.00  -0.56  01
1 BBB   -4.00  -9.32  01
2 CCC    2.54  -10.00 01
3 DDD    3.00  -6.40  01

Then pivot_table will do the job

df.pivot_table(values='total', index='residue', columns='filename')

Final output:

filename  00   01
residue       
AAA          -9.45 -0.56
BBB          -1.45 -9.32
CCC          -3.00 -10.00
DDD          -10.00 -6.40

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.