I prefer to open each file and save its content in a pandas.DataFrame. The clear advantage respect to numpy.array is that it is easier to perform later statistics. Check this code:
import pandas as pd
import os
pathtodir = r'data' # write the name of the subfolder where your file are stored
df = pd.DataFrame()
file_count = 0
for file in os.listdir(pathtodir):
with open(os.path.join(pathtodir, file), 'r') as of:
lines = of.readlines()
for line in lines:
header, value = line.split(':')
value = float(value.replace(' ','').replace('\n', ''))
if header not in df.columns:
df[header] = ''
df.at[file_count, header] = value
file_count += 1
for column in df.columns:
df[column] = df[column].astype(float)
With 4 example files, I get this dataframe:
print(df.to_string())
Jaccard Dice Volume Similarity False Positives False Negatives
0 0.687111 0.814542 -0.000662 0.185727 0.185189
1 0.345211 0.232542 -0.000455 0.678547 0.156752
2 0.623451 0.813345 -0.000625 0.132257 0.345519
3 0.346111 0.223454 -0.000343 0.453727 0.134586
And some statistics on the fly:
print(df.describe())
Jaccard Dice Volume Similarity False Positives False Negatives
count 4.000000 4.000000 4.000000 4.000000 4.000000
mean 0.500471 0.520971 -0.000521 0.362565 0.205511
std 0.180639 0.338316 0.000149 0.253291 0.095609
min 0.345211 0.223454 -0.000662 0.132257 0.134586
25% 0.345886 0.230270 -0.000634 0.172360 0.151210
50% 0.484781 0.522944 -0.000540 0.319727 0.170970
75% 0.639366 0.813644 -0.000427 0.509932 0.225271
max 0.687111 0.814542 -0.000343 0.678547 0.345519