1

I have a bunch of .txt files with metrics with the following formatting:

|Jaccard: 0.6871114980646424 
|Dice: 0.8145418946558747 
|Volume Similarity: -0.0006615037672849326 
|False Positives: 0.18572742753126772 
|False Negatives: 0.185188604940396

I would like to read them all (around 700) and store each value to a numpy array, so I could get statistics like average jaccard, average dice, etc.

How could I do that?

2
  • Do you want this in a single, 2D array, with name and value? If so, loop around as you read the file, and append each element. stackoverflow.com/questions/7332841/…. Or, you you want to have each element of the 700 files available for later processing? Commented Jun 3, 2020 at 21:02
  • Reading the lines, split and parse the numbers. What you show doesn't imply any array structure, so we can't help you there. Commented Jun 3, 2020 at 21:05

3 Answers 3

1

This would be my approach. The result is a dictionary with with all metrics in an array e.g.

 {"|Jaccard" : array...,
....}

Code might look like this:

import numpy as np
import os

pathtodir = "filedir"
d = {}
for file in os.listdir(pathtodir):
    with open(file, "r") as of:
        lines = of.readlines()
    for line in lines:
        k, v = line.split(": ")
        if k in d.keys():
            d[k].append(v)
        else:
            d[k] = [v]

for k in d:
    d[k] = np.array(d[k])
Sign up to request clarification or add additional context in comments.

Comments

0

You could use genfromtxt() from numpy. See https://numpy.org/doc/1.18/reference/generated/numpy.genfromtxt.html. Use':' as delimiter and extract a string followed by a float.

data = np.genfromtxt(path, delimiter=":", dtype='S64,f4')

Parsed the file and produced following data:

(b'|Jaccard',  6.8711150e-01) (b'|Dice',  8.1454188e-01)
 (b'|Volume Similarity', -6.6150376e-04)
 (b'|False Positives',  1.8572743e-01)
 (b'|False Negatives',  1.8518861e-01)]

Comments

0

I prefer to open each file and save its content in a pandas.DataFrame. The clear advantage respect to numpy.array is that it is easier to perform later statistics. Check this code:

import pandas as pd
import os

pathtodir = r'data' # write the name of the subfolder where your file are stored
df = pd.DataFrame()
file_count = 0

for file in os.listdir(pathtodir):
    with open(os.path.join(pathtodir, file), 'r') as of:
        lines = of.readlines()
    for line in lines:
        header, value = line.split(':')
        value = float(value.replace(' ','').replace('\n', ''))
        if header not in df.columns:
            df[header] = ''
        df.at[file_count, header] = value
    file_count += 1

for column in df.columns:
    df[column] = df[column].astype(float)

With 4 example files, I get this dataframe:

print(df.to_string())

    Jaccard      Dice  Volume Similarity  False Positives  False Negatives
0  0.687111  0.814542          -0.000662         0.185727         0.185189
1  0.345211  0.232542          -0.000455         0.678547         0.156752
2  0.623451  0.813345          -0.000625         0.132257         0.345519
3  0.346111  0.223454          -0.000343         0.453727         0.134586

And some statistics on the fly:

print(df.describe())

        Jaccard      Dice  Volume Similarity  False Positives  False Negatives
count  4.000000  4.000000           4.000000         4.000000         4.000000
mean   0.500471  0.520971          -0.000521         0.362565         0.205511
std    0.180639  0.338316           0.000149         0.253291         0.095609
min    0.345211  0.223454          -0.000662         0.132257         0.134586
25%    0.345886  0.230270          -0.000634         0.172360         0.151210
50%    0.484781  0.522944          -0.000540         0.319727         0.170970
75%    0.639366  0.813644          -0.000427         0.509932         0.225271
max    0.687111  0.814542          -0.000343         0.678547         0.345519

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.