2

I have a text file like this:

"-3.588920831680E-02","1.601887196302E-01","1.302309112549E+02"
"3.739478886127E-01","1.782759875059E-01","6.490543365479E+01"
"3.298096954823E-01","6.939357519150E-02","2.112392578125E+02"
"-2.319437451661E-02","1.149862855673E-01","2.712340698242E+02"
"-1.015115305781E-01","-1.082316488028E-01","6.532022094727E+01"
"-5.374089814723E-03","1.031072884798E-01","5.510117187500E+02"
"6.748274713755E-02","1.679160743952E-01","4.033969116211E+02"
"1.027429699898E-01","1.379162818193E-02","2.374352874756E+02"
"-1.371455192566E-01","1.483036130667E-01","2.703260498047E+02"
"NULL","NULL","NULL"
"3.968210220337E-01","1.893606968224E-02","2.803018188477E+01"

I tried to read this textfile using numpy as:

dat = np.genfromtxt('data.txt',delimiter=',',dtype='str')
print("dat = {}".format(dat))

# now when I try to convert to float
dat = dat.astype(np.float) # it fails

# try to make it float
dat = np.char.strip(dat, '"').astype(float)
File "test.py", line 25, in <module>
    dat = dat.astype(np.float)  # it fails
ValueError: could not convert string to float: '"-3.588920831680E-02"'

How can I fix this error?

Related links:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt

9
  • You should provide a summary of what each of those links taught you and why it didn't work. otherwise why did you bother posting them? Commented Feb 11, 2018 at 5:43
  • Hi Mad man, I have commented each time why it failed in the updated code. The numpy solution fails, I can do it using pandas and using normal python (somewhat long and many lines) but I can not do it using numpy method. Thanks for suggestions though. Commented Feb 11, 2018 at 5:45
  • how exactly does dat.astype(np.float) fail? Commented Feb 11, 2018 at 5:48
  • It says it cant convert NULL to floats. Commented Feb 11, 2018 at 5:50
  • 1
    How? "it fails" is not acceptable as a standalone statement. Commented Feb 11, 2018 at 5:53

2 Answers 2

2

You can read that file directly using the csv module like:

Code:

import csv
import numpy as np

reader = csv.reader(open('file1'), delimiter=",")
data = np.array([[float(i) if i != 'NULL' else np.nan for i in row]
                  for row in reader])

print(data)

Results:

[[ -3.58892083e-02   1.60188720e-01   1.30230911e+02]
 [  3.73947889e-01   1.78275988e-01   6.49054337e+01]
 [  3.29809695e-01   6.93935752e-02   2.11239258e+02]
 [ -2.31943745e-02   1.14986286e-01   2.71234070e+02]
 [ -1.01511531e-01  -1.08231649e-01   6.53202209e+01]
 [ -5.37408981e-03   1.03107288e-01   5.51011719e+02]
 [  6.74827471e-02   1.67916074e-01   4.03396912e+02]
 [  1.02742970e-01   1.37916282e-02   2.37435287e+02]
 [ -1.37145519e-01   1.48303613e-01   2.70326050e+02]
 [             nan              nan              nan]
 [  3.96821022e-01   1.89360697e-02   2.80301819e+01]]
Sign up to request clarification or add additional context in comments.

Comments

-1

The problem is that your floating point number is being enclosed by 2 quotes instead of 1. Numpy wants your array to have strings like

'1.45E-02'

Instead you have something like

' "1.45E-02" '(Note the extra double quotes at the beginning and end).

So the solution to this problem will be simply to remove those extra double quotes which can be done quite easily as follows:

dat_new = np.char.replace(dat,'"','')
dat_new = np.char.replace(dat_new,'NULL','0') #You also need to do something 
#with NULL. Here I am just replacing it with 0.
dat_new = dat_new.astype(float)

np.char.replace(np_array,string_to_replace,replacement) essentially works as 'Find and Replace' and replaces each instance of your second argument with the third.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.