Count nan in data string with python

Question

I am trying to count 'nan' in my data file.

For this purpose, I have used two codes one is:

with  open(filin,'r') as f:
    arrays = [map(float, line.split(',')) for line in f]
newa = [x[6] for x in arrays]

The other is:

for columns in ( raw.strip().split(',') for raw in f ):
      a.append((columns[6])
newa = np.array(a)

When I used the first way, I got error message of:

Traceback (most recent call last):
File "Count_nan.py", line 13, in <module>
arrays = [map(float, line.split(',')) for line in f]
ValueError: could not convert string to float:

With second code, I can get arrays, but I could not count nan with the code of

l = np.count_nonzero(np.isnan(newa)) or


v = [len(list(group)) for key, group in groupby(newa, key=np.isnan) if key]

v is code for counting group of consecutive 'nan's.

The reason which I can't use two code above is that my newa is consist of ['1', '2.4','nan'...], not [1, 2.4, nan, ...]

Any idea or help would be really appreciated.

Beat regards,

Isaac

Qantas 94 Heavy · Accepted Answer · 2014-07-03 01:27:52Z

3

Maybe change this

newa = np.array(a)

to this:

newa = np.array(a).astype(float)

or just:

newa = newa.astype(float)

edited Jul 3, 2014 at 1:27

Qantas 94 Heavy

16k31 gold badges74 silver badges89 bronze badges

answered Jul 3, 2014 at 1:08

Fourier

3232 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

CT Zhu · Accepted Answer · 2014-07-03 01:05:25Z

1

Since you are already using numpy, it makes a lot of sense to use genfromtxt to read data, instead of doing it manually, and then it should just work:

In [43]:

%%file temp.txt
1,2.4,nan
1,2.4,nan
Overwriting temp.txt
In [44]:

arr=np.genfromtxt('temp.txt',delimiter=',')
arr
Out[44]:
array([[ 1. ,  2.4,  nan],
       [ 1. ,  2.4,  nan]])
In [45]:

np.count_nonzero(np.isnan(arr))
Out[45]:
2

Also, if you are only reading the 7th column from you data file, supply usecols=[6] to genfromtxt.

To find the longest run of nan is easy:

In [57]:

import itertools
In [58]:

arr
Out[58]:
array([ 1. ,  2.4,  nan,  1. ,  2.4,  nan,  nan,  nan])
In [59]:

max([len(list(v)) for i, v in itertools.groupby(np.isnan(arr)) if i])
Out[59]:
3

edited Jul 3, 2014 at 1:05

answered Jul 3, 2014 at 0:53

CT Zhu

54.6k18 gold badges125 silver badges136 bronze badges

4 Comments

Isaac Over a year ago

Thank you CT Zhu, Do you mean "arr=np.genfromtxt('temp.txt, delimiter=',',usecols=[6])?

CT Zhu Over a year ago

Yep, also, to get the longest run of nans is easy with using itertools.groupby, see edit.

Isaac Over a year ago

Thank you CT Zhu, However problem of my data is ['1','2','nan',..], not [1,2,nan,...]. I will try now, but do you think that your idea would work for this case as well?

CT Zhu Over a year ago

genfromtxt will read data into float in this case, and I was thinking [1,2,nan,...] is the desired result, right?

Joran Beasley · Accepted Answer · 2014-07-03 00:35:44Z

1

how bout just

open(filin,'r').read().count("nan")

if you really just want to count "nan" at least

(as an aside float("nan") works fine ... so you are obviously passing in something else that cannot be converted to a float)

edited Jul 3, 2014 at 0:35

answered Jul 3, 2014 at 0:30

Joran Beasley

114k13 gold badges167 silver badges187 bronze badges

1 Comment

Isaac Over a year ago

Thank you Joran, your idea is great. Also, I would like to try to count consecutive "nan". My purpose is to find the largest consecutive "nan" in the list. I usually used "v = [len(list(group)) for key, group in groupby(newa, key=np.isnan) if key]" and print max(v). However, I couldn't apply this code for this case. Thank you, Isaac

Collectives™ on Stack Overflow

Count nan in data string with python

3 Answers 3

Comments

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related