1

I am trying to count 'nan' in my data file.

For this purpose, I have used two codes one is:

with  open(filin,'r') as f:
    arrays = [map(float, line.split(',')) for line in f]
newa = [x[6] for x in arrays]

The other is:

for columns in ( raw.strip().split(',') for raw in f ):
      a.append((columns[6])
newa = np.array(a)

When I used the first way, I got error message of:

Traceback (most recent call last):
File "Count_nan.py", line 13, in <module>
arrays = [map(float, line.split(',')) for line in f]
ValueError: could not convert string to float: 

With second code, I can get arrays, but I could not count nan with the code of

l = np.count_nonzero(np.isnan(newa)) or


v = [len(list(group)) for key, group in groupby(newa, key=np.isnan) if key]

v is code for counting group of consecutive 'nan's.

The reason which I can't use two code above is that my newa is consist of ['1', '2.4','nan'...], not [1, 2.4, nan, ...]

Any idea or help would be really appreciated.

Beat regards,

Isaac

3 Answers 3

3

Maybe change this

newa = np.array(a)

to this:

newa = np.array(a).astype(float)

or just:

newa = newa.astype(float)
Sign up to request clarification or add additional context in comments.

Comments

1

Since you are already using numpy, it makes a lot of sense to use genfromtxt to read data, instead of doing it manually, and then it should just work:

In [43]:

%%file temp.txt
1,2.4,nan
1,2.4,nan
Overwriting temp.txt
In [44]:

arr=np.genfromtxt('temp.txt',delimiter=',')
arr
Out[44]:
array([[ 1. ,  2.4,  nan],
       [ 1. ,  2.4,  nan]])
In [45]:

np.count_nonzero(np.isnan(arr))
Out[45]:
2

Also, if you are only reading the 7th column from you data file, supply usecols=[6] to genfromtxt.

To find the longest run of nan is easy:

In [57]:

import itertools
In [58]:

arr
Out[58]:
array([ 1. ,  2.4,  nan,  1. ,  2.4,  nan,  nan,  nan])
In [59]:

max([len(list(v)) for i, v in itertools.groupby(np.isnan(arr)) if i])
Out[59]:
3

4 Comments

Thank you CT Zhu, Do you mean "arr=np.genfromtxt('temp.txt, delimiter=',',usecols=[6])?
Yep, also, to get the longest run of nans is easy with using itertools.groupby, see edit.
Thank you CT Zhu, However problem of my data is ['1','2','nan',..], not [1,2,nan,...]. I will try now, but do you think that your idea would work for this case as well?
genfromtxt will read data into float in this case, and I was thinking [1,2,nan,...] is the desired result, right?
1

how bout just

open(filin,'r').read().count("nan")

if you really just want to count "nan" at least

(as an aside float("nan") works fine ... so you are obviously passing in something else that cannot be converted to a float)

1 Comment

Thank you Joran, your idea is great. Also, I would like to try to count consecutive "nan". My purpose is to find the largest consecutive "nan" in the list. I usually used "v = [len(list(group)) for key, group in groupby(newa, key=np.isnan) if key]" and print max(v). However, I couldn't apply this code for this case. Thank you, Isaac

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.