73

Is there a way to store NaN in a Numpy array of integers? I get:

a=np.array([1],dtype=long)
a[0]=np.nan

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot convert float NaN to integer
0

2 Answers 2

65

No, you can't, at least with current version of NumPy. A nan is a special value for float arrays only.

There are talks about introducing a special bit that would allow non-float arrays to store what in practice would correspond to a nan, but so far (2012/10), it's only talks.

In the meantime, you may want to consider the numpy.ma package: instead of picking an invalid integer like -99999, you could use the special numpy.ma.masked value to represent an invalid value.

a = np.ma.array([1,2,3,4,5], dtype=int)
a[1] = np.ma.masked
masked_array(data = [1 -- 3 4 5],
             mask = [False  True False False False],
       fill_value = 999999)
Sign up to request clarification or add additional context in comments.

4 Comments

But be aware that there is a huge performance cost to use masked arrays as they are implemented in pure python!
@gaborous Whoa, really? I thought they were the recommended way to do such things?
@endolith Yes I found the info a long time ago in one of numpy's github issues but I don't have the link anymore. However since it was a long time ago, this might have been optimized (although I doubt so, one would need to compile to cython or similar first).
Just to be clear, nan and null are not the same thing. Also, while it is not a direct substitute for numpy, cuDF does support nulls.
12

A nan is a floating point only thing, there is no representation of it in the integers, so no :)

Pick an invalid value, like -99999

5 Comments

Picking a canonical value as invalid wouldn't be a good solution as that wouldn't replicate the same properties as nan, namely: comparisons between nan and any other value including itself should be false.
Using a sentinel value isn't ideal, but it's sufficient under the condition that you understand your data well enough to know the sentinel will not interfere with your computations. For instance, if you know your values are (not just "should be") always >= 0, then using a negative sentinel is acceptable (unless you're doing an operation where the outcome could have a different sign than the input, such as -1 * -1). If you're writing a framework and end up using sentinels, you should probably allow that value to be chosen by the user on an individual operation basis. Again, not ideal.
If your dataset is not going to change, then there are 2 easy ways that are closest to ideal: np.amin()-1 and np.amax()+1. Now your placeholder value is going to be unique, except in the case that np.amin() == np.iinfo(np.int32).min or np.amax()==np.iinfo(np.int32).max. In those cases, can use np.unique() and if the number of unique is equal to size of the data type, you must throw an error as no placeholder is possible. Otherwise search for the first value not in np.unique() efficiently by taking np.diff() and seeing the first place a difference is present, etc.
Sentinel values are actually used in a lot of real world databases especially in the healthcare industry such as Weight of Newborn Babies where -1 is used designate an unsuccessful birth.
@NoName Yes, and that's bad. If I had a dollar for every bug caused by "Sentinel" values being used where a NaN or missing object should have been...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.