Elementwise mean of numpy arrays from pandas dataframe cells

Question

I have a 2x1 pandas dataframe where the 2 cells contain numpy arrays:

>>> import numpy as np
>>> import pandas as pd
>>> a0 = np.array([[1, 2], [2, 2]])
>>> a1 = np.array([[3, 2], [1, 1]])
>>> df = pd.DataFrame([[a0], [a1]])

I can compute the element-wise mean of the two arrays as follows:

>>> np.mean(df[0])
array([[ 2. ,  2. ],
       [ 1.5,  1.5]])

Now I want to consider the case where at least one of the arrays contains nan/s, e.g.

>>> a0 = np.array([[1, 2], [2, np.nan]])
>>> a1 = np.array([[3, 2], [1, 1]])
>>> df = pd.DataFrame([[a0], [a1]])

The mean method used above gives

>>> np.mean(df[0])
array([[ 2. ,  2. ],
       [ 1.5,  nan]])

as expected. I want the nan/s to be ignored though. I would have expected the following to work

>>> np.nanmean(df[0])
array([[ -4.,  -4.],
       [ -3.,  nan]])

but it obviously doesn't.

So, my question: how can I compute element-wise means of numpy arrays which are contained in the cells of a pandas dataframe?

Two questions: You may not be mixing index 0 and column 0, right? Perhaps naming the dataframe columns could help. And, the expected result from np.nanmean(df[0]) would be array([[ 2. , 2. ], [ 1.5, 1]])? — Luis
– Luis, Commented Jan 24, 2018 at 11:31
(i) I get the same result if I use e.g. 'c' as column name. (ii) Yes, expected result is np.array([[2., 2.], [1.5, 1.]]). — nluckn
– nluckn, Commented Jan 24, 2018 at 12:35

Luis · Accepted Answer · 2018-01-24 12:10:52Z

I am not sure if I understand correctly, partly because I get confused with index 0 and column name 0... But here is an idea:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: a0 = np.array([[1, 2], [2, np.nan]])

In [4]: a1 = np.array([[3, 2], [1, 1]])

In [5]: df = pd.DataFrame([[a0], [a1]])

In [6]: df
Out[6]: 
                          0
0  [[1.0, 2.0], [2.0, nan]]
1          [[3, 2], [1, 1]]

In [7]: df[0].as_matrix()
Out[7]: 
array([array([[  1.,   2.],
       [  2.,  nan]]),
       array([[3, 2],
       [1, 1]])], dtype=object)

In [8]: np.array( [ item for item in df[0].as_matrix() ] )
Out[8]: 
array([[[  1.,   2.],
        [  2.,  nan]],

       [[  3.,   2.],
        [  1.,   1.]]])

In [9]: np.nanmean( np.array( [ item for item in df[0].as_matrix() ]
   ...:  ), axis=0 )
Out[9]: 
array([[ 2. ,  2. ],
       [ 1.5,  1. ]])

In [10]: np.nanmean( np.array( [ item for item in df[0].as_matrix() 
    ...: ] ), axis=1 )
Out[10]: 
array([[ 1.5,  2. ],
       [ 2. ,  1.5]])

In [11]: np.nanmean( np.array( [ item for item in df[0].as_matrix() 
    ...: ] ), axis=2 )
Out[11]: 
array([[ 1.5,  2. ],
       [ 2.5,  1. ]])

Collectives™ on Stack Overflow

Elementwise mean of numpy arrays from pandas dataframe cells

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related