1

I have a 2x1 pandas dataframe where the 2 cells contain numpy arrays:

>>> import numpy as np
>>> import pandas as pd
>>> a0 = np.array([[1, 2], [2, 2]])
>>> a1 = np.array([[3, 2], [1, 1]])
>>> df = pd.DataFrame([[a0], [a1]])

I can compute the element-wise mean of the two arrays as follows:

>>> np.mean(df[0])
array([[ 2. ,  2. ],
       [ 1.5,  1.5]])

Now I want to consider the case where at least one of the arrays contains nan/s, e.g.

>>> a0 = np.array([[1, 2], [2, np.nan]])
>>> a1 = np.array([[3, 2], [1, 1]])
>>> df = pd.DataFrame([[a0], [a1]])

The mean method used above gives

>>> np.mean(df[0])
array([[ 2. ,  2. ],
       [ 1.5,  nan]])

as expected. I want the nan/s to be ignored though. I would have expected the following to work

>>> np.nanmean(df[0])
array([[ -4.,  -4.],
       [ -3.,  nan]])

but it obviously doesn't.

So, my question: how can I compute element-wise means of numpy arrays which are contained in the cells of a pandas dataframe?

2
  • Two questions: You may not be mixing index 0 and column 0, right? Perhaps naming the dataframe columns could help. And, the expected result from np.nanmean(df[0]) would be array([[ 2. , 2. ], [ 1.5, 1]])? Commented Jan 24, 2018 at 11:31
  • (i) I get the same result if I use e.g. 'c' as column name. (ii) Yes, expected result is np.array([[2., 2.], [1.5, 1.]]). Commented Jan 24, 2018 at 12:35

1 Answer 1

1

I am not sure if I understand correctly, partly because I get confused with index 0 and column name 0... But here is an idea:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: a0 = np.array([[1, 2], [2, np.nan]])

In [4]: a1 = np.array([[3, 2], [1, 1]])

In [5]: df = pd.DataFrame([[a0], [a1]])

In [6]: df
Out[6]: 
                          0
0  [[1.0, 2.0], [2.0, nan]]
1          [[3, 2], [1, 1]]

In [7]: df[0].as_matrix()
Out[7]: 
array([array([[  1.,   2.],
       [  2.,  nan]]),
       array([[3, 2],
       [1, 1]])], dtype=object)

In [8]: np.array( [ item for item in df[0].as_matrix() ] )
Out[8]: 
array([[[  1.,   2.],
        [  2.,  nan]],

       [[  3.,   2.],
        [  1.,   1.]]])

In [9]: np.nanmean( np.array( [ item for item in df[0].as_matrix() ]
   ...:  ), axis=0 )
Out[9]: 
array([[ 2. ,  2. ],
       [ 1.5,  1. ]])

In [10]: np.nanmean( np.array( [ item for item in df[0].as_matrix() 
    ...: ] ), axis=1 )
Out[10]: 
array([[ 1.5,  2. ],
       [ 2. ,  1.5]])

In [11]: np.nanmean( np.array( [ item for item in df[0].as_matrix() 
    ...: ] ), axis=2 )
Out[11]: 
array([[ 1.5,  2. ],
       [ 2.5,  1. ]])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.