0

I have a Pandas dataframe containing Numpy ndarrays:

import numpy as np, pandas as pd
x = pd.DataFrame(columns=['a', 'b'])
x.loc['t1'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t2'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t3'] = [np.random.rand(2000, 500), np.random.rand(2000)]
print(x)
                                                    a                                                  b
# t1  [[0.8613174378493778, 0.5959214775442211, 0.62...  [0.4603835101674928, 0.3552761341266353, 0.949...
# t2  [[0.15792328922236398, 0.4274550633264813, 0.5...  [0.20059737978647396, 0.9445869962005252, 0.38...
# t3  [[0.43047697993868284, 0.7127140849172484, 0.4...  [0.6868215656323862, 0.14146376237438463, 0.51...

This works and computes the mean of the column b numpy arrays, over each of the 3 rows (vertical axis mean):

x.loc[:, 'b'].mean()
# [0.44926749 0.4804423  0.61566989 ... 0.4717142  0.70605732 0.55848075]

But how to compute the mean on the other axis? This fails:

x.loc[:, 'b'].mean(axis=1)   # or axis="b"

Expected result:

           b
t1         0.46
t2         0.31
t3         0.79
2
  • You cannot directly, you'd need to loop which defeats the purpose of using pandas/numpy, you should rather use a ndarray here for efficiency Commented Jun 24, 2022 at 9:39
  • @mozway Oh really, is that impossible? This is a shame because yes it would defeat the use of pandas/numpy together... ndarrays are great but not so much when we want to use labeled indexing. This means I should probably use xarray, as seen in stackoverflow.com/questions/72733385/…. BTW, your ideas welcome for this question! Commented Jun 24, 2022 at 9:43

1 Answer 1

1

You could always apply a mean function on the column, creating a new column in x, like this:

import numpy as np, pandas as pd
x = pd.DataFrame(columns=['a', 'b'])
x.loc['t1'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t2'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t3'] = [np.random.rand(2000, 500), np.random.rand(2000)]

x["b_mean"] = x["b"].apply(lambda y: np.mean(y))
# or just:
x["b_mean"] = x["b"].apply(np.mean)

Which results in:

t1    0.506371
t2    0.501433
t3    0.493867
Name: b_mean, dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.