going from numpy array to a pandas dataframe changes values

Question

consider the following code:

dog = np.random.rand(10, 10)
frog = pd.DataFrame(dog, columns = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
from sklearn.preprocessing import StandardScaler
slog = StandardScaler()
mog = slog.fit_transform(frog.values)
frog[frog.columns] = mog

OK, now we should have a dataframe whose values should be the standard-scaled array. But:

frog.describe()

gives:

[![describe the dataframe][1]][1]

Note that the standard deviation is 1.05

While

np.std(mog, axis=0)

Gives the expected:

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

What gives?

I am not sure why the image is not showing up, but the standard deviation is 1.05 in all columns. — Igor Rivin
– Igor Rivin, Commented May 24, 2020 at 19:53

Warren Weckesser · Accepted Answer · 2020-05-24 20:09:06Z

1

The standard deviation computed by the describe method uses the sample standard deviation, while StandardScaler uses the population standard deviation. The only difference between the two is whether the sum of the squared differences from the mean is divided by n-1 (for the sample st. dev.) or n (for the pop. std. dev.).

numpy.std computes the population st. dev. by default, but you can use it to compute the sample st. dev. by adding the argument ddof=1, and the result agrees with the values computed by describe:

In [54]: np.std(mog, axis=0, ddof=1)
Out[54]: 
array([1.05409255, 1.05409255, 1.05409255, 1.05409255, 1.05409255,
       1.05409255, 1.05409255, 1.05409255, 1.05409255, 1.05409255])

answered May 24, 2020 at 20:09

Warren Weckesser

116k20 gold badges207 silver badges224 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

going from numpy array to a pandas dataframe changes values

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related