0

consider the following code:

dog = np.random.rand(10, 10)
frog = pd.DataFrame(dog, columns = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
from sklearn.preprocessing import StandardScaler
slog = StandardScaler()
mog = slog.fit_transform(frog.values)
frog[frog.columns] = mog

OK, now we should have a dataframe whose values should be the standard-scaled array. But:

frog.describe()

gives:

[![describe the dataframe][1]][1]

Note that the standard deviation is 1.05

While

np.std(mog, axis=0)

Gives the expected:

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

What gives?

2

1 Answer 1

1

The standard deviation computed by the describe method uses the sample standard deviation, while StandardScaler uses the population standard deviation. The only difference between the two is whether the sum of the squared differences from the mean is divided by n-1 (for the sample st. dev.) or n (for the pop. std. dev.).

numpy.std computes the population st. dev. by default, but you can use it to compute the sample st. dev. by adding the argument ddof=1, and the result agrees with the values computed by describe:

In [54]: np.std(mog, axis=0, ddof=1)
Out[54]: 
array([1.05409255, 1.05409255, 1.05409255, 1.05409255, 1.05409255,
       1.05409255, 1.05409255, 1.05409255, 1.05409255, 1.05409255])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.