Yes, they are statistically equivalent. As long as you generate numbers for the same normal distribution, they will have the same statistical characteristics. That is, the mean and standard deviation should be very close to those of the distribution they were drawn from. For example,
import numpy as np
mu, sigma = 2.0, 0.5
# A and B will have different seeds and so different numbers
A = np.random.normal(mu, sigma, size=(1000, 1000))
B = np.random.normal(mu, sigma, size=(1000, 1000))
# But their statistical characteristics should be similar enough
print(mu, np.mean(A), np.mean(B))
print(sigma, np.std(A), np.std(B))
The random numbers generated starting from the same seed will always be the same, assuming of course we use the same function, which we do in this case. So, what this means is that your two arrays will contain the same elements, but not in the same order. We can observe this with a small enough example to fit our screen.
size = (3, 5)
def method1():
np.random.seed(123)
return np.random.normal(0, 1, size)
def method2():
np.random.seed(123)
x = np.zeros(size, dtype=float)
for i in range(size[1]):
x[:,i] = np.random.normal(0, 1, size[0])
return x
Output
# method 1
array([[-1.0856306 , 0.99734545, 0.2829785 , -1.50629471, -0.57860025],
[ 1.65143654, -2.42667924, -0.42891263, 1.26593626, -0.8667404 ],
[-0.67888615, -0.09470897, 1.49138963, -0.638902 , -0.44398196]])
#method 2
array([[-1.0856306 , -1.50629471, -2.42667924, -0.8667404 , 1.49138963],
[ 0.99734545, -0.57860025, -0.42891263, -0.67888615, -0.638902 ],
[ 0.2829785 , 1.65143654, 1.26593626, -0.09470897, -0.44398196]])
Both methods generate the same 3*5 numbers. However, the first method puts the first 5 in the first row, the next 5 in the second row, etc. While the second method puts the first 3 in the first column, the next 3 in the second column, etc. In fact, if method2() was rewritten to the following, it'd put the numbers in the same way (row by row) as method1().
# same result as `method1()`
def method3():
np.random.seed(123)
x = np.zeros(size, dtype=float)
for i in range(size[0]):
x[i,:] = np.random.normal(0, 1, size[1])
return x
It goes without saying that if each method generates the same numbers but in different order, the statistical characteristics of each row/column between the two will not be the same (since they have different numbers). However, if the samples for each row/column is large enough, i.e., size, they should obey the statistical characteristic of the distribution they were drawn from.
Edit: Sampling independency
From the documentation see we that numpy implements the Mersenne Twister algorithm (MT). It creates a container, which all other distributions use, such as normal(), exponential(), etc.
The MT is a widely used PRNG and has been studied extensively. Any PRNG worth its salt will perform well against a battery of tests that test the statistical properties of the algorithm. Read about the Diehard tests and TestU01. Search for the keywords normal and uniform (i.e. independent) on those pages to read about specific tests.
You can also read this post about the theoretical approach to a truly random number generator.
Bottom line, the generators we use, we use them because they have been deemed good enough for what we want to do. If you're worried that MT will not be up to the task for your problem, you may want to choose (and probably implement) something different. However, without knowing what you're trying to do, it's impossible to give any more solid advice.
numpy? There's always the source!