numpy random usage validity

Question

EDITED2 : Precise question re-formulated. EDITED: Typo in the code:

I would like to generate 1000 samples of 10 INDEPENDANT Random VARIABLE(here gaussian).

Are those 2 equivalents in term of mathematical point of view (Independance between 10 random variables).

direct: 1000 samples of 10 random variables.

import numpy as np
np.random.seed(123)
x= np.random.normal(0,1 (10, 1000))

With Loop: Generate sample by sample the vector of 10 random variables.

 import numpy as np
 np.random.seed(123)
 for(i  in range(0,1000)):
   x[:,i]= np.random.normal(0,1,(10, 1))

Reason is to do paralell sampling... Are those 2 methods generate 1000 samples of 10 Independant gaussian random variable ? (Am not sure than the samples are independant).

I looked into other questions, there is no indication how the Mersenne Twister is implemented in numpy. So, the questions refers if 2 successive calls of the random.normal produce independant values (ie implementation of random.normal).

(Imagine the case that "manually", I reset the seed between the calls...this is obviously not independant...(there is no interest to do it).

Of course, we can check A POSTERIORI, it might be true by looking at the statistics.... it does not prove it....

Numpy does implement MT. docs.scipy.org/doc/numpy-1.10.1/reference/generated/… — Reti43
– Reti43, Commented Jan 8, 2016 at 9:23
What do you mean by there is no indication how the Mersenne Twister is implemented in numpy? There's always the source! — Nelewout
– Nelewout, Commented Jan 9, 2016 at 17:17

Community · Accepted Answer · 2017-04-12 07:31:17Z

3

Yes, they are statistically equivalent. As long as you generate numbers for the same normal distribution, they will have the same statistical characteristics. That is, the mean and standard deviation should be very close to those of the distribution they were drawn from. For example,

import numpy as np

mu, sigma = 2.0, 0.5

# A and B will have different seeds and so different numbers
A = np.random.normal(mu, sigma, size=(1000, 1000))
B = np.random.normal(mu, sigma, size=(1000, 1000))

# But their statistical characteristics should be similar enough
print(mu, np.mean(A), np.mean(B))
print(sigma, np.std(A), np.std(B))

The random numbers generated starting from the same seed will always be the same, assuming of course we use the same function, which we do in this case. So, what this means is that your two arrays will contain the same elements, but not in the same order. We can observe this with a small enough example to fit our screen.

size = (3, 5)

def method1():
    np.random.seed(123)
    return np.random.normal(0, 1, size)

def method2():
    np.random.seed(123)
    x = np.zeros(size, dtype=float)
    for i  in range(size[1]):
        x[:,i] = np.random.normal(0, 1, size[0])
    return x

Output

# method 1
array([[-1.0856306 ,  0.99734545,  0.2829785 , -1.50629471, -0.57860025],
       [ 1.65143654, -2.42667924, -0.42891263,  1.26593626, -0.8667404 ],
       [-0.67888615, -0.09470897,  1.49138963, -0.638902  , -0.44398196]])

#method 2
array([[-1.0856306 , -1.50629471, -2.42667924, -0.8667404 ,  1.49138963],
       [ 0.99734545, -0.57860025, -0.42891263, -0.67888615, -0.638902  ],
       [ 0.2829785 ,  1.65143654,  1.26593626, -0.09470897, -0.44398196]])

Both methods generate the same 3*5 numbers. However, the first method puts the first 5 in the first row, the next 5 in the second row, etc. While the second method puts the first 3 in the first column, the next 3 in the second column, etc. In fact, if method2() was rewritten to the following, it'd put the numbers in the same way (row by row) as method1().

# same result as `method1()`
def method3():
    np.random.seed(123)
    x = np.zeros(size, dtype=float)
    for i  in range(size[0]):
        x[i,:] = np.random.normal(0, 1, size[1])
    return x

It goes without saying that if each method generates the same numbers but in different order, the statistical characteristics of each row/column between the two will not be the same (since they have different numbers). However, if the samples for each row/column is large enough, i.e., size, they should obey the statistical characteristic of the distribution they were drawn from.

Edit: Sampling independency

From the documentation see we that numpy implements the Mersenne Twister algorithm (MT). It creates a container, which all other distributions use, such as normal(), exponential(), etc.

The MT is a widely used PRNG and has been studied extensively. Any PRNG worth its salt will perform well against a battery of tests that test the statistical properties of the algorithm. Read about the Diehard tests and TestU01. Search for the keywords normal and uniform (i.e. independent) on those pages to read about specific tests.

You can also read this post about the theoretical approach to a truly random number generator.

Bottom line, the generators we use, we use them because they have been deemed good enough for what we want to do. If you're worried that MT will not be up to the task for your problem, you may want to choose (and probably implement) something different. However, without knowing what you're trying to do, it's impossible to give any more solid advice.

edited Apr 12, 2017 at 7:31

CommunityBot

11 silver badge

answered Jan 8, 2016 at 9:32

Reti43

9,8373 gold badges30 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

user5497885 Over a year ago

thank you, I made typo in the code... I corrected with the right one

Reti43 Over a year ago

@quantCode I have revised my answer to match your updated question.

user5497885 Over a year ago

Thank you. but; Are the random numbers independant for each other ? ( I will reformulate my question in the post)

Nelewout Over a year ago

@quantCode think about this yourself: does the occurence of one realisation influence the occurence of the other?

Reti43 Over a year ago

@quantCode Any widely used PRNG goes through a lot of statistical tests (for example) to ensure it exhibits as many random properties as possible. That includes independence of sampling. You can read more about the MT algorithm here.

|

Collectives™ on Stack Overflow

numpy random usage validity

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related