I have searched for this and couldn't find it. Imagine I have a numpy array of size N. Now I want to generate it's subsample of size M. Basically I want M randomly chosen elements from this array. N >= M. How can I do it ?
1 Answer
>>> N = 100; M = 10
>>> a = np.arange(0, N)
>>> np.random.choice(a, M, replace=False)
array([22, 81, 63, 7, 10, 52, 30, 33, 18, 41])
With replace=False you get no repetitions, and in that case M must be <= N.
Edit: 2d case:
>>> a = np.arange(0,120).reshape(10,12)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[ 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[ 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],
[ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[ 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83],
[ 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95],
[ 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107],
[108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])
>>> idx = np.arange(0, 10)
>>> rand_idx = np.random.choice(idx, 5, replace=False)
>>> a[rand_idx]
array([[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],
[84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])
5 Comments
Giorgi Tsertsvadze
This is almost that but I want this to be possible when Array is 2 dimensional. (I mean consider each row a single element) but it requires only 1D arrays.
fferri
Consider the result of my example as the indices of the rows/column you want to sample.
Reed Richards
The example has the wrong output. You select 5 samples from an array that is 10 values but then when you apply it at the end you get more than 5 values.
fferri
@ReedRichards: I sample 5 elements (rows) from the 10 (rows) by 12 (columns) matrix, and I get a 5 by 12 matrix: my example is correct.
Reed Richards
You forget to assign the a then.