Sampling unique column indexes for each row of a numpy array

Question

I want to generate a fixed number of random column indexes (without replacement) for each row of a numpy array.

A = np.array([[3, 5, 2, 3, 3],
       [1, 3, 3, 4, 5],
       [3, 5, 4, 2, 1],
       [1, 2, 3, 5, 3]])

If I fixed the required column number to 2, I want something like

np.array([[1,3],
          [0,4],
          [1,4],
          [2,3]])

I am looking for a non-loop Numpy based solution. I tried with choice, but with the replacement=False I get error

ValueError: Cannot take a larger sample than population when 'replace=False'

I can't relate your desired result to the original array. What code produced the choice error? Obviously you can't choice 10 items without replacement from a population of 6. Are you trying to select a random 2 items from the 1st row, another random 2 from 2nd, and so on? — hpaulj
– hpaulj, Commented Jul 11, 2018 at 7:36
@hpaulj if I do random.randint(A.shape[1], size=(A.shape[0],2)), to select 2 random column indexes for each row I get rows with duplicate entries. and with replace=False, I get error. — Shew
– Shew, Commented Jul 11, 2018 at 8:02
OP wants random indices but it seems that the rows should be unique. — Kasravnd
– Kasravnd, Commented Jul 11, 2018 at 8:06

Divakar · Accepted Answer · 2018-07-11 08:18:13Z

3

Here's one vectorized approach inspired by this post -

def random_unique_indexes_per_row(A, N=2):
    m,n = A.shape
    return np.random.rand(m,n).argsort(1)[:,:N]

Sample run -

In [146]: A
Out[146]: 
array([[3, 5, 2, 3, 3],
       [1, 3, 3, 4, 5],
       [3, 5, 4, 2, 1],
       [1, 2, 3, 5, 3]])

In [147]: random_unique_indexes_per_row(A, N=2)
Out[147]: 
array([[4, 0],
       [0, 1],
       [3, 2],
       [2, 0]])
In [148]: random_unique_indexes_per_row(A, N=3)
Out[148]: 
array([[2, 0, 1],
       [3, 4, 2],
       [3, 2, 1],
       [4, 3, 0]])

edited Jul 11, 2018 at 8:18

answered Jul 11, 2018 at 7:58

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ben Dumoulin · Accepted Answer · 2018-07-11 07:33:20Z

1

Like this?

B = np.random.randint(5, size=(len(A), 2))

answered Jul 11, 2018 at 7:33

Ben Dumoulin

112 bronze badges

1 Comment

sɐunıɔןɐqɐp Over a year ago

Welcome to Stack Overflow! Please don't answer just with source code. Try to provide a nice description about how your solution works. See: How do I write a good answer?. Thanks

Kasravnd · Accepted Answer · 2018-07-11 08:16:58Z

0

You can use random.choice() as following:

def random_indices(arr, n):
    x, y = arr.shape
    return np.random.choice(np.arange(y), (x, n))
    # or return np.random.randint(low=0, high=y, size=(x, n))

Demo:

In [34]: x, y = A.shape

In [35]: np.random.choice(np.arange(y), (x, 2))
Out[35]: 
array([[0, 2],
       [0, 1],
       [0, 1],
       [3, 1]])

As an experimental approach here is a way that in 99% of the times will give unique indices:

In [60]: def random_ind(arr, n):
    ...:     x, y = arr.shape
    ...:     ind = np.random.randint(low=0, high=y, size=(x * 2, n))
    ...:     _, index = np.unique(ind.dot(np.random.rand(ind.shape[1])), return_index=True)
    ...:     return ind[index][:4]
    ...: 
    ...: 
    ...: 

In [61]: random_ind(A, 2)
Out[61]: 
array([[0, 1],
       [1, 0],
       [1, 1],
       [1, 4]])

In [62]: random_ind(A, 2)
Out[62]: 
array([[1, 0],
       [2, 0],
       [2, 1],
       [3, 1]])

In [64]: random_ind(A, 3)
Out[64]: 
array([[0, 0, 0],
       [1, 1, 2],
       [0, 4, 1],
       [2, 3, 1]])

In [65]: random_ind(A, 4)
Out[65]: 
array([[0, 4, 0, 3],
       [1, 0, 1, 4],
       [0, 4, 1, 2],
       [3, 0, 1, 0]])

This function will return IndexError at line return ind[index][:4] if there's no 4 unique items in that case you can repeat the function to make sure you'll get the desire result.

edited Jul 11, 2018 at 8:16

answered Jul 11, 2018 at 7:28

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

2 Comments

Divakar Over a year ago

But it seems OP wants without replacement.

Kasravnd Over a year ago

@Divakar It seems so, however I gave a solution for unique rows but not unique items in each row 0_0.

Collectives™ on Stack Overflow

Sampling unique column indexes for each row of a numpy array

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related