5

My google-fu has failed me! I have a 10x10 numpy array initialized to 0 as follows:

arr2d = np.zeros((10,10))

For each row in arr2d, I want to assign 3 random columns to 1. I am able to do it using a loop as follows:

for row in arr2d:
    rand_cols = np.random.randint(0,9,3)
    row[rand_cols] = 1

output:

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.,  0.],
   [ 1.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
   [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

Is there a way to exploit numpy or array indexing/slicing to achieve the same result in a more pythonic/elegant way (preferably in 1 or 2 lines of code)?

2
  • 2
    Did you notice that one of your rows has only two 1s? That can happen if randint(0, 9, 3) generates a sample with a repeated value. Is that what you want? Commented Aug 20, 2016 at 4:14
  • So, did any of the solutions work for you? Commented Aug 21, 2016 at 8:06

3 Answers 3

2

Once you have the arr2d initialized with arr2d = np.zeros((10,10)), you can use a vectorized approach with a two-liner like so -

# Generate random unique 3 column indices for 10 rows
idx = np.random.rand(10,10).argsort(1)[:,:3]

# Assign them into initialized array
arr2d[np.arange(10)[:,None],idx] = 1

Or cramp in everything for a one-liner if you like it that way -

arr2d[np.arange(10)[:,None],np.random.rand(10,10).argsort(1)[:,:3]] = 1

Sample run -

In [11]: arr2d = np.zeros((10,10))  # Initialize array

In [12]: idx = np.random.rand(10,10).argsort(1)[:,:3]

In [13]: arr2d[np.arange(10)[:,None],idx] = 1

In [14]: arr2d # Verify by manual inspection
Out[14]: 
array([[ 0.,  1.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.,  1.],
       [ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  1.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.]])

In [15]: arr2d.sum(1) # Verify by counting ones in each row
Out[15]: array([ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.])

Note : If you are looking for performance, I would suggest going with a np.argpartition based approach as listed in this other post.

Sign up to request clarification or add additional context in comments.

2 Comments

Really cool. There's a whole lot of cleverness packed into these two lines of code.
@zarak The original idea came from this post - stackoverflow.com/a/29156976/3293881. The speedups against a loopy approach are listed here : stackoverflow.com/a/31958263/3293881
1

Use answers from this question to generate non-repeating random numbers. You can use random.sample from Python's random module, or np.random.choice.

So, just a small modification to your code:

>>> import numpy as np
>>> for row in arr2d:
...     rand_cols = np.random.choice(range(10), 3, replace=False)
...     # Or the python standard lib alternative (use `import random`)
...     # rand_cols = random.sample(range(10), 3)
...     row[rand_cols] = 1
...
>>> arr2d
array([[ 0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  1.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  1.,  0.]])

I don't think you can really leverage column slicing here to set values to 1, unless you're generating the randomized array from scratch. This is because your column indices are random for each row. You're better off leaving it in the form of a loop for readability.

3 Comments

FYI: You can generate non-repeating random numbers using numpy.random.choice(10, size=3, replace=False). This is described in one of the answers to the question that you linked.
@WarrenWeckesser I did notice, but I didn't include it because it was the second result. I'll add it as an alternative. Thanks!
In fact, in retrospect, it's probably better to just use np.random in order to avoid having two very similar imports, which could get pretty confusing.
0

I'm not sure how good this would be in terms of performance, but it's fairly concise.

arr2d[:, :3] = 1
map(np.random.shuffle, arr2d)

1 Comment

That picks the same columns for every row.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.