0

Suppose I have two arrays, a=np.array([0,0,1,1,1,2]), b=np.array([1,2,4,2,6,5]). Elements in a mean the row indices of where b should be assigned. And if there are multiple elements in the same row, the values should be assigned in order. So the result is a 2D array c:

c = np.zeros((3, 4))
counts = {k:0 for k in range(3)}
for i in range(a.shape[0]):
    c[a[i], counts[a[i]]]=b[i]
    counts[a[i]]+=1
print(c)

Is there a way to use some fancy indexing method in numpy to get such results faster (without a for loop) in case these arrays are big.

1 Answer 1

2

I had to run your code to actually see what it produced. There are limits to what I can 'run' in my head.

In [230]: c                                                                                            
Out[230]: 
array([[1., 2., 0., 0.],
       [4., 2., 6., 0.],
       [5., 0., 0., 0.]])
In [231]: counts                                                                                       
Out[231]: {0: 2, 1: 3, 2: 1}

Omitting this information may be delaying possible answers. 'vectorization' requires thinking in whole-array terms, which is easiest if I can visualize the result, and look for a pattern.

This looks like a padding problem.

In [260]: u, c = np.unique(a, return_counts=True)                                                      
In [261]: u                                                                                            
Out[261]: array([0, 1, 2])
In [262]: c                                                                                            
Out[262]: array([2, 3, 1])      # cf with counts

Load data with rows of different sizes into Numpy array

Working from previous padding questions, I can construct a mask:

In [263]: mask = np.arange(4)<c[:,None]                                                                
In [264]: mask                                                                                         
Out[264]: 
array([[ True,  True, False, False],
       [ True,  True,  True, False],
       [ True, False, False, False]])

and use that to assign the b values to c:

In [265]: c = np.zeros((3,4),int)                                                                      
In [266]: c[mask] = b                                                                                  
In [267]: c                                                                                            
Out[267]: 
array([[1, 2, 0, 0],
       [4, 2, 6, 0],
       [5, 0, 0, 0]])

Since a is already sorted we might get the counts faster than with unique. Also it will have problems if a doesn't have any values for some row(s).

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. Yes, this is a padding problem. If a doesn't have any values for some row, one way to get around is to use d=np.zeros(3), d[u]=c,mask = np.arange(4)<d[:,None]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.