3

Let say I have two numPy arrays arr1and arr2:

arr1 = np.random.randint(3, size = 100)

arr2 = np.random.randint(3, size = 100)

I would like to build a matrix that contains the number of joint occurrences. In other words, for all the values of arr1 that are 0, find the elements in arr2 that are also 0 and are located at the same position. And so, I would like to get the following matrix:

M = [[p(0,0), p(0,1), p(0,2)],
     [p(1,0), p(1,1), p(1,2)],
     [p(2,0), p(2,1), p(2,2)]]

Where p(0,0)stands for the number of occurrences that are 0 on arr1and 0 on arr2.

First Attempt:

As a first attempt I have tried the following:

[[sum(arr1[arr2 == y] == x) for x in np.arange(0,3)] for y in np.arange(0,3)] 

But python throws the following error:

NameError: name 'arr1' is not defined

Second Attempt:

I tried to dig into this error by making use of for-loops:

M = np.array([])

for x in np.arange(0,dim):
    result = np.array([])

    for y in np.arange(0,dim):
        result_temp = sum(arr1[arr2 == x] == y)
        result = np.append(result, result_temp)

    M = np.append(M,result) 

In this case Python does not throw the previous Error, but instead of getting a 3x3 array, I get a 1x9 array, and I am not able to get the desired 3x3 array.

Thanks in advance.

2 Answers 2

3

Your first list comprehension works. You won't get a NameError if arr1 is defined:

import numpy as np
np.random.seed(2016)
arr1 = np.random.randint(3, size = 100)
arr2 = np.random.randint(3, size = 100)
result = [[sum(arr1[arr2 == y] == x) for x in np.arange(0,3)] 
          for y in np.arange(0,3)] 
print(result)
# [[10, 9, 10], [8, 13, 15], [18, 8, 9]]

But you could instead use np.histogram2d:

result2, xedges, yedges = np.histogram2d(arr2, arr1, bins=range(4))
print(result2)

yields

[[ 10.   9.  10.]
 [  8.  13.  15.]
 [ 18.   8.   9.]]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for you answer! I am going to use your second solution because although I have defined arr1, it still throws the error. I do not understand why.
2

For performance, I would like to suggest np.bincount -

N = 3 # Number of integers to cover
out = np.bincount(arr2*N + arr1, minlength=N*N).reshape(N,N)

Sample run -

In [50]: arr1 = np.random.randint(3, size = 100)
    ...: arr2 = np.random.randint(3, size = 100)
    ...: 

In [51]: N = 3 # Number of integers to cover

In [52]: np.bincount(arr2*N + arr1, minlength=N*N).reshape(N,N)
Out[52]: 
array([[12, 10, 12],
       [ 7,  6, 20],
       [ 5, 13, 15]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.