5

I want to count the number of equal matrices that I encounter after splitting a large matrix.

mat1 = np.zeros((4, 8))

split4x4 = np.split(mat1, 4)

Now I want to know how many equal matrices are in split4x4, but collections.Counter(split4x4) throws an error. Is there a built-in way in numpy to do this?

1
  • i am an amateur so this may sound silly, but np.split() will by default split the array in equal pieces that you specify (for eg: 4 in above example) and if it can't than it throws an error. So, why do you need to find out that information, wouldn't that be just 4? Commented Aug 23, 2016 at 18:54

2 Answers 2

1

This can be done in a fully vectorized manner using the numpy_indexed package (disclaimer: I am its author):

import numpy_indexed as npi
unique_rows, row_counts = npi.count(mat1)

This should be substantially faster than using collections.Counter.

Sign up to request clarification or add additional context in comments.

Comments

1

Maybe the easiest way is to use np.unique and to flatten the split arrays to compare them as tuple:

import numpy as np
# Generate some sample data:
a = np.random.uniform(size=(8,3))
# With repetition:
a = np.r_[a,a]
# Split a in 4 arrays
s = np.asarray(np.split(a, 4))
s = [tuple(e.flatten()) for e in s]
np.unique(s, return_counts=True)

Remark: return_counts argument of np.unique new in version 1.9.0.

An other pure numpy solution inspired from that post

# Generate some sample data:
In: a = np.random.uniform(size=(8,3))
# With some repetition
In: a = r_[a,a]
In: a.shape
Out: (16,3)
# Split a in 4 arrays
In: s = np.asarray(np.split(a, 4))
In: print s
Out: [[[ 0.78284847  0.28883662  0.53369866]
       [ 0.48249722  0.02922249  0.0355066 ]
       [ 0.05346797  0.35640319  0.91879326]
       [ 0.1645498   0.15131476  0.1717498 ]]

      [[ 0.98696629  0.8102581   0.84696276]
       [ 0.12612661  0.45144896  0.34802173]
       [ 0.33667377  0.79371788  0.81511075]
      [ 0.81892789  0.41917167  0.81450135]]

      [[ 0.78284847  0.28883662  0.53369866]
       [ 0.48249722  0.02922249  0.0355066 ]
       [ 0.05346797  0.35640319  0.91879326]
       [ 0.1645498   0.15131476  0.1717498 ]]

      [[ 0.98696629  0.8102581   0.84696276]
       [ 0.12612661  0.45144896  0.34802173]
       [ 0.33667377  0.79371788  0.81511075]
       [ 0.81892789  0.41917167  0.81450135]]]
In: s.shape
Out: (4, 4, 3)
# Flatten the array:
In: s = asarray([e.flatten() for e in s])
In: s.shape
Out: (4, 12)
# Sort the rows using lexsort:
In: idx = np.lexsort(s.T)
In: s_sorted = s[idx]
# Create a mask to get unique rows
In: row_mask = np.append([True],np.any(np.diff(s_sorted,axis=0),1))
# Get unique rows:
In: out = s_sorted[row_mask]
# and count:
In: for e in out:
        count = (e == s).all(axis=1).sum()
        print e.reshape(4,3), count
Out:[[ 0.78284847  0.28883662  0.53369866]
     [ 0.48249722  0.02922249  0.0355066 ]
     [ 0.05346797  0.35640319  0.91879326]
     [ 0.1645498   0.15131476  0.1717498 ]] 2
    [[ 0.98696629  0.8102581   0.84696276]
     [ 0.12612661  0.45144896  0.34802173]
     [ 0.33667377  0.79371788  0.81511075]
     [ 0.81892789  0.41917167  0.81450135]] 2

2 Comments

are you using python 3 in the first example? Cause I get from a = r_[a,a] NameError: name 'r_' is not defined
@andandandand No I don't. It's my fault, I forgot the np just before r_ which is a numpy simple way to build up arrays quickly (see: docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html). I've just corrected my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.