finding identical rows and columns in a numpy array

Question

I have a bolean array of nxn elements and I want to check if any row is identical to another.If there are any identical rows, I want to check if the corresponding columns are also identical.

Here is an example:

A=np.array([[0, 1, 0, 0, 0, 1],
            [0, 0, 0, 1, 0, 1],
            [0, 1, 0, 0, 0, 1],
            [1, 0, 1, 0, 1, 1],
            [1, 1, 1, 0, 0, 0],
            [0, 1, 0, 1, 0, 1]])

I would like the program to find that the first and the third row are identical, and then check if the first and the third columns are also identical; which in this case they are.

Is performance important for you?

wim
– wim

2014-08-27 17:04:46 +00:00
Commented Aug 27, 2014 at 17:04 — wim
– wim, Commented Aug 27, 2014 at 17:04
not too much, since the arrays are small

cgog
– cgog

2014-08-27 17:07:02 +00:00
Commented Aug 27, 2014 at 17:07 — cgog
– cgog, Commented Aug 27, 2014 at 17:07

vvvvv · Accepted Answer · 2021-03-07 11:31:23Z

4

You can use np.array_equal():

for i in range(len(A)):  # generate pairs
    for j in range(i + 1, len(A)): 
        if np.array_equal(A[i], A[j]):  # compare rows
            if np.array_equal(A[:,i], A[:,j]):  # compare columns
                print(i, j)
        else:
            pass

or using combinations():

import itertools

for pair in itertools.combinations(range(len(A)), 2):
    if np.array_equal(A[pair[0]], A[pair[1]]) and np.array_equal(A[:,pair[0]], A[:,pair[1]]):  # compare columns
        print(pair)

edited Mar 7, 2021 at 11:31

vvvvv

32.9k19 gold badges70 silver badges103 bronze badges

answered Aug 27, 2014 at 18:40

Esther Martinez

663 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Daniel · Accepted Answer · 2014-08-27 19:11:29Z

Starting with the typical way to apply np.unique to 2D arrays and have it return unique pairs:

def unique_pairs(arr):
    uview = np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1])))
    uvals, uidx = np.unique(uview, return_inverse=True)
    pos = np.where(np.bincount(uidx) == 2)[0]

    pairs = []
    for p in pos:
        pairs.append(np.where(uidx==p)[0])

    return np.array(pairs)

We can then do the following:

row_pairs = unique_pairs(A)
col_pairs = unique_pairs(A.T)

for pair in row_pairs:
    if np.any(np.all(pair==col_pairs, axis=1)):
        print pair

>>> [0 2]

Of course there is quite a few optimizations left to do, but the main point is using np.unique. The efficiency on this method compared to others depends heavily on how you define "small" arrays.

wim · Accepted Answer · 2014-08-27 18:50:07Z

1

Since you said performance is not critical, here is a not-very-numpythonic brute force solution:

>>> n = len(A)
>>> for i1, row1 in enumerate(A):
...     offset = i1 + 1  # skip rows already compared 
...     for i2, row2 in enumerate(A[offset:], start=offset):
...         if (row1 == row2).all() and (A.T[i1] == A.T[i2]).all():
...             print i1, i2
...             
0 2

It's probably O(n^2). I use the transposed array A.T to check columns also equal.

answered Aug 27, 2014 at 18:50

wim

368k113 gold badges681 silver badges816 bronze badges

Comments

Connor · Accepted Answer · 2021-04-08 07:28:18Z

For small arrays, an alternative approach without relying on Python loops is via NumPy broadcasting.

bool_array = np.logical_not(np.logical_xor(A[:,np.newaxis,:], A[np.newaxis,:,:])) # XNOR for comparison
matches_array = np.sum(bool_array, axis=2)  # count total matches for all elements in a row
row1, row2 = np.where(matches_array == A.shape[1]) # identical row = all elements in a row match
row1, row2 = row1[row2 > row1], row2[row2 > row1]  # filter self & duplicated comparisons
column_match = np.all(A[:,row1] == A[:,row2], axis=0)  # check if the corresponding columns are identical
for r1, r2, c in zip(row1, row2, column_match):
    print("Row %d and row %d : Column identical: %s" % (r1, r2, c))

As mentioned earlier, this method would not work when A gets large, since it requires O(n^3) memory storage during calculation (due to bool_array)

Collectives™ on Stack Overflow

finding identical rows and columns in a numpy array

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related