How to calculate number of duplicates in a list of numpy arrays?

Question

I have a list of arrays (permutations) that I need to clean up. This is how my current list looks like:

>>>permutations
[array([1, 2, 6, 7]),
 array([1, 2, 6, 7]),
 array([1, 2, 6, 7]),
 array([1, 2, 3, 8]),
 array([1, 2, 3, 8]),
 array([1, 2, 3, 8]),
 array([2, 3, 4, 9]),
 array([2, 3, 4, 9]),
 array([2, 3, 4, 9]),
 array([ 3,  4,  5, 10]),
 array([ 3,  4,  5, 10]),
 array([ 3,  4,  5, 10]),
 array([ 4,  5,  6, 11]),
 array([ 4,  5,  6, 11]),
 array([ 4,  5,  6, 11]),
 array([ 1,  5,  6, 12]),
 array([ 1,  5,  6, 12]),
 array([ 1,  5,  6, 12])]

What I want:

>>>neat_perm
[(array([1, 2, 6, 7]),3), 
(array([1, 2, 3, 8]),3),
(array([2, 3, 4, 9]),3)
(array([3, 4, 5, 10]), 3),
(array([4, 5, 6, 11]), 3),
(array([1, 5, 6, 12]), 3)]

What I want to do, is to create a list of tuples, where the first element of the tuple is the array, and the second element of the tuple is the number of times it was repeated in permutations.

The straightforward, brute force method is to do the O(n^2) np.array_equal over the array to ensure there are no repeats. The problem is the algorithmic complexity. list(set(permutations)), permutations.count() does not work as np arrays are not hashable.

I would appreciate any advice you have for me, to get this more efficient, in terms of lines of code required, or time/memory complexity!

sacuL · Accepted Answer · 2021-04-23 17:35:51Z

6

One solution is to use np.unique() with return_counts = True, and zip the resulting unique arrays and their counts:

from numpy import array
import numpy as np

permutations = [array([1, 2, 6, 7]),
         array([1, 2, 6, 7]),
         array([1, 2, 6, 7]),
         array([1, 2, 3, 8]),
         array([1, 2, 3, 8]),
         array([1, 2, 3, 8]),
         array([2, 3, 4, 9]),
         array([2, 3, 4, 9]),
         array([2, 3, 4, 9]),
         array([ 3,  4,  5, 10]),
         array([ 3,  4,  5, 10]),
         array([ 3,  4,  5, 10]),
         array([ 4,  5,  6, 11]),
         array([ 4,  5,  6, 11]),
         array([ 4,  5,  6, 11]),
         array([ 1,  5,  6, 12]),
         array([ 1,  5,  6, 12]),
         array([ 1,  5,  6, 12])]


>>> list(zip(*np.unique(permutations, return_counts = True, axis = 0)))

[(array([1, 2, 3, 8]), 3),
 (array([1, 2, 6, 7]), 3),
 (array([ 1,  5,  6, 12]), 3),
 (array([2, 3, 4, 9]), 3),
 (array([ 3,  4,  5, 10]), 3),
 (array([ 4,  5,  6, 11]), 3)]

edited Apr 23, 2021 at 17:35

answered Apr 23, 2021 at 16:27

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

bad_chemist Over a year ago

Thank you for your answer @sacuL! what is the "*" behind np.unique do?

sacuL Over a year ago

np.unique with return_counts = True returns a tuple of the arrays and their counts, so the * operator unpacks it into two separate arrays, which can then be zipped together

Collectives™ on Stack Overflow

How to calculate number of duplicates in a list of numpy arrays?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related