Clustering without all pairwise distances

I have a set of binarized images containing forms, each image follows one of N layouts. There are a few outliers which do not follow a layout and contain random text and images.

The distance between two images can be calculated, as the number of intersecting black pixels. High overlap means the images are more likely to depict the same form.

Are there any algorithms that can cluster the images without computing all pairwise distances, i.e. iteratively or online? I would like to cluster the images by the forms used in each image. Outliers should be detected and not end up within any cluster.

Ideally in Python, using scipy.

asked 2 days ago

sebastian

1,8081 gold badge18 silver badges24 bronze badges

1

The first thing that comes to mind is kmeans. That compares all elements to all cluster centroids. If the number of clusters is much smaller than the number of elements, that can be a lot faster. SciPy technically has a kmeans implementation, but I would really steer you toward sklearn's kmeans implementation.

Nick ODell
– Nick ODell

2025-11-20 02:10:56 +00:00
Commented 2 days ago
1

The other idea I would suggest is perceptual hashes. If you can reduce the amount of data inside an image to 64 bits, then comparing all images against all other images is not so bad.

Nick ODell
– Nick ODell

2025-11-20 02:12:55 +00:00
Commented 2 days ago
Thank you @NickODell, I will give it a try. Perceptual hashes sound interesting!

sebastian
– sebastian

2025-11-20 07:05:04 +00:00
Commented 2 days ago

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Clustering without all pairwise distances

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest