I'm working on merging company data from 3 or more different providers. I'm exploring an entity resolution approach using separate embeddings for name, location, and domain, stored in vector indexes. I'm considering Google Spanner as a database option, however I'm not sure if this is even possible. I know you can do individual searches, e.g. I have a vector for name, give me the 10 closest names to it. But I want to ideally e.g. have 400 million firms, run an algorithm, end up with 100 million resolved firms.
Can this be done on Google Spanner Graph?
I've tried similarity search as per this article https://cloud.google.com/spanner/docs/find-k-nearest-neighbors however I need to do some form of KNN and clustering to obtain the nearest neighbours for every entity.