Finding related items in OpenSearch between 2 datasets

Question

I have a scenario where I have 2 sets of data:

All possible products
Products carried by a store

Dataset 2 is a subset of dataset 1.

If a user searches for a product that the store doesn't carry, I'd like to be able to return similar products from the search. The problem is that product names don't indicate what category/subcategory they belong in so right now I just get text-based search results.

So let's say a user searches for a self-help book by name, but the store doesn't carry it. I'd like to recommend other self-help books.

Conceptually, I'd be able to search dataset 1 to get all of the relevant information about the book and then use that to search dataset 2, but I'm having trouble approaching how I'd go about doing that.

Right now I have the datasets in OpenSearch and I'd like to use that as I intend to pass the search results through an Amazon Personalize campaign to re-rank the items for the user searching.

A "more like this" query can return similar documents but is purely text based. If you can vectorize your documents using some ML embeddings model you can do a knn query, and the chosen model might cluster self-help topics close to each other. But there's no real substitute for data quality, tagging all products with metadata that you can use for precise filtering. If you only have a few thousand SKUs, manual tagging is probably the way to go. — amon
– amon, Commented May 30, 2024 at 18:13

J_H · Accepted Answer · 2024-05-30 19:57:43Z

You are describing a need for a recommender system. There is a rich literature on this topic for you to consult, with many implementations.

You didn't mention data volumes, nor how much consumer behavior you're able to view. E-commerce search, browsing, and purchase behavior can inform a model about your demographic of interest.

say a user searches for a self-help book by name

Fortunately the special case of "book" has some helpful metadata associated with it. For a given SKU we are likely to know both Title and Author. A specific author will tend to write books that are thematically similar, inducing graph edges among their published works. In the same way two products in a checkout basket induce an edge suggesting small distance between the products, though it's harder to collect that information. Staff involved in producing music or a film represent the same kind of metadata.

Pin down your data and dollar budget, and try out a suitable implementation.

Stack Exchange Network

Finding related items in OpenSearch between 2 datasets

1 Answer 1

Your Answer

Hot Network Questions

Finding related items in OpenSearch between 2 datasets

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions