0

I have a scenario where I have 2 sets of data:

  1. All possible products
  2. Products carried by a store

Dataset 2 is a subset of dataset 1.

If a user searches for a product that the store doesn't carry, I'd like to be able to return similar products from the search. The problem is that product names don't indicate what category/subcategory they belong in so right now I just get text-based search results.

So let's say a user searches for a self-help book by name, but the store doesn't carry it. I'd like to recommend other self-help books.

Conceptually, I'd be able to search dataset 1 to get all of the relevant information about the book and then use that to search dataset 2, but I'm having trouble approaching how I'd go about doing that.

Right now I have the datasets in OpenSearch and I'd like to use that as I intend to pass the search results through an Amazon Personalize campaign to re-rank the items for the user searching.

1
  • 1
    A "more like this" query can return similar documents but is purely text based. If you can vectorize your documents using some ML embeddings model you can do a knn query, and the chosen model might cluster self-help topics close to each other. But there's no real substitute for data quality, tagging all products with metadata that you can use for precise filtering. If you only have a few thousand SKUs, manual tagging is probably the way to go. Commented May 30, 2024 at 18:13

1 Answer 1

1

You are describing a need for a recommender system. There is a rich literature on this topic for you to consult, with many implementations.

You didn't mention data volumes, nor how much consumer behavior you're able to view. E-commerce search, browsing, and purchase behavior can inform a model about your demographic of interest.

say a user searches for a self-help book by name

Fortunately the special case of "book" has some helpful metadata associated with it. For a given SKU we are likely to know both Title and Author. A specific author will tend to write books that are thematically similar, inducing graph edges among their published works. In the same way two products in a checkout basket induce an edge suggesting small distance between the products, though it's harder to collect that information. Staff involved in producing music or a film represent the same kind of metadata.

Pin down your data and dollar budget, and try out a suitable implementation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.