I'm working on a chatbot that answers based on a department store's SQL database, and I need help.
The database looks like this:
If the user asks something like this:
The chatbot should answer like this:
To achieve this chatbot response, I'm thinking do a SQL query like
SELECT *
FROM products
WHERE title LIKE '%guitar%'
ORDER BY average_rating DESC
LIMIT 5;
Then send that to the LLM to paraphrase it.
And if it's a question like this:
The chatbot's response should be:
I'm thinking of creating a vector database out of the data, then do a vector search, then send that to the LLM to paraphrase.
But if it's a question like:
User: "Of all the thin guitar strings, what are the top 5 highest rated?"
The chatbot's response should be something like:
Chatbot:
- Guitar string A is the highest rated. It is thin blah blah...
- Guitar string B is the second highest rated....
etc...
I can either do a vector search for the top 5 highest rated guitar strings that are thin, get the top 1000 or so results' ids, then do a
SELECT *
FROM products
WHERE id = <id 1 from vector search>
OR id = <id 2 from vector search>
...
OR id = <id 1000 from vector search>
ORDER BY average_rating DESC
LIMIT 5;
Then send that to the LLM to paraphrase. But the problem is, this doesn't guarantee the top 5 thin guitar strings will be included in the vector search in the first step.
Alternatively, I can query the SQL first where title or description is like 'thin guitar', order by rating, limit 5, but then what if the description said 'thinnest guitar', then it wouldn't find it. Then send that to the LLM, not using a vector search at all.
Any suggestions?




