Correct input for OpenAI embeddings API?

Ask Question

Asked 3 months ago

Modified 3 months ago

Viewed 59 times

Part of OpenAI Collective

I'm using the OpenAi text-embedding-3-small model to create embeddings for each product category in a file. In total it's about 6000 product categories and they look like this:

Vehicles & Parts > Vehicle Parts & Accessories > Vehicle Safety & Security > Off-Road & All-Terrain Vehicle Protective Gear
Vehicles & Parts > Vehicle Parts & Accessories > Vehicle Safety & Security > Off-Road & All-Terrain Vehicle Protective Gear > ATV & UTV Bar Pads
Vehicles & Parts > Vehicle Parts & Accessories > Vehicle Safety & Security > Vehicle Alarms & Locks
Vehicles & Parts > Vehicle Parts & Accessories > Vehicle Safety & Security > Vehicle Alarms & Locks > Automotive Alarm Accessories
Vehicles & Parts > Vehicle Parts & Accessories > Vehicle Safety & Security > Vehicle Alarms & Locks > Automotive Alarm Systems
Vehicles & Parts > Vehicle Parts & Accessories > Vehicle Safety & Security > Vehicle Alarms & Locks > Motorcycle Alarms & Locks

For each line in that file, I'm using the following code to generate an embedding:

from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    input="Vehicles & Parts > Vehicle Parts & Accessories > Vehicle Safety & Security > Vehicle Alarms & Locks",
    model="text-embedding-3-small",
    encoding_format="float",
    dimensions=512
)

I'm storing the embeddings in a vector database (Cosmos DB for MongoDB). I'm running a vector similarity search on the DB in order to help customers, to find the best possible category for their entered product title. The similarity search works very well, but sometimes I'm getting bad results. For example, when I search for "Pinus Sylvestris" which is the name of a plant, I'm getting an entirely wrong product category suggested.

My question: Is it OK, to pass the product category in that hierarchical representation (with > character) into the model? Is there a way, how I can tell the model, that this is a product category for an e-commerce website, so that it understands the input better?

Edit: Adding the query code:

from openai import OpenAI
from pymongo import MongoClient
import sys

MONGODB_CON_STR=XXXXXX

db = MongoClient(MONGODB_CON_STR)["shop"]
client = OpenAI()

def get_vector_for_text(input:str):

    response = client.embeddings.create(
        input=input,
        model="text-embedding-3-small",
        encoding_format="float",
        dimensions=512
    ) 
    return response.data[0].embedding

 
for line in sys.stdin:
    queryVector = get_vector_for_text(line)
    res = db["product_taxonomy"].aggregate([
    {
        "$search": {
        "cosmosSearch": {
            "vector": queryVector,
            "path": "vector",
            "k": 2
        },
        "returnStoredSource": True }},
    {
        "$project": { "similarityScore": {
            "$meta": "searchScore" },
                "document" : "$$ROOT"
            }
    }
    ]);
    
    while res.alive:
        for doc in res:
            print(f'\tsimilarityScore: {doc["similarityScore"]} {doc["document"]["text"]}')
        print('\n')

edited Aug 19 at 18:38

asked Aug 16 at 18:16

eztam

3,8737 gold badges40 silver badges58 bronze badges

You could try replacing > with / and add a tiny “role prefix” (gives the model context). such as E-commerce category path:

Ajeet Verma
– Ajeet Verma

2025-08-17 03:22:33 +00:00
Commented Aug 17 at 3:22
2

for example, Vehicles & Parts > Vehicle Parts & Accessories > Vehicle Safety & Security > Vehicle Alarms & Locks > Motorcycle Alarms & Locks to E-commerce category path: Vehicles & Parts/Vehicle Parts & Accessories/Vehicle Safety & Security/Vehicle Alarms & Locks/Motorcycle Alarms & Locks . This simple hint might reduce the weird matches

Ajeet Verma
– Ajeet Verma

2025-08-17 03:24:56 +00:00
Commented Aug 17 at 3:24
1

Could you please edit your question to include the relevant query code?

meysam
– meysam

2025-08-19 09:35:30 +00:00
Commented Aug 19 at 9:35
1

@meysam I've added the query code.

eztam
– eztam

2025-08-19 18:39:17 +00:00
Commented Aug 19 at 18:39

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Correct input for OpenAI embeddings API?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest