0

I am making a small time test of the openai API, from my local internet connection and laptop, but I get times that are much larger than expected. With the following code:

import openai
import time
import tiktoken

OPENAI_KEY='xxx'

openai.api_key=OPENAI_KEY
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

def get_completion(prompt, model="gpt-3.5-turbo", temperature=0.5):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

for i in range(5):
    tic=time.time()
    answer=get_completion('Hello. My name is Bob. I would like to know how can I have a good day')
    print(answer)
    delta=(time.time()-tic)*1000
    tokens=len(encoding.encode(answer))
    print('time: ', delta)
    print('tokens: ', tokens)
    print('milliseconds/token: ', delta/tokens)

I get typical times like:

time:  39341.29881858826 (milliseconds)
tokens:  292
milliseconds/token:  134.73047540612416

whereas using the standard chatGPT online version at https://chat.openai.com/ I am getting times around 18 sec (to compare with a typical value of around 40 secs and more with the API), for the same request/prompt and with a similar output token length, therefore the online version is at least twice as fast. These experiments use the same model.

My question is this: should I expect such a difference in time response between the model when used on the internet and with the API or am I making some mistake? I tried also considering node.js and I get similar timings (but the python code should be simpler to debug).

I searched on the internet but I did not get yet to the conclusion whether my test has some issue or not.

UPDATE: it does not seem to be related to my account, even if it would be great to have a confirmation from users with different API-keys.

4
  • If I remember correctly, There is a difference between chatGPT latency and API latency, because, in chatGPT, the model begin to display the answer during the process, while the API wait all the answer and send it. Some information in openAI community : community.openai.com/t/… Commented Nov 13, 2023 at 17:55
  • Mmm... streaming is not the issue of my test I think. I compare the times of the api against the answer of chatGPT after the whole answer is printed. Commented Nov 13, 2023 at 18:47
  • "whereas using the standard chatGPT version in the internet" can you share the link to show what u are using Commented Nov 14, 2023 at 3:03
  • Thanks for the tip I updatet with the internet site Commented Nov 14, 2023 at 6:38

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.