Reponse time for GPT models via openAI API vs internet version

Ask Question

Asked 2 years ago

Modified 2 years ago

Viewed 686 times

Part of OpenAI Collective

I am making a small time test of the openai API, from my local internet connection and laptop, but I get times that are much larger than expected. With the following code:

import openai
import time
import tiktoken

OPENAI_KEY='xxx'

openai.api_key=OPENAI_KEY
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

def get_completion(prompt, model="gpt-3.5-turbo", temperature=0.5):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

for i in range(5):
    tic=time.time()
    answer=get_completion('Hello. My name is Bob. I would like to know how can I have a good day')
    print(answer)
    delta=(time.time()-tic)*1000
    tokens=len(encoding.encode(answer))
    print('time: ', delta)
    print('tokens: ', tokens)
    print('milliseconds/token: ', delta/tokens)

I get typical times like:

time:  39341.29881858826 (milliseconds)
tokens:  292
milliseconds/token:  134.73047540612416

whereas using the standard chatGPT online version at https://chat.openai.com/ I am getting times around 18 sec (to compare with a typical value of around 40 secs and more with the API), for the same request/prompt and with a similar output token length, therefore the online version is at least twice as fast. These experiments use the same model.

My question is this: should I expect such a difference in time response between the model when used on the internet and with the API or am I making some mistake? I tried also considering node.js and I get similar timings (but the python code should be simpler to debug).

I searched on the internet but I did not get yet to the conclusion whether my test has some issue or not.

UPDATE: it does not seem to be related to my account, even if it would be great to have a confirmation from users with different API-keys.

edited Nov 14, 2023 at 10:28

asked Nov 13, 2023 at 17:41

Thomas

3314 silver badges23 bronze badges

If I remember correctly, There is a difference between chatGPT latency and API latency, because, in chatGPT, the model begin to display the answer during the process, while the API wait all the answer and send it. Some information in openAI community : community.openai.com/t/…

Karine Bauch
– Karine Bauch

2023-11-13 17:55:14 +00:00
Commented Nov 13, 2023 at 17:55
Mmm... streaming is not the issue of my test I think. I compare the times of the api against the answer of chatGPT after the whole answer is printed.

Thomas
– Thomas

2023-11-13 18:47:43 +00:00
Commented Nov 13, 2023 at 18:47
"whereas using the standard chatGPT version in the internet" can you share the link to show what u are using

Yilmaz
– Yilmaz

2023-11-14 03:03:59 +00:00
Commented Nov 14, 2023 at 3:03
Thanks for the tip I updatet with the internet site

Thomas
– Thomas

2023-11-14 06:38:45 +00:00
Commented Nov 14, 2023 at 6:38

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Reponse time for GPT models via openAI API vs internet version

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest