I am making a small time test of the openai API, from my local internet connection and laptop, but I get times that are much larger than expected. With the following code:
import openai
import time
import tiktoken
OPENAI_KEY='xxx'
openai.api_key=OPENAI_KEY
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
def get_completion(prompt, model="gpt-3.5-turbo", temperature=0.5):
messages = [{"role": "user", "content": prompt}]
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=temperature, # this is the degree of randomness of the model's output
)
return response.choices[0].message["content"]
for i in range(5):
tic=time.time()
answer=get_completion('Hello. My name is Bob. I would like to know how can I have a good day')
print(answer)
delta=(time.time()-tic)*1000
tokens=len(encoding.encode(answer))
print('time: ', delta)
print('tokens: ', tokens)
print('milliseconds/token: ', delta/tokens)
I get typical times like:
time: 39341.29881858826 (milliseconds)
tokens: 292
milliseconds/token: 134.73047540612416
whereas using the standard chatGPT online version at https://chat.openai.com/ I am getting times around 18 sec (to compare with a typical value of around 40 secs and more with the API), for the same request/prompt and with a similar output token length, therefore the online version is at least twice as fast. These experiments use the same model.
My question is this: should I expect such a difference in time response between the model when used on the internet and with the API or am I making some mistake? I tried also considering node.js and I get similar timings (but the python code should be simpler to debug).
I searched on the internet but I did not get yet to the conclusion whether my test has some issue or not.
UPDATE: it does not seem to be related to my account, even if it would be great to have a confirmation from users with different API-keys.