I have a NodeJS program that connects to OpenAI's assistant API to create messages. I have followed this documentation from OpenAI to create the steps below:
- I have created an Assistant (gpt-4-1106-preview) and a thread in that Assistant that I'm accessing to interact with.
- Add a message to the thread. The message contains around 1000 tokens, checked via https://platform.openai.com/tokenizer
openai.beta.threads.messages.create(threadId, {
role: "user",
content: createMessage(),
});
- Run the assistant
await openai.beta.threads.runs.create(threadId, {
assistant_id: assistantId,
instructions:
"Please address the user as Mahesh. The user is an administrator.",
});
- Check the status. I'm running this every 5 seconds until the status is "completed"
await openai.beta.threads.runs.retrieve(threadId, runId);
- Get the last response from the Assistant
const messages = await openai.beta.threads.messages.list(threadId, {
limit: 1,
});
This code takes around 250,000 tokens to complete. The image shows today's token usage for three requests.

There could be multiple reasons why your cost of running an assistant is very high.
What OpenAI model do you use?
If you take a look at the official OpenAI documentation, you'll see that they use the
gpt-4-1106-previewmodel. They state:But older models might be good enough. It depends on what your assistant is used for. You can lower the cost of running the assistant just by changing the model. Of course, if you see that the performance of the assistant is considerably worse, then you need to use the latest models. Just take a look at the table below to see what a difference a model decision can make:
How long have you been using the same thread?
As stated in the official OpenAI documentation:
The tread is storing the message history! The
gpt-4-1106-previewhas a context window of128,000tokens. So, if you chat with your assistant using the same thread long enough, you will fill up the thread up to the context window of your chosen model.If you choose the
gpt-4-1106-previewthis means that after some time chatting with your assistant using the same thread, a single question you ask your assistant means that you used128,000tokens. Your recent question might contain1,000tokens, but you also need to keep in mind that hundreds of messages that were either asked by you or answered by the assistant in the past were also sent to the Assistants API.In your case, you can see that today you spent
760,564context tokens. You have probably been using the same thread for quite some time.How often do you check the run status?
You said that you check the run status to see if it has been moved to
completedevery 5 seconds. Try to adapt this to 10 seconds. You pay for every API call you make.