-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: increased max_tokens to avoid truncating responses #87
fix: increased max_tokens to avoid truncating responses #87
Conversation
async _evaluateComments( | ||
specification: string, | ||
comments: { id: number; comment: string }[] | ||
): Promise<RelevancesByOpenAi> { | ||
const prompt = this._generatePrompt(specification, comments); | ||
const maxTokens = await this._calculateMaxTokens(prompt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to make sure to also include a buffer on top because the amount of tokens that you're sending in is always less than what's being sent out.
What's being sent out has additional tokens that counts against the quota of the whole transaction. Basically the json response itself costs additional tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say include a buffer on top, it would mean make max_tokens
actually smaller than it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am understanding your code correctly, this reminds me of my first go-around at the implementation. If I recall correctly: I counted the tokens of what I was sending out. However, when you tell ChatGPT that we are using 1000 tokens max, when it replies, its response also counts against the token limit. We can estimate how many tokens will be added to its response, but I don't recall a way to know exactly how much.
Remember that billing works for what comes out as well, not just what goes in, which is why these limits make sense for most use cases for ChatGPT. You will need to estimate how many extra tokens will be included in the output, possibly by counting the amount of comments and tokenizing a dummy array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked v1 and it seems to be a fixed number that is within the configuration:
https://github.com/ubiquity/ubiquibot/blob/626cae8fcf5b60ec1ccb49df8dd2f250df046ff9/src/types/configuration-types.ts#L117
From what I understand about max_tokens
is that the model supports up to 4096
tokens, which indeed are from the prompt + response. So the bigger the prompt, the smaller the response available tokens are.
So calculation is: 4096 - prompt_tokens = max_tokens
.
Technically, this argument can be omitted and would default to max all the time.
https://community.openai.com/t/can-i-set-max-tokens-for-chatgpt-turbo/81207
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked v1 and it seems to be a fixed number that is within the configuration: https://github.com/ubiquity/ubiquibot/blob/626cae8fcf5b60ec1ccb49df8dd2f250df046ff9/src/types/configuration-types.ts#L117 From what I understand about
max_tokens
is that the model supports up to4096
tokens, which indeed are from the prompt + response. So the bigger the prompt, the smaller the response available tokens are.So calculation is:
4096 - prompt_tokens = max_tokens
.Technically, this argument can be omitted and would default to max all the time.
https://community.openai.com/t/can-i-set-max-tokens-for-chatgpt-turbo/81207
My concern is briefly discussed here. It has been a long time since I worked on this problem, and the models are always evolving so I am not confident in my knowledge. However if we set max length, perhaps the model will try to pad the reply to fill the full response. I would be most confident if we pre-calculate the token length by encoding a dummy array, unless there is definitive proof otherwise that setting max length will not cause problems.
However, when you tell ChatGPT that we are using 1000 tokens max, when it replies, its response also counts against the token limit.
I realize I didn't quite finish explaining clearly, but we start with 1000 in this example, and the result might be 1200 (200 in the response.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we undershoot the result will be truncated and error will occur again. Should we just omit this parameter so it is always maxed? I do not know how the padding occurs either, and I do not know what is the best solution. According to ChatGpt itself this is the way to calculate max-tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generate a dummy response and encode it to estimate the length of the response. I am not sure how many max significant digits exist per element in the array, but you should estimate with a worse case scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely use gpt-4o-2024-08-06
31cb560
into
ubiquity-os-marketplace:development
I'm certain that the way they operate is the base model version is always the most up to date. I.E
|
@Keyrxng I think this is true only if you update the package as well. |
I was going to also suggest using |
Resolves #84
QA:
ubiquibot/sandbox#21 (comment)
ubiquity/cloudflare-deploy-action#6 (comment)