Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: increased max_tokens to avoid truncating responses #87

Conversation

gentlementlegen
Copy link
Member

@gentlementlegen gentlementlegen commented Aug 20, 2024

@gentlementlegen gentlementlegen marked this pull request as ready for review August 20, 2024 01:57
@gentlementlegen gentlementlegen marked this pull request as draft August 20, 2024 02:00
@gentlementlegen gentlementlegen marked this pull request as ready for review August 20, 2024 02:05
async _evaluateComments(
specification: string,
comments: { id: number; comment: string }[]
): Promise<RelevancesByOpenAi> {
const prompt = this._generatePrompt(specification, comments);
const maxTokens = await this._calculateMaxTokens(prompt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to make sure to also include a buffer on top because the amount of tokens that you're sending in is always less than what's being sent out.

What's being sent out has additional tokens that counts against the quota of the whole transaction. Basically the json response itself costs additional tokens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say include a buffer on top, it would mean make max_tokens actually smaller than it is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding your code correctly, this reminds me of my first go-around at the implementation. If I recall correctly: I counted the tokens of what I was sending out. However, when you tell ChatGPT that we are using 1000 tokens max, when it replies, its response also counts against the token limit. We can estimate how many tokens will be added to its response, but I don't recall a way to know exactly how much.

Remember that billing works for what comes out as well, not just what goes in, which is why these limits make sense for most use cases for ChatGPT. You will need to estimate how many extra tokens will be included in the output, possibly by counting the amount of comments and tokenizing a dummy array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked v1 and it seems to be a fixed number that is within the configuration:
https://github.com/ubiquity/ubiquibot/blob/626cae8fcf5b60ec1ccb49df8dd2f250df046ff9/src/types/configuration-types.ts#L117
From what I understand about max_tokens is that the model supports up to 4096 tokens, which indeed are from the prompt + response. So the bigger the prompt, the smaller the response available tokens are.

So calculation is: 4096 - prompt_tokens = max_tokens.

Technically, this argument can be omitted and would default to max all the time.

https://community.openai.com/t/can-i-set-max-tokens-for-chatgpt-turbo/81207

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked v1 and it seems to be a fixed number that is within the configuration: https://github.com/ubiquity/ubiquibot/blob/626cae8fcf5b60ec1ccb49df8dd2f250df046ff9/src/types/configuration-types.ts#L117 From what I understand about max_tokens is that the model supports up to 4096 tokens, which indeed are from the prompt + response. So the bigger the prompt, the smaller the response available tokens are.

So calculation is: 4096 - prompt_tokens = max_tokens.

Technically, this argument can be omitted and would default to max all the time.

https://community.openai.com/t/can-i-set-max-tokens-for-chatgpt-turbo/81207

My concern is briefly discussed here. It has been a long time since I worked on this problem, and the models are always evolving so I am not confident in my knowledge. However if we set max length, perhaps the model will try to pad the reply to fill the full response. I would be most confident if we pre-calculate the token length by encoding a dummy array, unless there is definitive proof otherwise that setting max length will not cause problems.

However, when you tell ChatGPT that we are using 1000 tokens max, when it replies, its response also counts against the token limit.

I realize I didn't quite finish explaining clearly, but we start with 1000 in this example, and the result might be 1200 (200 in the response.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we undershoot the result will be truncated and error will occur again. Should we just omit this parameter so it is always maxed? I do not know how the padding occurs either, and I do not know what is the best solution. According to ChatGpt itself this is the way to calculate max-tokens.

Copy link
Member

@0x4007 0x4007 Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generate a dummy response and encode it to estimate the length of the response. I am not sure how many max significant digits exist per element in the array, but you should estimate with a worse case scenario.

@gentlementlegen gentlementlegen marked this pull request as draft August 20, 2024 07:54
@gentlementlegen
Copy link
Member Author

@gentlementlegen gentlementlegen marked this pull request as ready for review August 20, 2024 15:21
Copy link
Member

@0x4007 0x4007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely use gpt-4o-2024-08-06

src/parser/content-evaluator-module.ts Outdated Show resolved Hide resolved
src/parser/content-evaluator-module.ts Outdated Show resolved Hide resolved
@gentlementlegen gentlementlegen merged commit 31cb560 into ubiquity-os-marketplace:development Aug 20, 2024
6 checks passed
@gentlementlegen gentlementlegen deleted the fix/truncated-response branch August 20, 2024 16:34
@Keyrxng
Copy link
Member

Keyrxng commented Aug 20, 2024

Definitely use gpt-4o-2024-08-06

I'm certain that the way they operate is the base model version is always the most up to date. I.E gpt-4o is whatever the latest version is. This was the case when I was working on AI tasks a while back and the docs confirm this is still the case but it's been a while and this may be outdated.

Continuous model upgrades
gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, and gpt-3.5-turbo point to their respective latest model version.

@gentlementlegen
Copy link
Member Author

@Keyrxng I think this is true only if you update the package as well.

@Keyrxng
Copy link
Member

Keyrxng commented Aug 21, 2024

@Keyrxng I think this is true only if you update the package as well.

I was going to also suggest using Latest for the package but that might be too dangerous I feel they move pretty quickly over there and break things often

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected end of JSON input error
3 participants