Skip to content
This repository has been archived by the owner on Sep 19, 2024. It is now read-only.

/ask Priority - Token Optimization #807

Open
0x4007 opened this issue Sep 23, 2023 · 4 comments
Open

/ask Priority - Token Optimization #807

0x4007 opened this issue Sep 23, 2023 · 4 comments

Comments

@0x4007
Copy link
Member

0x4007 commented Sep 23, 2023

! Error: This model's maximum context length is 16385 tokens. However, you requested 20407 tokens (4023 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

@Keyrxng time for compression/prioritization? Not a great first real world attempt lol.

Prioritization order:

  1. Current issue specification
  2. Linked issue specification (in order of linked, the first link taking higher priority than the next link)
  3. Current issue conversation
  4. Linked issue conversations (same ordering system)

We should use a tokenization estimator to know how much we should exclude.

Originally posted by @pavlovcik in #787 (comment)


It should also include a warning that it had to cut out some content. Perhaps even including the exact tokens used etc similar to the information that was presented in the error message above for context to the user to approximate how much was cut off.

@Keyrxng
Copy link
Member

Keyrxng commented Sep 23, 2023

you requested 20407 tokens (4023 in the messages, 16384 in the completion)

Token Limit is equivalent to GPT output, whatever you set to be the token limit is the maximum that GPT will respond with but that also has to include the input, bbut we determine the input so we can't set that really.

The py package tiktoken is the best tokenization optimization package, there is a ts wrapper for it otherwise it'll be a case of using langchain and creating our own textSplitters and basing our input tokens on that which will be a rough but close estimate

@Keyrxng
Copy link
Member

Keyrxng commented Sep 23, 2023

This issue is a non-starter really my friend as it was user error this time around but i'll still take the bounty lmao ;))

@0x4007
Copy link
Member Author

0x4007 commented Sep 23, 2023

I'll wait until we get some real world use cases functional before we optimize

@Keyrxng
Copy link
Member

Keyrxng commented Oct 2, 2023

a crude workaround could be that if the response from gpt is an error message stating the token count and how much we are over by we can make an educated guess as to how many chars to strip from the context in order to meet the token limit set?

Another could be to use langchain to interact with openai and allow for -1 to be passed in for max_tokens

this.llm = new OpenAI({
            openAIApiKey: this.apiKey,
            modelName: 'gpt-3.5-turbo-16k',
            maxTokens: -1,
        })

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants