You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When editing the beginning of a long file, prompt evaluation takes a lot of time.
Reason for that - in Additional context
Currently we send similar amount of lines from top and bottom. I believe that we have reasons to make the bottom part smaller:
It takes a long time to reevaluate bottom lines
Bottom lines often aren't as important (IMO). This way we can have more context window left for top lines.
Describe the solution you'd like
I want to have separated options for Context Length for 'before' and 'after'.
Describe alternatives you've considered
Or maybe leave current Twinny: Context Length as is, but add optional override for bottom lines.
Additional context
For context:
AFAIK (this is mostly based on my assumptions), llama.cpp doesn't have to reevaluate prefix part of prompt that haven't changed since last generation. But the moment it encounters a change - it will start reevaluating everything after that change.
So when we have 2 requests in a row with prompts:
It won't have to spend time on evaluating import numpy.
However, it will still have to run everything after <|fim▁hole|> (because it only checks for prefix in prompt).
(Example of llama.cpp output (not for this exact case): Llama.generate: 2978 prefix-match hit, remaining 8 prompt tokens to eval)
The text was updated successfully, but these errors were encountered:
AndrewRocky
changed the title
Add option to have a different number of lines before and after the current line in FIM prompt
Separate options for amount of lines 'before' and 'after' the current line in FIM prompts
Aug 26, 2024
Is your feature request related to a problem? Please describe.
When editing the beginning of a long file, prompt evaluation takes a lot of time.
Reason for that - in
Additional context
Currently we send similar amount of lines from top and bottom. I believe that we have reasons to make the bottom part smaller:
Describe the solution you'd like
I want to have separated options for Context Length for 'before' and 'after'.
Describe alternatives you've considered
Or maybe leave current
Twinny: Context Length
as is, but add optional override for bottom lines.Additional context
For context:
AFAIK (this is mostly based on my assumptions), llama.cpp doesn't have to reevaluate prefix part of prompt that haven't changed since last generation. But the moment it encounters a change - it will start reevaluating everything after that change.
So when we have 2 requests in a row with prompts:
It won't have to spend time on evaluating
import numpy
.However, it will still have to run everything after
<|fim▁hole|>
(because it only checks for prefix in prompt).(Example of llama.cpp output (not for this exact case):
Llama.generate: 2978 prefix-match hit, remaining 8 prompt tokens to eval
)The text was updated successfully, but these errors were encountered: