Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/server: "New UI" chat becomes slower with each subsequent message #7944

Closed
khimaros opened this issue Jun 14, 2024 · 1 comment
Closed
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) stale

Comments

@khimaros
Copy link
Contributor

What happened?

when using examples/server's "New UI", parts of the chat history seem to be re-evaluated (skipping the KV cache?) on each new message from the user. this is not the case with llama-cli or examples/server in the old UI mode with default settings/prompt.

this seems to be a common failure mode for third-party frontends to llama.cpp, maybe there is an issue with the API layer that is making this problem difficult for frontends to solve? #7185

Name and Version

version: 3151 (f8ec887)
built with cc (Debian 13.2.0-25) 13.2.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

INFO [           print_timings] prompt eval time     =     189.41 ms /     1 tokens (  189.41 ms per token,     5.28 tokens per second) | tid="140556433274816" timestamp=1718408696 id_slot=0 id_task=3534 t_prompt_processing=189.405 n_prompt_tokens_processed=1 t_token=189.405 n_tokens_second=5.2796916660067055

INFO [           print_timings] prompt eval time     =    2473.22 ms /    40 tokens (   61.83 ms per token,    16.17 tokens per second) | tid="140556433274816" timestamp=1718408717 id_slot=0 id_task=3564 t_prompt_processing=2473.219 n_prompt_tokens_processed=40 t_token=61.830475 n_tokens_second=16.173254370114414

INFO [           print_timings] prompt eval time     =    5231.45 ms /    83 tokens (   63.03 ms per token,    15.87 tokens per second) | tid="140556433274816" timestamp=1718408745 id_slot=0 id_task=3632 t_prompt_processing=5231.451 n_prompt_tokens_processed=83 t_token=63.02953012048193 n_tokens_second=15.865579167232953

INFO [           print_timings] prompt eval time     =    6692.69 ms /   105 tokens (   63.74 ms per token,    15.69 tokens per second) | tid="140556433274816" timestamp=1718408774 id_slot=0 id_task=3721 t_prompt_processing=6692.691 n_prompt_tokens_processed=105 t_token=63.739914285714285 n_tokens_second=15.688756585355577

INFO [           print_timings] prompt eval time     =    5536.72 ms /    90 tokens (   61.52 ms per token,    16.26 tokens per second) | tid="140556433274816" timestamp=1718408815 id_slot=0 id_task=3797 t_prompt_processing=5536.721 n_prompt_tokens_processed=90 t_token=61.519122222222215 n_tokens_second=16.255108393578077

INFO [           print_timings] prompt eval time     =    6353.86 ms /   106 tokens (   59.94 ms per token,    16.68 tokens per second) | tid="140556433274816" timestamp=1718408885 id_slot=0 id_task=3885 t_prompt_processing=6353.859 n_prompt_tokens_processed=106 t_token=59.942066037735856 n_tokens_second=16.68277498760989

INFO [           print_timings] prompt eval time     =    8704.61 ms /   134 tokens (   64.96 ms per token,    15.39 tokens per second) | tid="140556433274816" timestamp=1718408926 id_slot=0 id_task=4002 t_prompt_processing=8704.613 n_prompt_tokens_processed=134 t_token=64.95979850746268 n_tokens_second=15.3941364193905
@khimaros khimaros added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Jun 14, 2024
@github-actions github-actions bot added the stale label Jul 15, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) stale
Projects
None yet
Development

No branches or pull requests

1 participant