examples/server: "New UI" chat becomes slower with each subsequent message #7944

khimaros · 2024-06-14T23:50:15Z

What happened?

when using examples/server's "New UI", parts of the chat history seem to be re-evaluated (skipping the KV cache?) on each new message from the user. this is not the case with llama-cli or examples/server in the old UI mode with default settings/prompt.

this seems to be a common failure mode for third-party frontends to llama.cpp, maybe there is an issue with the API layer that is making this problem difficult for frontends to solve? #7185

Name and Version

version: 3151 (f8ec887)
built with cc (Debian 13.2.0-25) 13.2.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

INFO [           print_timings] prompt eval time     =     189.41 ms /     1 tokens (  189.41 ms per token,     5.28 tokens per second) | tid="140556433274816" timestamp=1718408696 id_slot=0 id_task=3534 t_prompt_processing=189.405 n_prompt_tokens_processed=1 t_token=189.405 n_tokens_second=5.2796916660067055

INFO [           print_timings] prompt eval time     =    2473.22 ms /    40 tokens (   61.83 ms per token,    16.17 tokens per second) | tid="140556433274816" timestamp=1718408717 id_slot=0 id_task=3564 t_prompt_processing=2473.219 n_prompt_tokens_processed=40 t_token=61.830475 n_tokens_second=16.173254370114414

INFO [           print_timings] prompt eval time     =    5231.45 ms /    83 tokens (   63.03 ms per token,    15.87 tokens per second) | tid="140556433274816" timestamp=1718408745 id_slot=0 id_task=3632 t_prompt_processing=5231.451 n_prompt_tokens_processed=83 t_token=63.02953012048193 n_tokens_second=15.865579167232953

INFO [           print_timings] prompt eval time     =    6692.69 ms /   105 tokens (   63.74 ms per token,    15.69 tokens per second) | tid="140556433274816" timestamp=1718408774 id_slot=0 id_task=3721 t_prompt_processing=6692.691 n_prompt_tokens_processed=105 t_token=63.739914285714285 n_tokens_second=15.688756585355577

INFO [           print_timings] prompt eval time     =    5536.72 ms /    90 tokens (   61.52 ms per token,    16.26 tokens per second) | tid="140556433274816" timestamp=1718408815 id_slot=0 id_task=3797 t_prompt_processing=5536.721 n_prompt_tokens_processed=90 t_token=61.519122222222215 n_tokens_second=16.255108393578077

INFO [           print_timings] prompt eval time     =    6353.86 ms /   106 tokens (   59.94 ms per token,    16.68 tokens per second) | tid="140556433274816" timestamp=1718408885 id_slot=0 id_task=3885 t_prompt_processing=6353.859 n_prompt_tokens_processed=106 t_token=59.942066037735856 n_tokens_second=16.68277498760989

INFO [           print_timings] prompt eval time     =    8704.61 ms /   134 tokens (   64.96 ms per token,    15.39 tokens per second) | tid="140556433274816" timestamp=1718408926 id_slot=0 id_task=4002 t_prompt_processing=8704.613 n_prompt_tokens_processed=134 t_token=64.95979850746268 n_tokens_second=15.3941364193905

The text was updated successfully, but these errors were encountered:

github-actions · 2024-07-29T01:06:51Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

khimaros added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Jun 14, 2024

github-actions bot added the stale label Jul 15, 2024

github-actions bot closed this as completed Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples/server: "New UI" chat becomes slower with each subsequent message #7944

examples/server: "New UI" chat becomes slower with each subsequent message #7944

khimaros commented Jun 14, 2024

github-actions bot commented Jul 29, 2024

examples/server: "New UI" chat becomes slower with each subsequent message #7944

examples/server: "New UI" chat becomes slower with each subsequent message #7944

Comments

khimaros commented Jun 14, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

github-actions bot commented Jul 29, 2024