refact : fix convert script + zero out KV cache to avoid nans #3523

ggerganov · 2023-10-07T08:24:36Z

Copied tokenization from convert-starcoder-hf-to-gguf.py
ALiBi is prone to random KV cache data, so we have to zero out the cache at the start. Since llama : custom attention mask + parallel decoding + no context swaps #3228, we can access uninitialized KV cache data due to:

llama.cpp/llama.cpp

Line 5024 in bdbe117

kv_self.n = std::min((int32_t) cparams.n_ctx, std::max(32, llama_kv_cache_cell_max(kv_self)));

If this data happens to contain nan, then the generation fails

Question: should we first mask the KV tensor and then apply ALiBi?

Lines 3763 to 3771 in bdbe117

    
           // KQ_masked = mask_past(KQ_scaled) 
        
           struct ggml_tensor * KQ_scaled_alibi = ggml_alibi(ctx0, KQ_scaled, /*n_past*/ 0, n_head, 8); 
        
           ggml_set_name(KQ_scaled_alibi, "KQ_scaled_alibi"); 
        
           struct ggml_tensor * KQ_masked = ggml_add(ctx0, KQ_scaled_alibi, KQ_mask); 
        
           offload_func_kq(KQ_masked); 
        
           ggml_set_name(KQ_masked, "KQ_masked");

If that were the case, then the above KV cache initialization wouldn't be needed since any uninitialized values will be masked with -INF

slaren · 2023-10-07T10:59:45Z

If that were the case, then the above KV cache initialization wouldn't be needed since any uninitialized values will be masked with -INF

But nan - INF is still nan, so I don't think that this would work for removing nans before alibi.

refact : fix convert script + zero out KV cache to avoid nans

bdbe117

ggerganov mentioned this pull request Oct 7, 2023

add refact model #3329

Merged

ggml : silu(-inf) should never happen

42833bc

martell mentioned this pull request Oct 8, 2023

model: refact-1_6B-fim unable to load model #3531

Closed

ggerganov added 2 commits October 8, 2023 11:04

metal : assert various kernel requirements

0f8df39

Merge branch 'master' into fix-refact

acead65

ggerganov added the need feedback Testing and feedback with results are needed label Oct 8, 2023

ggerganov merged commit fcca0a7 into master Oct 9, 2023
37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refact : fix convert script + zero out KV cache to avoid nans #3523

refact : fix convert script + zero out KV cache to avoid nans #3523

ggerganov commented Oct 7, 2023 •

edited

Loading

slaren commented Oct 7, 2023


	// KQ_masked = mask_past(KQ_scaled)
	struct ggml_tensor * KQ_scaled_alibi = ggml_alibi(ctx0, KQ_scaled, /n_past/ 0, n_head, 8);
	ggml_set_name(KQ_scaled_alibi, "KQ_scaled_alibi");

	struct ggml_tensor * KQ_masked = ggml_add(ctx0, KQ_scaled_alibi, KQ_mask);
	offload_func_kq(KQ_masked);
	ggml_set_name(KQ_masked, "KQ_masked");

refact : fix convert script + zero out KV cache to avoid nans #3523

refact : fix convert script + zero out KV cache to avoid nans #3523

Conversation

ggerganov commented Oct 7, 2023 • edited Loading

slaren commented Oct 7, 2023

ggerganov commented Oct 7, 2023 •

edited

Loading