RMSE-optimized quants for all quantization types #1106

ikawrakow · 2023-04-21T15:29:58Z

The PR adds a new build option (LLAMA_NO_RMSE), which is off by default. When off, all current quantization types (Q4_0, Q4_1, Q4_2, Q4_3) are performed with RMSE minimization (on master RMSE minimization is enabled for Q4_2 only and cannot easily be disabled).

This makes generation of quantized models quite a bit longer, but still in the same ballpark as it used to take before it was multi-threaded in PR #1075.

With this option enabled, Q4_3 gives a perplexity of 6.0344 for the 7B model, so 0.0273 lower than simple Q4_3 quantization as reported by @ggerganov in #406. If I also enable his trick of not quantizing output tensors, perplexity becomes 6.0085.

Perplexity result for Q4_3 without quantization of output tensors for the 13B model is 5.3117.

Details for these perplexity runs can be found in here (issue #406)

As far as I can tell, we are now on par with best known GPTQ result for 7B, and better for 13B by about 0.05.

Green-Sky · 2023-04-21T16:02:47Z

sounds like a good idea. for me personally io is the bottleneck, since i store them on a NAS.

sw · 2023-04-21T17:39:10Z

It might be a good idea to get #953 merged first, which implements unit tests for the quantization. But that requires an improvement to the test samples.

sw

Please update SHA256SUMS, at the very least remove the files which are now different.

sw · 2023-04-21T17:52:16Z

ggml.c

+    float scale = sumlx/suml2;
+    return scale;
+}
+static float kquantize_q4_with_bound_plus(int n, int nmax, const float * restrict X, int nCandidates,


What does _plus mean?

Couldn't you re-use kquantize_q4_with_bounds with nmin=0?

sw · 2023-04-22T08:38:57Z

I'm still a bit skeptical if chasing after RMSE is the right thing to do.

Let me explain what I mean: originally the Q4 methods calculate max(abs()) and divide that by 7.
#729 intends to calculate the signed max, then divide by 8. This PR tries to find the divisor for minimum RMS error. But maybe the princess is in another castle?

What if it actually helps perplexity if we clip the largest values somewhat, even if that comes at a higher RMS error?

   ^
p  |
e  |
r  | *
p  |     orig                        *
l  |     *      #729                  
e  |            *              *
x  | - - - - - - - - - - - - - - - - < RMSE optimum #1106
i  |                           
t  |                   *             < perplexity optimum?
y  |
   +-----|------|------|-------------> 
         7      8      ?
             scale factor

So the approach to find that would be use #729, choose a value in the interesting range of maybe [7,11], quantize the model, do a perplexity run, lather, rinse, repeat.

ggerganov · 2023-04-22T14:05:18Z

@ikawrakow

Just made a full cuBLAS run on 13B using Q4_3, without RMSE optimization and output in F16 precision and got: 5.3075

main: seed = 1682170268
llama.cpp: loading model from ../models/13B/ggml-model-q4_3-output-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 6 (mostly Q4_3)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 9734493.73 KB
llama_model_load_internal: mem required  = 11554.34 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.93 seconds per pass - ETA 32 minutes
[1]3.7052,[2]4.1553,[3]4.9530,[4]5.3817,[5]5.5598,[6]5.4938,[7]5.6338,[8]5.7492,[9]6.0136,[10]6.2525,[11]6.4388,[12]6.4983,[13]6.4590,[14]6.5567,[15]6.7657,[16]6.4420,[17]6.3526,[18]6.3318,[19]6.0375,[20]6.0170,[21]5.9417,[22]5.7639,[23]5.7352,[24]5.6400,[25]5.6548,[26]5.5023,[27]5.3302,[28]5.2330,[29]5.1565,[30]5.0200,[31]4.9747,[32]4.9854,[33]4.9409,[34]4.9796,[35]4.9984,[36]5.0189,[37]5.0113,[38]5.0078,[39]5.0349,[40]5.0774,[41]5.0999,[42]5.1325,[43]5.0970,[44]5.1402,[45]5.1450,[46]5.1202,[47]5.1464,[48]5.1286,[49]5.1304,[50]5.0999,[51]5.1075,[52]5.1012,[53]5.1478,[54]5.1379,[55]5.1200,[56]5.1404,[57]5.1594,[58]5.1818,[59]5.2003,[60]5.2387,[61]5.2315,[62]5.2862,[63]5.3117,[64]5.3227,[65]5.3586,[66]5.3594,[67]5.3771,[68]5.3901,[69]5.4182,[70]5.4484,[71]5.4717,[72]5.5064,[73]5.5534,[74]5.5610,[75]5.5703,[76]5.5838,[77]5.5960,[78]5.5827,[79]5.6087,[80]5.6043,[81]5.6133,[82]5.6107,[83]5.5655,[84]5.5553,[85]5.5483,[86]5.5331,[87]5.4686,[88]5.4265,[89]5.4044,[90]5.3939,[91]5.4152,[92]5.4128,[93]5.4153,[94]5.4153,[95]5.4412,[96]5.4383,[97]5.4336,[98]5.4300,[99]5.4225,[100]5.4204,[101]5.4440,[102]5.4397,[103]5.4550,[104]5.4598,[105]5.4610,[106]5.4753,[107]5.4745,[108]5.4894,[109]5.4882,[110]5.4833,[111]5.5022,[112]5.5191,[113]5.5182,[114]5.5175,[115]5.5215,[116]5.5093,[117]5.5097,[118]5.5330,[119]5.5514,[120]5.5800,[121]5.5945,[122]5.6158,[123]5.6525,[124]5.6684,[125]5.6634,[126]5.6990,[127]5.7300,[128]5.7574,[129]5.7454,[130]5.7539,[131]5.7490,[132]5.7446,[133]5.7318,[134]5.7402,[135]5.7392,[136]5.7311,[137]5.7266,[138]5.7136,[139]5.7058,[140]5.7050,[141]5.6776,[142]5.6734,[143]5.6487,[144]5.6326,[145]5.6238,[146]5.6132,[147]5.6179,[148]5.6202,[149]5.6169,[150]5.6165,[151]5.6212,[152]5.6153,[153]5.6064,[154]5.6005,[155]5.6066,[156]5.6042,[157]5.6202,[158]5.6226,[159]5.6232,[160]5.6268,[161]5.6384,[162]5.6133,[163]5.6034,[164]5.5826,[165]5.5576,[166]5.5342,[167]5.5020,[168]5.4757,[169]5.4622,[170]5.4531,[171]5.4325,[172]5.4202,[173]5.4072,[174]5.3805,[175]5.3599,[176]5.3462,[177]5.3294,[178]5.3096,[179]5.2962,[180]5.2892,[181]5.2729,[182]5.2565,[183]5.2445,[184]5.2435,[185]5.2367,[186]5.2377,[187]5.2436,[188]5.2419,[189]5.2583,[190]5.2585,[191]5.2758,[192]5.2892,[193]5.3032,[194]5.3145,[195]5.3332,[196]5.3447,[197]5.3635,[198]5.3770,[199]5.3788,[200]5.3797,[201]5.3730,[202]5.3862,[203]5.3922,[204]5.3871,[205]5.3960,[206]5.4014,[207]5.3972,[208]5.4033,[209]5.4065,[210]5.4120,[211]5.4227,[212]5.4292,[213]5.4386,[214]5.4415,[215]5.4445,[216]5.4570,[217]5.4734,[218]5.4867,[219]5.4863,[220]5.4836,[221]5.4789,[222]5.4792,[223]5.4732,[224]5.4665,[225]5.4628,[226]5.4829,[227]5.4883,[228]5.4956,[229]5.5025,[230]5.4989,[231]5.5143,[232]5.5036,[233]5.4888,[234]5.4747,[235]5.4525,[236]5.4473,[237]5.4386,[238]5.4417,[239]5.4306,[240]5.4218,[241]5.4251,[242]5.4265,[243]5.4257,[244]5.4163,[245]5.4128,[246]5.4028,[247]5.3930,[248]5.3868,[249]5.3837,[250]5.3874,[251]5.3792,[252]5.3743,[253]5.3653,[254]5.3607,[255]5.3515,[256]5.3350,[257]5.3249,[258]5.3183,[259]5.3173,[260]5.3090,[261]5.3038,[262]5.2997,[263]5.2947,[264]5.2711,[265]5.2707,[266]5.2679,[267]5.2618,[268]5.2684,[269]5.2676,[270]5.2685,[271]5.2749,[272]5.2778,[273]5.2794,[274]5.2802,[275]5.2861,[276]5.2918,[277]5.3039,[278]5.3125,[279]5.3207,[280]5.3244,[281]5.3339,[282]5.3395,[283]5.3517,[284]5.3602,[285]5.3681,[286]5.3805,[287]5.3778,[288]5.3831,[289]5.3770,[290]5.3628,[291]5.3498,[292]5.3364,[293]5.3246,[294]5.3254,[295]5.3256,[296]5.3304,[297]5.3295,[298]5.3317,[299]5.3295,[300]5.3208,[301]5.3211,[302]5.3147,[303]5.3065,[304]5.2992,[305]5.2967,[306]5.2864,[307]5.2893,[308]5.2904,[309]5.2772,[310]5.2743,[311]5.2698,[312]5.2711,[313]5.2657,[314]5.2642,[315]5.2510,[316]5.2470,[317]5.2344,[318]5.2184,[319]5.2289,[320]5.2399,[321]5.2447,[322]5.2418,[323]5.2358,[324]5.2339,[325]5.2436,[326]5.2452,[327]5.2460,[328]5.2495,[329]5.2540,[330]5.2561,[331]5.2663,[332]5.2627,[333]5.2701,[334]5.2656,[335]5.2605,[336]5.2629,[337]5.2619,[338]5.2615,[339]5.2571,[340]5.2539,[341]5.2602,[342]5.2634,[343]5.2674,[344]5.2677,[345]5.2692,[346]5.2676,[347]5.2712,[348]5.2750,[349]5.2773,[350]5.2754,[351]5.2767,[352]5.2769,[353]5.2716,[354]5.2725,[355]5.2774,[356]5.2802,[357]5.2774,[358]5.2854,[359]5.2874,[360]5.2843,[361]5.2843,[362]5.2913,[363]5.3020,[364]5.3072,[365]5.3110,[366]5.3126,[367]5.3213,[368]5.3190,[369]5.3204,[370]5.3224,[371]5.3185,[372]5.3231,[373]5.3270,[374]5.3251,[375]5.3248,[376]5.3306,[377]5.3271,[378]5.3296,[379]5.3330,[380]5.3264,[381]5.3235,[382]5.3196,[383]5.3176,[384]5.3176,[385]5.3166,[386]5.3152,[387]5.3152,[388]5.3126,[389]5.3088,[390]5.3036,[391]5.2979,[392]5.2944,[393]5.2939,[394]5.2970,[395]5.2963,[396]5.2909,[397]5.2973,[398]5.3014,[399]5.3083,[400]5.3077,[401]5.3085,[402]5.3097,[403]5.3119,[404]5.3173,[405]5.3023,[406]5.2982,[407]5.2970,[408]5.2980,[409]5.3090,[410]5.3178,[411]5.3271,[412]5.3412,[413]5.3513,[414]5.3571,[415]5.3630,[416]5.3702,[417]5.3798,[418]5.3822,[419]5.3871,[420]5.3947,[421]5.4045,[422]5.4077,[423]5.4134,[424]5.4224,[425]5.4301,[426]5.4360,[427]5.4401,[428]5.4473,[429]5.4509,[430]5.4572,[431]5.4696,[432]5.4727,[433]5.4721,[434]5.4688,[435]5.4701,[436]5.4730,[437]5.4812,[438]5.4887,[439]5.4856,[440]5.4850,[441]5.4808,[442]5.4796,[443]5.4807,[444]5.4824,[445]5.4815,[446]5.4835,[447]5.4859,[448]5.4892,[449]5.4876,[450]5.4888,[451]5.4862,[452]5.4707,[453]5.4614,[454]5.4560,[455]5.4563,[456]5.4601,[457]5.4612,[458]5.4594,[459]5.4592,[460]5.4665,[461]5.4622,[462]5.4588,[463]5.4568,[464]5.4564,[465]5.4542,[466]5.4466,[467]5.4453,[468]5.4435,[469]5.4444,[470]5.4433,[471]5.4383,[472]5.4386,[473]5.4341,[474]5.4329,[475]5.4263,[476]5.4239,[477]5.4154,[478]5.4128,[479]5.4132,[480]5.4156,[481]5.4156,[482]5.4110,[483]5.4068,[484]5.4078,[485]5.4011,[486]5.3950,[487]5.3939,[488]5.3917,[489]5.3865,[490]5.3832,[491]5.3798,[492]5.3734,[493]5.3707,[494]5.3689,[495]5.3670,[496]5.3630,[497]5.3569,[498]5.3544,[499]5.3510,[500]5.3431,[501]5.3361,[502]5.3351,[503]5.3342,[504]5.3265,[505]5.3262,[506]5.3268,[507]5.3214,[508]5.3177,[509]5.3182,[510]5.3203,[511]5.3246,[512]5.3286,[513]5.3311,[514]5.3362,[515]5.3320,[516]5.3310,[517]5.3310,[518]5.3311,[519]5.3332,[520]5.3344,[521]5.3356,[522]5.3370,[523]5.3378,[524]5.3431,[525]5.3457,[526]5.3462,[527]5.3477,[528]5.3425,[529]5.3434,[530]5.3398,[531]5.3392,[532]5.3440,[533]5.3467,[534]5.3451,[535]5.3473,[536]5.3432,[537]5.3414,[538]5.3465,[539]5.3473,[540]5.3487,[541]5.3486,[542]5.3500,[543]5.3521,[544]5.3534,[545]5.3525,[546]5.3526,[547]5.3494,[548]5.3452,[549]5.3454,[550]5.3434,[551]5.3409,[552]5.3389,[553]5.3360,[554]5.3338,[555]5.3318,[556]5.3310,[557]5.3328,[558]5.3294,[559]5.3299,[560]5.3285,[561]5.3285,[562]5.3261,[563]5.3258,[564]5.3299,[565]5.3309,[566]5.3316,[567]5.3295,[568]5.3307,[569]5.3292,[570]5.3318,[571]5.3331,[572]5.3339,[573]5.3342,[574]5.3312,[575]5.3295,[576]5.3288,[577]5.3272,[578]5.3254,[579]5.3252,[580]5.3200,[581]5.3171,[582]5.3170,[583]5.3178,[584]5.3183,[585]5.3126,[586]5.3071,[587]5.3076,[588]5.3120,[589]5.3169,[590]5.3199,[591]5.3216,[592]5.3205,[593]5.3165,[594]5.3180,[595]5.3166,[596]5.3204,[597]5.3183,[598]5.3151,[599]5.3178,[600]5.3169,[601]5.3157,[602]5.3157,[603]5.3185,[604]5.3191,[605]5.3218,[606]5.3231,[607]5.3217,[608]5.3188,[609]5.3197,[610]5.3238,[611]5.3227,[612]5.3249,[613]5.3220,[614]5.3179,[615]5.3120,[616]5.3148,[617]5.3099,[618]5.3056,[619]5.3012,[620]5.2903,[621]5.2852,[622]5.2833,[623]5.2846,[624]5.2852,[625]5.2859,[626]5.2856,[627]5.2882,[628]5.2890,[629]5.2894,[630]5.2925,[631]5.2970,[632]5.3017,[633]5.3007,[634]5.3036,[635]5.3033,[636]5.2997,[637]5.2961,[638]5.2979,[639]5.2949,[640]5.2957,[641]5.2960,[642]5.3010,[643]5.3026,[644]5.3044,[645]5.3029,[646]5.3063,[647]5.3014,[648]5.3024,[649]5.3027,[650]5.3055,[651]5.3097,[652]5.3100,[653]5.3137,[654]5.3084,[655]5.3075,
llama_print_timings:        load time =  6119.84 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1858813.21 ms / 335360 tokens (    5.54 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1889707.90 ms

Will make another run, this time using RMSE optimization (i.e. same as the one in OP) and double-check the reported 5.3117 result. But if it is confirmed, it would indicate that the RMSE optimization in this case is actually making the result worse for some reason.

ggerganov · 2023-04-22T14:45:14Z

My result for 13B, using Q4_3 with RMSE opt. + F16 output is: 5.2962

This result I think makes more sense since it is inline with my expectation that I described here: #406 (reply in thread)

main: seed = 1682172642
llama.cpp: loading model from ../models/13B/ggml-model-q4_3-output-f16-rmse.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 6 (mostly Q4_3)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 9734493.73 KB
llama_model_load_internal: mem required  = 11554.34 MB (+ 1608.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 12 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.94 seconds per pass - ETA 32 minutes
[1]3.7362,[2]4.1744,[3]4.9576,[4]5.3621,[5]5.5410,[6]5.4788,[7]5.6392,[8]5.7500,[9]6.0088,[10]6.2366,[11]6.4228,[12]6.4859,[13]6.4491,[14]6.5428,[15]6.7439,[16]6.4225,[17]6.3396,[18]6.3169,[19]6.0233,[20]6.0024,[21]5.9256,[22]5.7530,[23]5.7201,[24]5.6258,[25]5.6327,[26]5.4845,[27]5.3094,[28]5.2083,[29]5.1320,[30]4.9981,[31]4.9567,[32]4.9675,[33]4.9237,[34]4.9636,[35]4.9806,[36]5.0033,[37]4.9960,[38]4.9915,[39]5.0202,[40]5.0616,[41]5.0862,[42]5.1202,[43]5.0861,[44]5.1307,[45]5.1348,[46]5.1096,[47]5.1370,[48]5.1183,[49]5.1225,[50]5.0927,[51]5.0998,[52]5.0920,[53]5.1385,[54]5.1290,[55]5.1113,[56]5.1311,[57]5.1489,[58]5.1710,[59]5.1904,[60]5.2260,[61]5.2188,[62]5.2735,[63]5.2982,[64]5.3100,[65]5.3463,[66]5.3455,[67]5.3634,[68]5.3761,[69]5.4045,[70]5.4349,[71]5.4582,[72]5.4919,[73]5.5385,[74]5.5451,[75]5.5550,[76]5.5687,[77]5.5802,[78]5.5664,[79]5.5933,[80]5.5871,[81]5.5951,[82]5.5919,[83]5.5466,[84]5.5365,[85]5.5301,[86]5.5156,[87]5.4509,[88]5.4070,[89]5.3858,[90]5.3750,[91]5.3960,[92]5.3922,[93]5.3940,[94]5.3927,[95]5.4193,[96]5.4162,[97]5.4128,[98]5.4089,[99]5.4020,[100]5.3994,[101]5.4223,[102]5.4177,[103]5.4330,[104]5.4377,[105]5.4390,[106]5.4531,[107]5.4517,[108]5.4666,[109]5.4659,[110]5.4606,[111]5.4784,[112]5.4950,[113]5.4943,[114]5.4930,[115]5.4972,[116]5.4852,[117]5.4847,[118]5.5081,[119]5.5259,[120]5.5549,[121]5.5701,[122]5.5912,[123]5.6276,[124]5.6452,[125]5.6402,[126]5.6758,[127]5.7086,[128]5.7369,[129]5.7256,[130]5.7341,[131]5.7301,[132]5.7257,[133]5.7132,[134]5.7222,[135]5.7222,[136]5.7139,[137]5.7100,[138]5.6974,[139]5.6896,[140]5.6884,[141]5.6614,[142]5.6575,[143]5.6327,[144]5.6168,[145]5.6083,[146]5.5972,[147]5.6019,[148]5.6050,[149]5.6019,[150]5.6011,[151]5.6057,[152]5.5999,[153]5.5903,[154]5.5846,[155]5.5908,[156]5.5892,[157]5.6045,[158]5.6062,[159]5.6072,[160]5.6110,[161]5.6225,[162]5.5972,[163]5.5878,[164]5.5676,[165]5.5427,[166]5.5196,[167]5.4880,[168]5.4613,[169]5.4483,[170]5.4390,[171]5.4185,[172]5.4062,[173]5.3930,[174]5.3661,[175]5.3457,[176]5.3327,[177]5.3162,[178]5.2963,[179]5.2832,[180]5.2757,[181]5.2597,[182]5.2438,[183]5.2320,[184]5.2312,[185]5.2241,[186]5.2253,[187]5.2309,[188]5.2284,[189]5.2448,[190]5.2451,[191]5.2620,[192]5.2756,[193]5.2900,[194]5.3014,[195]5.3208,[196]5.3325,[197]5.3513,[198]5.3647,[199]5.3667,[200]5.3676,[201]5.3610,[202]5.3735,[203]5.3792,[204]5.3744,[205]5.3834,[206]5.3888,[207]5.3851,[208]5.3906,[209]5.3943,[210]5.3998,[211]5.4100,[212]5.4163,[213]5.4254,[214]5.4288,[215]5.4319,[216]5.4438,[217]5.4603,[218]5.4738,[219]5.4735,[220]5.4706,[221]5.4657,[222]5.4658,[223]5.4597,[224]5.4532,[225]5.4496,[226]5.4696,[227]5.4756,[228]5.4828,[229]5.4899,[230]5.4862,[231]5.5013,[232]5.4910,[233]5.4762,[234]5.4620,[235]5.4403,[236]5.4352,[237]5.4269,[238]5.4304,[239]5.4193,[240]5.4102,[241]5.4136,[242]5.4153,[243]5.4147,[244]5.4049,[245]5.4014,[246]5.3912,[247]5.3816,[248]5.3755,[249]5.3723,[250]5.3757,[251]5.3675,[252]5.3627,[253]5.3537,[254]5.3493,[255]5.3401,[256]5.3237,[257]5.3136,[258]5.3070,[259]5.3062,[260]5.2981,[261]5.2931,[262]5.2891,[263]5.2843,[264]5.2605,[265]5.2605,[266]5.2575,[267]5.2515,[268]5.2580,[269]5.2572,[270]5.2580,[271]5.2640,[272]5.2668,[273]5.2678,[274]5.2685,[275]5.2745,[276]5.2801,[277]5.2921,[278]5.3005,[279]5.3085,[280]5.3122,[281]5.3216,[282]5.3269,[283]5.3391,[284]5.3474,[285]5.3553,[286]5.3679,[287]5.3645,[288]5.3696,[289]5.3634,[290]5.3495,[291]5.3367,[292]5.3234,[293]5.3117,[294]5.3125,[295]5.3125,[296]5.3172,[297]5.3161,[298]5.3181,[299]5.3160,[300]5.3074,[301]5.3077,[302]5.3015,[303]5.2931,[304]5.2860,[305]5.2835,[306]5.2733,[307]5.2761,[308]5.2769,[309]5.2637,[310]5.2612,[311]5.2570,[312]5.2585,[313]5.2533,[314]5.2514,[315]5.2387,[316]5.2343,[317]5.2222,[318]5.2060,[319]5.2165,[320]5.2273,[321]5.2322,[322]5.2293,[323]5.2237,[324]5.2220,[325]5.2315,[326]5.2329,[327]5.2335,[328]5.2373,[329]5.2422,[330]5.2444,[331]5.2547,[332]5.2512,[333]5.2586,[334]5.2541,[335]5.2490,[336]5.2513,[337]5.2502,[338]5.2501,[339]5.2458,[340]5.2431,[341]5.2495,[342]5.2528,[343]5.2568,[344]5.2571,[345]5.2586,[346]5.2569,[347]5.2604,[348]5.2641,[349]5.2661,[350]5.2642,[351]5.2655,[352]5.2658,[353]5.2604,[354]5.2612,[355]5.2661,[356]5.2691,[357]5.2663,[358]5.2743,[359]5.2762,[360]5.2728,[361]5.2725,[362]5.2792,[363]5.2900,[364]5.2951,[365]5.2990,[366]5.3007,[367]5.3094,[368]5.3074,[369]5.3089,[370]5.3109,[371]5.3069,[372]5.3116,[373]5.3154,[374]5.3134,[375]5.3129,[376]5.3187,[377]5.3151,[378]5.3176,[379]5.3211,[380]5.3144,[381]5.3114,[382]5.3077,[383]5.3059,[384]5.3061,[385]5.3048,[386]5.3036,[387]5.3034,[388]5.3007,[389]5.2969,[390]5.2918,[391]5.2859,[392]5.2825,[393]5.2821,[394]5.2854,[395]5.2847,[396]5.2796,[397]5.2859,[398]5.2901,[399]5.2971,[400]5.2966,[401]5.2974,[402]5.2986,[403]5.3011,[404]5.3066,[405]5.2918,[406]5.2877,[407]5.2867,[408]5.2875,[409]5.2985,[410]5.3076,[411]5.3169,[412]5.3308,[413]5.3409,[414]5.3470,[415]5.3528,[416]5.3598,[417]5.3696,[418]5.3721,[419]5.3769,[420]5.3844,[421]5.3942,[422]5.3975,[423]5.4033,[424]5.4122,[425]5.4199,[426]5.4259,[427]5.4301,[428]5.4373,[429]5.4410,[430]5.4472,[431]5.4596,[432]5.4627,[433]5.4620,[434]5.4587,[435]5.4601,[436]5.4629,[437]5.4710,[438]5.4782,[439]5.4755,[440]5.4748,[441]5.4704,[442]5.4692,[443]5.4702,[444]5.4721,[445]5.4712,[446]5.4733,[447]5.4756,[448]5.4788,[449]5.4773,[450]5.4784,[451]5.4755,[452]5.4599,[453]5.4503,[454]5.4450,[455]5.4453,[456]5.4495,[457]5.4508,[458]5.4491,[459]5.4490,[460]5.4563,[461]5.4523,[462]5.4489,[463]5.4468,[464]5.4465,[465]5.4443,[466]5.4369,[467]5.4360,[468]5.4340,[469]5.4352,[470]5.4341,[471]5.4292,[472]5.4299,[473]5.4251,[474]5.4240,[475]5.4171,[476]5.4148,[477]5.4065,[478]5.4035,[479]5.4036,[480]5.4061,[481]5.4062,[482]5.4015,[483]5.3973,[484]5.3980,[485]5.3913,[486]5.3849,[487]5.3837,[488]5.3815,[489]5.3761,[490]5.3730,[491]5.3698,[492]5.3630,[493]5.3604,[494]5.3585,[495]5.3561,[496]5.3521,[497]5.3457,[498]5.3431,[499]5.3395,[500]5.3314,[501]5.3245,[502]5.3236,[503]5.3225,[504]5.3149,[505]5.3145,[506]5.3150,[507]5.3097,[508]5.3060,[509]5.3066,[510]5.3088,[511]5.3130,[512]5.3170,[513]5.3194,[514]5.3248,[515]5.3208,[516]5.3198,[517]5.3197,[518]5.3198,[519]5.3219,[520]5.3233,[521]5.3245,[522]5.3258,[523]5.3265,[524]5.3319,[525]5.3347,[526]5.3353,[527]5.3369,[528]5.3314,[529]5.3323,[530]5.3287,[531]5.3282,[532]5.3329,[533]5.3356,[534]5.3337,[535]5.3357,[536]5.3316,[537]5.3298,[538]5.3347,[539]5.3355,[540]5.3371,[541]5.3369,[542]5.3382,[543]5.3404,[544]5.3416,[545]5.3406,[546]5.3408,[547]5.3375,[548]5.3334,[549]5.3334,[550]5.3313,[551]5.3286,[552]5.3266,[553]5.3238,[554]5.3216,[555]5.3197,[556]5.3189,[557]5.3208,[558]5.3175,[559]5.3178,[560]5.3164,[561]5.3166,[562]5.3141,[563]5.3140,[564]5.3182,[565]5.3194,[566]5.3201,[567]5.3182,[568]5.3192,[569]5.3177,[570]5.3204,[571]5.3216,[572]5.3224,[573]5.3228,[574]5.3200,[575]5.3184,[576]5.3177,[577]5.3163,[578]5.3144,[579]5.3144,[580]5.3091,[581]5.3061,[582]5.3061,[583]5.3069,[584]5.3074,[585]5.3016,[586]5.2962,[587]5.2965,[588]5.3008,[589]5.3058,[590]5.3088,[591]5.3105,[592]5.3094,[593]5.3054,[594]5.3068,[595]5.3052,[596]5.3091,[597]5.3071,[598]5.3039,[599]5.3065,[600]5.3056,[601]5.3045,[602]5.3046,[603]5.3073,[604]5.3079,[605]5.3106,[606]5.3120,[607]5.3105,[608]5.3077,[609]5.3086,[610]5.3126,[611]5.3116,[612]5.3138,[613]5.3109,[614]5.3070,[615]5.3011,[616]5.3036,[617]5.2986,[618]5.2942,[619]5.2898,[620]5.2789,[621]5.2739,[622]5.2721,[623]5.2734,[624]5.2737,[625]5.2745,[626]5.2741,[627]5.2768,[628]5.2776,[629]5.2780,[630]5.2811,[631]5.2854,[632]5.2902,[633]5.2891,[634]5.2920,[635]5.2918,[636]5.2884,[637]5.2848,[638]5.2868,[639]5.2838,[640]5.2845,[641]5.2849,[642]5.2899,[643]5.2915,[644]5.2933,[645]5.2919,[646]5.2953,[647]5.2902,[648]5.2913,[649]5.2914,[650]5.2943,[651]5.2983,[652]5.2987,[653]5.3024,[654]5.2970,[655]5.2962,
llama_print_timings:        load time =  6171.79 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 1846949.17 ms / 335360 tokens (    5.51 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 1878169.61 ms

ikawrakow · 2023-04-22T17:41:15Z

@ggerganov Are these results with or without the changes you made to Q4_3 after I opened this PR (and reported the results)?

By default this new option is ON. One can turn it off by setting LLAMA_NO_RMSE. With this option enabled, the Q4_3 quantization results in a perplexity of 6.0344, so 0.0273 lower than simple Q4_3 quantization.

Test does not work with RMSE-minimization enabled, so have to put the test cases between ifdefs.

ggerganov · 2023-04-22T20:14:27Z

@ggerganov Are these results with or without the changes you made to Q4_3 after I opened this PR (and reported the results)?

It includes all changes from today related to Q4_3 quantization. Maybe this is the source of the difference, although it's still strange since the Q4_3 changes should just improve the performance. Of course, we cannot expect exact same results, but the difference is rather big. So not 100% sure. I can run one extra Q4_3 13B run with the build from yesterday to make sure

ikawrakow · 2023-04-22T20:45:17Z

@ggerganov Rebased this branch on latest master, re-quantized, re-ran the perplexity. Now I get the lower result as well with OPEN_BLAS (5.2961, so actually 0.0001 lower than cuBLAS). So, something else has happened that positively impacts results. Another observation is that OPEN_BLAS and cuBLAS results are not identical as they are for the fp16 model. They are very close, but not exactly the same. See details.

Is it possible this affects this comment you made in #729 ?

perplexity : calculating perplexity over 655 chunks, batch_size=512 30.59 seconds per pass - ETA 5 hours 33 minutes [1]3.7363,[2]4.1741,[3]4.9573,[4]5.3622,[5]5.5408,[6]5.4786,[7]5.6388,[8]5.7498,[9]6.0085,[10]6.2361,[11]6.4224,[12]6.4857,[13]6.4488,[14]6.5426,[15]6.7435,[16]6.4223,[17]6.3394,[18]6.3167,[19]6.0232,[20]6.0023,[21]5.9256,[22]5.7530,[23]5.7200,[24]5.6257,[25]5.6326,[26]5.4844,[27]5.3093,[28]5.2082,[29]5.1320,[30]4.9981,[31]4.9567,[32]4.9675,[33]4.9237,[34]4.9636,[35]4.9806,[36]5.0033,[37]4.9960,[38]4.9914,[39]5.0201,[40]5.0615,[41]5.0861,[42]5.1202,[43]5.0861,[44]5.1306,[45]5.1347,[46]5.1095,[47]5.1369,[48]5.1183,[49]5.1225,[50]5.0926,[51]5.0997,[52]5.0919,[53]5.1384,[54]5.1290,[55]5.1113,[56]5.1310,[57]5.1488,[58]5.1709,[59]5.1902,[60]5.2259,[61]5.2187,[62]5.2734,[63]5.2981,[64]5.3100,[65]5.3462,[66]5.3454,[67]5.3633,[68]5.3760,[69]5.4044,[70]5.4348,[71]5.4581,[72]5.4918,[73]5.5384,[74]5.5450,[75]5.5549,[76]5.5686,[77]5.5801,[78]5.5663,[79]5.5932,[80]5.5870,[81]5.5950,[82]5.5918,[83]5.5465,[84]5.5364,[85]5.5300,[86]5.5155,[87]5.4508,[88]5.4069,[89]5.3857,[90]5.3749,[91]5.3959,[92]5.3921,[93]5.3939,[94]5.3926,[95]5.4191,[96]5.4161,[97]5.4127,[98]5.4088,[99]5.4019,[100]5.3993,[101]5.4222,[102]5.4176,[103]5.4329,[104]5.4376,[105]5.4389,[106]5.4529,[107]5.4516,[108]5.4665,[109]5.4657,[110]5.4605,[111]5.4783,[112]5.4949,[113]5.4942,[114]5.4929,[115]5.4971,[116]5.4851,[117]5.4846,[118]5.5080,[119]5.5258,[120]5.5548,[121]5.5700,[122]5.5911,[123]5.6275,[124]5.6451,[125]5.6401,[126]5.6757,[127]5.7085,[128]5.7368,[129]5.7255,[130]5.7340,[131]5.7300,[132]5.7256,[133]5.7132,[134]5.7221,[135]5.7221,[136]5.7139,[137]5.7100,[138]5.6973,[139]5.6895,[140]5.6883,[141]5.6613,[142]5.6574,[143]5.6326,[144]5.6167,[145]5.6083,[146]5.5972,[147]5.6019,[148]5.6049,[149]5.6018,[150]5.6011,[151]5.6057,[152]5.5998,[153]5.5902,[154]5.5846,[155]5.5907,[156]5.5891,[157]5.6045,[158]5.6061,[159]5.6071,[160]5.6109,[161]5.6225,[162]5.5971,[163]5.5877,[164]5.5676,[165]5.5426,[166]5.5195,[167]5.4880,[168]5.4612,[169]5.4483,[170]5.4389,[171]5.4184,[172]5.4062,[173]5.3929,[174]5.3660,[175]5.3457,[176]5.3327,[177]5.3161,[178]5.2963,[179]5.2832,[180]5.2757,[181]5.2596,[182]5.2438,[183]5.2319,[184]5.2311,[185]5.2240,[186]5.2252,[187]5.2308,[188]5.2284,[189]5.2447,[190]5.2451,[191]5.2619,[192]5.2755,[193]5.2900,[194]5.3014,[195]5.3208,[196]5.3324,[197]5.3513,[198]5.3647,[199]5.3667,[200]5.3676,[201]5.3610,[202]5.3734,[203]5.3792,[204]5.3744,[205]5.3834,[206]5.3888,[207]5.3851,[208]5.3906,[209]5.3943,[210]5.3998,[211]5.4100,[212]5.4164,[213]5.4254,[214]5.4288,[215]5.4319,[216]5.4438,[217]5.4603,[218]5.4738,[219]5.4735,[220]5.4706,[221]5.4657,[222]5.4658,[223]5.4597,[224]5.4532,[225]5.4496,[226]5.4696,[227]5.4756,[228]5.4828,[229]5.4899,[230]5.4862,[231]5.5013,[232]5.4910,[233]5.4762,[234]5.4620,[235]5.4403,[236]5.4352,[237]5.4269,[238]5.4303,[239]5.4192,[240]5.4102,[241]5.4136,[242]5.4153,[243]5.4147,[244]5.4049,[245]5.4014,[246]5.3912,[247]5.3815,[248]5.3755,[249]5.3722,[250]5.3757,[251]5.3675,[252]5.3626,[253]5.3537,[254]5.3492,[255]5.3401,[256]5.3237,[257]5.3136,[258]5.3070,[259]5.3062,[260]5.2981,[261]5.2931,[262]5.2890,[263]5.2843,[264]5.2605,[265]5.2605,[266]5.2575,[267]5.2514,[268]5.2580,[269]5.2572,[270]5.2580,[271]5.2640,[272]5.2668,[273]5.2678,[274]5.2685,[275]5.2744,[276]5.2801,[277]5.2921,[278]5.3005,[279]5.3085,[280]5.3122,[281]5.3216,[282]5.3269,[283]5.3390,[284]5.3473,[285]5.3553,[286]5.3679,[287]5.3645,[288]5.3696,[289]5.3634,[290]5.3495,[291]5.3367,[292]5.3234,[293]5.3117,[294]5.3125,[295]5.3125,[296]5.3172,[297]5.3161,[298]5.3181,[299]5.3160,[300]5.3074,[301]5.3077,[302]5.3015,[303]5.2931,[304]5.2860,[305]5.2835,[306]5.2733,[307]5.2761,[308]5.2769,[309]5.2637,[310]5.2612,[311]5.2570,[312]5.2585,[313]5.2533,[314]5.2515,[315]5.2387,[316]5.2343,[317]5.2222,[318]5.2060,[319]5.2165,[320]5.2273,[321]5.2322,[322]5.2293,[323]5.2238,[324]5.2220,[325]5.2316,[326]5.2329,[327]5.2335,[328]5.2373,[329]5.2422,[330]5.2445,[331]5.2547,[332]5.2512,[333]5.2586,[334]5.2541,[335]5.2490,[336]5.2514,[337]5.2502,[338]5.2501,[339]5.2458,[340]5.2431,[341]5.2495,[342]5.2528,[343]5.2568,[344]5.2571,[345]5.2586,[346]5.2569,[347]5.2604,[348]5.2641,[349]5.2661,[350]5.2642,[351]5.2656,[352]5.2658,[353]5.2604,[354]5.2612,[355]5.2661,[356]5.2691,[357]5.2663,[358]5.2744,[359]5.2762,[360]5.2728,[361]5.2725,[362]5.2792,[363]5.2900,[364]5.2951,[365]5.2990,[366]5.3008,[367]5.3094,[368]5.3074,[369]5.3089,[370]5.3109,[371]5.3069,[372]5.3116,[373]5.3154,[374]5.3134,[375]5.3129,[376]5.3187,[377]5.3152,[378]5.3176,[379]5.3211,[380]5.3144,[381]5.3114,[382]5.3077,[383]5.3059,[384]5.3061,[385]5.3048,[386]5.3036,[387]5.3034,[388]5.3007,[389]5.2969,[390]5.2918,[391]5.2859,[392]5.2826,[393]5.2821,[394]5.2854,[395]5.2847,[396]5.2796,[397]5.2859,[398]5.2901,[399]5.2971,[400]5.2966,[401]5.2974,[402]5.2986,[403]5.3011,[404]5.3066,[405]5.2918,[406]5.2876,[407]5.2867,[408]5.2875,[409]5.2985,[410]5.3076,[411]5.3169,[412]5.3308,[413]5.3409,[414]5.3470,[415]5.3528,[416]5.3598,[417]5.3696,[418]5.3721,[419]5.3769,[420]5.3844,[421]5.3942,[422]5.3975,[423]5.4033,[424]5.4122,[425]5.4199,[426]5.4259,[427]5.4301,[428]5.4373,[429]5.4410,[430]5.4472,[431]5.4596,[432]5.4627,[433]5.4620,[434]5.4587,[435]5.4601,[436]5.4629,[437]5.4710,[438]5.4782,[439]5.4755,[440]5.4748,[441]5.4704,[442]5.4692,[443]5.4702,[444]5.4721,[445]5.4712,[446]5.4733,[447]5.4756,[448]5.4788,[449]5.4773,[450]5.4784,[451]5.4755,[452]5.4599,[453]5.4503,[454]5.4450,[455]5.4453,[456]5.4495,[457]5.4508,[458]5.4491,[459]5.4490,[460]5.4563,[461]5.4523,[462]5.4489,[463]5.4468,[464]5.4465,[465]5.4443,[466]5.4369,[467]5.4360,[468]5.4340,[469]5.4352,[470]5.4341,[471]5.4292,[472]5.4299,[473]5.4251,[474]5.4239,[475]5.4171,[476]5.4147,[477]5.4064,[478]5.4035,[479]5.4036,[480]5.4060,[481]5.4062,[482]5.4015,[483]5.3973,[484]5.3980,[485]5.3913,[486]5.3848,[487]5.3836,[488]5.3814,[489]5.3761,[490]5.3730,[491]5.3697,[492]5.3630,[493]5.3603,[494]5.3584,[495]5.3561,[496]5.3521,[497]5.3457,[498]5.3430,[499]5.3394,[500]5.3313,[501]5.3245,[502]5.3235,[503]5.3225,[504]5.3148,[505]5.3145,[506]5.3150,[507]5.3097,[508]5.3060,[509]5.3065,[510]5.3088,[511]5.3130,[512]5.3169,[513]5.3194,[514]5.3247,[515]5.3207,[516]5.3197,[517]5.3197,[518]5.3197,[519]5.3219,[520]5.3233,[521]5.3244,[522]5.3258,[523]5.3265,[524]5.3319,[525]5.3347,[526]5.3352,[527]5.3368,[528]5.3313,[529]5.3323,[530]5.3286,[531]5.3281,[532]5.3329,[533]5.3356,[534]5.3337,[535]5.3356,[536]5.3315,[537]5.3298,[538]5.3346,[539]5.3354,[540]5.3370,[541]5.3368,[542]5.3382,[543]5.3403,[544]5.3415,[545]5.3405,[546]5.3407,[547]5.3375,[548]5.3334,[549]5.3334,[550]5.3312,[551]5.3286,[552]5.3266,[553]5.3238,[554]5.3216,[555]5.3196,[556]5.3189,[557]5.3207,[558]5.3174,[559]5.3177,[560]5.3164,[561]5.3166,[562]5.3141,[563]5.3139,[564]5.3182,[565]5.3194,[566]5.3201,[567]5.3182,[568]5.3192,[569]5.3177,[570]5.3203,[571]5.3216,[572]5.3224,[573]5.3228,[574]5.3200,[575]5.3184,[576]5.3177,[577]5.3162,[578]5.3144,[579]5.3144,[580]5.3090,[581]5.3061,[582]5.3061,[583]5.3068,[584]5.3073,[585]5.3016,[586]5.2962,[587]5.2965,[588]5.3007,[589]5.3057,[590]5.3087,[591]5.3104,[592]5.3093,[593]5.3053,[594]5.3068,[595]5.3052,[596]5.3090,[597]5.3070,[598]5.3039,[599]5.3065,[600]5.3056,[601]5.3044,[602]5.3045,[603]5.3073,[604]5.3079,[605]5.3105,[606]5.3119,[607]5.3105,[608]5.3077,[609]5.3085,[610]5.3126,[611]5.3115,[612]5.3138,[613]5.3109,[614]5.3070,[615]5.3010,[616]5.3035,[617]5.2985,[618]5.2942,[619]5.2898,[620]5.2789,[621]5.2738,[622]5.2720,[623]5.2733,[624]5.2736,[625]5.2744,[626]5.2741,[627]5.2767,[628]5.2776,[629]5.2779,[630]5.2811,[631]5.2853,[632]5.2901,[633]5.2890,[634]5.2920,[635]5.2917,[636]5.2883,[637]5.2848,[638]5.2868,[639]5.2838,[640]5.2844,[641]5.2849,[642]5.2898,[643]5.2915,[644]5.2932,[645]5.2919,[646]5.2953,[647]5.2901,[648]5.2912,[649]5.2914,[650]5.2942,[651]5.2982,[652]5.2987,[653]5.3024,[654]5.2969,[655]5.2961,

llama_print_timings: load time = 31077.61 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 9573081.81 ms / 335360 tokens ( 28.55 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 9604412.75 ms

ggerganov · 2023-04-22T20:50:30Z

I think we cannot expect cuBLAS and OpenBLAS to be exactly the same because cuBLAS dequantizes x to F16 and casts y to F16 and performs F16 mat mul, while OpenBLAS dequantizes x to F32 and performs F32 mat mul (if I'm not mistaken)

slaren · 2023-04-22T20:55:37Z

cuBLAS dequantizes x to F16 and casts y to F16 and performs F16 mat mul, while OpenBLAS dequantizes x to F32 and performs F32 mat mul (if I'm not mistaken)

That's not exactly the case, when multiplying q x f32, cuBLAS dequantizes to f32 and does a f32 x f32 mat mul. The only difference with OpenBLAS is when performing a f16 x f32 mat mul (ggml_compute_forward_mul_mat_f16_f32). In this case, src1 is converted to f16 instead of converting src0 to f32, and a f16 x f16 mat mul is done.

ikawrakow · 2023-04-30T17:48:32Z

@ggerganov I propose we close this PR. Although there is some benefit from rmse minimization for QX_1 and QX_3 quantization of the 7B model, the benefit mostly goes away for 13B (and Q5_1 is actually worse with rmse minimization that without at 13B).

ivanstepanovftw · 2023-05-03T19:56:32Z

You are minimizing error - why it should be worse? It may be worse for one, but better for another case, no?

ivanstepanovftw · 2023-05-03T20:07:35Z

By that I mean that perplexity for wide range of other files (than en-wikitext--whatever) may be better. And not for one model but for another...

Quantization itself is here is to compress data as much as possible without affecting model's quality much.

ikawrakow requested a review from ggerganov April 21, 2023 15:29

ikawrakow mentioned this pull request Apr 21, 2023

More accurate Q4_0 and Q4_1 quantizations #896

Closed

ikawrakow force-pushed the ik/rmse_quantization branch from 364c00a to 7ca90a8 Compare April 21, 2023 16:10

sw suggested changes Apr 21, 2023

View reviewed changes

sw reviewed Apr 21, 2023

View reviewed changes

ggerganov added high priority Very important issue generation quality Quality of model output labels Apr 22, 2023

ggerganov assigned ikawrakow Apr 22, 2023

ggerganov mentioned this pull request Apr 22, 2023

Use full range for q4_0 quantization #729

Merged

Kawrakow added 3 commits April 22, 2023 19:44

RMSE-optimized quants for all quantization types

3c69f93

By default this new option is ON. One can turn it off by setting LLAMA_NO_RMSE. With this option enabled, the Q4_3 quantization results in a perplexity of 6.0344, so 0.0273 lower than simple Q4_3 quantization.

Fix test-quantize

4f4f90c

Test does not work with RMSE-minimization enabled, so have to put the test cases between ifdefs.

Minor, plus rebase on master

6fd49ed

ikawrakow force-pushed the ik/rmse_quantization branch from 7ca90a8 to 6fd49ed Compare April 22, 2023 22:53

MarcioPais mentioned this pull request Apr 22, 2023

Investigate alternative approach for Q4 quantization #397

Closed

ggerganov closed this May 3, 2023

ikawrakow deleted the ik/rmse_quantization branch June 11, 2023 14:22

BarfingLemurs mentioned this pull request Nov 22, 2023

GPTQ / ExLlamaV2 (EXL2) quantisation #4165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMSE-optimized quants for all quantization types #1106

RMSE-optimized quants for all quantization types #1106

ikawrakow commented Apr 21, 2023 •

edited

Loading

Green-Sky commented Apr 21, 2023

sw commented Apr 21, 2023

sw left a comment

sw Apr 21, 2023 •

edited

Loading

sw commented Apr 22, 2023 •

edited

Loading

ggerganov commented Apr 22, 2023 •

edited

Loading

ggerganov commented Apr 22, 2023

ikawrakow commented Apr 22, 2023

ggerganov commented Apr 22, 2023

ikawrakow commented Apr 22, 2023

ggerganov commented Apr 22, 2023 •

edited

Loading

slaren commented Apr 22, 2023 •

edited

Loading

ikawrakow commented Apr 30, 2023

ivanstepanovftw commented May 3, 2023

ivanstepanovftw commented May 3, 2023 •

edited

Loading

RMSE-optimized quants for all quantization types #1106

RMSE-optimized quants for all quantization types #1106

Conversation

ikawrakow commented Apr 21, 2023 • edited Loading

Green-Sky commented Apr 21, 2023

sw commented Apr 21, 2023

sw left a comment

Choose a reason for hiding this comment

sw Apr 21, 2023 • edited Loading

Choose a reason for hiding this comment

sw commented Apr 22, 2023 • edited Loading

ggerganov commented Apr 22, 2023 • edited Loading

ggerganov commented Apr 22, 2023

ikawrakow commented Apr 22, 2023

ggerganov commented Apr 22, 2023

ikawrakow commented Apr 22, 2023

ggerganov commented Apr 22, 2023 • edited Loading

slaren commented Apr 22, 2023 • edited Loading

ikawrakow commented Apr 30, 2023

ivanstepanovftw commented May 3, 2023

ivanstepanovftw commented May 3, 2023 • edited Loading

ikawrakow commented Apr 21, 2023 •

edited

Loading

sw Apr 21, 2023 •

edited

Loading

sw commented Apr 22, 2023 •

edited

Loading

ggerganov commented Apr 22, 2023 •

edited

Loading

ggerganov commented Apr 22, 2023 •

edited

Loading

slaren commented Apr 22, 2023 •

edited

Loading

ivanstepanovftw commented May 3, 2023 •

edited

Loading