From 7e2122622f20d94e72ad27c5a484956f47386852 Mon Sep 17 00:00:00 2001 From: Klaus Post Date: Fri, 17 Feb 2023 15:43:42 +0100 Subject: [PATCH] s2: Add LZ4 block converter (#748) This allows converting compressed LZ4 blocks to S2 (or snappy) blocks without decompression. LZ4 -> S2 seems to be same size on average. LZ4 -> Snappy is usually worse. ## Single threaded performance Speed (excluding LZ4 encoding): ``` BenchmarkLZ4Converter_ConvertBlock/html-32 28237 42827 ns/op 2390.99 MB/s 559.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/urls-32 2138 541816 ns/op 1295.80 MB/s -3943 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/jpg-32 514826 2328 ns/op 52874.24 MB/s 482.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/jpg_200b-32 34821668 33.48 ns/op 5973.00 MB/s 2.000 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/pdf-32 198241 5975 ns/op 17136.81 MB/s 136.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/html4-32 7002 173440 ns/op 2361.63 MB/s 1840 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1-32 5940 196951 ns/op 772.22 MB/s 106.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt2-32 6656 177228 ns/op 706.32 MB/s -1427 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt3-32 2355 510435 ns/op 836.06 MB/s 384.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt4-32 1700 694444 ns/op 693.88 MB/s -9125 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/pb-32 37118 32141 ns/op 3689.60 MB/s 1.000 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/gaviota-32 6961 172253 ns/op 1070.05 MB/s 9303 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1_128b-32 19923691 59.82 ns/op 2139.66 MB/s 0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1_1000b-32 3180837 375.2 ns/op 2665.40 MB/s 16.00 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1_10000b-32 184214 6350 ns/op 1574.70 MB/s 90.00 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1_20000b-32 74031 15521 ns/op 1288.54 MB/s -5.000 b_saved 0 B/op 0 allocs/op ``` Assembly speed (amd64) ``` BenchmarkLZ4Converter_ConvertBlock/html-32 47457 24463 ns/op 4185.89 MB/s 559.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/urls-32 3506 330277 ns/op 2125.75 MB/s -3943 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/jpg-32 450177 2718 ns/op 45294.89 MB/s 482.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/jpg_200b-32 76887589 15.52 ns/op 12887.16 MB/s 2.000 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/pdf-32 279540 4322 ns/op 23694.21 MB/s 136.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/html4-32 10000 107485 ns/op 3810.75 MB/s 1840 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1-32 10000 117764 ns/op 1291.47 MB/s 106.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt2-32 10000 100578 ns/op 1244.60 MB/s -1427 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt3-32 3793 313021 ns/op 1363.34 MB/s 384.0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt4-32 2988 399888 ns/op 1204.99 MB/s -9125 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/pb-32 57486 19277 ns/op 6151.76 MB/s 1.000 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/gaviota-32 10000 115641 ns/op 1593.90 MB/s 9303 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1_128b-32 38400122 31.21 ns/op 4101.10 MB/s 0 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1_1000b-32 6509028 179.9 ns/op 5559.38 MB/s 16.00 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1_10000b-32 368212 3244 ns/op 3082.28 MB/s 90.00 b_saved 0 B/op 0 allocs/op BenchmarkLZ4Converter_ConvertBlock/txt1_20000b-32 141013 8303 ns/op 2408.69 MB/s -5.000 b_saved 0 B/op 0 allocs/op ``` Reference compression speed: ``` BenchmarkCompressBlockReference/html/default-32 14070 82449 ns/op 1241.98 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/urls/default-32 1215 970890 ns/op 723.14 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/jpg/default-32 193770 5904 ns/op 20849.30 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/jpg_200b/default-32 8297767 144.1 ns/op 1387.77 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/pdf/default-32 94203 12694 ns/op 8066.76 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/html4/default-32 12174 97969 ns/op 4180.90 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1/default-32 3613 333851 ns/op 455.56 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt2/default-32 4683 260579 ns/op 480.39 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt3/default-32 1268 947209 ns/op 450.54 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt4/default-32 1083 1097426 ns/op 439.08 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/pb/default-32 18357 64771 ns/op 1830.90 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/gaviota/default-32 3942 295275 ns/op 624.23 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1_128b/default-32 11448295 105.7 ns/op 1210.45 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1_1000b/default-32 1000000 1021 ns/op 979.26 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1_10000b/default-32 116739 10114 ns/op 988.68 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1_20000b/default-32 49216 23409 ns/op 854.39 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/html/better-32 6649 174667 ns/op 586.26 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/urls/better-32 627 1905706 ns/op 368.41 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/jpg/better-32 52425 22783 ns/op 5402.88 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/jpg_200b/better-32 2772865 433.3 ns/op 461.61 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/pdf/better-32 9210 127051 ns/op 805.97 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/html4/better-32 5835 201146 ns/op 2036.33 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1/better-32 2034 566702 ns/op 268.38 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt2/better-32 2386 500580 ns/op 250.07 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt3/better-32 758 1556541 ns/op 274.17 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt4/better-32 591 2013515 ns/op 239.31 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/pb/better-32 7836 155117 ns/op 764.51 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/gaviota/better-32 2473 484975 ns/op 380.06 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1_128b/better-32 4322678 275.5 ns/op 464.59 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1_1000b/better-32 468687 2533 ns/op 394.76 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1_10000b/better-32 49606 23720 ns/op 421.59 MB/s 0 B/op 0 allocs/op BenchmarkCompressBlockReference/txt1_20000b/better-32 14823 81300 ns/op 246.00 MB/s 0 B/op 0 allocs/op ``` Size comparisons (using Go lz4 encoder): ``` === RUN TestLZ4Converter_ConvertBlock/html lz4convert_test.go:42: input size: 102400 lz4convert_test.go:43: lz4 size: 21195 lz4convert_test.go:60: lz4->snappy size: 21828 lz4convert_test.go:79: lz4->s2 size: 20636 lz4convert_test.go:91: s2 (default) size: 20865 lz4convert_test.go:95: s2 (better) size: 18969 lz4convert_test.go:97: lz4 -> s2 bytes saved: 559 lz4convert_test.go:98: lz4 -> snappy bytes saved: -633 lz4convert_test.go:99: data -> s2 (default) bytes saved: 330 lz4convert_test.go:100: data -> s2 (better) bytes saved: 2226 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -229 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 1667 --- PASS: TestLZ4Converter_ConvertBlock/html (0.00s) === RUN TestLZ4Converter_ConvertBlock/urls lz4convert_test.go:42: input size: 702087 lz4convert_test.go:43: lz4 size: 292514 lz4convert_test.go:60: lz4->snappy size: 297926 lz4convert_test.go:79: lz4->s2 size: 296457 lz4convert_test.go:91: s2 (default) size: 286538 lz4convert_test.go:95: s2 (better) size: 248076 lz4convert_test.go:97: lz4 -> s2 bytes saved: -3943 lz4convert_test.go:98: lz4 -> snappy bytes saved: -5412 lz4convert_test.go:99: data -> s2 (default) bytes saved: 5976 lz4convert_test.go:100: data -> s2 (better) bytes saved: 44438 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: 9919 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 48381 --- PASS: TestLZ4Converter_ConvertBlock/urls (0.01s) === RUN TestLZ4Converter_ConvertBlock/jpg lz4convert_test.go:42: input size: 123093 lz4convert_test.go:43: lz4 size: 123522 lz4convert_test.go:60: lz4->snappy size: 123040 lz4convert_test.go:79: lz4->s2 size: 123040 lz4convert_test.go:91: s2 (default) size: 123097 lz4convert_test.go:95: s2 (better) size: 123097 lz4convert_test.go:97: lz4 -> s2 bytes saved: 482 lz4convert_test.go:98: lz4 -> snappy bytes saved: 482 lz4convert_test.go:99: data -> s2 (default) bytes saved: 425 lz4convert_test.go:100: data -> s2 (better) bytes saved: 425 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -57 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: -57 --- PASS: TestLZ4Converter_ConvertBlock/jpg (0.00s) === RUN TestLZ4Converter_ConvertBlock/jpg_200b lz4convert_test.go:42: input size: 200 lz4convert_test.go:43: lz4 size: 155 lz4convert_test.go:60: lz4->snappy size: 153 lz4convert_test.go:79: lz4->s2 size: 153 lz4convert_test.go:91: s2 (default) size: 153 lz4convert_test.go:95: s2 (better) size: 147 lz4convert_test.go:97: lz4 -> s2 bytes saved: 2 lz4convert_test.go:98: lz4 -> snappy bytes saved: 2 lz4convert_test.go:99: data -> s2 (default) bytes saved: 2 lz4convert_test.go:100: data -> s2 (better) bytes saved: 8 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: 0 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 6 --- PASS: TestLZ4Converter_ConvertBlock/jpg_200b (0.00s) === RUN TestLZ4Converter_ConvertBlock/pdf lz4convert_test.go:42: input size: 102400 lz4convert_test.go:43: lz4 size: 83152 lz4convert_test.go:60: lz4->snappy size: 83428 lz4convert_test.go:79: lz4->s2 size: 83016 lz4convert_test.go:91: s2 (default) size: 84199 lz4convert_test.go:95: s2 (better) size: 82884 lz4convert_test.go:97: lz4 -> s2 bytes saved: 136 lz4convert_test.go:98: lz4 -> snappy bytes saved: -276 lz4convert_test.go:99: data -> s2 (default) bytes saved: -1047 lz4convert_test.go:100: data -> s2 (better) bytes saved: 268 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -1183 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 132 --- PASS: TestLZ4Converter_ConvertBlock/pdf (0.00s) === RUN TestLZ4Converter_ConvertBlock/html4 lz4convert_test.go:42: input size: 409600 lz4convert_test.go:43: lz4 size: 81908 lz4convert_test.go:60: lz4->snappy size: 84886 lz4convert_test.go:79: lz4->s2 size: 80068 lz4convert_test.go:91: s2 (default) size: 20867 lz4convert_test.go:95: s2 (better) size: 18979 lz4convert_test.go:97: lz4 -> s2 bytes saved: 1840 lz4convert_test.go:98: lz4 -> snappy bytes saved: -2978 lz4convert_test.go:99: data -> s2 (default) bytes saved: 61041 lz4convert_test.go:100: data -> s2 (better) bytes saved: 62929 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: 59201 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 61089 --- PASS: TestLZ4Converter_ConvertBlock/html4 (0.00s) === RUN TestLZ4Converter_ConvertBlock/txt1 lz4convert_test.go:42: input size: 152089 lz4convert_test.go:43: lz4 size: 79672 lz4convert_test.go:60: lz4->snappy size: 79567 lz4convert_test.go:79: lz4->s2 size: 79566 lz4convert_test.go:91: s2 (default) size: 85931 lz4convert_test.go:95: s2 (better) size: 71608 lz4convert_test.go:97: lz4 -> s2 bytes saved: 106 lz4convert_test.go:98: lz4 -> snappy bytes saved: 105 lz4convert_test.go:99: data -> s2 (default) bytes saved: -6259 lz4convert_test.go:100: data -> s2 (better) bytes saved: 8064 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -6365 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 7958 --- PASS: TestLZ4Converter_ConvertBlock/txt1 (0.00s) === RUN TestLZ4Converter_ConvertBlock/txt2 lz4convert_test.go:42: input size: 125179 lz4convert_test.go:43: lz4 size: 70801 lz4convert_test.go:60: lz4->snappy size: 72231 lz4convert_test.go:79: lz4->s2 size: 72228 lz4convert_test.go:91: s2 (default) size: 79572 lz4convert_test.go:95: s2 (better) size: 65938 lz4convert_test.go:97: lz4 -> s2 bytes saved: -1427 lz4convert_test.go:98: lz4 -> snappy bytes saved: -1430 lz4convert_test.go:99: data -> s2 (default) bytes saved: -8771 lz4convert_test.go:100: data -> s2 (better) bytes saved: 4863 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -7344 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 6290 --- PASS: TestLZ4Converter_ConvertBlock/txt2 (0.00s) === RUN TestLZ4Converter_ConvertBlock/txt3 lz4convert_test.go:42: input size: 426754 lz4convert_test.go:43: lz4 size: 207038 lz4convert_test.go:60: lz4->snappy size: 206693 lz4convert_test.go:79: lz4->s2 size: 206654 lz4convert_test.go:91: s2 (default) size: 220380 lz4convert_test.go:95: s2 (better) size: 184936 lz4convert_test.go:97: lz4 -> s2 bytes saved: 384 lz4convert_test.go:98: lz4 -> snappy bytes saved: 345 lz4convert_test.go:99: data -> s2 (default) bytes saved: -13342 lz4convert_test.go:100: data -> s2 (better) bytes saved: 22102 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -13726 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 21718 --- PASS: TestLZ4Converter_ConvertBlock/txt3 (0.01s) === RUN TestLZ4Converter_ConvertBlock/txt4 lz4convert_test.go:42: input size: 481861 lz4convert_test.go:43: lz4 size: 277731 lz4convert_test.go:60: lz4->snappy size: 286863 lz4convert_test.go:79: lz4->s2 size: 286856 lz4convert_test.go:91: s2 (default) size: 318193 lz4convert_test.go:95: s2 (better) size: 264987 lz4convert_test.go:97: lz4 -> s2 bytes saved: -9125 lz4convert_test.go:98: lz4 -> snappy bytes saved: -9132 lz4convert_test.go:99: data -> s2 (default) bytes saved: -40462 lz4convert_test.go:100: data -> s2 (better) bytes saved: 12744 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -31337 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 21869 --- PASS: TestLZ4Converter_ConvertBlock/txt4 (0.01s) === RUN TestLZ4Converter_ConvertBlock/pb lz4convert_test.go:42: input size: 118588 lz4convert_test.go:43: lz4 size: 19003 lz4convert_test.go:60: lz4->snappy size: 21130 lz4convert_test.go:79: lz4->s2 size: 19002 lz4convert_test.go:91: s2 (default) size: 18603 lz4convert_test.go:95: s2 (better) size: 17686 lz4convert_test.go:97: lz4 -> s2 bytes saved: 1 lz4convert_test.go:98: lz4 -> snappy bytes saved: -2127 lz4convert_test.go:99: data -> s2 (default) bytes saved: 400 lz4convert_test.go:100: data -> s2 (better) bytes saved: 1317 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: 399 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 1316 --- PASS: TestLZ4Converter_ConvertBlock/pb (0.00s) === RUN TestLZ4Converter_ConvertBlock/gaviota lz4convert_test.go:42: input size: 184320 lz4convert_test.go:43: lz4 size: 71749 lz4convert_test.go:60: lz4->snappy size: 63392 lz4convert_test.go:79: lz4->s2 size: 62446 lz4convert_test.go:91: s2 (default) size: 65016 lz4convert_test.go:95: s2 (better) size: 55395 lz4convert_test.go:97: lz4 -> s2 bytes saved: 9303 lz4convert_test.go:98: lz4 -> snappy bytes saved: 8357 lz4convert_test.go:99: data -> s2 (default) bytes saved: 6733 lz4convert_test.go:100: data -> s2 (better) bytes saved: 16354 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -2570 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 7051 --- PASS: TestLZ4Converter_ConvertBlock/gaviota (0.00s) === RUN TestLZ4Converter_ConvertBlock/txt1_128b lz4convert_test.go:42: input size: 128 lz4convert_test.go:43: lz4 size: 84 lz4convert_test.go:60: lz4->snappy size: 84 lz4convert_test.go:79: lz4->s2 size: 84 lz4convert_test.go:91: s2 (default) size: 80 lz4convert_test.go:95: s2 (better) size: 76 lz4convert_test.go:97: lz4 -> s2 bytes saved: 0 lz4convert_test.go:98: lz4 -> snappy bytes saved: 0 lz4convert_test.go:99: data -> s2 (default) bytes saved: 4 lz4convert_test.go:100: data -> s2 (better) bytes saved: 8 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: 4 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 8 --- PASS: TestLZ4Converter_ConvertBlock/txt1_128b (0.00s) === RUN TestLZ4Converter_ConvertBlock/txt1_1000b lz4convert_test.go:42: input size: 1000 lz4convert_test.go:43: lz4 size: 807 lz4convert_test.go:60: lz4->snappy size: 791 lz4convert_test.go:79: lz4->s2 size: 791 lz4convert_test.go:91: s2 (default) size: 772 lz4convert_test.go:95: s2 (better) size: 744 lz4convert_test.go:97: lz4 -> s2 bytes saved: 16 lz4convert_test.go:98: lz4 -> snappy bytes saved: 16 lz4convert_test.go:99: data -> s2 (default) bytes saved: 35 lz4convert_test.go:100: data -> s2 (better) bytes saved: 63 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: 19 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 47 --- PASS: TestLZ4Converter_ConvertBlock/txt1_1000b (0.00s) === RUN TestLZ4Converter_ConvertBlock/txt1_10000b lz4convert_test.go:42: input size: 10000 lz4convert_test.go:43: lz4 size: 6969 lz4convert_test.go:60: lz4->snappy size: 6879 lz4convert_test.go:79: lz4->s2 size: 6879 lz4convert_test.go:91: s2 (default) size: 6931 lz4convert_test.go:95: s2 (better) size: 6216 lz4convert_test.go:97: lz4 -> s2 bytes saved: 90 lz4convert_test.go:98: lz4 -> snappy bytes saved: 90 lz4convert_test.go:99: data -> s2 (default) bytes saved: 38 lz4convert_test.go:100: data -> s2 (better) bytes saved: 753 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -52 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 663 --- PASS: TestLZ4Converter_ConvertBlock/txt1_10000b (0.00s) === RUN TestLZ4Converter_ConvertBlock/txt1_20000b lz4convert_test.go:42: input size: 20000 lz4convert_test.go:43: lz4 size: 12750 lz4convert_test.go:60: lz4->snappy size: 12755 lz4convert_test.go:79: lz4->s2 size: 12755 lz4convert_test.go:91: s2 (default) size: 13513 lz4convert_test.go:95: s2 (better) size: 11489 lz4convert_test.go:97: lz4 -> s2 bytes saved: -5 lz4convert_test.go:98: lz4 -> snappy bytes saved: -5 lz4convert_test.go:99: data -> s2 (default) bytes saved: -763 lz4convert_test.go:100: data -> s2 (better) bytes saved: 1261 lz4convert_test.go:101: direct data -> s2 (default) compared to converted from lz4: -758 lz4convert_test.go:102: direct data -> s2 (better) compared to converted from lz4: 1266 ``` --- huff0/decompress.go | 2 +- internal/lz4ref/LICENSE | 28 + internal/lz4ref/block.go | 433 + internal/lz4ref/errors.go | 9 + s2/_generate/gen.go | 287 +- s2/decode_other.go | 13 + s2/encode_amd64.go | 2 + s2/encode_go.go | 10 + s2/encodeblock_amd64.go | 10 + s2/encodeblock_amd64.s | 14860 +++++++++--------- s2/lz4convert.go | 585 + s2/lz4convert_test.go | 448 + s2/testdata/fuzz/FuzzLZ4Block.zip | Bin 0 -> 203950 bytes s2/testdata/fuzz/lz4-convert-corpus-raw.zip | Bin 0 -> 19780 bytes 14 files changed, 9575 insertions(+), 7112 deletions(-) create mode 100644 internal/lz4ref/LICENSE create mode 100644 internal/lz4ref/block.go create mode 100644 internal/lz4ref/errors.go create mode 100644 s2/lz4convert.go create mode 100644 s2/lz4convert_test.go create mode 100644 s2/testdata/fuzz/FuzzLZ4Block.zip create mode 100644 s2/testdata/fuzz/lz4-convert-corpus-raw.zip diff --git a/huff0/decompress.go b/huff0/decompress.go index 42a237eac4..3c0b398c72 100644 --- a/huff0/decompress.go +++ b/huff0/decompress.go @@ -61,7 +61,7 @@ func ReadTable(in []byte, s *Scratch) (s2 *Scratch, remain []byte, err error) { b, err := fse.Decompress(in[:iSize], s.fse) s.fse.Out = nil if err != nil { - return s, nil, err + return s, nil, fmt.Errorf("fse decompress returned: %w", err) } if len(b) > 255 { return s, nil, errors.New("corrupt input: output table too large") diff --git a/internal/lz4ref/LICENSE b/internal/lz4ref/LICENSE new file mode 100644 index 0000000000..bd899d8353 --- /dev/null +++ b/internal/lz4ref/LICENSE @@ -0,0 +1,28 @@ +Copyright (c) 2015, Pierre Curto +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +* Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +* Neither the name of xxHash nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + diff --git a/internal/lz4ref/block.go b/internal/lz4ref/block.go new file mode 100644 index 0000000000..cedcd98bc2 --- /dev/null +++ b/internal/lz4ref/block.go @@ -0,0 +1,433 @@ +package lz4ref + +import ( + "encoding/binary" + "fmt" + "math/bits" + "sync" +) + +const ( + // The following constants are used to setup the compression algorithm. + minMatch = 4 // the minimum size of the match sequence size (4 bytes) + winSizeLog = 16 // LZ4 64Kb window size limit + winSize = 1 << winSizeLog + winMask = winSize - 1 // 64Kb window of previous data for dependent blocks + + // hashLog determines the size of the hash table used to quickly find a previous match position. + // Its value influences the compression speed and memory usage, the lower the faster, + // but at the expense of the compression ratio. + // 16 seems to be the best compromise for fast compression. + hashLog = 16 + htSize = 1 << hashLog + + mfLimit = 10 + minMatch // The last match cannot start within the last 14 bytes. +) + +// blockHash hashes the lower five bytes of x into a value < htSize. +func blockHash(x uint64) uint32 { + const prime6bytes = 227718039650203 + x &= 1<<40 - 1 + return uint32((x * prime6bytes) >> (64 - hashLog)) +} + +func CompressBlockBound(n int) int { + return n + n/255 + 16 +} + +type Compressor struct { + // Offsets are at most 64kiB, so we can store only the lower 16 bits of + // match positions: effectively, an offset from some 64kiB block boundary. + // + // When we retrieve such an offset, we interpret it as relative to the last + // block boundary si &^ 0xffff, or the one before, (si &^ 0xffff) - 0x10000, + // depending on which of these is inside the current window. If a table + // entry was generated more than 64kiB back in the input, we find out by + // inspecting the input stream. + table [htSize]uint16 + + // Bitmap indicating which positions in the table are in use. + // This allows us to quickly reset the table for reuse, + // without having to zero everything. + inUse [htSize / 32]uint32 +} + +// Get returns the position of a presumptive match for the hash h. +// The match may be a false positive due to a hash collision or an old entry. +// If si < winSize, the return value may be negative. +func (c *Compressor) get(h uint32, si int) int { + h &= htSize - 1 + i := 0 + if c.inUse[h/32]&(1<<(h%32)) != 0 { + i = int(c.table[h]) + } + i += si &^ winMask + if i >= si { + // Try previous 64kiB block (negative when in first block). + i -= winSize + } + return i +} + +func (c *Compressor) put(h uint32, si int) { + h &= htSize - 1 + c.table[h] = uint16(si) + c.inUse[h/32] |= 1 << (h % 32) +} + +func (c *Compressor) reset() { c.inUse = [htSize / 32]uint32{} } + +var compressorPool = sync.Pool{New: func() interface{} { return new(Compressor) }} + +func CompressBlock(src, dst []byte) (int, error) { + c := compressorPool.Get().(*Compressor) + n, err := c.CompressBlock(src, dst) + compressorPool.Put(c) + return n, err +} + +func (c *Compressor) CompressBlock(src, dst []byte) (int, error) { + // Zero out reused table to avoid non-deterministic output (issue #65). + c.reset() + + const debug = false + + if debug { + fmt.Printf("lz4 block start: len(src): %d, len(dst):%d \n", len(src), len(dst)) + } + + // Return 0, nil only if the destination buffer size is < CompressBlockBound. + isNotCompressible := len(dst) < CompressBlockBound(len(src)) + + // adaptSkipLog sets how quickly the compressor begins skipping blocks when data is incompressible. + // This significantly speeds up incompressible data and usually has very small impact on compression. + // bytes to skip = 1 + (bytes since last match >> adaptSkipLog) + const adaptSkipLog = 7 + + // si: Current position of the search. + // anchor: Position of the current literals. + var si, di, anchor int + sn := len(src) - mfLimit + if sn <= 0 { + goto lastLiterals + } + + // Fast scan strategy: the hash table only stores the last five-byte sequences. + for si < sn { + // Hash the next five bytes (sequence)... + match := binary.LittleEndian.Uint64(src[si:]) + h := blockHash(match) + h2 := blockHash(match >> 8) + + // We check a match at s, s+1 and s+2 and pick the first one we get. + // Checking 3 only requires us to load the source one. + ref := c.get(h, si) + ref2 := c.get(h2, si+1) + c.put(h, si) + c.put(h2, si+1) + + offset := si - ref + + if offset <= 0 || offset >= winSize || uint32(match) != binary.LittleEndian.Uint32(src[ref:]) { + // No match. Start calculating another hash. + // The processor can usually do this out-of-order. + h = blockHash(match >> 16) + ref3 := c.get(h, si+2) + + // Check the second match at si+1 + si += 1 + offset = si - ref2 + + if offset <= 0 || offset >= winSize || uint32(match>>8) != binary.LittleEndian.Uint32(src[ref2:]) { + // No match. Check the third match at si+2 + si += 1 + offset = si - ref3 + c.put(h, si) + + if offset <= 0 || offset >= winSize || uint32(match>>16) != binary.LittleEndian.Uint32(src[ref3:]) { + // Skip one extra byte (at si+3) before we check 3 matches again. + si += 2 + (si-anchor)>>adaptSkipLog + continue + } + } + } + + // Match found. + lLen := si - anchor // Literal length. + // We already matched 4 bytes. + mLen := 4 + + // Extend backwards if we can, reducing literals. + tOff := si - offset - 1 + for lLen > 0 && tOff >= 0 && src[si-1] == src[tOff] { + si-- + tOff-- + lLen-- + mLen++ + } + + // Add the match length, so we continue search at the end. + // Use mLen to store the offset base. + si, mLen = si+mLen, si+minMatch + + // Find the longest match by looking by batches of 8 bytes. + for si+8 <= sn { + x := binary.LittleEndian.Uint64(src[si:]) ^ binary.LittleEndian.Uint64(src[si-offset:]) + if x == 0 { + si += 8 + } else { + // Stop is first non-zero byte. + si += bits.TrailingZeros64(x) >> 3 + break + } + } + + mLen = si - mLen + if di >= len(dst) { + return 0, ErrInvalidSourceShortBuffer + } + if mLen < 0xF { + dst[di] = byte(mLen) + } else { + dst[di] = 0xF + } + + // Encode literals length. + if debug { + fmt.Printf("emit %d literals\n", lLen) + } + if lLen < 0xF { + dst[di] |= byte(lLen << 4) + } else { + dst[di] |= 0xF0 + di++ + l := lLen - 0xF + for ; l >= 0xFF && di < len(dst); l -= 0xFF { + dst[di] = 0xFF + di++ + } + if di >= len(dst) { + return 0, ErrInvalidSourceShortBuffer + } + dst[di] = byte(l) + } + di++ + + // Literals. + if di+lLen > len(dst) { + return 0, ErrInvalidSourceShortBuffer + } + copy(dst[di:di+lLen], src[anchor:anchor+lLen]) + di += lLen + 2 + anchor = si + + // Encode offset. + if debug { + fmt.Printf("emit copy, length: %d, offset: %d\n", mLen+minMatch, offset) + } + if di > len(dst) { + return 0, ErrInvalidSourceShortBuffer + } + dst[di-2], dst[di-1] = byte(offset), byte(offset>>8) + + // Encode match length part 2. + if mLen >= 0xF { + for mLen -= 0xF; mLen >= 0xFF && di < len(dst); mLen -= 0xFF { + dst[di] = 0xFF + di++ + } + if di >= len(dst) { + return 0, ErrInvalidSourceShortBuffer + } + dst[di] = byte(mLen) + di++ + } + // Check if we can load next values. + if si >= sn { + break + } + // Hash match end-2 + h = blockHash(binary.LittleEndian.Uint64(src[si-2:])) + c.put(h, si-2) + } + +lastLiterals: + if isNotCompressible && anchor == 0 { + // Incompressible. + return 0, nil + } + + // Last literals. + if di >= len(dst) { + return 0, ErrInvalidSourceShortBuffer + } + lLen := len(src) - anchor + if lLen < 0xF { + dst[di] = byte(lLen << 4) + } else { + dst[di] = 0xF0 + di++ + for lLen -= 0xF; lLen >= 0xFF && di < len(dst); lLen -= 0xFF { + dst[di] = 0xFF + di++ + } + if di >= len(dst) { + return 0, ErrInvalidSourceShortBuffer + } + dst[di] = byte(lLen) + } + di++ + + // Write the last literals. + if isNotCompressible && di >= anchor { + // Incompressible. + return 0, nil + } + if di+len(src)-anchor > len(dst) { + return 0, ErrInvalidSourceShortBuffer + } + di += copy(dst[di:di+len(src)-anchor], src[anchor:]) + return di, nil +} + +func UncompressBlock(dst, src []byte) (ret int) { + // Restrict capacities so we don't read or write out of bounds. + dst = dst[:len(dst):len(dst)] + src = src[:len(src):len(src)] + + const debug = false + + const hasError = -2 + + if len(src) == 0 { + return hasError + } + + defer func() { + if r := recover(); r != nil { + if debug { + fmt.Println("recover:", r) + } + ret = hasError + } + }() + + var si, di uint + for { + if si >= uint(len(src)) { + return hasError + } + // Literals and match lengths (token). + b := uint(src[si]) + si++ + + // Literals. + if lLen := b >> 4; lLen > 0 { + switch { + case lLen < 0xF && si+16 < uint(len(src)): + // Shortcut 1 + // if we have enough room in src and dst, and the literals length + // is small enough (0..14) then copy all 16 bytes, even if not all + // are part of the literals. + copy(dst[di:], src[si:si+16]) + si += lLen + di += lLen + if debug { + fmt.Println("ll:", lLen) + } + if mLen := b & 0xF; mLen < 0xF { + // Shortcut 2 + // if the match length (4..18) fits within the literals, then copy + // all 18 bytes, even if not all are part of the literals. + mLen += 4 + if offset := u16(src[si:]); mLen <= offset && offset < di { + i := di - offset + // The remaining buffer may not hold 18 bytes. + // See https://github.com/pierrec/lz4/issues/51. + if end := i + 18; end <= uint(len(dst)) { + copy(dst[di:], dst[i:end]) + si += 2 + di += mLen + continue + } + } + } + case lLen == 0xF: + for { + x := uint(src[si]) + if lLen += x; int(lLen) < 0 { + if debug { + fmt.Println("int(lLen) < 0") + } + return hasError + } + si++ + if x != 0xFF { + break + } + } + fallthrough + default: + copy(dst[di:di+lLen], src[si:si+lLen]) + si += lLen + di += lLen + if debug { + fmt.Println("ll:", lLen) + } + + } + } + + mLen := b & 0xF + if si == uint(len(src)) && mLen == 0 { + break + } else if si >= uint(len(src))-2 { + return hasError + } + + offset := u16(src[si:]) + if offset == 0 { + return hasError + } + si += 2 + + // Match. + mLen += minMatch + if mLen == minMatch+0xF { + for { + x := uint(src[si]) + if mLen += x; int(mLen) < 0 { + return hasError + } + si++ + if x != 0xFF { + break + } + } + } + if debug { + fmt.Println("ml:", mLen, "offset:", offset) + } + + // Copy the match. + if di < offset { + return hasError + } + + expanded := dst[di-offset:] + if mLen > offset { + // Efficiently copy the match dst[di-offset:di] into the dst slice. + bytesToCopy := offset * (mLen / offset) + for n := offset; n <= bytesToCopy+offset; n *= 2 { + copy(expanded[n:], expanded[:n]) + } + di += bytesToCopy + mLen -= bytesToCopy + } + di += uint(copy(dst[di:di+mLen], expanded[:mLen])) + } + + return int(di) +} + +func u16(p []byte) uint { return uint(binary.LittleEndian.Uint16(p)) } diff --git a/internal/lz4ref/errors.go b/internal/lz4ref/errors.go new file mode 100644 index 0000000000..b10c5e1b53 --- /dev/null +++ b/internal/lz4ref/errors.go @@ -0,0 +1,9 @@ +package lz4ref + +type Error string + +func (e Error) Error() string { return string(e) } + +const ( + ErrInvalidSourceShortBuffer Error = "lz4: invalid source or destination buffer too short" +) diff --git a/s2/_generate/gen.go b/s2/_generate/gen.go index 19120783ad..983fae77ab 100644 --- a/s2/_generate/gen.go +++ b/s2/_generate/gen.go @@ -8,7 +8,9 @@ import ( "flag" "fmt" "math" + "math/rand" "runtime" + "strings" . "github.com/mmcloughlin/avo/build" "github.com/mmcloughlin/avo/buildtags" @@ -86,6 +88,7 @@ func main() { o.snappy = false o.outputMargin = 0 o.maxLen = math.MaxUint32 + o.maxOffset = math.MaxUint32 - 1 o.genEmitLiteral() o.genEmitRepeat() o.genEmitCopy() @@ -93,6 +96,10 @@ func main() { o.genEmitCopyNoRepeat() o.snappy = false o.genMatchLen() + o.cvtLZ4BlockAsm() + o.snappy = true + o.cvtLZ4BlockAsm() + Generate() } @@ -133,6 +140,7 @@ type options struct { bmi1 bool bmi2 bool maxLen int + maxOffset int outputMargin int // Should be at least 5. maxSkip int } @@ -145,6 +153,7 @@ func (o options) genEncodeBlockAsm(name string, tableBits, skipLog, hashBytes, m Pragma("noescape") o.maxLen = maxLen + o.maxOffset = maxLen - 1 var literalMaxOverhead = maxLitOverheadFor(maxLen) var tableSize = 4 * (1 << tableBits) @@ -446,7 +455,7 @@ func (o options) genEncodeBlockAsm(name string, tableBits, skipLog, hashBytes, m // Emit as copy instead... Label("repeat_as_copy_" + name) } - o.emitCopy("repeat_as_copy_"+name, length, offsetVal, nil, dst, LabelRef("repeat_end_emit_"+name)) + o.emitCopy("repeat_as_copy_"+name, length, offsetVal, nil, dst, nil, LabelRef("repeat_end_emit_"+name)) Label("repeat_end_emit_" + name) // Store new dst and nextEmit @@ -648,7 +657,7 @@ func (o options) genEncodeBlockAsm(name string, tableBits, skipLog, hashBytes, m // length += 4 ADDL(U8(4), length.As32()) MOVL(s, nextEmitL) // nextEmit = s - o.emitCopy("match_nolit_"+name, length, offset, nil, dst, LabelRef("match_nolit_emitcopy_end_"+name)) + o.emitCopy("match_nolit_"+name, length, offset, nil, dst, nil, LabelRef("match_nolit_emitcopy_end_"+name)) Label("match_nolit_emitcopy_end_" + name) // if s >= sLimit { end } @@ -799,6 +808,7 @@ func (o options) genEncodeBetterBlockAsm(name string, lTableBits, sTableBits, sk const sHashBytes = 4 o.maxLen = maxLen + o.maxOffset = maxLen - 1 var lTableSize = 4 * (1 << lTableBits) var sTableSize = 4 * (1 << sTableBits) @@ -1295,7 +1305,7 @@ func (o options) genEncodeBetterBlockAsm(name string, lTableBits, sTableBits, sk // NOT REPEAT { // Check if match is better.. - if o.maxLen > 65535 { + if o.maxOffset > 65535 { CMPL(length.As32(), U8(1)) JG(LabelRef("match_length_ok_" + name)) CMPL(offset32, U32(65535)) @@ -1316,7 +1326,7 @@ func (o options) genEncodeBetterBlockAsm(name string, lTableBits, sTableBits, sk // length += 4 ADDL(U8(4), length.As32()) MOVL(s, nextEmitL) // nextEmit = s - o.emitCopy("match_nolit_"+name, length, offset, nil, dst, LabelRef("match_nolit_emitcopy_end_"+name)) + o.emitCopy("match_nolit_"+name, length, offset, nil, dst, nil, LabelRef("match_nolit_emitcopy_end_"+name)) // Jumps at end } // REPEAT @@ -1679,7 +1689,7 @@ func (o options) genEmitLiteral() { // stack must have at least 32 bytes. // retval will contain emitted bytes, but can be nil if this is not interesting. // dstBase and litBase are updated. -// Uses 2 GP registers. With AVX 4 registers. +// Uses 2 GP registers. // If updateDst is true dstBase will have the updated end pointer and an additional register will be used. func (o options) emitLiteral(name string, litLen, retval, dstBase, litBase reg.GPVirtual, end LabelRef, updateDst bool) { n := GP32() @@ -1891,8 +1901,7 @@ func (o options) emitRepeat(name string, length reg.GPVirtual, offset reg.GPVirt // We have have more than 24 bits // Emit so we have at least 4 bytes left. LEAL(Mem{Base: length, Disp: -(maxRepeat - 4)}, length.As32()) // length -= (maxRepeat - 4) - MOVW(U16(7<<2|tagCopy1), Mem{Base: dstBase}) // dst[0] = 7<<2 | tagCopy1, dst[1] = 0 - MOVW(U16(65531), Mem{Base: dstBase, Disp: 2}) // 0xfffb + MOVL(U32(7<<2|tagCopy1|(65531<<16)), Mem{Base: dstBase}) // dst[0] = 7<<2 | tagCopy1, dst[1] = 0 dst[2+3] = 65531 MOVB(U8(255), Mem{Base: dstBase, Disp: 4}) ADDQ(U8(5), dstBase) if retval != nil { @@ -1995,7 +2004,7 @@ func (o options) genEmitCopy() { Load(Param("dst").Base(), dstBase) Load(Param("offset"), offset) Load(Param("length"), length) - o.emitCopy("standalone", length, offset, retval, dstBase, LabelRef("gen_emit_copy_end")) + o.emitCopy("standalone", length, offset, retval, dstBase, nil, LabelRef("gen_emit_copy_end")) Label("gen_emit_copy_end") Store(retval, ReturnIndex(0)) RET() @@ -2026,7 +2035,7 @@ func (o options) genEmitCopyNoRepeat() { Load(Param("dst").Base(), dstBase) Load(Param("offset"), offset) Load(Param("length"), length) - o.emitCopy("standalone_snappy", length, offset, retval, dstBase, "gen_emit_copy_end_snappy") + o.emitCopy("standalone_snappy", length, offset, retval, dstBase, nil, "gen_emit_copy_end_snappy") Label("gen_emit_copy_end_snappy") Store(retval, ReturnIndex(0)) RET() @@ -2044,10 +2053,19 @@ const ( // retval can be nil. // Will jump to end label when finished. // Uses 2 GP registers. -func (o options) emitCopy(name string, length, offset, retval, dstBase reg.GPVirtual, end LabelRef) { +// dstLimit is optional. If set each iteration will check if dstLimit is reached. +// If so, it will jump to "end" without emitting everything. +func (o options) emitCopy(name string, length, offset, retval, dstBase, dstLimit reg.GPVirtual, end LabelRef) { Comment("emitCopy") - if o.maxLen >= 65536 { + checkDst := func(reg reg.GPVirtual) { + if dstLimit != nil { + CMPQ(reg, dstLimit) + JAE(end) + } + } + + if o.maxOffset >= 65536 { //if offset >= 65536 { CMPL(offset.As32(), U32(65536)) JL(LabelRef("two_byte_offset_" + name)) @@ -2082,8 +2100,10 @@ func (o options) emitCopy(name string, length, offset, retval, dstBase reg.GPVir // Inline call to emitRepeat. Will jump to end if !o.snappy { o.emitRepeat(name+"_emit_copy", length, offset, retval, dstBase, end, false) + } else { + checkDst(dstBase) + JMP(LabelRef("four_bytes_loop_back_" + name)) } - JMP(LabelRef("four_bytes_loop_back_" + name)) Label("four_bytes_remain_" + name) // if length == 0 { @@ -2099,9 +2119,9 @@ func (o options) emitCopy(name string, length, offset, retval, dstBase reg.GPVir // dst[i+3] = uint8(offset >> 16) // dst[i+4] = uint8(offset >> 24) tmp := GP64() - MOVB(U8(tagCopy4), tmp.As8()) + XORL(tmp.As32(), tmp.As32()) // Use displacement to subtract 1 from upshifted length. - LEAL(Mem{Base: tmp, Disp: -(1 << 2), Index: length, Scale: 4}, length.As32()) + LEAL(Mem{Base: tmp, Disp: -(1 << 2) | tagCopy4, Index: length, Scale: 4}, length.As32()) MOVB(length.As8(), Mem{Base: dstBase}) MOVL(offset.As32(), Mem{Base: dstBase, Disp: 1}) // return i + 5 @@ -2168,29 +2188,35 @@ func (o options) emitCopy(name string, length, offset, retval, dstBase reg.GPVir // Inline call to emitRepeat. Will jump to end if !o.snappy { o.emitRepeat(name+"_emit_copy_short", length, offset, retval, dstBase, end, false) + } else { + checkDst(dstBase) + JMP(LabelRef("two_byte_offset_" + name)) } - JMP(LabelRef("two_byte_offset_" + name)) Label("two_byte_offset_short_" + name) + + // Create a length * 4 as early as possible. + length4 := GP32() + MOVL(length.As32(), length4) + SHLL(U8(2), length4) + //if length >= 12 || offset >= 2048 { CMPL(length.As32(), U8(12)) JGE(LabelRef("emit_copy_three_" + name)) - if o.maxLen >= 2048 { + if o.maxOffset >= 2048 { CMPL(offset.As32(), U32(2048)) JGE(LabelRef("emit_copy_three_" + name)) } // Emit the remaining copy, encoded as 2 bytes. // dst[1] = uint8(offset) // dst[0] = uint8(offset>>8)<<5 | uint8(length-4)<<2 | tagCopy1 - tmp := GP64() - MOVB(U8(tagCopy1), tmp.As8()) // Use scale and displacement to shift and subtract values from length. - LEAL(Mem{Base: tmp, Index: length, Scale: 4, Disp: -(4 << 2)}, length.As32()) + LEAL(Mem{Base: length4, Disp: -(4 << 2) | tagCopy1}, length4.As32()) MOVB(offset.As8(), Mem{Base: dstBase, Disp: 1}) // Store offset lower byte SHRL(U8(8), offset.As32()) // Remove lower SHLL(U8(5), offset.As32()) // Shift back up - ORL(offset.As32(), length.As32()) // OR result - MOVB(length.As8(), Mem{Base: dstBase, Disp: 0}) + ORL(offset.As32(), length4.As32()) // OR result + MOVB(length4.As8(), Mem{Base: dstBase, Disp: 0}) if retval != nil { ADDQ(U8(2), retval) // i += 2 } @@ -2203,10 +2229,9 @@ func (o options) emitCopy(name string, length, offset, retval, dstBase reg.GPVir // dst[2] = uint8(offset >> 8) // dst[1] = uint8(offset) // dst[0] = uint8(length-1)<<2 | tagCopy2 - tmp = GP64() - MOVB(U8(tagCopy2), tmp.As8()) - LEAL(Mem{Base: tmp, Disp: -(1 << 2), Index: length, Scale: 4}, length.As32()) - MOVB(length.As8(), Mem{Base: dstBase}) + + LEAL(Mem{Base: length4, Disp: -(1 << 2) | tagCopy2}, length4.As32()) + MOVB(length4.As8(), Mem{Base: dstBase}) MOVW(offset.As16(), Mem{Base: dstBase, Disp: 1}) // return 3 if retval != nil { @@ -2768,3 +2793,215 @@ func (o options) matchLenAlt(name string, a, b, len reg.GPVirtual, end LabelRef) JMP(end) return matched } + +func (o options) cvtLZ4BlockAsm() { + snap := "Asm" + name := "lz4_s2_" + if o.snappy { + snap = "SnappyAsm" + name = "lz4_snappy_" + } + TEXT("cvtLZ4Block"+snap, NOSPLIT, "func(dst, src []byte) (uncompressed int, dstUsed int)") + Doc("cvtLZ4Block converts an LZ4 block to S2", "") + Pragma("noescape") + o.outputMargin = 10 + o.maxOffset = math.MaxUint16 + + const ( + errCorrupt = -1 + errDstTooSmall = -2 + ) + dst, dstLen, src, srcLen, retval := GP64(), GP64(), GP64(), GP64(), GP64() + + // retval = 0 + XORQ(retval, retval) + + Load(Param("dst").Base(), dst) + Load(Param("dst").Len(), dstLen) + Load(Param("src").Base(), src) + Load(Param("src").Len(), srcLen) + srcEnd, dstEnd := GP64(), GP64() + LEAQ(Mem{Base: src, Index: srcLen, Scale: 1, Disp: 0}, srcEnd) + LEAQ(Mem{Base: dst, Index: dstLen, Scale: 1, Disp: -o.outputMargin}, dstEnd) + lastOffset := GP64() + if !o.snappy { + XORQ(lastOffset, lastOffset) + } + + checkSrc := func(reg reg.GPVirtual) { + if debug { + name := fmt.Sprintf(name+"ok_%d", rand.Int31()) + + CMPQ(reg, srcEnd) + JB(LabelRef(name)) + JMP(LabelRef(name + "corrupt")) + Label(name) + } else { + CMPQ(reg, srcEnd) + JAE(LabelRef(name + "corrupt")) + } + } + checkDst := func(reg reg.GPVirtual) { + CMPQ(reg, dstEnd) + JAE(LabelRef(name + "dstfull")) + } + + const lz4MinMatch = 4 + + Label(name + "loop") + checkSrc(src) + checkDst(dst) + token := GP64() + MOVBQZX(Mem{Base: src}, token) + ll, ml := GP64(), GP64() + MOVQ(token, ll) + MOVQ(token, ml) + SHRQ(U8(4), ll) + ANDQ(U8(0xf), ml) + + // If upper nibble is 15, literal length is extended + { + CMPQ(token, U8(0xf0)) + JB(LabelRef(name + "ll_end")) + Label(name + "ll_loop") + INCQ(src) // s++ + checkSrc(src) + val := GP64() + MOVBQZX(Mem{Base: src}, val) + ADDQ(val, ll) + CMPQ(val, U8(255)) + JEQ(LabelRef(name + "ll_loop")) + Label(name + "ll_end") + } + + // if s+ll >= len(src) + endLits := GP64() + LEAQ(Mem{Base: src, Index: ll, Scale: 1}, endLits) + ADDQ(U8(lz4MinMatch), ml) + checkSrc(endLits) + INCQ(src) // s++ + INCQ(endLits) + TESTQ(ll, ll) + JZ(LabelRef(name + "lits_done")) + { + dstEnd := GP64() + LEAQ(Mem{Base: dst, Index: ll, Scale: 1}, dstEnd) + checkDst(dstEnd) + o.outputMargin++ + ADDQ(ll, retval) + o.emitLiteral(strings.TrimRight(name, "_"), ll, nil, dst, src, LabelRef(name+"lits_emit_done"), true) + o.outputMargin-- + Label(name + "lits_emit_done") + MOVQ(endLits, src) + } + Label(name + "lits_done") + // if s == len(src) && ml == lz4MinMatch + CMPQ(src, srcEnd) + JNE(LabelRef(name + "match")) + CMPQ(ml, U8(lz4MinMatch)) + JEQ(LabelRef(name + "done")) + JMP(LabelRef(name + "corrupt")) + + Label(name + "match") + // if s >= len(src)-2 { + end := GP64() + LEAQ(Mem{Base: src, Disp: 2}, end) + checkSrc(end) + offset := GP64() + MOVWQZX(Mem{Base: src}, offset) + MOVQ(end, src) // s = s + 2 + + if debug { + // if offset == 0 { + TESTQ(offset, offset) + JNZ(LabelRef(name + "c1")) + JMP(LabelRef(name + "corrupt")) + + Label(name + "c1") + + // if int(offset) > uncompressed { + CMPQ(offset, retval) + JB(LabelRef(name + "c2")) + JMP(LabelRef(name + "corrupt")) + + Label(name + "c2") + + } else { + // if offset == 0 { + TESTQ(offset, offset) + JZ(LabelRef(name + "corrupt")) + + // if int(offset) > uncompressed { + CMPQ(offset, retval) + JA(LabelRef(name + "corrupt")) + } + + // if ml == lz4MinMatch+15 { + { + CMPQ(ml, U8(lz4MinMatch+15)) + JNE(LabelRef(name + "ml_done")) + + Label(name + "ml_loop") + val := GP64() + MOVBQZX(Mem{Base: src}, val) + INCQ(src) // s++ + ADDQ(val, ml) // ml += val + checkSrc(src) + CMPQ(val, U8(255)) + JEQ(LabelRef(name + "ml_loop")) + } + Label(name + "ml_done") + + // uncompressed += ml + ADDQ(ml, retval) + if !o.snappy { + CMPQ(offset, lastOffset) + JNE(LabelRef(name + "docopy")) + // Offsets can only be 16 bits + { + // emitRepeat16(dst[d:], offset, ml) + o.emitRepeat("lz4_s2", ml, offset, nil, dst, LabelRef(name+"loop"), false) + } + } + Label(name + "docopy") + { + // emitCopy16(dst[d:], offset, ml) + if !o.snappy { + MOVQ(offset, lastOffset) + } + o.emitCopy("lz4_s2", ml, offset, nil, dst, dstEnd, LabelRef(name+"loop")) + } + + Label(name + "done") + { + tmp := GP64() + Load(Param("dst").Base(), tmp) + SUBQ(tmp, dst) + Store(retval, ReturnIndex(0)) + Store(dst, ReturnIndex(1)) + RET() + } + Label(name + "corrupt") + { + tmp := GP64() + if debug { + tmp := GP64() + Load(Param("dst").Base(), tmp) + SUBQ(tmp, dst) + Store(dst, ReturnIndex(1)) + } + XORQ(tmp, tmp) + LEAQ(Mem{Base: tmp, Disp: errCorrupt}, retval) + Store(retval, ReturnIndex(0)) + RET() + } + + Label(name + "dstfull") + { + tmp := GP64() + XORQ(tmp, tmp) + LEAQ(Mem{Base: tmp, Disp: errDstTooSmall}, retval) + Store(retval, ReturnIndex(0)) + RET() + } +} diff --git a/s2/decode_other.go b/s2/decode_other.go index 11300c3a81..2cb55c2c77 100644 --- a/s2/decode_other.go +++ b/s2/decode_other.go @@ -57,6 +57,9 @@ func s2Decode(dst, src []byte) int { } length = int(x) + 1 if length > len(dst)-d || length > len(src)-s || (strconv.IntSize == 32 && length <= 0) { + if debug { + fmt.Println("corrupt: lit size", length) + } return decodeErrCodeCorrupt } if debug { @@ -109,6 +112,10 @@ func s2Decode(dst, src []byte) int { } if offset <= 0 || d < offset || length > len(dst)-d { + if debug { + fmt.Println("corrupt: match, length", length, "offset:", offset, "dst avail:", len(dst)-d, "dst pos:", d) + } + return decodeErrCodeCorrupt } @@ -175,6 +182,9 @@ func s2Decode(dst, src []byte) int { } length = int(x) + 1 if length > len(dst)-d || length > len(src)-s || (strconv.IntSize == 32 && length <= 0) { + if debug { + fmt.Println("corrupt: lit size", length) + } return decodeErrCodeCorrupt } if debug { @@ -241,6 +251,9 @@ func s2Decode(dst, src []byte) int { } if offset <= 0 || d < offset || length > len(dst)-d { + if debug { + fmt.Println("corrupt: match, length", length, "offset:", offset, "dst avail:", len(dst)-d, "dst pos:", d) + } return decodeErrCodeCorrupt } diff --git a/s2/encode_amd64.go b/s2/encode_amd64.go index 6b93daa5ae..ebc332ad5f 100644 --- a/s2/encode_amd64.go +++ b/s2/encode_amd64.go @@ -3,6 +3,8 @@ package s2 +const hasAmd64Asm = true + // encodeBlock encodes a non-empty src to a guaranteed-large-enough dst. It // assumes that the varint-encoded length of the decompressed bytes has already // been written. diff --git a/s2/encode_go.go b/s2/encode_go.go index db08fc355e..46b7b8707d 100644 --- a/s2/encode_go.go +++ b/s2/encode_go.go @@ -7,6 +7,8 @@ import ( "math/bits" ) +const hasAmd64Asm = false + // encodeBlock encodes a non-empty src to a guaranteed-large-enough dst. It // assumes that the varint-encoded length of the decompressed bytes has already // been written. @@ -312,3 +314,11 @@ func matchLen(a []byte, b []byte) int { } return len(a) + checked } + +func cvtLZ4BlockAsm(dst []byte, src []byte) (uncompressed int, dstUsed int) { + panic("cvtLZ4BlockAsm should be unreachable") +} + +func cvtLZ4BlockSnappyAsm(dst []byte, src []byte) (uncompressed int, dstUsed int) { + panic("cvtLZ4BlockSnappyAsm should be unreachable") +} diff --git a/s2/encodeblock_amd64.go b/s2/encodeblock_amd64.go index 7e00bac3ea..b41bceec56 100644 --- a/s2/encodeblock_amd64.go +++ b/s2/encodeblock_amd64.go @@ -192,3 +192,13 @@ func emitCopyNoRepeat(dst []byte, offset int, length int) int // //go:noescape func matchLen(a []byte, b []byte) int + +// cvtLZ4Block converts an LZ4 block to S2 +// +//go:noescape +func cvtLZ4BlockAsm(dst []byte, src []byte) (uncompressed int, dstUsed int) + +// cvtLZ4Block converts an LZ4 block to S2 +// +//go:noescape +func cvtLZ4BlockSnappyAsm(dst []byte, src []byte) (uncompressed int, dstUsed int) diff --git a/s2/encodeblock_amd64.s b/s2/encodeblock_amd64.s index 81a487d6de..24727d2a70 100644 --- a/s2/encodeblock_amd64.s +++ b/s2/encodeblock_amd64.s @@ -36,8 +36,8 @@ zero_loop_encodeBlockAsm: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -47,609 +47,601 @@ zero_loop_encodeBlockAsm: MOVQ src_base+24(FP), DX search_loop_encodeBlockAsm: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x06, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x06, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeBlockAsm - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x32, R10 - SHLQ $0x10, R11 - IMULQ R9, R11 - SHRQ $0x32, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 SHLQ $0x10, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x32, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 + MOVL CX, R8 + SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 JNE no_repeat_found_encodeBlockAsm - LEAL 1(CX), DI - MOVL 12(SP), R8 - MOVL DI, SI - SUBL 16(SP), SI + LEAL 1(CX), SI + MOVL 12(SP), DI + MOVL SI, BX + SUBL 16(SP), BX JZ repeat_extend_back_end_encodeBlockAsm repeat_extend_back_loop_encodeBlockAsm: - CMPL DI, R8 + CMPL SI, DI JLE repeat_extend_back_end_encodeBlockAsm - MOVB -1(DX)(SI*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(BX*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeBlockAsm - LEAL -1(DI), DI - DECL SI + LEAL -1(SI), SI + DECL BX JNZ repeat_extend_back_loop_encodeBlockAsm repeat_extend_back_end_encodeBlockAsm: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeBlockAsm - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeBlockAsm - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeBlockAsm - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JLT three_bytes_repeat_emit_encodeBlockAsm - CMPL SI, $0x01000000 + CMPL BX, $0x01000000 JLT four_bytes_repeat_emit_encodeBlockAsm MOVB $0xfc, (AX) - MOVL SI, 1(AX) + MOVL BX, 1(AX) ADDQ $0x05, AX JMP memmove_long_repeat_emit_encodeBlockAsm four_bytes_repeat_emit_encodeBlockAsm: - MOVL SI, R11 - SHRL $0x10, R11 + MOVL BX, R10 + SHRL $0x10, R10 MOVB $0xf8, (AX) - MOVW SI, 1(AX) - MOVB R11, 3(AX) + MOVW BX, 1(AX) + MOVB R10, 3(AX) ADDQ $0x04, AX JMP memmove_long_repeat_emit_encodeBlockAsm three_bytes_repeat_emit_encodeBlockAsm: MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeBlockAsm two_bytes_repeat_emit_encodeBlockAsm: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeBlockAsm JMP memmove_long_repeat_emit_encodeBlockAsm one_byte_repeat_emit_encodeBlockAsm: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeBlockAsm: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_repeat_emit_encodeBlockAsm_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeBlockAsm_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeBlockAsm_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_repeat_emit_encodeBlockAsm emit_lit_memmove_repeat_emit_encodeBlockAsm_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm emit_lit_memmove_repeat_emit_encodeBlockAsm_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm emit_lit_memmove_repeat_emit_encodeBlockAsm_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_repeat_emit_encodeBlockAsm: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeBlockAsm memmove_long_repeat_emit_encodeBlockAsm: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R12 - SHRQ $0x05, R12 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R13 - SUBQ R11, R13 - DECQ R12 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R11 + SHRQ $0x05, R11 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R12 + SUBQ R10, R12 + DECQ R11 JA emit_lit_memmove_long_repeat_emit_encodeBlockAsmlarge_forward_sse_loop_32 - LEAQ -32(R10)(R13*1), R11 - LEAQ -32(AX)(R13*1), R14 + LEAQ -32(R9)(R12*1), R10 + LEAQ -32(AX)(R12*1), R13 emit_lit_memmove_long_repeat_emit_encodeBlockAsmlarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R14) - MOVOA X5, 16(R14) - ADDQ $0x20, R14 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R13) + MOVOA X5, 16(R13) ADDQ $0x20, R13 - DECQ R12 + ADDQ $0x20, R10 + ADDQ $0x20, R12 + DECQ R11 JNA emit_lit_memmove_long_repeat_emit_encodeBlockAsmlarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeBlockAsmlarge_forward_sse_loop_32: - MOVOU -32(R10)(R13*1), X4 - MOVOU -16(R10)(R13*1), X5 - MOVOA X4, -32(AX)(R13*1) - MOVOA X5, -16(AX)(R13*1) - ADDQ $0x20, R13 - CMPQ R9, R13 + MOVOU -32(R9)(R12*1), X4 + MOVOU -16(R9)(R12*1), X5 + MOVOA X4, -32(AX)(R12*1) + MOVOA X5, -16(AX)(R12*1) + ADDQ $0x20, R12 + CMPQ R8, R12 JAE emit_lit_memmove_long_repeat_emit_encodeBlockAsmlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeBlockAsm: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R9 - SUBL CX, R9 - LEAQ (DX)(CX*1), R10 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), R8 + SUBL CX, R8 + LEAQ (DX)(CX*1), R9 + LEAQ (DX)(BX*1), BX // matchLen - XORL R12, R12 - CMPL R9, $0x08 + XORL R11, R11 + CMPL R8, $0x08 JL matchlen_match4_repeat_extend_encodeBlockAsm matchlen_loopback_repeat_extend_encodeBlockAsm: - MOVQ (R10)(R12*1), R11 - XORQ (SI)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R9)(R11*1), R10 + XORQ (BX)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_repeat_extend_encodeBlockAsm #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP repeat_extend_forward_end_encodeBlockAsm matchlen_loop_repeat_extend_encodeBlockAsm: - LEAL -8(R9), R9 - LEAL 8(R12), R12 - CMPL R9, $0x08 + LEAL -8(R8), R8 + LEAL 8(R11), R11 + CMPL R8, $0x08 JGE matchlen_loopback_repeat_extend_encodeBlockAsm JZ repeat_extend_forward_end_encodeBlockAsm matchlen_match4_repeat_extend_encodeBlockAsm: - CMPL R9, $0x04 + CMPL R8, $0x04 JL matchlen_match2_repeat_extend_encodeBlockAsm - MOVL (R10)(R12*1), R11 - CMPL (SI)(R12*1), R11 + MOVL (R9)(R11*1), R10 + CMPL (BX)(R11*1), R10 JNE matchlen_match2_repeat_extend_encodeBlockAsm - SUBL $0x04, R9 - LEAL 4(R12), R12 + SUBL $0x04, R8 + LEAL 4(R11), R11 matchlen_match2_repeat_extend_encodeBlockAsm: - CMPL R9, $0x02 + CMPL R8, $0x02 JL matchlen_match1_repeat_extend_encodeBlockAsm - MOVW (R10)(R12*1), R11 - CMPW (SI)(R12*1), R11 + MOVW (R9)(R11*1), R10 + CMPW (BX)(R11*1), R10 JNE matchlen_match1_repeat_extend_encodeBlockAsm - SUBL $0x02, R9 - LEAL 2(R12), R12 + SUBL $0x02, R8 + LEAL 2(R11), R11 matchlen_match1_repeat_extend_encodeBlockAsm: - CMPL R9, $0x01 + CMPL R8, $0x01 JL repeat_extend_forward_end_encodeBlockAsm - MOVB (R10)(R12*1), R11 - CMPB (SI)(R12*1), R11 + MOVB (R9)(R11*1), R10 + CMPB (BX)(R11*1), R10 JNE repeat_extend_forward_end_encodeBlockAsm - LEAL 1(R12), R12 + LEAL 1(R11), R11 repeat_extend_forward_end_encodeBlockAsm: - ADDL R12, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI - TESTL R8, R8 + ADDL R11, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI + TESTL DI, DI JZ repeat_as_copy_encodeBlockAsm // emitRepeat emit_repeat_again_match_repeat_encodeBlockAsm: - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_match_repeat_encodeBlockAsm - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_match_repeat_encodeBlockAsm - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_match_repeat_encodeBlockAsm cant_repeat_two_offset_match_repeat_encodeBlockAsm: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_match_repeat_encodeBlockAsm - CMPL SI, $0x00010100 + CMPL BX, $0x00010100 JLT repeat_four_match_repeat_encodeBlockAsm - CMPL SI, $0x0100ffff + CMPL BX, $0x0100ffff JLT repeat_five_match_repeat_encodeBlockAsm - LEAL -16842747(SI), SI - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(BX), BX + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_match_repeat_encodeBlockAsm repeat_five_match_repeat_encodeBlockAsm: - LEAL -65536(SI), SI - MOVL SI, DI + LEAL -65536(BX), BX + MOVL BX, SI MOVW $0x001d, (AX) - MOVW SI, 2(AX) - SARL $0x10, DI - MOVB DI, 4(AX) + MOVW BX, 2(AX) + SARL $0x10, SI + MOVB SI, 4(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm repeat_four_match_repeat_encodeBlockAsm: - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm repeat_three_match_repeat_encodeBlockAsm: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm repeat_two_match_repeat_encodeBlockAsm: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm repeat_two_offset_match_repeat_encodeBlockAsm: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm repeat_as_copy_encodeBlockAsm: // emitCopy - CMPL DI, $0x00010000 + CMPL SI, $0x00010000 JL two_byte_offset_repeat_as_copy_encodeBlockAsm - -four_bytes_loop_back_repeat_as_copy_encodeBlockAsm: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE four_bytes_remain_repeat_as_copy_encodeBlockAsm MOVB $0xff, (AX) - MOVL DI, 1(AX) - LEAL -64(SI), SI + MOVL SI, 1(AX) + LEAL -64(BX), BX ADDQ $0x05, AX - CMPL SI, $0x04 + CMPL BX, $0x04 JL four_bytes_remain_repeat_as_copy_encodeBlockAsm // emitRepeat emit_repeat_again_repeat_as_copy_encodeBlockAsm_emit_copy: - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm_emit_copy - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm_emit_copy - CMPL SI, $0x00010100 + CMPL BX, $0x00010100 JLT repeat_four_repeat_as_copy_encodeBlockAsm_emit_copy - CMPL SI, $0x0100ffff + CMPL BX, $0x0100ffff JLT repeat_five_repeat_as_copy_encodeBlockAsm_emit_copy - LEAL -16842747(SI), SI - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(BX), BX + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_repeat_as_copy_encodeBlockAsm_emit_copy repeat_five_repeat_as_copy_encodeBlockAsm_emit_copy: - LEAL -65536(SI), SI - MOVL SI, DI + LEAL -65536(BX), BX + MOVL BX, SI MOVW $0x001d, (AX) - MOVW SI, 2(AX) - SARL $0x10, DI - MOVB DI, 4(AX) + MOVW BX, 2(AX) + SARL $0x10, SI + MOVB SI, 4(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm repeat_four_repeat_as_copy_encodeBlockAsm_emit_copy: - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm repeat_three_repeat_as_copy_encodeBlockAsm_emit_copy: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm repeat_two_repeat_as_copy_encodeBlockAsm_emit_copy: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm - JMP four_bytes_loop_back_repeat_as_copy_encodeBlockAsm four_bytes_remain_repeat_as_copy_encodeBlockAsm: - TESTL SI, SI + TESTL BX, BX JZ repeat_end_emit_encodeBlockAsm - MOVB $0x03, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVL DI, 1(AX) + XORL DI, DI + LEAL -1(DI)(BX*4), BX + MOVB BL, (AX) + MOVL SI, 1(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm two_byte_offset_repeat_as_copy_encodeBlockAsm: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeBlockAsm - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JAE long_offset_short_repeat_as_copy_encodeBlockAsm - MOVL $0x00000001, R8 - LEAL 16(R8), R8 - MOVB DI, 1(AX) - MOVL DI, R9 - SHRL $0x08, R9 - SHLL $0x05, R9 - ORL R9, R8 - MOVB R8, (AX) + MOVL $0x00000001, DI + LEAL 16(DI), DI + MOVB SI, 1(AX) + MOVL SI, R8 + SHRL $0x08, R8 + SHLL $0x05, R8 + ORL R8, DI + MOVB DI, (AX) ADDQ $0x02, AX - SUBL $0x08, SI + SUBL $0x08, BX // emitRepeat - LEAL -4(SI), SI + LEAL -4(BX), BX JMP cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b emit_repeat_again_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b: - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b - CMPL SI, $0x00010100 + CMPL BX, $0x00010100 JLT repeat_four_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b - CMPL SI, $0x0100ffff + CMPL BX, $0x0100ffff JLT repeat_five_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b - LEAL -16842747(SI), SI - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(BX), BX + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b repeat_five_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b: - LEAL -65536(SI), SI - MOVL SI, DI + LEAL -65536(BX), BX + MOVL BX, SI MOVW $0x001d, (AX) - MOVW SI, 2(AX) - SARL $0x10, DI - MOVB DI, 4(AX) + MOVW BX, 2(AX) + SARL $0x10, SI + MOVB SI, 4(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm repeat_four_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b: - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm repeat_three_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm repeat_two_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short_2b: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm long_offset_short_repeat_as_copy_encodeBlockAsm: MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX // emitRepeat emit_repeat_again_repeat_as_copy_encodeBlockAsm_emit_copy_short: - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm_emit_copy_short - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm_emit_copy_short - CMPL SI, $0x00010100 + CMPL BX, $0x00010100 JLT repeat_four_repeat_as_copy_encodeBlockAsm_emit_copy_short - CMPL SI, $0x0100ffff + CMPL BX, $0x0100ffff JLT repeat_five_repeat_as_copy_encodeBlockAsm_emit_copy_short - LEAL -16842747(SI), SI - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(BX), BX + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_repeat_as_copy_encodeBlockAsm_emit_copy_short repeat_five_repeat_as_copy_encodeBlockAsm_emit_copy_short: - LEAL -65536(SI), SI - MOVL SI, DI + LEAL -65536(BX), BX + MOVL BX, SI MOVW $0x001d, (AX) - MOVW SI, 2(AX) - SARL $0x10, DI - MOVB DI, 4(AX) + MOVW BX, 2(AX) + SARL $0x10, SI + MOVB SI, 4(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm repeat_four_repeat_as_copy_encodeBlockAsm_emit_copy_short: - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm repeat_three_repeat_as_copy_encodeBlockAsm_emit_copy_short: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm repeat_two_repeat_as_copy_encodeBlockAsm_emit_copy_short: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm repeat_two_offset_repeat_as_copy_encodeBlockAsm_emit_copy_short: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm - JMP two_byte_offset_repeat_as_copy_encodeBlockAsm two_byte_offset_short_repeat_as_copy_encodeBlockAsm: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeBlockAsm - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JGE emit_copy_three_repeat_as_copy_encodeBlockAsm - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm emit_copy_three_repeat_as_copy_encodeBlockAsm: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeBlockAsm: @@ -657,16 +649,16 @@ repeat_end_emit_encodeBlockAsm: JMP search_loop_encodeBlockAsm no_repeat_found_encodeBlockAsm: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBlockAsm - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeBlockAsm - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeBlockAsm MOVL 20(SP), CX JMP search_loop_encodeBlockAsm @@ -676,549 +668,542 @@ candidate3_match_encodeBlockAsm: JMP candidate_match_encodeBlockAsm candidate2_match_encodeBlockAsm: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBlockAsm: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBlockAsm match_extend_back_loop_encodeBlockAsm: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBlockAsm - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBlockAsm LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBlockAsm JMP match_extend_back_loop_encodeBlockAsm match_extend_back_end_encodeBlockAsm: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 5(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 5(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBlockAsm MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBlockAsm: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeBlockAsm - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeBlockAsm - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeBlockAsm - CMPL R8, $0x00010000 + CMPL DI, $0x00010000 JLT three_bytes_match_emit_encodeBlockAsm - CMPL R8, $0x01000000 + CMPL DI, $0x01000000 JLT four_bytes_match_emit_encodeBlockAsm MOVB $0xfc, (AX) - MOVL R8, 1(AX) + MOVL DI, 1(AX) ADDQ $0x05, AX JMP memmove_long_match_emit_encodeBlockAsm four_bytes_match_emit_encodeBlockAsm: - MOVL R8, R10 - SHRL $0x10, R10 + MOVL DI, R9 + SHRL $0x10, R9 MOVB $0xf8, (AX) - MOVW R8, 1(AX) - MOVB R10, 3(AX) + MOVW DI, 1(AX) + MOVB R9, 3(AX) ADDQ $0x04, AX JMP memmove_long_match_emit_encodeBlockAsm three_bytes_match_emit_encodeBlockAsm: MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBlockAsm two_bytes_match_emit_encodeBlockAsm: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeBlockAsm JMP memmove_long_match_emit_encodeBlockAsm one_byte_match_emit_encodeBlockAsm: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBlockAsm: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeBlockAsm_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBlockAsm_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBlockAsm_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBlockAsm_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBlockAsm_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeBlockAsm emit_lit_memmove_match_emit_encodeBlockAsm_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm emit_lit_memmove_match_emit_encodeBlockAsm_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm emit_lit_memmove_match_emit_encodeBlockAsm_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBlockAsm: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeBlockAsm memmove_long_match_emit_encodeBlockAsm: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeBlockAsmlarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeBlockAsmlarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeBlockAsmlarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBlockAsmlarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeBlockAsmlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeBlockAsm: match_nolit_loop_encodeBlockAsm: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeBlockAsm matchlen_loopback_match_nolit_encodeBlockAsm: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeBlockAsm #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeBlockAsm matchlen_loop_match_nolit_encodeBlockAsm: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeBlockAsm JZ match_nolit_end_encodeBlockAsm matchlen_match4_match_nolit_encodeBlockAsm: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeBlockAsm - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeBlockAsm - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeBlockAsm: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeBlockAsm - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeBlockAsm - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeBlockAsm: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeBlockAsm - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeBlockAsm - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeBlockAsm: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JL two_byte_offset_match_nolit_encodeBlockAsm - -four_bytes_loop_back_match_nolit_encodeBlockAsm: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE four_bytes_remain_match_nolit_encodeBlockAsm MOVB $0xff, (AX) - MOVL SI, 1(AX) - LEAL -64(R10), R10 + MOVL BX, 1(AX) + LEAL -64(R9), R9 ADDQ $0x05, AX - CMPL R10, $0x04 + CMPL R9, $0x04 JL four_bytes_remain_match_nolit_encodeBlockAsm // emitRepeat emit_repeat_again_match_nolit_encodeBlockAsm_emit_copy: - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm_emit_copy - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy cant_repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm_emit_copy - CMPL R10, $0x00010100 + CMPL R9, $0x00010100 JLT repeat_four_match_nolit_encodeBlockAsm_emit_copy - CMPL R10, $0x0100ffff + CMPL R9, $0x0100ffff JLT repeat_five_match_nolit_encodeBlockAsm_emit_copy - LEAL -16842747(R10), R10 - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(R9), R9 + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_match_nolit_encodeBlockAsm_emit_copy repeat_five_match_nolit_encodeBlockAsm_emit_copy: - LEAL -65536(R10), R10 - MOVL R10, SI + LEAL -65536(R9), R9 + MOVL R9, BX MOVW $0x001d, (AX) - MOVW R10, 2(AX) - SARL $0x10, SI - MOVB SI, 4(AX) + MOVW R9, 2(AX) + SARL $0x10, BX + MOVB BL, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_four_match_nolit_encodeBlockAsm_emit_copy: - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_three_match_nolit_encodeBlockAsm_emit_copy: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_two_match_nolit_encodeBlockAsm_emit_copy: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm - JMP four_bytes_loop_back_match_nolit_encodeBlockAsm four_bytes_remain_match_nolit_encodeBlockAsm: - TESTL R10, R10 + TESTL R9, R9 JZ match_nolit_emitcopy_end_encodeBlockAsm - MOVB $0x03, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVL SI, 1(AX) + XORL SI, SI + LEAL -1(SI)(R9*4), R9 + MOVB R9, (AX) + MOVL BX, 1(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBlockAsm two_byte_offset_match_nolit_encodeBlockAsm: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeBlockAsm - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JAE long_offset_short_match_nolit_encodeBlockAsm - MOVL $0x00000001, DI - LEAL 16(DI), DI - MOVB SI, 1(AX) - MOVL SI, R8 - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, DI - MOVB DI, (AX) + MOVL $0x00000001, SI + LEAL 16(SI), SI + MOVB BL, 1(AX) + MOVL BX, DI + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, SI + MOVB SI, (AX) ADDQ $0x02, AX - SUBL $0x08, R10 + SUBL $0x08, R9 // emitRepeat - LEAL -4(R10), R10 + LEAL -4(R9), R9 JMP cant_repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short_2b emit_repeat_again_match_nolit_encodeBlockAsm_emit_copy_short_2b: - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm_emit_copy_short_2b - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short_2b - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short_2b: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm_emit_copy_short_2b - CMPL R10, $0x00010100 + CMPL R9, $0x00010100 JLT repeat_four_match_nolit_encodeBlockAsm_emit_copy_short_2b - CMPL R10, $0x0100ffff + CMPL R9, $0x0100ffff JLT repeat_five_match_nolit_encodeBlockAsm_emit_copy_short_2b - LEAL -16842747(R10), R10 - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(R9), R9 + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_match_nolit_encodeBlockAsm_emit_copy_short_2b repeat_five_match_nolit_encodeBlockAsm_emit_copy_short_2b: - LEAL -65536(R10), R10 - MOVL R10, SI + LEAL -65536(R9), R9 + MOVL R9, BX MOVW $0x001d, (AX) - MOVW R10, 2(AX) - SARL $0x10, SI - MOVB SI, 4(AX) + MOVW R9, 2(AX) + SARL $0x10, BX + MOVB BL, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_four_match_nolit_encodeBlockAsm_emit_copy_short_2b: - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_three_match_nolit_encodeBlockAsm_emit_copy_short_2b: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_two_match_nolit_encodeBlockAsm_emit_copy_short_2b: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short_2b: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm long_offset_short_match_nolit_encodeBlockAsm: MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX // emitRepeat emit_repeat_again_match_nolit_encodeBlockAsm_emit_copy_short: - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm_emit_copy_short - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm_emit_copy_short - CMPL R10, $0x00010100 + CMPL R9, $0x00010100 JLT repeat_four_match_nolit_encodeBlockAsm_emit_copy_short - CMPL R10, $0x0100ffff + CMPL R9, $0x0100ffff JLT repeat_five_match_nolit_encodeBlockAsm_emit_copy_short - LEAL -16842747(R10), R10 - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(R9), R9 + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_match_nolit_encodeBlockAsm_emit_copy_short repeat_five_match_nolit_encodeBlockAsm_emit_copy_short: - LEAL -65536(R10), R10 - MOVL R10, SI + LEAL -65536(R9), R9 + MOVL R9, BX MOVW $0x001d, (AX) - MOVW R10, 2(AX) - SARL $0x10, SI - MOVB SI, 4(AX) + MOVW R9, 2(AX) + SARL $0x10, BX + MOVB BL, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_four_match_nolit_encodeBlockAsm_emit_copy_short: - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_three_match_nolit_encodeBlockAsm_emit_copy_short: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_two_match_nolit_encodeBlockAsm_emit_copy_short: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm repeat_two_offset_match_nolit_encodeBlockAsm_emit_copy_short: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm - JMP two_byte_offset_match_nolit_encodeBlockAsm two_byte_offset_short_match_nolit_encodeBlockAsm: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeBlockAsm - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JGE emit_copy_three_match_nolit_encodeBlockAsm - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm emit_copy_three_match_nolit_encodeBlockAsm: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeBlockAsm: CMPL CX, 8(SP) JGE emit_remainder_encodeBlockAsm - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeBlockAsm MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeBlockAsm: - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x10, R8 - IMULQ R9, R8 - SHRQ $0x32, R8 - SHLQ $0x10, SI - IMULQ R9, SI - SHRQ $0x32, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x10, DI + IMULQ R8, DI + SHRQ $0x32, DI + SHLQ $0x10, BX + IMULQ R8, BX + SHRQ $0x32, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeBlockAsm INCL CX JMP search_loop_encodeBlockAsm @@ -1422,8 +1407,8 @@ zero_loop_encodeBlockAsm4MB: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -1433,555 +1418,551 @@ zero_loop_encodeBlockAsm4MB: MOVQ src_base+24(FP), DX search_loop_encodeBlockAsm4MB: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x06, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x06, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeBlockAsm4MB - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x32, R10 - SHLQ $0x10, R11 - IMULQ R9, R11 - SHRQ $0x32, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 SHLQ $0x10, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x32, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 + MOVL CX, R8 + SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 JNE no_repeat_found_encodeBlockAsm4MB - LEAL 1(CX), DI - MOVL 12(SP), R8 - MOVL DI, SI - SUBL 16(SP), SI + LEAL 1(CX), SI + MOVL 12(SP), DI + MOVL SI, BX + SUBL 16(SP), BX JZ repeat_extend_back_end_encodeBlockAsm4MB repeat_extend_back_loop_encodeBlockAsm4MB: - CMPL DI, R8 + CMPL SI, DI JLE repeat_extend_back_end_encodeBlockAsm4MB - MOVB -1(DX)(SI*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(BX*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeBlockAsm4MB - LEAL -1(DI), DI - DECL SI + LEAL -1(SI), SI + DECL BX JNZ repeat_extend_back_loop_encodeBlockAsm4MB repeat_extend_back_end_encodeBlockAsm4MB: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeBlockAsm4MB - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeBlockAsm4MB - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeBlockAsm4MB - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JLT three_bytes_repeat_emit_encodeBlockAsm4MB - MOVL SI, R11 - SHRL $0x10, R11 + MOVL BX, R10 + SHRL $0x10, R10 MOVB $0xf8, (AX) - MOVW SI, 1(AX) - MOVB R11, 3(AX) + MOVW BX, 1(AX) + MOVB R10, 3(AX) ADDQ $0x04, AX JMP memmove_long_repeat_emit_encodeBlockAsm4MB three_bytes_repeat_emit_encodeBlockAsm4MB: MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeBlockAsm4MB two_bytes_repeat_emit_encodeBlockAsm4MB: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeBlockAsm4MB JMP memmove_long_repeat_emit_encodeBlockAsm4MB one_byte_repeat_emit_encodeBlockAsm4MB: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeBlockAsm4MB: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_repeat_emit_encodeBlockAsm4MB_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm4MB_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm4MB_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeBlockAsm4MB_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeBlockAsm4MB_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_repeat_emit_encodeBlockAsm4MB emit_lit_memmove_repeat_emit_encodeBlockAsm4MB_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm4MB emit_lit_memmove_repeat_emit_encodeBlockAsm4MB_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm4MB emit_lit_memmove_repeat_emit_encodeBlockAsm4MB_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_repeat_emit_encodeBlockAsm4MB: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeBlockAsm4MB memmove_long_repeat_emit_encodeBlockAsm4MB: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R12 - SHRQ $0x05, R12 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R13 - SUBQ R11, R13 - DECQ R12 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R11 + SHRQ $0x05, R11 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R12 + SUBQ R10, R12 + DECQ R11 JA emit_lit_memmove_long_repeat_emit_encodeBlockAsm4MBlarge_forward_sse_loop_32 - LEAQ -32(R10)(R13*1), R11 - LEAQ -32(AX)(R13*1), R14 + LEAQ -32(R9)(R12*1), R10 + LEAQ -32(AX)(R12*1), R13 emit_lit_memmove_long_repeat_emit_encodeBlockAsm4MBlarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R14) - MOVOA X5, 16(R14) - ADDQ $0x20, R14 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R13) + MOVOA X5, 16(R13) ADDQ $0x20, R13 - DECQ R12 + ADDQ $0x20, R10 + ADDQ $0x20, R12 + DECQ R11 JNA emit_lit_memmove_long_repeat_emit_encodeBlockAsm4MBlarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeBlockAsm4MBlarge_forward_sse_loop_32: - MOVOU -32(R10)(R13*1), X4 - MOVOU -16(R10)(R13*1), X5 - MOVOA X4, -32(AX)(R13*1) - MOVOA X5, -16(AX)(R13*1) - ADDQ $0x20, R13 - CMPQ R9, R13 + MOVOU -32(R9)(R12*1), X4 + MOVOU -16(R9)(R12*1), X5 + MOVOA X4, -32(AX)(R12*1) + MOVOA X5, -16(AX)(R12*1) + ADDQ $0x20, R12 + CMPQ R8, R12 JAE emit_lit_memmove_long_repeat_emit_encodeBlockAsm4MBlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeBlockAsm4MB: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R9 - SUBL CX, R9 - LEAQ (DX)(CX*1), R10 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), R8 + SUBL CX, R8 + LEAQ (DX)(CX*1), R9 + LEAQ (DX)(BX*1), BX // matchLen - XORL R12, R12 - CMPL R9, $0x08 + XORL R11, R11 + CMPL R8, $0x08 JL matchlen_match4_repeat_extend_encodeBlockAsm4MB matchlen_loopback_repeat_extend_encodeBlockAsm4MB: - MOVQ (R10)(R12*1), R11 - XORQ (SI)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R9)(R11*1), R10 + XORQ (BX)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_repeat_extend_encodeBlockAsm4MB #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP repeat_extend_forward_end_encodeBlockAsm4MB matchlen_loop_repeat_extend_encodeBlockAsm4MB: - LEAL -8(R9), R9 - LEAL 8(R12), R12 - CMPL R9, $0x08 + LEAL -8(R8), R8 + LEAL 8(R11), R11 + CMPL R8, $0x08 JGE matchlen_loopback_repeat_extend_encodeBlockAsm4MB JZ repeat_extend_forward_end_encodeBlockAsm4MB matchlen_match4_repeat_extend_encodeBlockAsm4MB: - CMPL R9, $0x04 + CMPL R8, $0x04 JL matchlen_match2_repeat_extend_encodeBlockAsm4MB - MOVL (R10)(R12*1), R11 - CMPL (SI)(R12*1), R11 + MOVL (R9)(R11*1), R10 + CMPL (BX)(R11*1), R10 JNE matchlen_match2_repeat_extend_encodeBlockAsm4MB - SUBL $0x04, R9 - LEAL 4(R12), R12 + SUBL $0x04, R8 + LEAL 4(R11), R11 matchlen_match2_repeat_extend_encodeBlockAsm4MB: - CMPL R9, $0x02 + CMPL R8, $0x02 JL matchlen_match1_repeat_extend_encodeBlockAsm4MB - MOVW (R10)(R12*1), R11 - CMPW (SI)(R12*1), R11 + MOVW (R9)(R11*1), R10 + CMPW (BX)(R11*1), R10 JNE matchlen_match1_repeat_extend_encodeBlockAsm4MB - SUBL $0x02, R9 - LEAL 2(R12), R12 + SUBL $0x02, R8 + LEAL 2(R11), R11 matchlen_match1_repeat_extend_encodeBlockAsm4MB: - CMPL R9, $0x01 + CMPL R8, $0x01 JL repeat_extend_forward_end_encodeBlockAsm4MB - MOVB (R10)(R12*1), R11 - CMPB (SI)(R12*1), R11 + MOVB (R9)(R11*1), R10 + CMPB (BX)(R11*1), R10 JNE repeat_extend_forward_end_encodeBlockAsm4MB - LEAL 1(R12), R12 + LEAL 1(R11), R11 repeat_extend_forward_end_encodeBlockAsm4MB: - ADDL R12, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI - TESTL R8, R8 + ADDL R11, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI + TESTL DI, DI JZ repeat_as_copy_encodeBlockAsm4MB // emitRepeat - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_match_repeat_encodeBlockAsm4MB - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_match_repeat_encodeBlockAsm4MB - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_match_repeat_encodeBlockAsm4MB cant_repeat_two_offset_match_repeat_encodeBlockAsm4MB: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_match_repeat_encodeBlockAsm4MB - CMPL SI, $0x00010100 + CMPL BX, $0x00010100 JLT repeat_four_match_repeat_encodeBlockAsm4MB - LEAL -65536(SI), SI - MOVL SI, DI + LEAL -65536(BX), BX + MOVL BX, SI MOVW $0x001d, (AX) - MOVW SI, 2(AX) - SARL $0x10, DI - MOVB DI, 4(AX) + MOVW BX, 2(AX) + SARL $0x10, SI + MOVB SI, 4(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_four_match_repeat_encodeBlockAsm4MB: - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_three_match_repeat_encodeBlockAsm4MB: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_two_match_repeat_encodeBlockAsm4MB: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_two_offset_match_repeat_encodeBlockAsm4MB: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_as_copy_encodeBlockAsm4MB: // emitCopy - CMPL DI, $0x00010000 + CMPL SI, $0x00010000 JL two_byte_offset_repeat_as_copy_encodeBlockAsm4MB - -four_bytes_loop_back_repeat_as_copy_encodeBlockAsm4MB: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE four_bytes_remain_repeat_as_copy_encodeBlockAsm4MB MOVB $0xff, (AX) - MOVL DI, 1(AX) - LEAL -64(SI), SI + MOVL SI, 1(AX) + LEAL -64(BX), BX ADDQ $0x05, AX - CMPL SI, $0x04 + CMPL BX, $0x04 JL four_bytes_remain_repeat_as_copy_encodeBlockAsm4MB // emitRepeat - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm4MB_emit_copy - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm4MB_emit_copy - CMPL SI, $0x00010100 + CMPL BX, $0x00010100 JLT repeat_four_repeat_as_copy_encodeBlockAsm4MB_emit_copy - LEAL -65536(SI), SI - MOVL SI, DI + LEAL -65536(BX), BX + MOVL BX, SI MOVW $0x001d, (AX) - MOVW SI, 2(AX) - SARL $0x10, DI - MOVB DI, 4(AX) + MOVW BX, 2(AX) + SARL $0x10, SI + MOVB SI, 4(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_four_repeat_as_copy_encodeBlockAsm4MB_emit_copy: - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_three_repeat_as_copy_encodeBlockAsm4MB_emit_copy: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_two_repeat_as_copy_encodeBlockAsm4MB_emit_copy: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB - JMP four_bytes_loop_back_repeat_as_copy_encodeBlockAsm4MB four_bytes_remain_repeat_as_copy_encodeBlockAsm4MB: - TESTL SI, SI + TESTL BX, BX JZ repeat_end_emit_encodeBlockAsm4MB - MOVB $0x03, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVL DI, 1(AX) + XORL DI, DI + LEAL -1(DI)(BX*4), BX + MOVB BL, (AX) + MOVL SI, 1(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm4MB two_byte_offset_repeat_as_copy_encodeBlockAsm4MB: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeBlockAsm4MB - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JAE long_offset_short_repeat_as_copy_encodeBlockAsm4MB - MOVL $0x00000001, R8 - LEAL 16(R8), R8 - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, R8 - MOVB R8, (AX) + MOVL $0x00000001, DI + LEAL 16(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX - SUBL $0x08, SI + SUBL $0x08, BX // emitRepeat - LEAL -4(SI), SI + LEAL -4(BX), BX JMP cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b - CMPL SI, $0x00010100 + CMPL BX, $0x00010100 JLT repeat_four_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b - LEAL -65536(SI), SI - MOVL SI, DI + LEAL -65536(BX), BX + MOVL BX, SI MOVW $0x001d, (AX) - MOVW SI, 2(AX) - SARL $0x10, DI - MOVB DI, 4(AX) + MOVW BX, 2(AX) + SARL $0x10, SI + MOVB SI, 4(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_four_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b: - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_three_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_two_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short_2b: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB long_offset_short_repeat_as_copy_encodeBlockAsm4MB: MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX // emitRepeat - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short - CMPL SI, $0x00010100 + CMPL BX, $0x00010100 JLT repeat_four_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short - LEAL -65536(SI), SI - MOVL SI, DI + LEAL -65536(BX), BX + MOVL BX, SI MOVW $0x001d, (AX) - MOVW SI, 2(AX) - SARL $0x10, DI - MOVB DI, 4(AX) + MOVW BX, 2(AX) + SARL $0x10, SI + MOVB SI, 4(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_four_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short: - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_three_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_two_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB repeat_two_offset_repeat_as_copy_encodeBlockAsm4MB_emit_copy_short: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB - JMP two_byte_offset_repeat_as_copy_encodeBlockAsm4MB two_byte_offset_short_repeat_as_copy_encodeBlockAsm4MB: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeBlockAsm4MB - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JGE emit_copy_three_repeat_as_copy_encodeBlockAsm4MB - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm4MB emit_copy_three_repeat_as_copy_encodeBlockAsm4MB: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeBlockAsm4MB: @@ -1989,16 +1970,16 @@ repeat_end_emit_encodeBlockAsm4MB: JMP search_loop_encodeBlockAsm4MB no_repeat_found_encodeBlockAsm4MB: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBlockAsm4MB - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeBlockAsm4MB - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeBlockAsm4MB MOVL 20(SP), CX JMP search_loop_encodeBlockAsm4MB @@ -2008,506 +1989,502 @@ candidate3_match_encodeBlockAsm4MB: JMP candidate_match_encodeBlockAsm4MB candidate2_match_encodeBlockAsm4MB: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBlockAsm4MB: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBlockAsm4MB match_extend_back_loop_encodeBlockAsm4MB: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBlockAsm4MB - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBlockAsm4MB LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBlockAsm4MB JMP match_extend_back_loop_encodeBlockAsm4MB match_extend_back_end_encodeBlockAsm4MB: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 4(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 4(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBlockAsm4MB MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBlockAsm4MB: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeBlockAsm4MB - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeBlockAsm4MB - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeBlockAsm4MB - CMPL R8, $0x00010000 + CMPL DI, $0x00010000 JLT three_bytes_match_emit_encodeBlockAsm4MB - MOVL R8, R10 - SHRL $0x10, R10 + MOVL DI, R9 + SHRL $0x10, R9 MOVB $0xf8, (AX) - MOVW R8, 1(AX) - MOVB R10, 3(AX) + MOVW DI, 1(AX) + MOVB R9, 3(AX) ADDQ $0x04, AX JMP memmove_long_match_emit_encodeBlockAsm4MB three_bytes_match_emit_encodeBlockAsm4MB: MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBlockAsm4MB two_bytes_match_emit_encodeBlockAsm4MB: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeBlockAsm4MB JMP memmove_long_match_emit_encodeBlockAsm4MB one_byte_match_emit_encodeBlockAsm4MB: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBlockAsm4MB: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeBlockAsm4MB_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBlockAsm4MB_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBlockAsm4MB_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBlockAsm4MB_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBlockAsm4MB_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeBlockAsm4MB emit_lit_memmove_match_emit_encodeBlockAsm4MB_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm4MB emit_lit_memmove_match_emit_encodeBlockAsm4MB_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm4MB emit_lit_memmove_match_emit_encodeBlockAsm4MB_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBlockAsm4MB: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeBlockAsm4MB memmove_long_match_emit_encodeBlockAsm4MB: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeBlockAsm4MBlarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeBlockAsm4MBlarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeBlockAsm4MBlarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBlockAsm4MBlarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeBlockAsm4MBlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeBlockAsm4MB: match_nolit_loop_encodeBlockAsm4MB: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeBlockAsm4MB matchlen_loopback_match_nolit_encodeBlockAsm4MB: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeBlockAsm4MB #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeBlockAsm4MB matchlen_loop_match_nolit_encodeBlockAsm4MB: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeBlockAsm4MB JZ match_nolit_end_encodeBlockAsm4MB matchlen_match4_match_nolit_encodeBlockAsm4MB: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeBlockAsm4MB - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeBlockAsm4MB - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeBlockAsm4MB: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeBlockAsm4MB - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeBlockAsm4MB - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeBlockAsm4MB: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeBlockAsm4MB - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeBlockAsm4MB - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeBlockAsm4MB: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JL two_byte_offset_match_nolit_encodeBlockAsm4MB - -four_bytes_loop_back_match_nolit_encodeBlockAsm4MB: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE four_bytes_remain_match_nolit_encodeBlockAsm4MB MOVB $0xff, (AX) - MOVL SI, 1(AX) - LEAL -64(R10), R10 + MOVL BX, 1(AX) + LEAL -64(R9), R9 ADDQ $0x05, AX - CMPL R10, $0x04 + CMPL R9, $0x04 JL four_bytes_remain_match_nolit_encodeBlockAsm4MB // emitRepeat - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm4MB_emit_copy - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy cant_repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm4MB_emit_copy - CMPL R10, $0x00010100 + CMPL R9, $0x00010100 JLT repeat_four_match_nolit_encodeBlockAsm4MB_emit_copy - LEAL -65536(R10), R10 - MOVL R10, SI + LEAL -65536(R9), R9 + MOVL R9, BX MOVW $0x001d, (AX) - MOVW R10, 2(AX) - SARL $0x10, SI - MOVB SI, 4(AX) + MOVW R9, 2(AX) + SARL $0x10, BX + MOVB BL, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_four_match_nolit_encodeBlockAsm4MB_emit_copy: - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_three_match_nolit_encodeBlockAsm4MB_emit_copy: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_two_match_nolit_encodeBlockAsm4MB_emit_copy: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB - JMP four_bytes_loop_back_match_nolit_encodeBlockAsm4MB four_bytes_remain_match_nolit_encodeBlockAsm4MB: - TESTL R10, R10 + TESTL R9, R9 JZ match_nolit_emitcopy_end_encodeBlockAsm4MB - MOVB $0x03, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVL SI, 1(AX) + XORL SI, SI + LEAL -1(SI)(R9*4), R9 + MOVB R9, (AX) + MOVL BX, 1(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB two_byte_offset_match_nolit_encodeBlockAsm4MB: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeBlockAsm4MB - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JAE long_offset_short_match_nolit_encodeBlockAsm4MB - MOVL $0x00000001, DI - LEAL 16(DI), DI - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, DI - MOVB DI, (AX) + MOVL $0x00000001, SI + LEAL 16(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX - SUBL $0x08, R10 + SUBL $0x08, R9 // emitRepeat - LEAL -4(R10), R10 + LEAL -4(R9), R9 JMP cant_repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b - CMPL R10, $0x00010100 + CMPL R9, $0x00010100 JLT repeat_four_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b - LEAL -65536(R10), R10 - MOVL R10, SI + LEAL -65536(R9), R9 + MOVL R9, BX MOVW $0x001d, (AX) - MOVW R10, 2(AX) - SARL $0x10, SI - MOVB SI, 4(AX) + MOVW R9, 2(AX) + SARL $0x10, BX + MOVB BL, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_four_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b: - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_three_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_two_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short_2b: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB long_offset_short_match_nolit_encodeBlockAsm4MB: MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX // emitRepeat - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm4MB_emit_copy_short - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm4MB_emit_copy_short - CMPL R10, $0x00010100 + CMPL R9, $0x00010100 JLT repeat_four_match_nolit_encodeBlockAsm4MB_emit_copy_short - LEAL -65536(R10), R10 - MOVL R10, SI + LEAL -65536(R9), R9 + MOVL R9, BX MOVW $0x001d, (AX) - MOVW R10, 2(AX) - SARL $0x10, SI - MOVB SI, 4(AX) + MOVW R9, 2(AX) + SARL $0x10, BX + MOVB BL, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_four_match_nolit_encodeBlockAsm4MB_emit_copy_short: - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_three_match_nolit_encodeBlockAsm4MB_emit_copy_short: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_two_match_nolit_encodeBlockAsm4MB_emit_copy_short: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB repeat_two_offset_match_nolit_encodeBlockAsm4MB_emit_copy_short: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB - JMP two_byte_offset_match_nolit_encodeBlockAsm4MB two_byte_offset_short_match_nolit_encodeBlockAsm4MB: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeBlockAsm4MB - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JGE emit_copy_three_match_nolit_encodeBlockAsm4MB - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm4MB emit_copy_three_match_nolit_encodeBlockAsm4MB: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeBlockAsm4MB: CMPL CX, 8(SP) JGE emit_remainder_encodeBlockAsm4MB - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeBlockAsm4MB MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeBlockAsm4MB: - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x10, R8 - IMULQ R9, R8 - SHRQ $0x32, R8 - SHLQ $0x10, SI - IMULQ R9, SI - SHRQ $0x32, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x10, DI + IMULQ R8, DI + SHRQ $0x32, DI + SHLQ $0x10, BX + IMULQ R8, BX + SHRQ $0x32, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeBlockAsm4MB INCL CX JMP search_loop_encodeBlockAsm4MB @@ -2703,8 +2680,8 @@ zero_loop_encodeBlockAsm12B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -2714,428 +2691,426 @@ zero_loop_encodeBlockAsm12B: MOVQ src_base+24(FP), DX search_loop_encodeBlockAsm12B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x05, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x05, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeBlockAsm12B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x000000cf1bbcdcbb, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x18, R10 - IMULQ R9, R10 - SHRQ $0x34, R10 - SHLQ $0x18, R11 - IMULQ R9, R11 - SHRQ $0x34, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x000000cf1bbcdcbb, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x18, R9 + IMULQ R8, R9 + SHRQ $0x34, R9 SHLQ $0x18, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x34, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x18, R9 + IMULQ R8, R9 + SHRQ $0x34, R9 + MOVL CX, R8 + SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 JNE no_repeat_found_encodeBlockAsm12B - LEAL 1(CX), DI - MOVL 12(SP), R8 - MOVL DI, SI - SUBL 16(SP), SI + LEAL 1(CX), SI + MOVL 12(SP), DI + MOVL SI, BX + SUBL 16(SP), BX JZ repeat_extend_back_end_encodeBlockAsm12B repeat_extend_back_loop_encodeBlockAsm12B: - CMPL DI, R8 + CMPL SI, DI JLE repeat_extend_back_end_encodeBlockAsm12B - MOVB -1(DX)(SI*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(BX*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeBlockAsm12B - LEAL -1(DI), DI - DECL SI + LEAL -1(SI), SI + DECL BX JNZ repeat_extend_back_loop_encodeBlockAsm12B repeat_extend_back_end_encodeBlockAsm12B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeBlockAsm12B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeBlockAsm12B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeBlockAsm12B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeBlockAsm12B two_bytes_repeat_emit_encodeBlockAsm12B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeBlockAsm12B JMP memmove_long_repeat_emit_encodeBlockAsm12B one_byte_repeat_emit_encodeBlockAsm12B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeBlockAsm12B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_repeat_emit_encodeBlockAsm12B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm12B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm12B_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeBlockAsm12B_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeBlockAsm12B_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_repeat_emit_encodeBlockAsm12B emit_lit_memmove_repeat_emit_encodeBlockAsm12B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm12B emit_lit_memmove_repeat_emit_encodeBlockAsm12B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm12B emit_lit_memmove_repeat_emit_encodeBlockAsm12B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_repeat_emit_encodeBlockAsm12B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeBlockAsm12B memmove_long_repeat_emit_encodeBlockAsm12B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R12 - SHRQ $0x05, R12 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R13 - SUBQ R11, R13 - DECQ R12 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R11 + SHRQ $0x05, R11 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R12 + SUBQ R10, R12 + DECQ R11 JA emit_lit_memmove_long_repeat_emit_encodeBlockAsm12Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R13*1), R11 - LEAQ -32(AX)(R13*1), R14 + LEAQ -32(R9)(R12*1), R10 + LEAQ -32(AX)(R12*1), R13 emit_lit_memmove_long_repeat_emit_encodeBlockAsm12Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R14) - MOVOA X5, 16(R14) - ADDQ $0x20, R14 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R13) + MOVOA X5, 16(R13) ADDQ $0x20, R13 - DECQ R12 + ADDQ $0x20, R10 + ADDQ $0x20, R12 + DECQ R11 JNA emit_lit_memmove_long_repeat_emit_encodeBlockAsm12Blarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeBlockAsm12Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R13*1), X4 - MOVOU -16(R10)(R13*1), X5 - MOVOA X4, -32(AX)(R13*1) - MOVOA X5, -16(AX)(R13*1) - ADDQ $0x20, R13 - CMPQ R9, R13 + MOVOU -32(R9)(R12*1), X4 + MOVOU -16(R9)(R12*1), X5 + MOVOA X4, -32(AX)(R12*1) + MOVOA X5, -16(AX)(R12*1) + ADDQ $0x20, R12 + CMPQ R8, R12 JAE emit_lit_memmove_long_repeat_emit_encodeBlockAsm12Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeBlockAsm12B: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R9 - SUBL CX, R9 - LEAQ (DX)(CX*1), R10 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), R8 + SUBL CX, R8 + LEAQ (DX)(CX*1), R9 + LEAQ (DX)(BX*1), BX // matchLen - XORL R12, R12 - CMPL R9, $0x08 + XORL R11, R11 + CMPL R8, $0x08 JL matchlen_match4_repeat_extend_encodeBlockAsm12B matchlen_loopback_repeat_extend_encodeBlockAsm12B: - MOVQ (R10)(R12*1), R11 - XORQ (SI)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R9)(R11*1), R10 + XORQ (BX)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_repeat_extend_encodeBlockAsm12B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP repeat_extend_forward_end_encodeBlockAsm12B matchlen_loop_repeat_extend_encodeBlockAsm12B: - LEAL -8(R9), R9 - LEAL 8(R12), R12 - CMPL R9, $0x08 + LEAL -8(R8), R8 + LEAL 8(R11), R11 + CMPL R8, $0x08 JGE matchlen_loopback_repeat_extend_encodeBlockAsm12B JZ repeat_extend_forward_end_encodeBlockAsm12B matchlen_match4_repeat_extend_encodeBlockAsm12B: - CMPL R9, $0x04 + CMPL R8, $0x04 JL matchlen_match2_repeat_extend_encodeBlockAsm12B - MOVL (R10)(R12*1), R11 - CMPL (SI)(R12*1), R11 + MOVL (R9)(R11*1), R10 + CMPL (BX)(R11*1), R10 JNE matchlen_match2_repeat_extend_encodeBlockAsm12B - SUBL $0x04, R9 - LEAL 4(R12), R12 + SUBL $0x04, R8 + LEAL 4(R11), R11 matchlen_match2_repeat_extend_encodeBlockAsm12B: - CMPL R9, $0x02 + CMPL R8, $0x02 JL matchlen_match1_repeat_extend_encodeBlockAsm12B - MOVW (R10)(R12*1), R11 - CMPW (SI)(R12*1), R11 + MOVW (R9)(R11*1), R10 + CMPW (BX)(R11*1), R10 JNE matchlen_match1_repeat_extend_encodeBlockAsm12B - SUBL $0x02, R9 - LEAL 2(R12), R12 + SUBL $0x02, R8 + LEAL 2(R11), R11 matchlen_match1_repeat_extend_encodeBlockAsm12B: - CMPL R9, $0x01 + CMPL R8, $0x01 JL repeat_extend_forward_end_encodeBlockAsm12B - MOVB (R10)(R12*1), R11 - CMPB (SI)(R12*1), R11 + MOVB (R9)(R11*1), R10 + CMPB (BX)(R11*1), R10 JNE repeat_extend_forward_end_encodeBlockAsm12B - LEAL 1(R12), R12 + LEAL 1(R11), R11 repeat_extend_forward_end_encodeBlockAsm12B: - ADDL R12, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI - TESTL R8, R8 + ADDL R11, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI + TESTL DI, DI JZ repeat_as_copy_encodeBlockAsm12B // emitRepeat - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_match_repeat_encodeBlockAsm12B - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_match_repeat_encodeBlockAsm12B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_match_repeat_encodeBlockAsm12B cant_repeat_two_offset_match_repeat_encodeBlockAsm12B: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_match_repeat_encodeBlockAsm12B - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_three_match_repeat_encodeBlockAsm12B: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_two_match_repeat_encodeBlockAsm12B: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_two_offset_match_repeat_encodeBlockAsm12B: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_as_copy_encodeBlockAsm12B: // emitCopy -two_byte_offset_repeat_as_copy_encodeBlockAsm12B: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeBlockAsm12B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JAE long_offset_short_repeat_as_copy_encodeBlockAsm12B - MOVL $0x00000001, R8 - LEAL 16(R8), R8 - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, R8 - MOVB R8, (AX) + MOVL $0x00000001, DI + LEAL 16(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX - SUBL $0x08, SI + SUBL $0x08, BX // emitRepeat - LEAL -4(SI), SI + LEAL -4(BX), BX JMP cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_three_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_two_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short_2b: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm12B long_offset_short_repeat_as_copy_encodeBlockAsm12B: MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX // emitRepeat - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm12B_emit_copy_short - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm12B_emit_copy_short - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_three_repeat_as_copy_encodeBlockAsm12B_emit_copy_short: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_two_repeat_as_copy_encodeBlockAsm12B_emit_copy_short: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm12B repeat_two_offset_repeat_as_copy_encodeBlockAsm12B_emit_copy_short: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm12B - JMP two_byte_offset_repeat_as_copy_encodeBlockAsm12B two_byte_offset_short_repeat_as_copy_encodeBlockAsm12B: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeBlockAsm12B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JGE emit_copy_three_repeat_as_copy_encodeBlockAsm12B - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm12B emit_copy_three_repeat_as_copy_encodeBlockAsm12B: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeBlockAsm12B: @@ -3143,16 +3118,16 @@ repeat_end_emit_encodeBlockAsm12B: JMP search_loop_encodeBlockAsm12B no_repeat_found_encodeBlockAsm12B: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBlockAsm12B - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeBlockAsm12B - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeBlockAsm12B MOVL 20(SP), CX JMP search_loop_encodeBlockAsm12B @@ -3162,391 +3137,389 @@ candidate3_match_encodeBlockAsm12B: JMP candidate_match_encodeBlockAsm12B candidate2_match_encodeBlockAsm12B: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBlockAsm12B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBlockAsm12B match_extend_back_loop_encodeBlockAsm12B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBlockAsm12B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBlockAsm12B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBlockAsm12B JMP match_extend_back_loop_encodeBlockAsm12B match_extend_back_end_encodeBlockAsm12B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBlockAsm12B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBlockAsm12B: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeBlockAsm12B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeBlockAsm12B - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeBlockAsm12B MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBlockAsm12B two_bytes_match_emit_encodeBlockAsm12B: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeBlockAsm12B JMP memmove_long_match_emit_encodeBlockAsm12B one_byte_match_emit_encodeBlockAsm12B: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBlockAsm12B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeBlockAsm12B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBlockAsm12B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBlockAsm12B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBlockAsm12B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBlockAsm12B_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeBlockAsm12B emit_lit_memmove_match_emit_encodeBlockAsm12B_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm12B emit_lit_memmove_match_emit_encodeBlockAsm12B_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm12B emit_lit_memmove_match_emit_encodeBlockAsm12B_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBlockAsm12B: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeBlockAsm12B memmove_long_match_emit_encodeBlockAsm12B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeBlockAsm12Blarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeBlockAsm12Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeBlockAsm12Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBlockAsm12Blarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeBlockAsm12Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeBlockAsm12B: match_nolit_loop_encodeBlockAsm12B: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeBlockAsm12B matchlen_loopback_match_nolit_encodeBlockAsm12B: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeBlockAsm12B #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeBlockAsm12B matchlen_loop_match_nolit_encodeBlockAsm12B: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeBlockAsm12B JZ match_nolit_end_encodeBlockAsm12B matchlen_match4_match_nolit_encodeBlockAsm12B: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeBlockAsm12B - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeBlockAsm12B - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeBlockAsm12B: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeBlockAsm12B - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeBlockAsm12B - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeBlockAsm12B: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeBlockAsm12B - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeBlockAsm12B - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeBlockAsm12B: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy -two_byte_offset_match_nolit_encodeBlockAsm12B: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeBlockAsm12B - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JAE long_offset_short_match_nolit_encodeBlockAsm12B - MOVL $0x00000001, DI - LEAL 16(DI), DI - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, DI - MOVB DI, (AX) + MOVL $0x00000001, SI + LEAL 16(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX - SUBL $0x08, R10 + SUBL $0x08, R9 // emitRepeat - LEAL -4(R10), R10 + LEAL -4(R9), R9 JMP cant_repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short_2b - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm12B_emit_copy_short_2b - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short_2b - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short_2b: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm12B_emit_copy_short_2b - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B repeat_three_match_nolit_encodeBlockAsm12B_emit_copy_short_2b: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B repeat_two_match_nolit_encodeBlockAsm12B_emit_copy_short_2b: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short_2b: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B long_offset_short_match_nolit_encodeBlockAsm12B: MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX // emitRepeat - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm12B_emit_copy_short - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm12B_emit_copy_short - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B repeat_three_match_nolit_encodeBlockAsm12B_emit_copy_short: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B repeat_two_match_nolit_encodeBlockAsm12B_emit_copy_short: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B repeat_two_offset_match_nolit_encodeBlockAsm12B_emit_copy_short: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B - JMP two_byte_offset_match_nolit_encodeBlockAsm12B two_byte_offset_short_match_nolit_encodeBlockAsm12B: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeBlockAsm12B - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JGE emit_copy_three_match_nolit_encodeBlockAsm12B - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm12B emit_copy_three_match_nolit_encodeBlockAsm12B: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeBlockAsm12B: CMPL CX, 8(SP) JGE emit_remainder_encodeBlockAsm12B - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeBlockAsm12B MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeBlockAsm12B: - MOVQ $0x000000cf1bbcdcbb, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x18, R8 - IMULQ R9, R8 - SHRQ $0x34, R8 - SHLQ $0x18, SI - IMULQ R9, SI - SHRQ $0x34, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x000000cf1bbcdcbb, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x18, DI + IMULQ R8, DI + SHRQ $0x34, DI + SHLQ $0x18, BX + IMULQ R8, BX + SHRQ $0x34, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeBlockAsm12B INCL CX JMP search_loop_encodeBlockAsm12B @@ -3731,8 +3704,8 @@ zero_loop_encodeBlockAsm10B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -3742,428 +3715,426 @@ zero_loop_encodeBlockAsm10B: MOVQ src_base+24(FP), DX search_loop_encodeBlockAsm10B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x05, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x05, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeBlockAsm10B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x9e3779b1, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x20, R10 - IMULQ R9, R10 - SHRQ $0x36, R10 - SHLQ $0x20, R11 - IMULQ R9, R11 - SHRQ $0x36, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x9e3779b1, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x20, R9 + IMULQ R8, R9 + SHRQ $0x36, R9 SHLQ $0x20, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x36, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x20, R9 + IMULQ R8, R9 + SHRQ $0x36, R9 + MOVL CX, R8 + SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 JNE no_repeat_found_encodeBlockAsm10B - LEAL 1(CX), DI - MOVL 12(SP), R8 - MOVL DI, SI - SUBL 16(SP), SI + LEAL 1(CX), SI + MOVL 12(SP), DI + MOVL SI, BX + SUBL 16(SP), BX JZ repeat_extend_back_end_encodeBlockAsm10B repeat_extend_back_loop_encodeBlockAsm10B: - CMPL DI, R8 + CMPL SI, DI JLE repeat_extend_back_end_encodeBlockAsm10B - MOVB -1(DX)(SI*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(BX*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeBlockAsm10B - LEAL -1(DI), DI - DECL SI + LEAL -1(SI), SI + DECL BX JNZ repeat_extend_back_loop_encodeBlockAsm10B repeat_extend_back_end_encodeBlockAsm10B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeBlockAsm10B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeBlockAsm10B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeBlockAsm10B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeBlockAsm10B two_bytes_repeat_emit_encodeBlockAsm10B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeBlockAsm10B JMP memmove_long_repeat_emit_encodeBlockAsm10B one_byte_repeat_emit_encodeBlockAsm10B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeBlockAsm10B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_repeat_emit_encodeBlockAsm10B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm10B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm10B_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeBlockAsm10B_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeBlockAsm10B_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_repeat_emit_encodeBlockAsm10B emit_lit_memmove_repeat_emit_encodeBlockAsm10B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm10B emit_lit_memmove_repeat_emit_encodeBlockAsm10B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm10B emit_lit_memmove_repeat_emit_encodeBlockAsm10B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_repeat_emit_encodeBlockAsm10B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeBlockAsm10B memmove_long_repeat_emit_encodeBlockAsm10B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R12 - SHRQ $0x05, R12 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R13 - SUBQ R11, R13 - DECQ R12 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R11 + SHRQ $0x05, R11 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R12 + SUBQ R10, R12 + DECQ R11 JA emit_lit_memmove_long_repeat_emit_encodeBlockAsm10Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R13*1), R11 - LEAQ -32(AX)(R13*1), R14 + LEAQ -32(R9)(R12*1), R10 + LEAQ -32(AX)(R12*1), R13 emit_lit_memmove_long_repeat_emit_encodeBlockAsm10Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R14) - MOVOA X5, 16(R14) - ADDQ $0x20, R14 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R13) + MOVOA X5, 16(R13) ADDQ $0x20, R13 - DECQ R12 + ADDQ $0x20, R10 + ADDQ $0x20, R12 + DECQ R11 JNA emit_lit_memmove_long_repeat_emit_encodeBlockAsm10Blarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeBlockAsm10Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R13*1), X4 - MOVOU -16(R10)(R13*1), X5 - MOVOA X4, -32(AX)(R13*1) - MOVOA X5, -16(AX)(R13*1) - ADDQ $0x20, R13 - CMPQ R9, R13 + MOVOU -32(R9)(R12*1), X4 + MOVOU -16(R9)(R12*1), X5 + MOVOA X4, -32(AX)(R12*1) + MOVOA X5, -16(AX)(R12*1) + ADDQ $0x20, R12 + CMPQ R8, R12 JAE emit_lit_memmove_long_repeat_emit_encodeBlockAsm10Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeBlockAsm10B: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R9 - SUBL CX, R9 - LEAQ (DX)(CX*1), R10 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), R8 + SUBL CX, R8 + LEAQ (DX)(CX*1), R9 + LEAQ (DX)(BX*1), BX // matchLen - XORL R12, R12 - CMPL R9, $0x08 + XORL R11, R11 + CMPL R8, $0x08 JL matchlen_match4_repeat_extend_encodeBlockAsm10B matchlen_loopback_repeat_extend_encodeBlockAsm10B: - MOVQ (R10)(R12*1), R11 - XORQ (SI)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R9)(R11*1), R10 + XORQ (BX)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_repeat_extend_encodeBlockAsm10B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP repeat_extend_forward_end_encodeBlockAsm10B matchlen_loop_repeat_extend_encodeBlockAsm10B: - LEAL -8(R9), R9 - LEAL 8(R12), R12 - CMPL R9, $0x08 + LEAL -8(R8), R8 + LEAL 8(R11), R11 + CMPL R8, $0x08 JGE matchlen_loopback_repeat_extend_encodeBlockAsm10B JZ repeat_extend_forward_end_encodeBlockAsm10B matchlen_match4_repeat_extend_encodeBlockAsm10B: - CMPL R9, $0x04 + CMPL R8, $0x04 JL matchlen_match2_repeat_extend_encodeBlockAsm10B - MOVL (R10)(R12*1), R11 - CMPL (SI)(R12*1), R11 + MOVL (R9)(R11*1), R10 + CMPL (BX)(R11*1), R10 JNE matchlen_match2_repeat_extend_encodeBlockAsm10B - SUBL $0x04, R9 - LEAL 4(R12), R12 + SUBL $0x04, R8 + LEAL 4(R11), R11 matchlen_match2_repeat_extend_encodeBlockAsm10B: - CMPL R9, $0x02 + CMPL R8, $0x02 JL matchlen_match1_repeat_extend_encodeBlockAsm10B - MOVW (R10)(R12*1), R11 - CMPW (SI)(R12*1), R11 + MOVW (R9)(R11*1), R10 + CMPW (BX)(R11*1), R10 JNE matchlen_match1_repeat_extend_encodeBlockAsm10B - SUBL $0x02, R9 - LEAL 2(R12), R12 + SUBL $0x02, R8 + LEAL 2(R11), R11 matchlen_match1_repeat_extend_encodeBlockAsm10B: - CMPL R9, $0x01 + CMPL R8, $0x01 JL repeat_extend_forward_end_encodeBlockAsm10B - MOVB (R10)(R12*1), R11 - CMPB (SI)(R12*1), R11 + MOVB (R9)(R11*1), R10 + CMPB (BX)(R11*1), R10 JNE repeat_extend_forward_end_encodeBlockAsm10B - LEAL 1(R12), R12 + LEAL 1(R11), R11 repeat_extend_forward_end_encodeBlockAsm10B: - ADDL R12, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI - TESTL R8, R8 + ADDL R11, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI + TESTL DI, DI JZ repeat_as_copy_encodeBlockAsm10B // emitRepeat - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_match_repeat_encodeBlockAsm10B - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_match_repeat_encodeBlockAsm10B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_match_repeat_encodeBlockAsm10B cant_repeat_two_offset_match_repeat_encodeBlockAsm10B: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_match_repeat_encodeBlockAsm10B - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_three_match_repeat_encodeBlockAsm10B: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_two_match_repeat_encodeBlockAsm10B: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_two_offset_match_repeat_encodeBlockAsm10B: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_as_copy_encodeBlockAsm10B: // emitCopy -two_byte_offset_repeat_as_copy_encodeBlockAsm10B: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeBlockAsm10B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JAE long_offset_short_repeat_as_copy_encodeBlockAsm10B - MOVL $0x00000001, R8 - LEAL 16(R8), R8 - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, R8 - MOVB R8, (AX) + MOVL $0x00000001, DI + LEAL 16(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX - SUBL $0x08, SI + SUBL $0x08, BX // emitRepeat - LEAL -4(SI), SI + LEAL -4(BX), BX JMP cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_three_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_two_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short_2b: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm10B long_offset_short_repeat_as_copy_encodeBlockAsm10B: MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX // emitRepeat - MOVL SI, R8 - LEAL -4(SI), SI - CMPL R8, $0x08 + MOVL BX, DI + LEAL -4(BX), BX + CMPL DI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm10B_emit_copy_short - CMPL R8, $0x0c + CMPL DI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JLT repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm10B_emit_copy_short - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_three_repeat_as_copy_encodeBlockAsm10B_emit_copy_short: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_two_repeat_as_copy_encodeBlockAsm10B_emit_copy_short: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm10B repeat_two_offset_repeat_as_copy_encodeBlockAsm10B_emit_copy_short: - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm10B - JMP two_byte_offset_repeat_as_copy_encodeBlockAsm10B two_byte_offset_short_repeat_as_copy_encodeBlockAsm10B: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeBlockAsm10B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JGE emit_copy_three_repeat_as_copy_encodeBlockAsm10B - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm10B emit_copy_three_repeat_as_copy_encodeBlockAsm10B: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeBlockAsm10B: @@ -4171,16 +4142,16 @@ repeat_end_emit_encodeBlockAsm10B: JMP search_loop_encodeBlockAsm10B no_repeat_found_encodeBlockAsm10B: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBlockAsm10B - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeBlockAsm10B - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeBlockAsm10B MOVL 20(SP), CX JMP search_loop_encodeBlockAsm10B @@ -4190,391 +4161,389 @@ candidate3_match_encodeBlockAsm10B: JMP candidate_match_encodeBlockAsm10B candidate2_match_encodeBlockAsm10B: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBlockAsm10B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBlockAsm10B match_extend_back_loop_encodeBlockAsm10B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBlockAsm10B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBlockAsm10B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBlockAsm10B JMP match_extend_back_loop_encodeBlockAsm10B match_extend_back_end_encodeBlockAsm10B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) - JL match_dst_size_check_encodeBlockAsm10B + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) + JL match_dst_size_check_encodeBlockAsm10B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBlockAsm10B: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeBlockAsm10B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeBlockAsm10B - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeBlockAsm10B MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBlockAsm10B two_bytes_match_emit_encodeBlockAsm10B: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeBlockAsm10B JMP memmove_long_match_emit_encodeBlockAsm10B one_byte_match_emit_encodeBlockAsm10B: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBlockAsm10B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeBlockAsm10B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBlockAsm10B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBlockAsm10B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBlockAsm10B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBlockAsm10B_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeBlockAsm10B emit_lit_memmove_match_emit_encodeBlockAsm10B_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm10B emit_lit_memmove_match_emit_encodeBlockAsm10B_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm10B emit_lit_memmove_match_emit_encodeBlockAsm10B_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBlockAsm10B: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeBlockAsm10B memmove_long_match_emit_encodeBlockAsm10B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeBlockAsm10Blarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeBlockAsm10Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeBlockAsm10Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBlockAsm10Blarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeBlockAsm10Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeBlockAsm10B: match_nolit_loop_encodeBlockAsm10B: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeBlockAsm10B matchlen_loopback_match_nolit_encodeBlockAsm10B: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeBlockAsm10B #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeBlockAsm10B matchlen_loop_match_nolit_encodeBlockAsm10B: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeBlockAsm10B JZ match_nolit_end_encodeBlockAsm10B matchlen_match4_match_nolit_encodeBlockAsm10B: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeBlockAsm10B - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeBlockAsm10B - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeBlockAsm10B: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeBlockAsm10B - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeBlockAsm10B - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeBlockAsm10B: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeBlockAsm10B - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeBlockAsm10B - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeBlockAsm10B: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy -two_byte_offset_match_nolit_encodeBlockAsm10B: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeBlockAsm10B - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JAE long_offset_short_match_nolit_encodeBlockAsm10B - MOVL $0x00000001, DI - LEAL 16(DI), DI - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, DI - MOVB DI, (AX) + MOVL $0x00000001, SI + LEAL 16(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX - SUBL $0x08, R10 + SUBL $0x08, R9 // emitRepeat - LEAL -4(R10), R10 + LEAL -4(R9), R9 JMP cant_repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short_2b - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm10B_emit_copy_short_2b - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short_2b - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short_2b: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm10B_emit_copy_short_2b - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B repeat_three_match_nolit_encodeBlockAsm10B_emit_copy_short_2b: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B repeat_two_match_nolit_encodeBlockAsm10B_emit_copy_short_2b: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short_2b: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B long_offset_short_match_nolit_encodeBlockAsm10B: MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX // emitRepeat - MOVL R10, DI - LEAL -4(R10), R10 - CMPL DI, $0x08 + MOVL R9, SI + LEAL -4(R9), R9 + CMPL SI, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm10B_emit_copy_short - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm10B_emit_copy_short - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B repeat_three_match_nolit_encodeBlockAsm10B_emit_copy_short: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B repeat_two_match_nolit_encodeBlockAsm10B_emit_copy_short: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B repeat_two_offset_match_nolit_encodeBlockAsm10B_emit_copy_short: - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B - JMP two_byte_offset_match_nolit_encodeBlockAsm10B two_byte_offset_short_match_nolit_encodeBlockAsm10B: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeBlockAsm10B - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JGE emit_copy_three_match_nolit_encodeBlockAsm10B - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm10B emit_copy_three_match_nolit_encodeBlockAsm10B: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeBlockAsm10B: CMPL CX, 8(SP) JGE emit_remainder_encodeBlockAsm10B - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeBlockAsm10B MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeBlockAsm10B: - MOVQ $0x9e3779b1, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x20, R8 - IMULQ R9, R8 - SHRQ $0x36, R8 - SHLQ $0x20, SI - IMULQ R9, SI - SHRQ $0x36, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x9e3779b1, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x20, DI + IMULQ R8, DI + SHRQ $0x36, DI + SHLQ $0x20, BX + IMULQ R8, BX + SHRQ $0x36, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeBlockAsm10B INCL CX JMP search_loop_encodeBlockAsm10B @@ -4759,8 +4728,8 @@ zero_loop_encodeBlockAsm8B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -4770,414 +4739,412 @@ zero_loop_encodeBlockAsm8B: MOVQ src_base+24(FP), DX search_loop_encodeBlockAsm8B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x04, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x04, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeBlockAsm8B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x9e3779b1, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x20, R10 - IMULQ R9, R10 - SHRQ $0x38, R10 - SHLQ $0x20, R11 - IMULQ R9, R11 - SHRQ $0x38, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x9e3779b1, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x20, R9 + IMULQ R8, R9 + SHRQ $0x38, R9 SHLQ $0x20, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x38, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x20, R9 + IMULQ R8, R9 + SHRQ $0x38, R9 + MOVL CX, R8 + SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 JNE no_repeat_found_encodeBlockAsm8B - LEAL 1(CX), DI - MOVL 12(SP), R8 - MOVL DI, SI - SUBL 16(SP), SI + LEAL 1(CX), SI + MOVL 12(SP), DI + MOVL SI, BX + SUBL 16(SP), BX JZ repeat_extend_back_end_encodeBlockAsm8B repeat_extend_back_loop_encodeBlockAsm8B: - CMPL DI, R8 + CMPL SI, DI JLE repeat_extend_back_end_encodeBlockAsm8B - MOVB -1(DX)(SI*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(BX*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeBlockAsm8B - LEAL -1(DI), DI - DECL SI + LEAL -1(SI), SI + DECL BX JNZ repeat_extend_back_loop_encodeBlockAsm8B repeat_extend_back_end_encodeBlockAsm8B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeBlockAsm8B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeBlockAsm8B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeBlockAsm8B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeBlockAsm8B two_bytes_repeat_emit_encodeBlockAsm8B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeBlockAsm8B JMP memmove_long_repeat_emit_encodeBlockAsm8B one_byte_repeat_emit_encodeBlockAsm8B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeBlockAsm8B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_repeat_emit_encodeBlockAsm8B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm8B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_repeat_emit_encodeBlockAsm8B_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeBlockAsm8B_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeBlockAsm8B_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_repeat_emit_encodeBlockAsm8B emit_lit_memmove_repeat_emit_encodeBlockAsm8B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm8B emit_lit_memmove_repeat_emit_encodeBlockAsm8B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_repeat_emit_encodeBlockAsm8B emit_lit_memmove_repeat_emit_encodeBlockAsm8B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_repeat_emit_encodeBlockAsm8B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeBlockAsm8B memmove_long_repeat_emit_encodeBlockAsm8B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R12 - SHRQ $0x05, R12 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R13 - SUBQ R11, R13 - DECQ R12 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R11 + SHRQ $0x05, R11 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R12 + SUBQ R10, R12 + DECQ R11 JA emit_lit_memmove_long_repeat_emit_encodeBlockAsm8Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R13*1), R11 - LEAQ -32(AX)(R13*1), R14 + LEAQ -32(R9)(R12*1), R10 + LEAQ -32(AX)(R12*1), R13 emit_lit_memmove_long_repeat_emit_encodeBlockAsm8Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R14) - MOVOA X5, 16(R14) - ADDQ $0x20, R14 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R13) + MOVOA X5, 16(R13) ADDQ $0x20, R13 - DECQ R12 + ADDQ $0x20, R10 + ADDQ $0x20, R12 + DECQ R11 JNA emit_lit_memmove_long_repeat_emit_encodeBlockAsm8Blarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeBlockAsm8Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R13*1), X4 - MOVOU -16(R10)(R13*1), X5 - MOVOA X4, -32(AX)(R13*1) - MOVOA X5, -16(AX)(R13*1) - ADDQ $0x20, R13 - CMPQ R9, R13 + MOVOU -32(R9)(R12*1), X4 + MOVOU -16(R9)(R12*1), X5 + MOVOA X4, -32(AX)(R12*1) + MOVOA X5, -16(AX)(R12*1) + ADDQ $0x20, R12 + CMPQ R8, R12 JAE emit_lit_memmove_long_repeat_emit_encodeBlockAsm8Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeBlockAsm8B: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R9 - SUBL CX, R9 - LEAQ (DX)(CX*1), R10 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), R8 + SUBL CX, R8 + LEAQ (DX)(CX*1), R9 + LEAQ (DX)(BX*1), BX // matchLen - XORL R12, R12 - CMPL R9, $0x08 + XORL R11, R11 + CMPL R8, $0x08 JL matchlen_match4_repeat_extend_encodeBlockAsm8B matchlen_loopback_repeat_extend_encodeBlockAsm8B: - MOVQ (R10)(R12*1), R11 - XORQ (SI)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R9)(R11*1), R10 + XORQ (BX)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_repeat_extend_encodeBlockAsm8B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 - JMP repeat_extend_forward_end_encodeBlockAsm8B + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 + JMP repeat_extend_forward_end_encodeBlockAsm8B matchlen_loop_repeat_extend_encodeBlockAsm8B: - LEAL -8(R9), R9 - LEAL 8(R12), R12 - CMPL R9, $0x08 + LEAL -8(R8), R8 + LEAL 8(R11), R11 + CMPL R8, $0x08 JGE matchlen_loopback_repeat_extend_encodeBlockAsm8B JZ repeat_extend_forward_end_encodeBlockAsm8B matchlen_match4_repeat_extend_encodeBlockAsm8B: - CMPL R9, $0x04 + CMPL R8, $0x04 JL matchlen_match2_repeat_extend_encodeBlockAsm8B - MOVL (R10)(R12*1), R11 - CMPL (SI)(R12*1), R11 + MOVL (R9)(R11*1), R10 + CMPL (BX)(R11*1), R10 JNE matchlen_match2_repeat_extend_encodeBlockAsm8B - SUBL $0x04, R9 - LEAL 4(R12), R12 + SUBL $0x04, R8 + LEAL 4(R11), R11 matchlen_match2_repeat_extend_encodeBlockAsm8B: - CMPL R9, $0x02 + CMPL R8, $0x02 JL matchlen_match1_repeat_extend_encodeBlockAsm8B - MOVW (R10)(R12*1), R11 - CMPW (SI)(R12*1), R11 + MOVW (R9)(R11*1), R10 + CMPW (BX)(R11*1), R10 JNE matchlen_match1_repeat_extend_encodeBlockAsm8B - SUBL $0x02, R9 - LEAL 2(R12), R12 + SUBL $0x02, R8 + LEAL 2(R11), R11 matchlen_match1_repeat_extend_encodeBlockAsm8B: - CMPL R9, $0x01 + CMPL R8, $0x01 JL repeat_extend_forward_end_encodeBlockAsm8B - MOVB (R10)(R12*1), R11 - CMPB (SI)(R12*1), R11 + MOVB (R9)(R11*1), R10 + CMPB (BX)(R11*1), R10 JNE repeat_extend_forward_end_encodeBlockAsm8B - LEAL 1(R12), R12 + LEAL 1(R11), R11 repeat_extend_forward_end_encodeBlockAsm8B: - ADDL R12, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI - TESTL R8, R8 + ADDL R11, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI + TESTL DI, DI JZ repeat_as_copy_encodeBlockAsm8B // emitRepeat - MOVL SI, DI - LEAL -4(SI), SI - CMPL DI, $0x08 + MOVL BX, SI + LEAL -4(BX), BX + CMPL SI, $0x08 JLE repeat_two_match_repeat_encodeBlockAsm8B - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_match_repeat_encodeBlockAsm8B cant_repeat_two_offset_match_repeat_encodeBlockAsm8B: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_match_repeat_encodeBlockAsm8B - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm8B repeat_three_match_repeat_encodeBlockAsm8B: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm8B repeat_two_match_repeat_encodeBlockAsm8B: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm8B - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm8B repeat_as_copy_encodeBlockAsm8B: // emitCopy -two_byte_offset_repeat_as_copy_encodeBlockAsm8B: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeBlockAsm8B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JAE long_offset_short_repeat_as_copy_encodeBlockAsm8B - MOVL $0x00000001, R8 - LEAL 16(R8), R8 - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, R8 - MOVB R8, (AX) + MOVL $0x00000001, DI + LEAL 16(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX - SUBL $0x08, SI + SUBL $0x08, BX // emitRepeat - LEAL -4(SI), SI + LEAL -4(BX), BX JMP cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm8B_emit_copy_short_2b - MOVL SI, DI - LEAL -4(SI), SI - CMPL DI, $0x08 + MOVL BX, SI + LEAL -4(BX), BX + CMPL SI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm8B_emit_copy_short_2b - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm8B_emit_copy_short_2b cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm8B_emit_copy_short_2b: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm8B_emit_copy_short_2b - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm8B repeat_three_repeat_as_copy_encodeBlockAsm8B_emit_copy_short_2b: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm8B repeat_two_repeat_as_copy_encodeBlockAsm8B_emit_copy_short_2b: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm8B - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm8B long_offset_short_repeat_as_copy_encodeBlockAsm8B: MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX // emitRepeat - MOVL SI, DI - LEAL -4(SI), SI - CMPL DI, $0x08 + MOVL BX, SI + LEAL -4(BX), BX + CMPL SI, $0x08 JLE repeat_two_repeat_as_copy_encodeBlockAsm8B_emit_copy_short - CMPL DI, $0x0c + CMPL SI, $0x0c JGE cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm8B_emit_copy_short cant_repeat_two_offset_repeat_as_copy_encodeBlockAsm8B_emit_copy_short: - CMPL SI, $0x00000104 + CMPL BX, $0x00000104 JLT repeat_three_repeat_as_copy_encodeBlockAsm8B_emit_copy_short - LEAL -256(SI), SI + LEAL -256(BX), BX MOVW $0x0019, (AX) - MOVW SI, 2(AX) + MOVW BX, 2(AX) ADDQ $0x04, AX JMP repeat_end_emit_encodeBlockAsm8B repeat_three_repeat_as_copy_encodeBlockAsm8B_emit_copy_short: - LEAL -4(SI), SI + LEAL -4(BX), BX MOVW $0x0015, (AX) - MOVB SI, 2(AX) + MOVB BL, 2(AX) ADDQ $0x03, AX JMP repeat_end_emit_encodeBlockAsm8B repeat_two_repeat_as_copy_encodeBlockAsm8B_emit_copy_short: - SHLL $0x02, SI - ORL $0x01, SI - MOVW SI, (AX) + SHLL $0x02, BX + ORL $0x01, BX + MOVW BX, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm8B - XORQ R8, R8 - LEAL 1(R8)(SI*4), SI - MOVB DI, 1(AX) - SARL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + XORQ DI, DI + LEAL 1(DI)(BX*4), BX + MOVB SI, 1(AX) + SARL $0x08, SI + SHLL $0x05, SI + ORL SI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm8B - JMP two_byte_offset_repeat_as_copy_encodeBlockAsm8B two_byte_offset_short_repeat_as_copy_encodeBlockAsm8B: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeBlockAsm8B - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeBlockAsm8B emit_copy_three_repeat_as_copy_encodeBlockAsm8B: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeBlockAsm8B: @@ -5185,16 +5152,16 @@ repeat_end_emit_encodeBlockAsm8B: JMP search_loop_encodeBlockAsm8B no_repeat_found_encodeBlockAsm8B: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBlockAsm8B - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeBlockAsm8B - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeBlockAsm8B MOVL 20(SP), CX JMP search_loop_encodeBlockAsm8B @@ -5204,381 +5171,379 @@ candidate3_match_encodeBlockAsm8B: JMP candidate_match_encodeBlockAsm8B candidate2_match_encodeBlockAsm8B: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBlockAsm8B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBlockAsm8B match_extend_back_loop_encodeBlockAsm8B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBlockAsm8B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBlockAsm8B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBlockAsm8B JMP match_extend_back_loop_encodeBlockAsm8B match_extend_back_end_encodeBlockAsm8B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBlockAsm8B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBlockAsm8B: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeBlockAsm8B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeBlockAsm8B - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeBlockAsm8B MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBlockAsm8B two_bytes_match_emit_encodeBlockAsm8B: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeBlockAsm8B JMP memmove_long_match_emit_encodeBlockAsm8B one_byte_match_emit_encodeBlockAsm8B: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBlockAsm8B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeBlockAsm8B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBlockAsm8B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBlockAsm8B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBlockAsm8B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBlockAsm8B_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeBlockAsm8B emit_lit_memmove_match_emit_encodeBlockAsm8B_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm8B emit_lit_memmove_match_emit_encodeBlockAsm8B_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBlockAsm8B emit_lit_memmove_match_emit_encodeBlockAsm8B_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBlockAsm8B: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeBlockAsm8B memmove_long_match_emit_encodeBlockAsm8B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeBlockAsm8Blarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeBlockAsm8Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeBlockAsm8Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBlockAsm8Blarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeBlockAsm8Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeBlockAsm8B: match_nolit_loop_encodeBlockAsm8B: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeBlockAsm8B matchlen_loopback_match_nolit_encodeBlockAsm8B: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeBlockAsm8B #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeBlockAsm8B matchlen_loop_match_nolit_encodeBlockAsm8B: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeBlockAsm8B JZ match_nolit_end_encodeBlockAsm8B matchlen_match4_match_nolit_encodeBlockAsm8B: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeBlockAsm8B - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeBlockAsm8B - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeBlockAsm8B: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeBlockAsm8B - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeBlockAsm8B - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeBlockAsm8B: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeBlockAsm8B - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeBlockAsm8B - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeBlockAsm8B: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy -two_byte_offset_match_nolit_encodeBlockAsm8B: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeBlockAsm8B - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JAE long_offset_short_match_nolit_encodeBlockAsm8B - MOVL $0x00000001, DI - LEAL 16(DI), DI - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, DI - MOVB DI, (AX) + MOVL $0x00000001, SI + LEAL 16(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX - SUBL $0x08, R10 + SUBL $0x08, R9 // emitRepeat - LEAL -4(R10), R10 + LEAL -4(R9), R9 JMP cant_repeat_two_offset_match_nolit_encodeBlockAsm8B_emit_copy_short_2b - MOVL R10, SI - LEAL -4(R10), R10 - CMPL SI, $0x08 + MOVL R9, BX + LEAL -4(R9), R9 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm8B_emit_copy_short_2b - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm8B_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBlockAsm8B_emit_copy_short_2b: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm8B_emit_copy_short_2b - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B repeat_three_match_nolit_encodeBlockAsm8B_emit_copy_short_2b: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B repeat_two_match_nolit_encodeBlockAsm8B_emit_copy_short_2b: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B long_offset_short_match_nolit_encodeBlockAsm8B: MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX // emitRepeat - MOVL R10, SI - LEAL -4(R10), R10 - CMPL SI, $0x08 + MOVL R9, BX + LEAL -4(R9), R9 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBlockAsm8B_emit_copy_short - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBlockAsm8B_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBlockAsm8B_emit_copy_short: - CMPL R10, $0x00000104 + CMPL R9, $0x00000104 JLT repeat_three_match_nolit_encodeBlockAsm8B_emit_copy_short - LEAL -256(R10), R10 + LEAL -256(R9), R9 MOVW $0x0019, (AX) - MOVW R10, 2(AX) + MOVW R9, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B repeat_three_match_nolit_encodeBlockAsm8B_emit_copy_short: - LEAL -4(R10), R10 + LEAL -4(R9), R9 MOVW $0x0015, (AX) - MOVB R10, 2(AX) + MOVB R9, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B repeat_two_match_nolit_encodeBlockAsm8B_emit_copy_short: - SHLL $0x02, R10 - ORL $0x01, R10 - MOVW R10, (AX) + SHLL $0x02, R9 + ORL $0x01, R9 + MOVW R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B - XORQ DI, DI - LEAL 1(DI)(R10*4), R10 - MOVB SI, 1(AX) - SARL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + XORQ SI, SI + LEAL 1(SI)(R9*4), R9 + MOVB BL, 1(AX) + SARL $0x08, BX + SHLL $0x05, BX + ORL BX, R9 + MOVB R9, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B - JMP two_byte_offset_match_nolit_encodeBlockAsm8B two_byte_offset_short_match_nolit_encodeBlockAsm8B: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeBlockAsm8B - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBlockAsm8B emit_copy_three_match_nolit_encodeBlockAsm8B: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeBlockAsm8B: CMPL CX, 8(SP) JGE emit_remainder_encodeBlockAsm8B - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeBlockAsm8B MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeBlockAsm8B: - MOVQ $0x9e3779b1, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x20, R8 - IMULQ R9, R8 - SHRQ $0x38, R8 - SHLQ $0x20, SI - IMULQ R9, SI - SHRQ $0x38, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x9e3779b1, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x20, DI + IMULQ R8, DI + SHRQ $0x38, DI + SHLQ $0x20, BX + IMULQ R8, BX + SHRQ $0x38, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeBlockAsm8B INCL CX JMP search_loop_encodeBlockAsm8B @@ -5763,8 +5728,8 @@ zero_loop_encodeBetterBlockAsm: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -6(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -5774,818 +5739,810 @@ zero_loop_encodeBetterBlockAsm: MOVQ src_base+24(FP), DX search_loop_encodeBetterBlockAsm: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x07, SI - CMPL SI, $0x63 + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x07, BX + CMPL BX, $0x63 JLE check_maxskip_ok_encodeBetterBlockAsm - LEAL 100(CX), SI + LEAL 100(CX), BX JMP check_maxskip_cont_encodeBetterBlockAsm check_maxskip_ok_encodeBetterBlockAsm: - LEAL 1(CX)(SI*1), SI + LEAL 1(CX)(BX*1), BX check_maxskip_cont_encodeBetterBlockAsm: - CMPL SI, 8(SP) + CMPL BX, 8(SP) JGE emit_remainder_encodeBetterBlockAsm - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x00cf1bbcdcbfa563, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x08, R10 - IMULQ R9, R10 - SHRQ $0x2f, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x32, R11 - MOVL 24(SP)(R10*4), SI - MOVL 524312(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 524312(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x00cf1bbcdcbfa563, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x08, R9 + IMULQ R8, R9 + SHRQ $0x2f, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x32, R10 + MOVL 24(SP)(R9*4), BX + MOVL 524312(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 524312(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeBetterBlockAsm - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeBetterBlockAsm - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeBetterBlockAsm no_short_found_encodeBetterBlockAsm: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeBetterBlockAsm - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeBetterBlockAsm MOVL 20(SP), CX JMP search_loop_encodeBetterBlockAsm candidateS_match_encodeBetterBlockAsm: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x08, R10 - IMULQ R9, R10 - SHRQ $0x2f, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x08, R9 + IMULQ R8, R9 + SHRQ $0x2f, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBetterBlockAsm DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBetterBlockAsm: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBetterBlockAsm match_extend_back_loop_encodeBetterBlockAsm: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBetterBlockAsm - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBetterBlockAsm LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBetterBlockAsm JMP match_extend_back_loop_encodeBetterBlockAsm match_extend_back_end_encodeBetterBlockAsm: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 5(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 5(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBetterBlockAsm MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBetterBlockAsm: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeBetterBlockAsm matchlen_loopback_match_nolit_encodeBetterBlockAsm: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeBetterBlockAsm #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeBetterBlockAsm matchlen_loop_match_nolit_encodeBetterBlockAsm: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeBetterBlockAsm JZ match_nolit_end_encodeBetterBlockAsm matchlen_match4_match_nolit_encodeBetterBlockAsm: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeBetterBlockAsm - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeBetterBlockAsm - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeBetterBlockAsm: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeBetterBlockAsm - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeBetterBlockAsm - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeBetterBlockAsm: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeBetterBlockAsm - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeBetterBlockAsm - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeBetterBlockAsm: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - CMPL 16(SP), R8 + CMPL 16(SP), DI JEQ match_is_repeat_encodeBetterBlockAsm - CMPL R12, $0x01 + CMPL R11, $0x01 JG match_length_ok_encodeBetterBlockAsm - CMPL R8, $0x0000ffff + CMPL DI, $0x0000ffff JLE match_length_ok_encodeBetterBlockAsm MOVL 20(SP), CX INCL CX JMP search_loop_encodeBetterBlockAsm match_length_ok_encodeBetterBlockAsm: - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeBetterBlockAsm - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeBetterBlockAsm - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeBetterBlockAsm - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JLT three_bytes_match_emit_encodeBetterBlockAsm - CMPL SI, $0x01000000 + CMPL BX, $0x01000000 JLT four_bytes_match_emit_encodeBetterBlockAsm MOVB $0xfc, (AX) - MOVL SI, 1(AX) + MOVL BX, 1(AX) ADDQ $0x05, AX JMP memmove_long_match_emit_encodeBetterBlockAsm four_bytes_match_emit_encodeBetterBlockAsm: - MOVL SI, R11 - SHRL $0x10, R11 + MOVL BX, R10 + SHRL $0x10, R10 MOVB $0xf8, (AX) - MOVW SI, 1(AX) - MOVB R11, 3(AX) + MOVW BX, 1(AX) + MOVB R10, 3(AX) ADDQ $0x04, AX JMP memmove_long_match_emit_encodeBetterBlockAsm three_bytes_match_emit_encodeBetterBlockAsm: MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBetterBlockAsm two_bytes_match_emit_encodeBetterBlockAsm: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeBetterBlockAsm JMP memmove_long_match_emit_encodeBetterBlockAsm one_byte_match_emit_encodeBetterBlockAsm: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBetterBlockAsm: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm emit_lit_memmove_match_emit_encodeBetterBlockAsm_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBetterBlockAsm: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeBetterBlockAsm memmove_long_match_emit_encodeBetterBlockAsm: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeBetterBlockAsmlarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeBetterBlockAsmlarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeBetterBlockAsmlarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBetterBlockAsmlarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeBetterBlockAsmlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeBetterBlockAsm: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy - CMPL R8, $0x00010000 + CMPL DI, $0x00010000 JL two_byte_offset_match_nolit_encodeBetterBlockAsm - -four_bytes_loop_back_match_nolit_encodeBetterBlockAsm: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE four_bytes_remain_match_nolit_encodeBetterBlockAsm MOVB $0xff, (AX) - MOVL R8, 1(AX) - LEAL -64(R12), R12 + MOVL DI, 1(AX) + LEAL -64(R11), R11 ADDQ $0x05, AX - CMPL R12, $0x04 + CMPL R11, $0x04 JL four_bytes_remain_match_nolit_encodeBetterBlockAsm // emitRepeat emit_repeat_again_match_nolit_encodeBetterBlockAsm_emit_copy: - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm_emit_copy - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm_emit_copy - CMPL R12, $0x00010100 + CMPL R11, $0x00010100 JLT repeat_four_match_nolit_encodeBetterBlockAsm_emit_copy - CMPL R12, $0x0100ffff + CMPL R11, $0x0100ffff JLT repeat_five_match_nolit_encodeBetterBlockAsm_emit_copy - LEAL -16842747(R12), R12 - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(R11), R11 + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_match_nolit_encodeBetterBlockAsm_emit_copy repeat_five_match_nolit_encodeBetterBlockAsm_emit_copy: - LEAL -65536(R12), R12 - MOVL R12, R8 + LEAL -65536(R11), R11 + MOVL R11, DI MOVW $0x001d, (AX) - MOVW R12, 2(AX) - SARL $0x10, R8 - MOVB R8, 4(AX) + MOVW R11, 2(AX) + SARL $0x10, DI + MOVB DI, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_four_match_nolit_encodeBetterBlockAsm_emit_copy: - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_three_match_nolit_encodeBetterBlockAsm_emit_copy: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_two_match_nolit_encodeBetterBlockAsm_emit_copy: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm - JMP four_bytes_loop_back_match_nolit_encodeBetterBlockAsm four_bytes_remain_match_nolit_encodeBetterBlockAsm: - TESTL R12, R12 + TESTL R11, R11 JZ match_nolit_emitcopy_end_encodeBetterBlockAsm - MOVB $0x03, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVL R8, 1(AX) + XORL BX, BX + LEAL -1(BX)(R11*4), R11 + MOVB R11, (AX) + MOVL DI, 1(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm two_byte_offset_match_nolit_encodeBetterBlockAsm: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeBetterBlockAsm - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JAE long_offset_short_match_nolit_encodeBetterBlockAsm - MOVL $0x00000001, SI - LEAL 16(SI), SI - MOVB R8, 1(AX) - MOVL R8, R9 - SHRL $0x08, R9 - SHLL $0x05, R9 - ORL R9, SI - MOVB SI, (AX) + MOVL $0x00000001, BX + LEAL 16(BX), BX + MOVB DI, 1(AX) + MOVL DI, R8 + SHRL $0x08, R8 + SHLL $0x05, R8 + ORL R8, BX + MOVB BL, (AX) ADDQ $0x02, AX - SUBL $0x08, R12 + SUBL $0x08, R11 // emitRepeat - LEAL -4(R12), R12 + LEAL -4(R11), R11 JMP cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b emit_repeat_again_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b: - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b - CMPL R12, $0x00010100 + CMPL R11, $0x00010100 JLT repeat_four_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b - CMPL R12, $0x0100ffff + CMPL R11, $0x0100ffff JLT repeat_five_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b - LEAL -16842747(R12), R12 - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(R11), R11 + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b repeat_five_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b: - LEAL -65536(R12), R12 - MOVL R12, R8 + LEAL -65536(R11), R11 + MOVL R11, DI MOVW $0x001d, (AX) - MOVW R12, 2(AX) - SARL $0x10, R8 - MOVB R8, 4(AX) + MOVW R11, 2(AX) + SARL $0x10, DI + MOVB DI, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_four_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b: - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_three_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_two_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short_2b: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm long_offset_short_match_nolit_encodeBetterBlockAsm: MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX // emitRepeat emit_repeat_again_match_nolit_encodeBetterBlockAsm_emit_copy_short: - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm_emit_copy_short - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm_emit_copy_short - CMPL R12, $0x00010100 + CMPL R11, $0x00010100 JLT repeat_four_match_nolit_encodeBetterBlockAsm_emit_copy_short - CMPL R12, $0x0100ffff + CMPL R11, $0x0100ffff JLT repeat_five_match_nolit_encodeBetterBlockAsm_emit_copy_short - LEAL -16842747(R12), R12 - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(R11), R11 + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_match_nolit_encodeBetterBlockAsm_emit_copy_short repeat_five_match_nolit_encodeBetterBlockAsm_emit_copy_short: - LEAL -65536(R12), R12 - MOVL R12, R8 + LEAL -65536(R11), R11 + MOVL R11, DI MOVW $0x001d, (AX) - MOVW R12, 2(AX) - SARL $0x10, R8 - MOVB R8, 4(AX) + MOVW R11, 2(AX) + SARL $0x10, DI + MOVB DI, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_four_match_nolit_encodeBetterBlockAsm_emit_copy_short: - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_three_match_nolit_encodeBetterBlockAsm_emit_copy_short: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_two_match_nolit_encodeBetterBlockAsm_emit_copy_short: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_two_offset_match_nolit_encodeBetterBlockAsm_emit_copy_short: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm - JMP two_byte_offset_match_nolit_encodeBetterBlockAsm two_byte_offset_short_match_nolit_encodeBetterBlockAsm: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeBetterBlockAsm - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JGE emit_copy_three_match_nolit_encodeBetterBlockAsm - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm emit_copy_three_match_nolit_encodeBetterBlockAsm: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm match_is_repeat_encodeBetterBlockAsm: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_repeat_encodeBetterBlockAsm - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_repeat_encodeBetterBlockAsm - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_repeat_encodeBetterBlockAsm - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JLT three_bytes_match_emit_repeat_encodeBetterBlockAsm - CMPL SI, $0x01000000 + CMPL BX, $0x01000000 JLT four_bytes_match_emit_repeat_encodeBetterBlockAsm MOVB $0xfc, (AX) - MOVL SI, 1(AX) + MOVL BX, 1(AX) ADDQ $0x05, AX JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm four_bytes_match_emit_repeat_encodeBetterBlockAsm: - MOVL SI, R11 - SHRL $0x10, R11 + MOVL BX, R10 + SHRL $0x10, R10 MOVB $0xf8, (AX) - MOVW SI, 1(AX) - MOVB R11, 3(AX) + MOVW BX, 1(AX) + MOVB R10, 3(AX) ADDQ $0x04, AX JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm three_bytes_match_emit_repeat_encodeBetterBlockAsm: MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm two_bytes_match_emit_repeat_encodeBetterBlockAsm: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_repeat_encodeBetterBlockAsm JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm one_byte_match_emit_repeat_encodeBetterBlockAsm: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_repeat_encodeBetterBlockAsm: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_17through32 JMP emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_33through64 emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_repeat_encodeBetterBlockAsm memmove_long_match_emit_repeat_encodeBetterBlockAsm: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsmlarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsmlarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsmlarge_big_loop_back emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsmlarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsmlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_repeat_encodeBetterBlockAsm: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitRepeat emit_repeat_again_match_nolit_repeat_encodeBetterBlockAsm: - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_repeat_encodeBetterBlockAsm - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_repeat_encodeBetterBlockAsm - CMPL R12, $0x00010100 + CMPL R11, $0x00010100 JLT repeat_four_match_nolit_repeat_encodeBetterBlockAsm - CMPL R12, $0x0100ffff + CMPL R11, $0x0100ffff JLT repeat_five_match_nolit_repeat_encodeBetterBlockAsm - LEAL -16842747(R12), R12 - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + LEAL -16842747(R11), R11 + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX JMP emit_repeat_again_match_nolit_repeat_encodeBetterBlockAsm repeat_five_match_nolit_repeat_encodeBetterBlockAsm: - LEAL -65536(R12), R12 - MOVL R12, R8 + LEAL -65536(R11), R11 + MOVL R11, DI MOVW $0x001d, (AX) - MOVW R12, 2(AX) - SARL $0x10, R8 - MOVB R8, 4(AX) + MOVW R11, 2(AX) + SARL $0x10, DI + MOVB DI, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_four_match_nolit_repeat_encodeBetterBlockAsm: - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_three_match_nolit_repeat_encodeBetterBlockAsm: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_two_match_nolit_repeat_encodeBetterBlockAsm: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX match_nolit_emitcopy_end_encodeBetterBlockAsm: @@ -6597,50 +6554,50 @@ match_nolit_emitcopy_end_encodeBetterBlockAsm: RET match_nolit_dst_ok_encodeBetterBlockAsm: - MOVQ $0x00cf1bbcdcbfa563, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x08, R10 - IMULQ SI, R10 - SHRQ $0x2f, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x32, R11 - SHLQ $0x08, R12 - IMULQ SI, R12 - SHRQ $0x2f, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x32, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 524312(SP)(R11*4) - MOVL R14, 524312(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x00cf1bbcdcbfa563, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x08, R9 + IMULQ BX, R9 + SHRQ $0x2f, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x32, R10 + SHLQ $0x08, R11 + IMULQ BX, R11 + SHRQ $0x2f, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x32, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 524312(SP)(R10*4) + MOVL R13, 524312(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeBetterBlockAsm: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeBetterBlockAsm - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x08, R8 - IMULQ SI, R8 - SHRQ $0x2f, R8 - SHLQ $0x08, R10 - IMULQ SI, R10 - SHRQ $0x2f, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x08, DI + IMULQ BX, DI + SHRQ $0x2f, DI + SHLQ $0x08, R9 + IMULQ BX, R9 + SHRQ $0x2f, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeBetterBlockAsm emit_remainder_encodeBetterBlockAsm: @@ -6842,8 +6799,8 @@ zero_loop_encodeBetterBlockAsm4MB: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -6(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -6853,756 +6810,752 @@ zero_loop_encodeBetterBlockAsm4MB: MOVQ src_base+24(FP), DX search_loop_encodeBetterBlockAsm4MB: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x07, SI - CMPL SI, $0x63 + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x07, BX + CMPL BX, $0x63 JLE check_maxskip_ok_encodeBetterBlockAsm4MB - LEAL 100(CX), SI + LEAL 100(CX), BX JMP check_maxskip_cont_encodeBetterBlockAsm4MB check_maxskip_ok_encodeBetterBlockAsm4MB: - LEAL 1(CX)(SI*1), SI + LEAL 1(CX)(BX*1), BX check_maxskip_cont_encodeBetterBlockAsm4MB: - CMPL SI, 8(SP) + CMPL BX, 8(SP) JGE emit_remainder_encodeBetterBlockAsm4MB - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x00cf1bbcdcbfa563, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x08, R10 - IMULQ R9, R10 - SHRQ $0x2f, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x32, R11 - MOVL 24(SP)(R10*4), SI - MOVL 524312(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 524312(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x00cf1bbcdcbfa563, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x08, R9 + IMULQ R8, R9 + SHRQ $0x2f, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x32, R10 + MOVL 24(SP)(R9*4), BX + MOVL 524312(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 524312(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeBetterBlockAsm4MB - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeBetterBlockAsm4MB - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeBetterBlockAsm4MB no_short_found_encodeBetterBlockAsm4MB: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeBetterBlockAsm4MB - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeBetterBlockAsm4MB MOVL 20(SP), CX JMP search_loop_encodeBetterBlockAsm4MB candidateS_match_encodeBetterBlockAsm4MB: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x08, R10 - IMULQ R9, R10 - SHRQ $0x2f, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x08, R9 + IMULQ R8, R9 + SHRQ $0x2f, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBetterBlockAsm4MB DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBetterBlockAsm4MB: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBetterBlockAsm4MB match_extend_back_loop_encodeBetterBlockAsm4MB: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBetterBlockAsm4MB - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBetterBlockAsm4MB LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBetterBlockAsm4MB JMP match_extend_back_loop_encodeBetterBlockAsm4MB match_extend_back_end_encodeBetterBlockAsm4MB: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 4(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 4(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBetterBlockAsm4MB MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBetterBlockAsm4MB: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeBetterBlockAsm4MB matchlen_loopback_match_nolit_encodeBetterBlockAsm4MB: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeBetterBlockAsm4MB #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeBetterBlockAsm4MB matchlen_loop_match_nolit_encodeBetterBlockAsm4MB: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeBetterBlockAsm4MB JZ match_nolit_end_encodeBetterBlockAsm4MB matchlen_match4_match_nolit_encodeBetterBlockAsm4MB: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeBetterBlockAsm4MB - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeBetterBlockAsm4MB - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeBetterBlockAsm4MB: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeBetterBlockAsm4MB - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeBetterBlockAsm4MB - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeBetterBlockAsm4MB: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeBetterBlockAsm4MB - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeBetterBlockAsm4MB - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeBetterBlockAsm4MB: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - CMPL 16(SP), R8 + CMPL 16(SP), DI JEQ match_is_repeat_encodeBetterBlockAsm4MB - CMPL R12, $0x01 + CMPL R11, $0x01 JG match_length_ok_encodeBetterBlockAsm4MB - CMPL R8, $0x0000ffff + CMPL DI, $0x0000ffff JLE match_length_ok_encodeBetterBlockAsm4MB MOVL 20(SP), CX INCL CX JMP search_loop_encodeBetterBlockAsm4MB match_length_ok_encodeBetterBlockAsm4MB: - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeBetterBlockAsm4MB - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeBetterBlockAsm4MB - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeBetterBlockAsm4MB - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JLT three_bytes_match_emit_encodeBetterBlockAsm4MB - MOVL SI, R11 - SHRL $0x10, R11 + MOVL BX, R10 + SHRL $0x10, R10 MOVB $0xf8, (AX) - MOVW SI, 1(AX) - MOVB R11, 3(AX) + MOVW BX, 1(AX) + MOVB R10, 3(AX) ADDQ $0x04, AX JMP memmove_long_match_emit_encodeBetterBlockAsm4MB three_bytes_match_emit_encodeBetterBlockAsm4MB: MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBetterBlockAsm4MB two_bytes_match_emit_encodeBetterBlockAsm4MB: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeBetterBlockAsm4MB JMP memmove_long_match_emit_encodeBetterBlockAsm4MB one_byte_match_emit_encodeBetterBlockAsm4MB: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBetterBlockAsm4MB: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm4MB emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm4MB emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm4MB emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm4MB emit_lit_memmove_match_emit_encodeBetterBlockAsm4MB_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBetterBlockAsm4MB: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeBetterBlockAsm4MB memmove_long_match_emit_encodeBetterBlockAsm4MB: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeBetterBlockAsm4MBlarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeBetterBlockAsm4MBlarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeBetterBlockAsm4MBlarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBetterBlockAsm4MBlarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeBetterBlockAsm4MBlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeBetterBlockAsm4MB: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy - CMPL R8, $0x00010000 + CMPL DI, $0x00010000 JL two_byte_offset_match_nolit_encodeBetterBlockAsm4MB - -four_bytes_loop_back_match_nolit_encodeBetterBlockAsm4MB: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE four_bytes_remain_match_nolit_encodeBetterBlockAsm4MB MOVB $0xff, (AX) - MOVL R8, 1(AX) - LEAL -64(R12), R12 + MOVL DI, 1(AX) + LEAL -64(R11), R11 ADDQ $0x05, AX - CMPL R12, $0x04 + CMPL R11, $0x04 JL four_bytes_remain_match_nolit_encodeBetterBlockAsm4MB // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm4MB_emit_copy - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm4MB_emit_copy - CMPL R12, $0x00010100 + CMPL R11, $0x00010100 JLT repeat_four_match_nolit_encodeBetterBlockAsm4MB_emit_copy - LEAL -65536(R12), R12 - MOVL R12, R8 + LEAL -65536(R11), R11 + MOVL R11, DI MOVW $0x001d, (AX) - MOVW R12, 2(AX) - SARL $0x10, R8 - MOVB R8, 4(AX) + MOVW R11, 2(AX) + SARL $0x10, DI + MOVB DI, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_four_match_nolit_encodeBetterBlockAsm4MB_emit_copy: - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_three_match_nolit_encodeBetterBlockAsm4MB_emit_copy: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_two_match_nolit_encodeBetterBlockAsm4MB_emit_copy: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB - JMP four_bytes_loop_back_match_nolit_encodeBetterBlockAsm4MB four_bytes_remain_match_nolit_encodeBetterBlockAsm4MB: - TESTL R12, R12 + TESTL R11, R11 JZ match_nolit_emitcopy_end_encodeBetterBlockAsm4MB - MOVB $0x03, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVL R8, 1(AX) + XORL BX, BX + LEAL -1(BX)(R11*4), R11 + MOVB R11, (AX) + MOVL DI, 1(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB two_byte_offset_match_nolit_encodeBetterBlockAsm4MB: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeBetterBlockAsm4MB - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JAE long_offset_short_match_nolit_encodeBetterBlockAsm4MB - MOVL $0x00000001, SI - LEAL 16(SI), SI - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, SI - MOVB SI, (AX) + MOVL $0x00000001, BX + LEAL 16(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX - SUBL $0x08, R12 + SUBL $0x08, R11 // emitRepeat - LEAL -4(R12), R12 + LEAL -4(R11), R11 JMP cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b - CMPL R12, $0x00010100 + CMPL R11, $0x00010100 JLT repeat_four_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b - LEAL -65536(R12), R12 - MOVL R12, R8 + LEAL -65536(R11), R11 + MOVL R11, DI MOVW $0x001d, (AX) - MOVW R12, 2(AX) - SARL $0x10, R8 - MOVB R8, 4(AX) + MOVW R11, 2(AX) + SARL $0x10, DI + MOVB DI, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_four_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b: - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_three_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_two_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short_2b: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) - ADDQ $0x02, AX - JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) + ADDQ $0x02, AX + JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB long_offset_short_match_nolit_encodeBetterBlockAsm4MB: MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short - CMPL R12, $0x00010100 + CMPL R11, $0x00010100 JLT repeat_four_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short - LEAL -65536(R12), R12 - MOVL R12, R8 + LEAL -65536(R11), R11 + MOVL R11, DI MOVW $0x001d, (AX) - MOVW R12, 2(AX) - SARL $0x10, R8 - MOVB R8, 4(AX) + MOVW R11, 2(AX) + SARL $0x10, DI + MOVB DI, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_four_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short: - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_three_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_two_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_two_offset_match_nolit_encodeBetterBlockAsm4MB_emit_copy_short: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB - JMP two_byte_offset_match_nolit_encodeBetterBlockAsm4MB two_byte_offset_short_match_nolit_encodeBetterBlockAsm4MB: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeBetterBlockAsm4MB - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JGE emit_copy_three_match_nolit_encodeBetterBlockAsm4MB - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB emit_copy_three_match_nolit_encodeBetterBlockAsm4MB: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB match_is_repeat_encodeBetterBlockAsm4MB: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_repeat_encodeBetterBlockAsm4MB - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_repeat_encodeBetterBlockAsm4MB - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_repeat_encodeBetterBlockAsm4MB - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JLT three_bytes_match_emit_repeat_encodeBetterBlockAsm4MB - MOVL SI, R11 - SHRL $0x10, R11 + MOVL BX, R10 + SHRL $0x10, R10 MOVB $0xf8, (AX) - MOVW SI, 1(AX) - MOVB R11, 3(AX) + MOVW BX, 1(AX) + MOVB R10, 3(AX) ADDQ $0x04, AX JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm4MB three_bytes_match_emit_repeat_encodeBetterBlockAsm4MB: MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm4MB two_bytes_match_emit_repeat_encodeBetterBlockAsm4MB: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_repeat_encodeBetterBlockAsm4MB JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm4MB one_byte_match_emit_repeat_encodeBetterBlockAsm4MB: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_repeat_encodeBetterBlockAsm4MB: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_17through32 JMP emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_33through64 emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm4MB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm4MB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm4MB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm4MB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm4MB_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm4MB: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_repeat_encodeBetterBlockAsm4MB memmove_long_match_emit_repeat_encodeBetterBlockAsm4MB: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm4MBlarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm4MBlarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm4MBlarge_big_loop_back emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm4MBlarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm4MBlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_repeat_encodeBetterBlockAsm4MB: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_repeat_encodeBetterBlockAsm4MB - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm4MB - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm4MB cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm4MB: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_repeat_encodeBetterBlockAsm4MB - CMPL R12, $0x00010100 + CMPL R11, $0x00010100 JLT repeat_four_match_nolit_repeat_encodeBetterBlockAsm4MB - LEAL -65536(R12), R12 - MOVL R12, R8 + LEAL -65536(R11), R11 + MOVL R11, DI MOVW $0x001d, (AX) - MOVW R12, 2(AX) - SARL $0x10, R8 - MOVB R8, 4(AX) + MOVW R11, 2(AX) + SARL $0x10, DI + MOVB DI, 4(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_four_match_nolit_repeat_encodeBetterBlockAsm4MB: - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_three_match_nolit_repeat_encodeBetterBlockAsm4MB: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_two_match_nolit_repeat_encodeBetterBlockAsm4MB: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm4MB repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm4MB: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX match_nolit_emitcopy_end_encodeBetterBlockAsm4MB: @@ -7614,50 +7567,50 @@ match_nolit_emitcopy_end_encodeBetterBlockAsm4MB: RET match_nolit_dst_ok_encodeBetterBlockAsm4MB: - MOVQ $0x00cf1bbcdcbfa563, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x08, R10 - IMULQ SI, R10 - SHRQ $0x2f, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x32, R11 - SHLQ $0x08, R12 - IMULQ SI, R12 - SHRQ $0x2f, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x32, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 524312(SP)(R11*4) - MOVL R14, 524312(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x00cf1bbcdcbfa563, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x08, R9 + IMULQ BX, R9 + SHRQ $0x2f, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x32, R10 + SHLQ $0x08, R11 + IMULQ BX, R11 + SHRQ $0x2f, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x32, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 524312(SP)(R10*4) + MOVL R13, 524312(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeBetterBlockAsm4MB: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeBetterBlockAsm4MB - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x08, R8 - IMULQ SI, R8 - SHRQ $0x2f, R8 - SHLQ $0x08, R10 - IMULQ SI, R10 - SHRQ $0x2f, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x08, DI + IMULQ BX, DI + SHRQ $0x2f, DI + SHLQ $0x08, R9 + IMULQ BX, R9 + SHRQ $0x2f, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeBetterBlockAsm4MB emit_remainder_encodeBetterBlockAsm4MB: @@ -7851,8 +7804,8 @@ zero_loop_encodeBetterBlockAsm12B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -6(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -7862,601 +7815,599 @@ zero_loop_encodeBetterBlockAsm12B: MOVQ src_base+24(FP), DX search_loop_encodeBetterBlockAsm12B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x06, SI - LEAL 1(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x06, BX + LEAL 1(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeBetterBlockAsm12B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x32, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x34, R11 - MOVL 24(SP)(R10*4), SI - MOVL 65560(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 65560(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x34, R10 + MOVL 24(SP)(R9*4), BX + MOVL 65560(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 65560(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeBetterBlockAsm12B - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeBetterBlockAsm12B - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeBetterBlockAsm12B no_short_found_encodeBetterBlockAsm12B: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeBetterBlockAsm12B - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeBetterBlockAsm12B MOVL 20(SP), CX JMP search_loop_encodeBetterBlockAsm12B candidateS_match_encodeBetterBlockAsm12B: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x32, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBetterBlockAsm12B DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBetterBlockAsm12B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBetterBlockAsm12B match_extend_back_loop_encodeBetterBlockAsm12B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBetterBlockAsm12B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBetterBlockAsm12B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBetterBlockAsm12B JMP match_extend_back_loop_encodeBetterBlockAsm12B match_extend_back_end_encodeBetterBlockAsm12B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBetterBlockAsm12B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBetterBlockAsm12B: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeBetterBlockAsm12B matchlen_loopback_match_nolit_encodeBetterBlockAsm12B: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeBetterBlockAsm12B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeBetterBlockAsm12B matchlen_loop_match_nolit_encodeBetterBlockAsm12B: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeBetterBlockAsm12B JZ match_nolit_end_encodeBetterBlockAsm12B matchlen_match4_match_nolit_encodeBetterBlockAsm12B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeBetterBlockAsm12B - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeBetterBlockAsm12B - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeBetterBlockAsm12B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeBetterBlockAsm12B - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeBetterBlockAsm12B - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeBetterBlockAsm12B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeBetterBlockAsm12B - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeBetterBlockAsm12B - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeBetterBlockAsm12B: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - CMPL 16(SP), R8 + CMPL 16(SP), DI JEQ match_is_repeat_encodeBetterBlockAsm12B - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeBetterBlockAsm12B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeBetterBlockAsm12B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeBetterBlockAsm12B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBetterBlockAsm12B two_bytes_match_emit_encodeBetterBlockAsm12B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeBetterBlockAsm12B JMP memmove_long_match_emit_encodeBetterBlockAsm12B one_byte_match_emit_encodeBetterBlockAsm12B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBetterBlockAsm12B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm12B emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm12B emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm12B emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm12B emit_lit_memmove_match_emit_encodeBetterBlockAsm12B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBetterBlockAsm12B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeBetterBlockAsm12B memmove_long_match_emit_encodeBetterBlockAsm12B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeBetterBlockAsm12Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 - + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 + emit_lit_memmove_long_match_emit_encodeBetterBlockAsm12Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeBetterBlockAsm12Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBetterBlockAsm12Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeBetterBlockAsm12Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeBetterBlockAsm12B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy -two_byte_offset_match_nolit_encodeBetterBlockAsm12B: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeBetterBlockAsm12B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JAE long_offset_short_match_nolit_encodeBetterBlockAsm12B - MOVL $0x00000001, SI - LEAL 16(SI), SI - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, SI - MOVB SI, (AX) + MOVL $0x00000001, BX + LEAL 16(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX - SUBL $0x08, R12 + SUBL $0x08, R11 // emitRepeat - LEAL -4(R12), R12 + LEAL -4(R11), R11 JMP cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_three_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_two_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short_2b: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B long_offset_short_match_nolit_encodeBetterBlockAsm12B: MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm12B_emit_copy_short - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm12B_emit_copy_short - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_three_match_nolit_encodeBetterBlockAsm12B_emit_copy_short: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_two_match_nolit_encodeBetterBlockAsm12B_emit_copy_short: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_two_offset_match_nolit_encodeBetterBlockAsm12B_emit_copy_short: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B - JMP two_byte_offset_match_nolit_encodeBetterBlockAsm12B two_byte_offset_short_match_nolit_encodeBetterBlockAsm12B: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeBetterBlockAsm12B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JGE emit_copy_three_match_nolit_encodeBetterBlockAsm12B - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B emit_copy_three_match_nolit_encodeBetterBlockAsm12B: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B match_is_repeat_encodeBetterBlockAsm12B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_repeat_encodeBetterBlockAsm12B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_repeat_encodeBetterBlockAsm12B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_repeat_encodeBetterBlockAsm12B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm12B two_bytes_match_emit_repeat_encodeBetterBlockAsm12B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_repeat_encodeBetterBlockAsm12B JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm12B one_byte_match_emit_repeat_encodeBetterBlockAsm12B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_repeat_encodeBetterBlockAsm12B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_33through64 emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm12B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm12B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm12B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm12B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm12B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm12B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_repeat_encodeBetterBlockAsm12B memmove_long_match_emit_repeat_encodeBetterBlockAsm12B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm12Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm12Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm12Blarge_big_loop_back emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm12Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm12Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_repeat_encodeBetterBlockAsm12B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_repeat_encodeBetterBlockAsm12B - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm12B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm12B cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm12B: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_repeat_encodeBetterBlockAsm12B - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_three_match_nolit_repeat_encodeBetterBlockAsm12B: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_two_match_nolit_repeat_encodeBetterBlockAsm12B: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm12B repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm12B: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX match_nolit_emitcopy_end_encodeBetterBlockAsm12B: @@ -8468,50 +8419,50 @@ match_nolit_emitcopy_end_encodeBetterBlockAsm12B: RET match_nolit_dst_ok_encodeBetterBlockAsm12B: - MOVQ $0x0000cf1bbcdcbf9b, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x32, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x34, R11 - SHLQ $0x10, R12 - IMULQ SI, R12 - SHRQ $0x32, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x34, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 65560(SP)(R11*4) - MOVL R14, 65560(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x0000cf1bbcdcbf9b, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x32, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x34, R10 + SHLQ $0x10, R11 + IMULQ BX, R11 + SHRQ $0x32, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x34, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 65560(SP)(R10*4) + MOVL R13, 65560(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeBetterBlockAsm12B: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeBetterBlockAsm12B - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x10, R8 - IMULQ SI, R8 - SHRQ $0x32, R8 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x32, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x10, DI + IMULQ BX, DI + SHRQ $0x32, DI + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x32, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeBetterBlockAsm12B emit_remainder_encodeBetterBlockAsm12B: @@ -8694,8 +8645,8 @@ zero_loop_encodeBetterBlockAsm10B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -6(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -8705,601 +8656,599 @@ zero_loop_encodeBetterBlockAsm10B: MOVQ src_base+24(FP), DX search_loop_encodeBetterBlockAsm10B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x05, SI - LEAL 1(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x05, BX + LEAL 1(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeBetterBlockAsm10B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x34, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x36, R11 - MOVL 24(SP)(R10*4), SI - MOVL 16408(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 16408(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x34, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x36, R10 + MOVL 24(SP)(R9*4), BX + MOVL 16408(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 16408(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeBetterBlockAsm10B - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeBetterBlockAsm10B - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeBetterBlockAsm10B no_short_found_encodeBetterBlockAsm10B: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeBetterBlockAsm10B - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeBetterBlockAsm10B MOVL 20(SP), CX JMP search_loop_encodeBetterBlockAsm10B candidateS_match_encodeBetterBlockAsm10B: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x34, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x34, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBetterBlockAsm10B DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBetterBlockAsm10B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBetterBlockAsm10B match_extend_back_loop_encodeBetterBlockAsm10B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBetterBlockAsm10B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBetterBlockAsm10B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBetterBlockAsm10B JMP match_extend_back_loop_encodeBetterBlockAsm10B match_extend_back_end_encodeBetterBlockAsm10B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBetterBlockAsm10B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBetterBlockAsm10B: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeBetterBlockAsm10B matchlen_loopback_match_nolit_encodeBetterBlockAsm10B: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeBetterBlockAsm10B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeBetterBlockAsm10B matchlen_loop_match_nolit_encodeBetterBlockAsm10B: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeBetterBlockAsm10B JZ match_nolit_end_encodeBetterBlockAsm10B matchlen_match4_match_nolit_encodeBetterBlockAsm10B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeBetterBlockAsm10B - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeBetterBlockAsm10B - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeBetterBlockAsm10B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeBetterBlockAsm10B - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeBetterBlockAsm10B - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeBetterBlockAsm10B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeBetterBlockAsm10B - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeBetterBlockAsm10B - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeBetterBlockAsm10B: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - CMPL 16(SP), R8 + CMPL 16(SP), DI JEQ match_is_repeat_encodeBetterBlockAsm10B - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeBetterBlockAsm10B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeBetterBlockAsm10B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeBetterBlockAsm10B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBetterBlockAsm10B two_bytes_match_emit_encodeBetterBlockAsm10B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeBetterBlockAsm10B JMP memmove_long_match_emit_encodeBetterBlockAsm10B one_byte_match_emit_encodeBetterBlockAsm10B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBetterBlockAsm10B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm10B emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm10B emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm10B emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm10B emit_lit_memmove_match_emit_encodeBetterBlockAsm10B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBetterBlockAsm10B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeBetterBlockAsm10B memmove_long_match_emit_encodeBetterBlockAsm10B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeBetterBlockAsm10Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeBetterBlockAsm10Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeBetterBlockAsm10Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBetterBlockAsm10Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeBetterBlockAsm10Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeBetterBlockAsm10B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy -two_byte_offset_match_nolit_encodeBetterBlockAsm10B: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeBetterBlockAsm10B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JAE long_offset_short_match_nolit_encodeBetterBlockAsm10B - MOVL $0x00000001, SI - LEAL 16(SI), SI - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, SI - MOVB SI, (AX) + MOVL $0x00000001, BX + LEAL 16(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX - SUBL $0x08, R12 + SUBL $0x08, R11 // emitRepeat - LEAL -4(R12), R12 + LEAL -4(R11), R11 JMP cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_three_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_two_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short_2b: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B long_offset_short_match_nolit_encodeBetterBlockAsm10B: MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm10B_emit_copy_short - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm10B_emit_copy_short - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_three_match_nolit_encodeBetterBlockAsm10B_emit_copy_short: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_two_match_nolit_encodeBetterBlockAsm10B_emit_copy_short: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_two_offset_match_nolit_encodeBetterBlockAsm10B_emit_copy_short: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B - JMP two_byte_offset_match_nolit_encodeBetterBlockAsm10B two_byte_offset_short_match_nolit_encodeBetterBlockAsm10B: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeBetterBlockAsm10B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JGE emit_copy_three_match_nolit_encodeBetterBlockAsm10B - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B emit_copy_three_match_nolit_encodeBetterBlockAsm10B: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B match_is_repeat_encodeBetterBlockAsm10B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_repeat_encodeBetterBlockAsm10B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_repeat_encodeBetterBlockAsm10B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_repeat_encodeBetterBlockAsm10B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm10B two_bytes_match_emit_repeat_encodeBetterBlockAsm10B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_repeat_encodeBetterBlockAsm10B JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm10B one_byte_match_emit_repeat_encodeBetterBlockAsm10B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_repeat_encodeBetterBlockAsm10B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_33through64 emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm10B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm10B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm10B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm10B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm10B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm10B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_repeat_encodeBetterBlockAsm10B memmove_long_match_emit_repeat_encodeBetterBlockAsm10B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm10Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm10Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm10Blarge_big_loop_back emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm10Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm10Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_repeat_encodeBetterBlockAsm10B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_repeat_encodeBetterBlockAsm10B - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm10B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JLT repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm10B cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm10B: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_repeat_encodeBetterBlockAsm10B - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_three_match_nolit_repeat_encodeBetterBlockAsm10B: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_two_match_nolit_repeat_encodeBetterBlockAsm10B: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm10B repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm10B: - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX match_nolit_emitcopy_end_encodeBetterBlockAsm10B: @@ -9311,50 +9260,50 @@ match_nolit_emitcopy_end_encodeBetterBlockAsm10B: RET match_nolit_dst_ok_encodeBetterBlockAsm10B: - MOVQ $0x0000cf1bbcdcbf9b, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x34, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x36, R11 - SHLQ $0x10, R12 - IMULQ SI, R12 - SHRQ $0x34, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x36, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 16408(SP)(R11*4) - MOVL R14, 16408(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x0000cf1bbcdcbf9b, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x34, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x36, R10 + SHLQ $0x10, R11 + IMULQ BX, R11 + SHRQ $0x34, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x36, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 16408(SP)(R10*4) + MOVL R13, 16408(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeBetterBlockAsm10B: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeBetterBlockAsm10B - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x10, R8 - IMULQ SI, R8 - SHRQ $0x34, R8 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x34, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x10, DI + IMULQ BX, DI + SHRQ $0x34, DI + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x34, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeBetterBlockAsm10B emit_remainder_encodeBetterBlockAsm10B: @@ -9537,8 +9486,8 @@ zero_loop_encodeBetterBlockAsm8B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -6(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -9548,587 +9497,585 @@ zero_loop_encodeBetterBlockAsm8B: MOVQ src_base+24(FP), DX search_loop_encodeBetterBlockAsm8B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x04, SI - LEAL 1(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x04, BX + LEAL 1(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeBetterBlockAsm8B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x36, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x38, R11 - MOVL 24(SP)(R10*4), SI - MOVL 4120(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 4120(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x36, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x38, R10 + MOVL 24(SP)(R9*4), BX + MOVL 4120(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 4120(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeBetterBlockAsm8B - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeBetterBlockAsm8B - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeBetterBlockAsm8B no_short_found_encodeBetterBlockAsm8B: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeBetterBlockAsm8B - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeBetterBlockAsm8B MOVL 20(SP), CX JMP search_loop_encodeBetterBlockAsm8B candidateS_match_encodeBetterBlockAsm8B: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x36, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x36, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeBetterBlockAsm8B DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeBetterBlockAsm8B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeBetterBlockAsm8B match_extend_back_loop_encodeBetterBlockAsm8B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeBetterBlockAsm8B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeBetterBlockAsm8B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeBetterBlockAsm8B JMP match_extend_back_loop_encodeBetterBlockAsm8B match_extend_back_end_encodeBetterBlockAsm8B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeBetterBlockAsm8B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeBetterBlockAsm8B: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 - - // matchLen - XORL R12, R12 - CMPL R8, $0x08 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 + + // matchLen + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeBetterBlockAsm8B matchlen_loopback_match_nolit_encodeBetterBlockAsm8B: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeBetterBlockAsm8B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeBetterBlockAsm8B matchlen_loop_match_nolit_encodeBetterBlockAsm8B: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeBetterBlockAsm8B JZ match_nolit_end_encodeBetterBlockAsm8B matchlen_match4_match_nolit_encodeBetterBlockAsm8B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeBetterBlockAsm8B - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeBetterBlockAsm8B - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeBetterBlockAsm8B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeBetterBlockAsm8B - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeBetterBlockAsm8B - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeBetterBlockAsm8B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeBetterBlockAsm8B - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeBetterBlockAsm8B - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeBetterBlockAsm8B: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - CMPL 16(SP), R8 + CMPL 16(SP), DI JEQ match_is_repeat_encodeBetterBlockAsm8B - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeBetterBlockAsm8B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeBetterBlockAsm8B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeBetterBlockAsm8B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeBetterBlockAsm8B two_bytes_match_emit_encodeBetterBlockAsm8B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeBetterBlockAsm8B JMP memmove_long_match_emit_encodeBetterBlockAsm8B one_byte_match_emit_encodeBetterBlockAsm8B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeBetterBlockAsm8B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x04 + CMPQ R8, $0x04 JLE emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_4 - CMPQ R9, $0x08 + CMPQ R8, $0x08 JB emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_4through7 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_4: - MOVL (R10), R11 - MOVL R11, (AX) + MOVL (R9), R10 + MOVL R10, (AX) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm8B emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_4through7: - MOVL (R10), R11 - MOVL -4(R10)(R9*1), R10 - MOVL R11, (AX) - MOVL R10, -4(AX)(R9*1) + MOVL (R9), R10 + MOVL -4(R9)(R8*1), R9 + MOVL R10, (AX) + MOVL R9, -4(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm8B emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm8B emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeBetterBlockAsm8B emit_lit_memmove_match_emit_encodeBetterBlockAsm8B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeBetterBlockAsm8B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeBetterBlockAsm8B memmove_long_match_emit_encodeBetterBlockAsm8B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeBetterBlockAsm8Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeBetterBlockAsm8Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeBetterBlockAsm8Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeBetterBlockAsm8Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeBetterBlockAsm8Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeBetterBlockAsm8B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy -two_byte_offset_match_nolit_encodeBetterBlockAsm8B: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeBetterBlockAsm8B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JAE long_offset_short_match_nolit_encodeBetterBlockAsm8B - MOVL $0x00000001, SI - LEAL 16(SI), SI - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, SI - MOVB SI, (AX) + MOVL $0x00000001, BX + LEAL 16(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX - SUBL $0x08, R12 + SUBL $0x08, R11 // emitRepeat - LEAL -4(R12), R12 + LEAL -4(R11), R11 JMP cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm8B_emit_copy_short_2b - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm8B_emit_copy_short_2b - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm8B_emit_copy_short_2b cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm8B_emit_copy_short_2b: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm8B_emit_copy_short_2b - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B repeat_three_match_nolit_encodeBetterBlockAsm8B_emit_copy_short_2b: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B repeat_two_match_nolit_encodeBetterBlockAsm8B_emit_copy_short_2b: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B long_offset_short_match_nolit_encodeBetterBlockAsm8B: MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_encodeBetterBlockAsm8B_emit_copy_short - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm8B_emit_copy_short cant_repeat_two_offset_match_nolit_encodeBetterBlockAsm8B_emit_copy_short: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_encodeBetterBlockAsm8B_emit_copy_short - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B repeat_three_match_nolit_encodeBetterBlockAsm8B_emit_copy_short: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B repeat_two_match_nolit_encodeBetterBlockAsm8B_emit_copy_short: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B - JMP two_byte_offset_match_nolit_encodeBetterBlockAsm8B two_byte_offset_short_match_nolit_encodeBetterBlockAsm8B: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeBetterBlockAsm8B - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B emit_copy_three_match_nolit_encodeBetterBlockAsm8B: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B match_is_repeat_encodeBetterBlockAsm8B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_repeat_encodeBetterBlockAsm8B - MOVL DI, R8 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R9 - SUBL SI, R8 - LEAL -1(R8), SI - CMPL SI, $0x3c + MOVL SI, DI + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R8 + SUBL BX, DI + LEAL -1(DI), BX + CMPL BX, $0x3c JLT one_byte_match_emit_repeat_encodeBetterBlockAsm8B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_repeat_encodeBetterBlockAsm8B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm8B two_bytes_match_emit_repeat_encodeBetterBlockAsm8B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_repeat_encodeBetterBlockAsm8B JMP memmove_long_match_emit_repeat_encodeBetterBlockAsm8B one_byte_match_emit_repeat_encodeBetterBlockAsm8B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_repeat_encodeBetterBlockAsm8B: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveShort - CMPQ R8, $0x04 + CMPQ DI, $0x04 JLE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_4 - CMPQ R8, $0x08 + CMPQ DI, $0x08 JB emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_4through7 - CMPQ R8, $0x10 + CMPQ DI, $0x10 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_8through16 - CMPQ R8, $0x20 + CMPQ DI, $0x20 JBE emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_33through64 emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_4: - MOVL (R9), R10 - MOVL R10, (AX) + MOVL (R8), R9 + MOVL R9, (AX) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm8B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_4through7: - MOVL (R9), R10 - MOVL -4(R9)(R8*1), R9 - MOVL R10, (AX) - MOVL R9, -4(AX)(R8*1) + MOVL (R8), R9 + MOVL -4(R8)(DI*1), R8 + MOVL R9, (AX) + MOVL R8, -4(AX)(DI*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm8B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_8through16: - MOVQ (R9), R10 - MOVQ -8(R9)(R8*1), R9 - MOVQ R10, (AX) - MOVQ R9, -8(AX)(R8*1) + MOVQ (R8), R9 + MOVQ -8(R8)(DI*1), R8 + MOVQ R9, (AX) + MOVQ R8, -8(AX)(DI*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm8B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_17through32: - MOVOU (R9), X0 - MOVOU -16(R9)(R8*1), X1 + MOVOU (R8), X0 + MOVOU -16(R8)(DI*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R8*1) + MOVOU X1, -16(AX)(DI*1) JMP memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm8B emit_lit_memmove_match_emit_repeat_encodeBetterBlockAsm8B_memmove_move_33through64: - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) memmove_end_copy_match_emit_repeat_encodeBetterBlockAsm8B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_repeat_encodeBetterBlockAsm8B memmove_long_match_emit_repeat_encodeBetterBlockAsm8B: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveLong - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 - MOVQ R8, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R13 - SUBQ R10, R13 - DECQ R11 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 + MOVQ DI, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R12 + SUBQ R9, R12 + DECQ R10 JA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm8Blarge_forward_sse_loop_32 - LEAQ -32(R9)(R13*1), R10 - LEAQ -32(AX)(R13*1), R14 + LEAQ -32(R8)(R12*1), R9 + LEAQ -32(AX)(R12*1), R13 emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm8Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R14) - MOVOA X5, 16(R14) - ADDQ $0x20, R14 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R13) + MOVOA X5, 16(R13) ADDQ $0x20, R13 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R12 + DECQ R10 JNA emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm8Blarge_big_loop_back emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm8Blarge_forward_sse_loop_32: - MOVOU -32(R9)(R13*1), X4 - MOVOU -16(R9)(R13*1), X5 - MOVOA X4, -32(AX)(R13*1) - MOVOA X5, -16(AX)(R13*1) - ADDQ $0x20, R13 - CMPQ R8, R13 + MOVOU -32(R8)(R12*1), X4 + MOVOU -16(R8)(R12*1), X5 + MOVOA X4, -32(AX)(R12*1) + MOVOA X5, -16(AX)(R12*1) + ADDQ $0x20, R12 + CMPQ DI, R12 JAE emit_lit_memmove_long_match_emit_repeat_encodeBetterBlockAsm8Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) + MOVQ BX, AX emit_literal_done_match_emit_repeat_encodeBetterBlockAsm8B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitRepeat - MOVL R12, SI - LEAL -4(R12), R12 - CMPL SI, $0x08 + MOVL R11, BX + LEAL -4(R11), R11 + CMPL BX, $0x08 JLE repeat_two_match_nolit_repeat_encodeBetterBlockAsm8B - CMPL SI, $0x0c + CMPL BX, $0x0c JGE cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm8B cant_repeat_two_offset_match_nolit_repeat_encodeBetterBlockAsm8B: - CMPL R12, $0x00000104 + CMPL R11, $0x00000104 JLT repeat_three_match_nolit_repeat_encodeBetterBlockAsm8B - LEAL -256(R12), R12 + LEAL -256(R11), R11 MOVW $0x0019, (AX) - MOVW R12, 2(AX) + MOVW R11, 2(AX) ADDQ $0x04, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B repeat_three_match_nolit_repeat_encodeBetterBlockAsm8B: - LEAL -4(R12), R12 + LEAL -4(R11), R11 MOVW $0x0015, (AX) - MOVB R12, 2(AX) + MOVB R11, 2(AX) ADDQ $0x03, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B repeat_two_match_nolit_repeat_encodeBetterBlockAsm8B: - SHLL $0x02, R12 - ORL $0x01, R12 - MOVW R12, (AX) + SHLL $0x02, R11 + ORL $0x01, R11 + MOVW R11, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeBetterBlockAsm8B - XORQ SI, SI - LEAL 1(SI)(R12*4), R12 - MOVB R8, 1(AX) - SARL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + XORQ BX, BX + LEAL 1(BX)(R11*4), R11 + MOVB DI, 1(AX) + SARL $0x08, DI + SHLL $0x05, DI + ORL DI, R11 + MOVB R11, (AX) ADDQ $0x02, AX match_nolit_emitcopy_end_encodeBetterBlockAsm8B: @@ -10140,50 +10087,50 @@ match_nolit_emitcopy_end_encodeBetterBlockAsm8B: RET match_nolit_dst_ok_encodeBetterBlockAsm8B: - MOVQ $0x0000cf1bbcdcbf9b, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x36, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x38, R11 - SHLQ $0x10, R12 - IMULQ SI, R12 - SHRQ $0x36, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x38, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 4120(SP)(R11*4) - MOVL R14, 4120(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x0000cf1bbcdcbf9b, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x36, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x38, R10 + SHLQ $0x10, R11 + IMULQ BX, R11 + SHRQ $0x36, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x38, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 4120(SP)(R10*4) + MOVL R13, 4120(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeBetterBlockAsm8B: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeBetterBlockAsm8B - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x10, R8 - IMULQ SI, R8 - SHRQ $0x36, R8 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x36, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x10, DI + IMULQ BX, DI + SHRQ $0x36, DI + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x36, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeBetterBlockAsm8B emit_remainder_encodeBetterBlockAsm8B: @@ -10366,8 +10313,8 @@ zero_loop_encodeSnappyBlockAsm: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -10377,321 +10324,321 @@ zero_loop_encodeSnappyBlockAsm: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBlockAsm: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x06, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x06, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 SHLQ $0x10, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x32, R10 - SHLQ $0x10, R11 - IMULQ R9, R11 - SHRQ $0x32, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x32, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 - JNE no_repeat_found_encodeSnappyBlockAsm - LEAL 1(CX), DI - MOVL 12(SP), SI - MOVL DI, R8 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 + MOVL CX, R8 SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 + JNE no_repeat_found_encodeSnappyBlockAsm + LEAL 1(CX), SI + MOVL 12(SP), BX + MOVL SI, DI + SUBL 16(SP), DI JZ repeat_extend_back_end_encodeSnappyBlockAsm repeat_extend_back_loop_encodeSnappyBlockAsm: - CMPL DI, SI + CMPL SI, BX JLE repeat_extend_back_end_encodeSnappyBlockAsm - MOVB -1(DX)(R8*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(DI*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeSnappyBlockAsm - LEAL -1(DI), DI - DECL R8 + LEAL -1(SI), SI + DECL DI JNZ repeat_extend_back_loop_encodeSnappyBlockAsm repeat_extend_back_end_encodeSnappyBlockAsm: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeSnappyBlockAsm - MOVL DI, R8 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R9 - SUBL SI, R8 - LEAL -1(R8), SI - CMPL SI, $0x3c + MOVL SI, DI + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R8 + SUBL BX, DI + LEAL -1(DI), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeSnappyBlockAsm - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeSnappyBlockAsm - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JLT three_bytes_repeat_emit_encodeSnappyBlockAsm - CMPL SI, $0x01000000 + CMPL BX, $0x01000000 JLT four_bytes_repeat_emit_encodeSnappyBlockAsm MOVB $0xfc, (AX) - MOVL SI, 1(AX) + MOVL BX, 1(AX) ADDQ $0x05, AX JMP memmove_long_repeat_emit_encodeSnappyBlockAsm four_bytes_repeat_emit_encodeSnappyBlockAsm: - MOVL SI, R10 - SHRL $0x10, R10 + MOVL BX, R9 + SHRL $0x10, R9 MOVB $0xf8, (AX) - MOVW SI, 1(AX) - MOVB R10, 3(AX) + MOVW BX, 1(AX) + MOVB R9, 3(AX) ADDQ $0x04, AX JMP memmove_long_repeat_emit_encodeSnappyBlockAsm three_bytes_repeat_emit_encodeSnappyBlockAsm: MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeSnappyBlockAsm two_bytes_repeat_emit_encodeSnappyBlockAsm: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeSnappyBlockAsm JMP memmove_long_repeat_emit_encodeSnappyBlockAsm one_byte_repeat_emit_encodeSnappyBlockAsm: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeSnappyBlockAsm: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveShort - CMPQ R8, $0x08 + CMPQ DI, $0x08 JLE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm_memmove_move_8 - CMPQ R8, $0x10 + CMPQ DI, $0x10 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm_memmove_move_8through16 - CMPQ R8, $0x20 + CMPQ DI, $0x20 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm_memmove_move_8: - MOVQ (R9), R10 - MOVQ R10, (AX) + MOVQ (R8), R9 + MOVQ R9, (AX) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm_memmove_move_8through16: - MOVQ (R9), R10 - MOVQ -8(R9)(R8*1), R9 - MOVQ R10, (AX) - MOVQ R9, -8(AX)(R8*1) + MOVQ (R8), R9 + MOVQ -8(R8)(DI*1), R8 + MOVQ R9, (AX) + MOVQ R8, -8(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm_memmove_move_17through32: - MOVOU (R9), X0 - MOVOU -16(R9)(R8*1), X1 + MOVOU (R8), X0 + MOVOU -16(R8)(DI*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R8*1) + MOVOU X1, -16(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm_memmove_move_33through64: - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) memmove_end_copy_repeat_emit_encodeSnappyBlockAsm: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeSnappyBlockAsm memmove_long_repeat_emit_encodeSnappyBlockAsm: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveLong - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 - MOVQ R8, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 + MOVQ DI, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsmlarge_forward_sse_loop_32 - LEAQ -32(R9)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(R8)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsmlarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsmlarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsmlarge_forward_sse_loop_32: - MOVOU -32(R9)(R12*1), X4 - MOVOU -16(R9)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R8, R12 + MOVOU -32(R8)(R11*1), X4 + MOVOU -16(R8)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ DI, R11 JAE emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsmlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeSnappyBlockAsm: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), BX // matchLen - XORL R11, R11 - CMPL R8, $0x08 + XORL R10, R10 + CMPL DI, $0x08 JL matchlen_match4_repeat_extend_encodeSnappyBlockAsm matchlen_loopback_repeat_extend_encodeSnappyBlockAsm: - MOVQ (R9)(R11*1), R10 - XORQ (SI)(R11*1), R10 - TESTQ R10, R10 + MOVQ (R8)(R10*1), R9 + XORQ (BX)(R10*1), R9 + TESTQ R9, R9 JZ matchlen_loop_repeat_extend_encodeSnappyBlockAsm #ifdef GOAMD64_v3 - TZCNTQ R10, R10 + TZCNTQ R9, R9 #else - BSFQ R10, R10 + BSFQ R9, R9 #endif - SARQ $0x03, R10 - LEAL (R11)(R10*1), R11 + SARQ $0x03, R9 + LEAL (R10)(R9*1), R10 JMP repeat_extend_forward_end_encodeSnappyBlockAsm matchlen_loop_repeat_extend_encodeSnappyBlockAsm: - LEAL -8(R8), R8 - LEAL 8(R11), R11 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R10), R10 + CMPL DI, $0x08 JGE matchlen_loopback_repeat_extend_encodeSnappyBlockAsm JZ repeat_extend_forward_end_encodeSnappyBlockAsm matchlen_match4_repeat_extend_encodeSnappyBlockAsm: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_repeat_extend_encodeSnappyBlockAsm - MOVL (R9)(R11*1), R10 - CMPL (SI)(R11*1), R10 + MOVL (R8)(R10*1), R9 + CMPL (BX)(R10*1), R9 JNE matchlen_match2_repeat_extend_encodeSnappyBlockAsm - SUBL $0x04, R8 - LEAL 4(R11), R11 + SUBL $0x04, DI + LEAL 4(R10), R10 matchlen_match2_repeat_extend_encodeSnappyBlockAsm: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_repeat_extend_encodeSnappyBlockAsm - MOVW (R9)(R11*1), R10 - CMPW (SI)(R11*1), R10 + MOVW (R8)(R10*1), R9 + CMPW (BX)(R10*1), R9 JNE matchlen_match1_repeat_extend_encodeSnappyBlockAsm - SUBL $0x02, R8 - LEAL 2(R11), R11 + SUBL $0x02, DI + LEAL 2(R10), R10 matchlen_match1_repeat_extend_encodeSnappyBlockAsm: - CMPL R8, $0x01 + CMPL DI, $0x01 JL repeat_extend_forward_end_encodeSnappyBlockAsm - MOVB (R9)(R11*1), R10 - CMPB (SI)(R11*1), R10 + MOVB (R8)(R10*1), R9 + CMPB (BX)(R10*1), R9 JNE repeat_extend_forward_end_encodeSnappyBlockAsm - LEAL 1(R11), R11 + LEAL 1(R10), R10 repeat_extend_forward_end_encodeSnappyBlockAsm: - ADDL R11, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI + ADDL R10, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI // emitCopy - CMPL DI, $0x00010000 + CMPL SI, $0x00010000 JL two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm four_bytes_loop_back_repeat_as_copy_encodeSnappyBlockAsm: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE four_bytes_remain_repeat_as_copy_encodeSnappyBlockAsm MOVB $0xff, (AX) - MOVL DI, 1(AX) - LEAL -64(SI), SI + MOVL SI, 1(AX) + LEAL -64(BX), BX ADDQ $0x05, AX - CMPL SI, $0x04 + CMPL BX, $0x04 JL four_bytes_remain_repeat_as_copy_encodeSnappyBlockAsm JMP four_bytes_loop_back_repeat_as_copy_encodeSnappyBlockAsm four_bytes_remain_repeat_as_copy_encodeSnappyBlockAsm: - TESTL SI, SI + TESTL BX, BX JZ repeat_end_emit_encodeSnappyBlockAsm - MOVB $0x03, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVL DI, 1(AX) + XORL DI, DI + LEAL -1(DI)(BX*4), BX + MOVB BL, (AX) + MOVL SI, 1(AX) ADDQ $0x05, AX JMP repeat_end_emit_encodeSnappyBlockAsm two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX JMP two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeSnappyBlockAsm emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeSnappyBlockAsm: @@ -10699,16 +10646,16 @@ repeat_end_emit_encodeSnappyBlockAsm: JMP search_loop_encodeSnappyBlockAsm no_repeat_found_encodeSnappyBlockAsm: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBlockAsm - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeSnappyBlockAsm - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeSnappyBlockAsm MOVL 20(SP), CX JMP search_loop_encodeSnappyBlockAsm @@ -10718,331 +10665,331 @@ candidate3_match_encodeSnappyBlockAsm: JMP candidate_match_encodeSnappyBlockAsm candidate2_match_encodeSnappyBlockAsm: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBlockAsm: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBlockAsm match_extend_back_loop_encodeSnappyBlockAsm: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBlockAsm - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBlockAsm LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBlockAsm JMP match_extend_back_loop_encodeSnappyBlockAsm match_extend_back_end_encodeSnappyBlockAsm: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 5(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 5(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBlockAsm MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBlockAsm: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeSnappyBlockAsm - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeSnappyBlockAsm - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBlockAsm - CMPL R8, $0x00010000 + CMPL DI, $0x00010000 JLT three_bytes_match_emit_encodeSnappyBlockAsm - CMPL R8, $0x01000000 + CMPL DI, $0x01000000 JLT four_bytes_match_emit_encodeSnappyBlockAsm MOVB $0xfc, (AX) - MOVL R8, 1(AX) + MOVL DI, 1(AX) ADDQ $0x05, AX JMP memmove_long_match_emit_encodeSnappyBlockAsm four_bytes_match_emit_encodeSnappyBlockAsm: - MOVL R8, R10 - SHRL $0x10, R10 + MOVL DI, R9 + SHRL $0x10, R9 MOVB $0xf8, (AX) - MOVW R8, 1(AX) - MOVB R10, 3(AX) + MOVW DI, 1(AX) + MOVB R9, 3(AX) ADDQ $0x04, AX JMP memmove_long_match_emit_encodeSnappyBlockAsm three_bytes_match_emit_encodeSnappyBlockAsm: MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBlockAsm two_bytes_match_emit_encodeSnappyBlockAsm: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeSnappyBlockAsm JMP memmove_long_match_emit_encodeSnappyBlockAsm one_byte_match_emit_encodeSnappyBlockAsm: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBlockAsm: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBlockAsm_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBlockAsm_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBlockAsm_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm emit_lit_memmove_match_emit_encodeSnappyBlockAsm_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm emit_lit_memmove_match_emit_encodeSnappyBlockAsm_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm emit_lit_memmove_match_emit_encodeSnappyBlockAsm_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBlockAsm: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeSnappyBlockAsm memmove_long_match_emit_encodeSnappyBlockAsm: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsmlarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeSnappyBlockAsmlarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsmlarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBlockAsmlarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeSnappyBlockAsmlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeSnappyBlockAsm: match_nolit_loop_encodeSnappyBlockAsm: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBlockAsm matchlen_loopback_match_nolit_encodeSnappyBlockAsm: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeSnappyBlockAsm #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeSnappyBlockAsm matchlen_loop_match_nolit_encodeSnappyBlockAsm: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBlockAsm JZ match_nolit_end_encodeSnappyBlockAsm matchlen_match4_match_nolit_encodeSnappyBlockAsm: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBlockAsm - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeSnappyBlockAsm - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeSnappyBlockAsm: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBlockAsm - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeSnappyBlockAsm - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeSnappyBlockAsm: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeSnappyBlockAsm - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeSnappyBlockAsm - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeSnappyBlockAsm: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JL two_byte_offset_match_nolit_encodeSnappyBlockAsm four_bytes_loop_back_match_nolit_encodeSnappyBlockAsm: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE four_bytes_remain_match_nolit_encodeSnappyBlockAsm MOVB $0xff, (AX) - MOVL SI, 1(AX) - LEAL -64(R10), R10 + MOVL BX, 1(AX) + LEAL -64(R9), R9 ADDQ $0x05, AX - CMPL R10, $0x04 + CMPL R9, $0x04 JL four_bytes_remain_match_nolit_encodeSnappyBlockAsm JMP four_bytes_loop_back_match_nolit_encodeSnappyBlockAsm four_bytes_remain_match_nolit_encodeSnappyBlockAsm: - TESTL R10, R10 + TESTL R9, R9 JZ match_nolit_emitcopy_end_encodeSnappyBlockAsm - MOVB $0x03, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVL SI, 1(AX) + XORL SI, SI + LEAL -1(SI)(R9*4), R9 + MOVB R9, (AX) + MOVL BX, 1(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeSnappyBlockAsm two_byte_offset_match_nolit_encodeSnappyBlockAsm: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBlockAsm MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBlockAsm two_byte_offset_short_match_nolit_encodeSnappyBlockAsm: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBlockAsm emit_copy_three_match_nolit_encodeSnappyBlockAsm: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBlockAsm: CMPL CX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeSnappyBlockAsm MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeSnappyBlockAsm: - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x10, R8 - IMULQ R9, R8 - SHRQ $0x32, R8 - SHLQ $0x10, SI - IMULQ R9, SI - SHRQ $0x32, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x10, DI + IMULQ R8, DI + SHRQ $0x32, DI + SHLQ $0x10, BX + IMULQ R8, BX + SHRQ $0x32, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeSnappyBlockAsm INCL CX JMP search_loop_encodeSnappyBlockAsm @@ -11246,8 +11193,8 @@ zero_loop_encodeSnappyBlockAsm64K: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -11257,278 +11204,278 @@ zero_loop_encodeSnappyBlockAsm64K: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBlockAsm64K: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x06, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x06, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm64K - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x32, R10 - SHLQ $0x10, R11 - IMULQ R9, R11 - SHRQ $0x32, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 SHLQ $0x10, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x32, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 - JNE no_repeat_found_encodeSnappyBlockAsm64K - LEAL 1(CX), DI - MOVL 12(SP), SI - MOVL DI, R8 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 + MOVL CX, R8 SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 + JNE no_repeat_found_encodeSnappyBlockAsm64K + LEAL 1(CX), SI + MOVL 12(SP), BX + MOVL SI, DI + SUBL 16(SP), DI JZ repeat_extend_back_end_encodeSnappyBlockAsm64K repeat_extend_back_loop_encodeSnappyBlockAsm64K: - CMPL DI, SI + CMPL SI, BX JLE repeat_extend_back_end_encodeSnappyBlockAsm64K - MOVB -1(DX)(R8*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(DI*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeSnappyBlockAsm64K - LEAL -1(DI), DI - DECL R8 + LEAL -1(SI), SI + DECL DI JNZ repeat_extend_back_loop_encodeSnappyBlockAsm64K repeat_extend_back_end_encodeSnappyBlockAsm64K: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeSnappyBlockAsm64K - MOVL DI, R8 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R9 - SUBL SI, R8 - LEAL -1(R8), SI - CMPL SI, $0x3c + MOVL SI, DI + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R8 + SUBL BX, DI + LEAL -1(DI), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeSnappyBlockAsm64K - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeSnappyBlockAsm64K MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeSnappyBlockAsm64K two_bytes_repeat_emit_encodeSnappyBlockAsm64K: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeSnappyBlockAsm64K JMP memmove_long_repeat_emit_encodeSnappyBlockAsm64K one_byte_repeat_emit_encodeSnappyBlockAsm64K: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeSnappyBlockAsm64K: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveShort - CMPQ R8, $0x08 + CMPQ DI, $0x08 JLE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm64K_memmove_move_8 - CMPQ R8, $0x10 + CMPQ DI, $0x10 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm64K_memmove_move_8through16 - CMPQ R8, $0x20 + CMPQ DI, $0x20 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm64K_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm64K_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm64K_memmove_move_8: - MOVQ (R9), R10 - MOVQ R10, (AX) + MOVQ (R8), R9 + MOVQ R9, (AX) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm64K emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm64K_memmove_move_8through16: - MOVQ (R9), R10 - MOVQ -8(R9)(R8*1), R9 - MOVQ R10, (AX) - MOVQ R9, -8(AX)(R8*1) + MOVQ (R8), R9 + MOVQ -8(R8)(DI*1), R8 + MOVQ R9, (AX) + MOVQ R8, -8(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm64K emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm64K_memmove_move_17through32: - MOVOU (R9), X0 - MOVOU -16(R9)(R8*1), X1 + MOVOU (R8), X0 + MOVOU -16(R8)(DI*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R8*1) + MOVOU X1, -16(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm64K emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm64K_memmove_move_33through64: - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) memmove_end_copy_repeat_emit_encodeSnappyBlockAsm64K: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeSnappyBlockAsm64K memmove_long_repeat_emit_encodeSnappyBlockAsm64K: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveLong - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 - MOVQ R8, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 + MOVQ DI, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm64Klarge_forward_sse_loop_32 - LEAQ -32(R9)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(R8)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm64Klarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm64Klarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm64Klarge_forward_sse_loop_32: - MOVOU -32(R9)(R12*1), X4 - MOVOU -16(R9)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R8, R12 + MOVOU -32(R8)(R11*1), X4 + MOVOU -16(R8)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ DI, R11 JAE emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm64Klarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeSnappyBlockAsm64K: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), BX // matchLen - XORL R11, R11 - CMPL R8, $0x08 + XORL R10, R10 + CMPL DI, $0x08 JL matchlen_match4_repeat_extend_encodeSnappyBlockAsm64K matchlen_loopback_repeat_extend_encodeSnappyBlockAsm64K: - MOVQ (R9)(R11*1), R10 - XORQ (SI)(R11*1), R10 - TESTQ R10, R10 + MOVQ (R8)(R10*1), R9 + XORQ (BX)(R10*1), R9 + TESTQ R9, R9 JZ matchlen_loop_repeat_extend_encodeSnappyBlockAsm64K #ifdef GOAMD64_v3 - TZCNTQ R10, R10 + TZCNTQ R9, R9 #else - BSFQ R10, R10 + BSFQ R9, R9 #endif - SARQ $0x03, R10 - LEAL (R11)(R10*1), R11 + SARQ $0x03, R9 + LEAL (R10)(R9*1), R10 JMP repeat_extend_forward_end_encodeSnappyBlockAsm64K matchlen_loop_repeat_extend_encodeSnappyBlockAsm64K: - LEAL -8(R8), R8 - LEAL 8(R11), R11 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R10), R10 + CMPL DI, $0x08 JGE matchlen_loopback_repeat_extend_encodeSnappyBlockAsm64K JZ repeat_extend_forward_end_encodeSnappyBlockAsm64K matchlen_match4_repeat_extend_encodeSnappyBlockAsm64K: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_repeat_extend_encodeSnappyBlockAsm64K - MOVL (R9)(R11*1), R10 - CMPL (SI)(R11*1), R10 + MOVL (R8)(R10*1), R9 + CMPL (BX)(R10*1), R9 JNE matchlen_match2_repeat_extend_encodeSnappyBlockAsm64K - SUBL $0x04, R8 - LEAL 4(R11), R11 + SUBL $0x04, DI + LEAL 4(R10), R10 matchlen_match2_repeat_extend_encodeSnappyBlockAsm64K: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_repeat_extend_encodeSnappyBlockAsm64K - MOVW (R9)(R11*1), R10 - CMPW (SI)(R11*1), R10 + MOVW (R8)(R10*1), R9 + CMPW (BX)(R10*1), R9 JNE matchlen_match1_repeat_extend_encodeSnappyBlockAsm64K - SUBL $0x02, R8 - LEAL 2(R11), R11 + SUBL $0x02, DI + LEAL 2(R10), R10 matchlen_match1_repeat_extend_encodeSnappyBlockAsm64K: - CMPL R8, $0x01 + CMPL DI, $0x01 JL repeat_extend_forward_end_encodeSnappyBlockAsm64K - MOVB (R9)(R11*1), R10 - CMPB (SI)(R11*1), R10 + MOVB (R8)(R10*1), R9 + CMPB (BX)(R10*1), R9 JNE repeat_extend_forward_end_encodeSnappyBlockAsm64K - LEAL 1(R11), R11 + LEAL 1(R10), R10 repeat_extend_forward_end_encodeSnappyBlockAsm64K: - ADDL R11, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI + ADDL R10, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI // emitCopy two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm64K: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm64K MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX JMP two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm64K two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm64K: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm64K - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm64K - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeSnappyBlockAsm64K emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm64K: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeSnappyBlockAsm64K: @@ -11536,16 +11483,16 @@ repeat_end_emit_encodeSnappyBlockAsm64K: JMP search_loop_encodeSnappyBlockAsm64K no_repeat_found_encodeSnappyBlockAsm64K: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBlockAsm64K - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeSnappyBlockAsm64K - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeSnappyBlockAsm64K MOVL 20(SP), CX JMP search_loop_encodeSnappyBlockAsm64K @@ -11555,288 +11502,288 @@ candidate3_match_encodeSnappyBlockAsm64K: JMP candidate_match_encodeSnappyBlockAsm64K candidate2_match_encodeSnappyBlockAsm64K: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBlockAsm64K: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBlockAsm64K match_extend_back_loop_encodeSnappyBlockAsm64K: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBlockAsm64K - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBlockAsm64K LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBlockAsm64K JMP match_extend_back_loop_encodeSnappyBlockAsm64K match_extend_back_end_encodeSnappyBlockAsm64K: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBlockAsm64K MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBlockAsm64K: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeSnappyBlockAsm64K - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeSnappyBlockAsm64K - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBlockAsm64K MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBlockAsm64K two_bytes_match_emit_encodeSnappyBlockAsm64K: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeSnappyBlockAsm64K JMP memmove_long_match_emit_encodeSnappyBlockAsm64K one_byte_match_emit_encodeSnappyBlockAsm64K: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBlockAsm64K: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBlockAsm64K_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm64K_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm64K_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBlockAsm64K_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBlockAsm64K_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm64K emit_lit_memmove_match_emit_encodeSnappyBlockAsm64K_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm64K emit_lit_memmove_match_emit_encodeSnappyBlockAsm64K_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm64K emit_lit_memmove_match_emit_encodeSnappyBlockAsm64K_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBlockAsm64K: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeSnappyBlockAsm64K memmove_long_match_emit_encodeSnappyBlockAsm64K: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm64Klarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm64Klarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm64Klarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm64Klarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm64Klarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeSnappyBlockAsm64K: match_nolit_loop_encodeSnappyBlockAsm64K: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBlockAsm64K matchlen_loopback_match_nolit_encodeSnappyBlockAsm64K: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeSnappyBlockAsm64K #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeSnappyBlockAsm64K matchlen_loop_match_nolit_encodeSnappyBlockAsm64K: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBlockAsm64K JZ match_nolit_end_encodeSnappyBlockAsm64K matchlen_match4_match_nolit_encodeSnappyBlockAsm64K: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBlockAsm64K - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeSnappyBlockAsm64K - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeSnappyBlockAsm64K: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBlockAsm64K - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeSnappyBlockAsm64K - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeSnappyBlockAsm64K: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeSnappyBlockAsm64K - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeSnappyBlockAsm64K - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeSnappyBlockAsm64K: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy two_byte_offset_match_nolit_encodeSnappyBlockAsm64K: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBlockAsm64K MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBlockAsm64K two_byte_offset_short_match_nolit_encodeSnappyBlockAsm64K: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm64K - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm64K - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBlockAsm64K emit_copy_three_match_nolit_encodeSnappyBlockAsm64K: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBlockAsm64K: CMPL CX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm64K - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeSnappyBlockAsm64K MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeSnappyBlockAsm64K: - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x10, R8 - IMULQ R9, R8 - SHRQ $0x32, R8 - SHLQ $0x10, SI - IMULQ R9, SI - SHRQ $0x32, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x10, DI + IMULQ R8, DI + SHRQ $0x32, DI + SHLQ $0x10, BX + IMULQ R8, BX + SHRQ $0x32, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeSnappyBlockAsm64K INCL CX JMP search_loop_encodeSnappyBlockAsm64K @@ -12021,8 +11968,8 @@ zero_loop_encodeSnappyBlockAsm12B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -12032,278 +11979,278 @@ zero_loop_encodeSnappyBlockAsm12B: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBlockAsm12B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x05, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x05, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm12B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x000000cf1bbcdcbb, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x18, R10 - IMULQ R9, R10 - SHRQ $0x34, R10 - SHLQ $0x18, R11 - IMULQ R9, R11 - SHRQ $0x34, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x000000cf1bbcdcbb, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x18, R9 + IMULQ R8, R9 + SHRQ $0x34, R9 SHLQ $0x18, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x34, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 - JNE no_repeat_found_encodeSnappyBlockAsm12B - LEAL 1(CX), DI - MOVL 12(SP), SI - MOVL DI, R8 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x18, R9 + IMULQ R8, R9 + SHRQ $0x34, R9 + MOVL CX, R8 SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 + JNE no_repeat_found_encodeSnappyBlockAsm12B + LEAL 1(CX), SI + MOVL 12(SP), BX + MOVL SI, DI + SUBL 16(SP), DI JZ repeat_extend_back_end_encodeSnappyBlockAsm12B repeat_extend_back_loop_encodeSnappyBlockAsm12B: - CMPL DI, SI + CMPL SI, BX JLE repeat_extend_back_end_encodeSnappyBlockAsm12B - MOVB -1(DX)(R8*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(DI*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeSnappyBlockAsm12B - LEAL -1(DI), DI - DECL R8 + LEAL -1(SI), SI + DECL DI JNZ repeat_extend_back_loop_encodeSnappyBlockAsm12B repeat_extend_back_end_encodeSnappyBlockAsm12B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeSnappyBlockAsm12B - MOVL DI, R8 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R9 - SUBL SI, R8 - LEAL -1(R8), SI - CMPL SI, $0x3c + MOVL SI, DI + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R8 + SUBL BX, DI + LEAL -1(DI), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeSnappyBlockAsm12B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeSnappyBlockAsm12B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeSnappyBlockAsm12B two_bytes_repeat_emit_encodeSnappyBlockAsm12B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeSnappyBlockAsm12B JMP memmove_long_repeat_emit_encodeSnappyBlockAsm12B one_byte_repeat_emit_encodeSnappyBlockAsm12B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeSnappyBlockAsm12B: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveShort - CMPQ R8, $0x08 + CMPQ DI, $0x08 JLE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm12B_memmove_move_8 - CMPQ R8, $0x10 + CMPQ DI, $0x10 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm12B_memmove_move_8through16 - CMPQ R8, $0x20 + CMPQ DI, $0x20 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm12B_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm12B_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm12B_memmove_move_8: - MOVQ (R9), R10 - MOVQ R10, (AX) + MOVQ (R8), R9 + MOVQ R9, (AX) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm12B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm12B_memmove_move_8through16: - MOVQ (R9), R10 - MOVQ -8(R9)(R8*1), R9 - MOVQ R10, (AX) - MOVQ R9, -8(AX)(R8*1) + MOVQ (R8), R9 + MOVQ -8(R8)(DI*1), R8 + MOVQ R9, (AX) + MOVQ R8, -8(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm12B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm12B_memmove_move_17through32: - MOVOU (R9), X0 - MOVOU -16(R9)(R8*1), X1 + MOVOU (R8), X0 + MOVOU -16(R8)(DI*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R8*1) + MOVOU X1, -16(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm12B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm12B_memmove_move_33through64: - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) memmove_end_copy_repeat_emit_encodeSnappyBlockAsm12B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeSnappyBlockAsm12B memmove_long_repeat_emit_encodeSnappyBlockAsm12B: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveLong - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 - MOVQ R8, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 + MOVQ DI, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm12Blarge_forward_sse_loop_32 - LEAQ -32(R9)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(R8)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm12Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm12Blarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm12Blarge_forward_sse_loop_32: - MOVOU -32(R9)(R12*1), X4 - MOVOU -16(R9)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R8, R12 + MOVOU -32(R8)(R11*1), X4 + MOVOU -16(R8)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ DI, R11 JAE emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm12Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeSnappyBlockAsm12B: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), BX // matchLen - XORL R11, R11 - CMPL R8, $0x08 + XORL R10, R10 + CMPL DI, $0x08 JL matchlen_match4_repeat_extend_encodeSnappyBlockAsm12B matchlen_loopback_repeat_extend_encodeSnappyBlockAsm12B: - MOVQ (R9)(R11*1), R10 - XORQ (SI)(R11*1), R10 - TESTQ R10, R10 + MOVQ (R8)(R10*1), R9 + XORQ (BX)(R10*1), R9 + TESTQ R9, R9 JZ matchlen_loop_repeat_extend_encodeSnappyBlockAsm12B #ifdef GOAMD64_v3 - TZCNTQ R10, R10 + TZCNTQ R9, R9 #else - BSFQ R10, R10 + BSFQ R9, R9 #endif - SARQ $0x03, R10 - LEAL (R11)(R10*1), R11 + SARQ $0x03, R9 + LEAL (R10)(R9*1), R10 JMP repeat_extend_forward_end_encodeSnappyBlockAsm12B matchlen_loop_repeat_extend_encodeSnappyBlockAsm12B: - LEAL -8(R8), R8 - LEAL 8(R11), R11 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R10), R10 + CMPL DI, $0x08 JGE matchlen_loopback_repeat_extend_encodeSnappyBlockAsm12B JZ repeat_extend_forward_end_encodeSnappyBlockAsm12B matchlen_match4_repeat_extend_encodeSnappyBlockAsm12B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_repeat_extend_encodeSnappyBlockAsm12B - MOVL (R9)(R11*1), R10 - CMPL (SI)(R11*1), R10 + MOVL (R8)(R10*1), R9 + CMPL (BX)(R10*1), R9 JNE matchlen_match2_repeat_extend_encodeSnappyBlockAsm12B - SUBL $0x04, R8 - LEAL 4(R11), R11 + SUBL $0x04, DI + LEAL 4(R10), R10 matchlen_match2_repeat_extend_encodeSnappyBlockAsm12B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_repeat_extend_encodeSnappyBlockAsm12B - MOVW (R9)(R11*1), R10 - CMPW (SI)(R11*1), R10 + MOVW (R8)(R10*1), R9 + CMPW (BX)(R10*1), R9 JNE matchlen_match1_repeat_extend_encodeSnappyBlockAsm12B - SUBL $0x02, R8 - LEAL 2(R11), R11 + SUBL $0x02, DI + LEAL 2(R10), R10 matchlen_match1_repeat_extend_encodeSnappyBlockAsm12B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL repeat_extend_forward_end_encodeSnappyBlockAsm12B - MOVB (R9)(R11*1), R10 - CMPB (SI)(R11*1), R10 + MOVB (R8)(R10*1), R9 + CMPB (BX)(R10*1), R9 JNE repeat_extend_forward_end_encodeSnappyBlockAsm12B - LEAL 1(R11), R11 + LEAL 1(R10), R10 repeat_extend_forward_end_encodeSnappyBlockAsm12B: - ADDL R11, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI + ADDL R10, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI // emitCopy two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm12B: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm12B MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX JMP two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm12B two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm12B: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm12B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm12B - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeSnappyBlockAsm12B emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm12B: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeSnappyBlockAsm12B: @@ -12311,16 +12258,16 @@ repeat_end_emit_encodeSnappyBlockAsm12B: JMP search_loop_encodeSnappyBlockAsm12B no_repeat_found_encodeSnappyBlockAsm12B: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBlockAsm12B - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeSnappyBlockAsm12B - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeSnappyBlockAsm12B MOVL 20(SP), CX JMP search_loop_encodeSnappyBlockAsm12B @@ -12330,288 +12277,288 @@ candidate3_match_encodeSnappyBlockAsm12B: JMP candidate_match_encodeSnappyBlockAsm12B candidate2_match_encodeSnappyBlockAsm12B: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBlockAsm12B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBlockAsm12B match_extend_back_loop_encodeSnappyBlockAsm12B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBlockAsm12B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBlockAsm12B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBlockAsm12B JMP match_extend_back_loop_encodeSnappyBlockAsm12B match_extend_back_end_encodeSnappyBlockAsm12B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBlockAsm12B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBlockAsm12B: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeSnappyBlockAsm12B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeSnappyBlockAsm12B - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBlockAsm12B MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBlockAsm12B two_bytes_match_emit_encodeSnappyBlockAsm12B: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeSnappyBlockAsm12B JMP memmove_long_match_emit_encodeSnappyBlockAsm12B one_byte_match_emit_encodeSnappyBlockAsm12B: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBlockAsm12B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBlockAsm12B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm12B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm12B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBlockAsm12B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBlockAsm12B_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm12B emit_lit_memmove_match_emit_encodeSnappyBlockAsm12B_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm12B emit_lit_memmove_match_emit_encodeSnappyBlockAsm12B_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm12B emit_lit_memmove_match_emit_encodeSnappyBlockAsm12B_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBlockAsm12B: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeSnappyBlockAsm12B memmove_long_match_emit_encodeSnappyBlockAsm12B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm12Blarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm12Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm12Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm12Blarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm12Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeSnappyBlockAsm12B: match_nolit_loop_encodeSnappyBlockAsm12B: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBlockAsm12B matchlen_loopback_match_nolit_encodeSnappyBlockAsm12B: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeSnappyBlockAsm12B #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeSnappyBlockAsm12B matchlen_loop_match_nolit_encodeSnappyBlockAsm12B: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBlockAsm12B JZ match_nolit_end_encodeSnappyBlockAsm12B matchlen_match4_match_nolit_encodeSnappyBlockAsm12B: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBlockAsm12B - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeSnappyBlockAsm12B - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeSnappyBlockAsm12B: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBlockAsm12B - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeSnappyBlockAsm12B - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeSnappyBlockAsm12B: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeSnappyBlockAsm12B - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeSnappyBlockAsm12B - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeSnappyBlockAsm12B: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy two_byte_offset_match_nolit_encodeSnappyBlockAsm12B: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBlockAsm12B MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBlockAsm12B two_byte_offset_short_match_nolit_encodeSnappyBlockAsm12B: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm12B - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm12B - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBlockAsm12B emit_copy_three_match_nolit_encodeSnappyBlockAsm12B: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBlockAsm12B: CMPL CX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm12B - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeSnappyBlockAsm12B MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeSnappyBlockAsm12B: - MOVQ $0x000000cf1bbcdcbb, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x18, R8 - IMULQ R9, R8 - SHRQ $0x34, R8 - SHLQ $0x18, SI - IMULQ R9, SI - SHRQ $0x34, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x000000cf1bbcdcbb, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x18, DI + IMULQ R8, DI + SHRQ $0x34, DI + SHLQ $0x18, BX + IMULQ R8, BX + SHRQ $0x34, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeSnappyBlockAsm12B INCL CX JMP search_loop_encodeSnappyBlockAsm12B @@ -12796,8 +12743,8 @@ zero_loop_encodeSnappyBlockAsm10B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -12807,278 +12754,278 @@ zero_loop_encodeSnappyBlockAsm10B: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBlockAsm10B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x05, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x05, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm10B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x9e3779b1, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x20, R10 - IMULQ R9, R10 - SHRQ $0x36, R10 - SHLQ $0x20, R11 - IMULQ R9, R11 - SHRQ $0x36, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x9e3779b1, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x20, R9 + IMULQ R8, R9 + SHRQ $0x36, R9 SHLQ $0x20, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x36, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 - JNE no_repeat_found_encodeSnappyBlockAsm10B - LEAL 1(CX), DI - MOVL 12(SP), SI - MOVL DI, R8 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x20, R9 + IMULQ R8, R9 + SHRQ $0x36, R9 + MOVL CX, R8 SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 + JNE no_repeat_found_encodeSnappyBlockAsm10B + LEAL 1(CX), SI + MOVL 12(SP), BX + MOVL SI, DI + SUBL 16(SP), DI JZ repeat_extend_back_end_encodeSnappyBlockAsm10B repeat_extend_back_loop_encodeSnappyBlockAsm10B: - CMPL DI, SI + CMPL SI, BX JLE repeat_extend_back_end_encodeSnappyBlockAsm10B - MOVB -1(DX)(R8*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(DI*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeSnappyBlockAsm10B - LEAL -1(DI), DI - DECL R8 + LEAL -1(SI), SI + DECL DI JNZ repeat_extend_back_loop_encodeSnappyBlockAsm10B repeat_extend_back_end_encodeSnappyBlockAsm10B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeSnappyBlockAsm10B - MOVL DI, R8 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R9 - SUBL SI, R8 - LEAL -1(R8), SI - CMPL SI, $0x3c + MOVL SI, DI + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R8 + SUBL BX, DI + LEAL -1(DI), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeSnappyBlockAsm10B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeSnappyBlockAsm10B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeSnappyBlockAsm10B two_bytes_repeat_emit_encodeSnappyBlockAsm10B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeSnappyBlockAsm10B JMP memmove_long_repeat_emit_encodeSnappyBlockAsm10B one_byte_repeat_emit_encodeSnappyBlockAsm10B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeSnappyBlockAsm10B: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveShort - CMPQ R8, $0x08 + CMPQ DI, $0x08 JLE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm10B_memmove_move_8 - CMPQ R8, $0x10 + CMPQ DI, $0x10 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm10B_memmove_move_8through16 - CMPQ R8, $0x20 + CMPQ DI, $0x20 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm10B_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm10B_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm10B_memmove_move_8: - MOVQ (R9), R10 - MOVQ R10, (AX) + MOVQ (R8), R9 + MOVQ R9, (AX) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm10B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm10B_memmove_move_8through16: - MOVQ (R9), R10 - MOVQ -8(R9)(R8*1), R9 - MOVQ R10, (AX) - MOVQ R9, -8(AX)(R8*1) + MOVQ (R8), R9 + MOVQ -8(R8)(DI*1), R8 + MOVQ R9, (AX) + MOVQ R8, -8(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm10B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm10B_memmove_move_17through32: - MOVOU (R9), X0 - MOVOU -16(R9)(R8*1), X1 + MOVOU (R8), X0 + MOVOU -16(R8)(DI*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R8*1) + MOVOU X1, -16(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm10B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm10B_memmove_move_33through64: - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) memmove_end_copy_repeat_emit_encodeSnappyBlockAsm10B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeSnappyBlockAsm10B memmove_long_repeat_emit_encodeSnappyBlockAsm10B: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveLong - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 - MOVQ R8, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 + MOVQ DI, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm10Blarge_forward_sse_loop_32 - LEAQ -32(R9)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(R8)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm10Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm10Blarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm10Blarge_forward_sse_loop_32: - MOVOU -32(R9)(R12*1), X4 - MOVOU -16(R9)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R8, R12 + MOVOU -32(R8)(R11*1), X4 + MOVOU -16(R8)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ DI, R11 JAE emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm10Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeSnappyBlockAsm10B: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), BX // matchLen - XORL R11, R11 - CMPL R8, $0x08 + XORL R10, R10 + CMPL DI, $0x08 JL matchlen_match4_repeat_extend_encodeSnappyBlockAsm10B matchlen_loopback_repeat_extend_encodeSnappyBlockAsm10B: - MOVQ (R9)(R11*1), R10 - XORQ (SI)(R11*1), R10 - TESTQ R10, R10 + MOVQ (R8)(R10*1), R9 + XORQ (BX)(R10*1), R9 + TESTQ R9, R9 JZ matchlen_loop_repeat_extend_encodeSnappyBlockAsm10B #ifdef GOAMD64_v3 - TZCNTQ R10, R10 + TZCNTQ R9, R9 #else - BSFQ R10, R10 + BSFQ R9, R9 #endif - SARQ $0x03, R10 - LEAL (R11)(R10*1), R11 + SARQ $0x03, R9 + LEAL (R10)(R9*1), R10 JMP repeat_extend_forward_end_encodeSnappyBlockAsm10B matchlen_loop_repeat_extend_encodeSnappyBlockAsm10B: - LEAL -8(R8), R8 - LEAL 8(R11), R11 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R10), R10 + CMPL DI, $0x08 JGE matchlen_loopback_repeat_extend_encodeSnappyBlockAsm10B JZ repeat_extend_forward_end_encodeSnappyBlockAsm10B matchlen_match4_repeat_extend_encodeSnappyBlockAsm10B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_repeat_extend_encodeSnappyBlockAsm10B - MOVL (R9)(R11*1), R10 - CMPL (SI)(R11*1), R10 + MOVL (R8)(R10*1), R9 + CMPL (BX)(R10*1), R9 JNE matchlen_match2_repeat_extend_encodeSnappyBlockAsm10B - SUBL $0x04, R8 - LEAL 4(R11), R11 + SUBL $0x04, DI + LEAL 4(R10), R10 matchlen_match2_repeat_extend_encodeSnappyBlockAsm10B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_repeat_extend_encodeSnappyBlockAsm10B - MOVW (R9)(R11*1), R10 - CMPW (SI)(R11*1), R10 + MOVW (R8)(R10*1), R9 + CMPW (BX)(R10*1), R9 JNE matchlen_match1_repeat_extend_encodeSnappyBlockAsm10B - SUBL $0x02, R8 - LEAL 2(R11), R11 + SUBL $0x02, DI + LEAL 2(R10), R10 matchlen_match1_repeat_extend_encodeSnappyBlockAsm10B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL repeat_extend_forward_end_encodeSnappyBlockAsm10B - MOVB (R9)(R11*1), R10 - CMPB (SI)(R11*1), R10 + MOVB (R8)(R10*1), R9 + CMPB (BX)(R10*1), R9 JNE repeat_extend_forward_end_encodeSnappyBlockAsm10B - LEAL 1(R11), R11 + LEAL 1(R10), R10 repeat_extend_forward_end_encodeSnappyBlockAsm10B: - ADDL R11, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI + ADDL R10, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI // emitCopy two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm10B: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm10B MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX JMP two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm10B two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm10B: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm10B - CMPL DI, $0x00000800 + CMPL SI, $0x00000800 JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm10B - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeSnappyBlockAsm10B emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm10B: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeSnappyBlockAsm10B: @@ -13086,16 +13033,16 @@ repeat_end_emit_encodeSnappyBlockAsm10B: JMP search_loop_encodeSnappyBlockAsm10B no_repeat_found_encodeSnappyBlockAsm10B: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBlockAsm10B - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeSnappyBlockAsm10B - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeSnappyBlockAsm10B MOVL 20(SP), CX JMP search_loop_encodeSnappyBlockAsm10B @@ -13105,288 +13052,288 @@ candidate3_match_encodeSnappyBlockAsm10B: JMP candidate_match_encodeSnappyBlockAsm10B candidate2_match_encodeSnappyBlockAsm10B: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBlockAsm10B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBlockAsm10B match_extend_back_loop_encodeSnappyBlockAsm10B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBlockAsm10B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBlockAsm10B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBlockAsm10B JMP match_extend_back_loop_encodeSnappyBlockAsm10B match_extend_back_end_encodeSnappyBlockAsm10B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBlockAsm10B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBlockAsm10B: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeSnappyBlockAsm10B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeSnappyBlockAsm10B - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBlockAsm10B MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBlockAsm10B two_bytes_match_emit_encodeSnappyBlockAsm10B: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeSnappyBlockAsm10B JMP memmove_long_match_emit_encodeSnappyBlockAsm10B one_byte_match_emit_encodeSnappyBlockAsm10B: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBlockAsm10B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBlockAsm10B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm10B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm10B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBlockAsm10B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBlockAsm10B_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm10B emit_lit_memmove_match_emit_encodeSnappyBlockAsm10B_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm10B emit_lit_memmove_match_emit_encodeSnappyBlockAsm10B_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm10B emit_lit_memmove_match_emit_encodeSnappyBlockAsm10B_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBlockAsm10B: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeSnappyBlockAsm10B memmove_long_match_emit_encodeSnappyBlockAsm10B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm10Blarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm10Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm10Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm10Blarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm10Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeSnappyBlockAsm10B: match_nolit_loop_encodeSnappyBlockAsm10B: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBlockAsm10B matchlen_loopback_match_nolit_encodeSnappyBlockAsm10B: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeSnappyBlockAsm10B #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeSnappyBlockAsm10B matchlen_loop_match_nolit_encodeSnappyBlockAsm10B: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBlockAsm10B JZ match_nolit_end_encodeSnappyBlockAsm10B matchlen_match4_match_nolit_encodeSnappyBlockAsm10B: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBlockAsm10B - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeSnappyBlockAsm10B - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeSnappyBlockAsm10B: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBlockAsm10B - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeSnappyBlockAsm10B - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeSnappyBlockAsm10B: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeSnappyBlockAsm10B - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeSnappyBlockAsm10B - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeSnappyBlockAsm10B: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy two_byte_offset_match_nolit_encodeSnappyBlockAsm10B: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBlockAsm10B MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBlockAsm10B two_byte_offset_short_match_nolit_encodeSnappyBlockAsm10B: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm10B - CMPL SI, $0x00000800 + CMPL BX, $0x00000800 JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm10B - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBlockAsm10B emit_copy_three_match_nolit_encodeSnappyBlockAsm10B: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBlockAsm10B: CMPL CX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm10B - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeSnappyBlockAsm10B MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeSnappyBlockAsm10B: - MOVQ $0x9e3779b1, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x20, R8 - IMULQ R9, R8 - SHRQ $0x36, R8 - SHLQ $0x20, SI - IMULQ R9, SI - SHRQ $0x36, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x9e3779b1, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x20, DI + IMULQ R8, DI + SHRQ $0x36, DI + SHLQ $0x20, BX + IMULQ R8, BX + SHRQ $0x36, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeSnappyBlockAsm10B INCL CX JMP search_loop_encodeSnappyBlockAsm10B @@ -13571,8 +13518,8 @@ zero_loop_encodeSnappyBlockAsm8B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -13582,276 +13529,276 @@ zero_loop_encodeSnappyBlockAsm8B: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBlockAsm8B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x04, SI - LEAL 4(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x04, BX + LEAL 4(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm8B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x9e3779b1, R9 - MOVQ DI, R10 - MOVQ DI, R11 - SHRQ $0x08, R11 - SHLQ $0x20, R10 - IMULQ R9, R10 - SHRQ $0x38, R10 - SHLQ $0x20, R11 - IMULQ R9, R11 - SHRQ $0x38, R11 - MOVL 24(SP)(R10*4), SI - MOVL 24(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - LEAL 1(CX), R10 - MOVL R10, 24(SP)(R11*4) - MOVQ DI, R10 - SHRQ $0x10, R10 + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x9e3779b1, R8 + MOVQ SI, R9 + MOVQ SI, R10 + SHRQ $0x08, R10 + SHLQ $0x20, R9 + IMULQ R8, R9 + SHRQ $0x38, R9 SHLQ $0x20, R10 - IMULQ R9, R10 + IMULQ R8, R10 SHRQ $0x38, R10 - MOVL CX, R9 - SUBL 16(SP), R9 - MOVL 1(DX)(R9*1), R11 - MOVQ DI, R9 - SHRQ $0x08, R9 - CMPL R9, R11 - JNE no_repeat_found_encodeSnappyBlockAsm8B - LEAL 1(CX), DI - MOVL 12(SP), SI - MOVL DI, R8 + MOVL 24(SP)(R9*4), BX + MOVL 24(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + LEAL 1(CX), R9 + MOVL R9, 24(SP)(R10*4) + MOVQ SI, R9 + SHRQ $0x10, R9 + SHLQ $0x20, R9 + IMULQ R8, R9 + SHRQ $0x38, R9 + MOVL CX, R8 SUBL 16(SP), R8 + MOVL 1(DX)(R8*1), R10 + MOVQ SI, R8 + SHRQ $0x08, R8 + CMPL R8, R10 + JNE no_repeat_found_encodeSnappyBlockAsm8B + LEAL 1(CX), SI + MOVL 12(SP), BX + MOVL SI, DI + SUBL 16(SP), DI JZ repeat_extend_back_end_encodeSnappyBlockAsm8B repeat_extend_back_loop_encodeSnappyBlockAsm8B: - CMPL DI, SI + CMPL SI, BX JLE repeat_extend_back_end_encodeSnappyBlockAsm8B - MOVB -1(DX)(R8*1), BL - MOVB -1(DX)(DI*1), R9 - CMPB BL, R9 + MOVB -1(DX)(DI*1), R8 + MOVB -1(DX)(SI*1), R9 + CMPB R8, R9 JNE repeat_extend_back_end_encodeSnappyBlockAsm8B - LEAL -1(DI), DI - DECL R8 + LEAL -1(SI), SI + DECL DI JNZ repeat_extend_back_loop_encodeSnappyBlockAsm8B repeat_extend_back_end_encodeSnappyBlockAsm8B: - MOVL 12(SP), SI - CMPL SI, DI + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_repeat_emit_encodeSnappyBlockAsm8B - MOVL DI, R8 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R9 - SUBL SI, R8 - LEAL -1(R8), SI - CMPL SI, $0x3c + MOVL SI, DI + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R8 + SUBL BX, DI + LEAL -1(DI), BX + CMPL BX, $0x3c JLT one_byte_repeat_emit_encodeSnappyBlockAsm8B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_repeat_emit_encodeSnappyBlockAsm8B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_repeat_emit_encodeSnappyBlockAsm8B two_bytes_repeat_emit_encodeSnappyBlockAsm8B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_repeat_emit_encodeSnappyBlockAsm8B JMP memmove_long_repeat_emit_encodeSnappyBlockAsm8B one_byte_repeat_emit_encodeSnappyBlockAsm8B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_repeat_emit_encodeSnappyBlockAsm8B: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveShort - CMPQ R8, $0x08 + CMPQ DI, $0x08 JLE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm8B_memmove_move_8 - CMPQ R8, $0x10 + CMPQ DI, $0x10 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm8B_memmove_move_8through16 - CMPQ R8, $0x20 + CMPQ DI, $0x20 JBE emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm8B_memmove_move_17through32 JMP emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm8B_memmove_move_33through64 emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm8B_memmove_move_8: - MOVQ (R9), R10 - MOVQ R10, (AX) + MOVQ (R8), R9 + MOVQ R9, (AX) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm8B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm8B_memmove_move_8through16: - MOVQ (R9), R10 - MOVQ -8(R9)(R8*1), R9 - MOVQ R10, (AX) - MOVQ R9, -8(AX)(R8*1) + MOVQ (R8), R9 + MOVQ -8(R8)(DI*1), R8 + MOVQ R9, (AX) + MOVQ R8, -8(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm8B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm8B_memmove_move_17through32: - MOVOU (R9), X0 - MOVOU -16(R9)(R8*1), X1 + MOVOU (R8), X0 + MOVOU -16(R8)(DI*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R8*1) + MOVOU X1, -16(AX)(DI*1) JMP memmove_end_copy_repeat_emit_encodeSnappyBlockAsm8B emit_lit_memmove_repeat_emit_encodeSnappyBlockAsm8B_memmove_move_33through64: - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) memmove_end_copy_repeat_emit_encodeSnappyBlockAsm8B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_repeat_emit_encodeSnappyBlockAsm8B memmove_long_repeat_emit_encodeSnappyBlockAsm8B: - LEAQ (AX)(R8*1), SI + LEAQ (AX)(DI*1), BX // genMemMoveLong - MOVOU (R9), X0 - MOVOU 16(R9), X1 - MOVOU -32(R9)(R8*1), X2 - MOVOU -16(R9)(R8*1), X3 - MOVQ R8, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (R8), X0 + MOVOU 16(R8), X1 + MOVOU -32(R8)(DI*1), X2 + MOVOU -16(R8)(DI*1), X3 + MOVQ DI, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm8Blarge_forward_sse_loop_32 - LEAQ -32(R9)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(R8)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm8Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm8Blarge_big_loop_back emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm8Blarge_forward_sse_loop_32: - MOVOU -32(R9)(R12*1), X4 - MOVOU -16(R9)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R8, R12 + MOVOU -32(R8)(R11*1), X4 + MOVOU -16(R8)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ DI, R11 JAE emit_lit_memmove_long_repeat_emit_encodeSnappyBlockAsm8Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R8*1) - MOVOU X3, -16(AX)(R8*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(DI*1) + MOVOU X3, -16(AX)(DI*1) + MOVQ BX, AX emit_literal_done_repeat_emit_encodeSnappyBlockAsm8B: ADDL $0x05, CX - MOVL CX, SI - SUBL 16(SP), SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), SI + MOVL CX, BX + SUBL 16(SP), BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), BX // matchLen - XORL R11, R11 - CMPL R8, $0x08 + XORL R10, R10 + CMPL DI, $0x08 JL matchlen_match4_repeat_extend_encodeSnappyBlockAsm8B matchlen_loopback_repeat_extend_encodeSnappyBlockAsm8B: - MOVQ (R9)(R11*1), R10 - XORQ (SI)(R11*1), R10 - TESTQ R10, R10 + MOVQ (R8)(R10*1), R9 + XORQ (BX)(R10*1), R9 + TESTQ R9, R9 JZ matchlen_loop_repeat_extend_encodeSnappyBlockAsm8B #ifdef GOAMD64_v3 - TZCNTQ R10, R10 + TZCNTQ R9, R9 #else - BSFQ R10, R10 + BSFQ R9, R9 #endif - SARQ $0x03, R10 - LEAL (R11)(R10*1), R11 + SARQ $0x03, R9 + LEAL (R10)(R9*1), R10 JMP repeat_extend_forward_end_encodeSnappyBlockAsm8B matchlen_loop_repeat_extend_encodeSnappyBlockAsm8B: - LEAL -8(R8), R8 - LEAL 8(R11), R11 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R10), R10 + CMPL DI, $0x08 JGE matchlen_loopback_repeat_extend_encodeSnappyBlockAsm8B JZ repeat_extend_forward_end_encodeSnappyBlockAsm8B matchlen_match4_repeat_extend_encodeSnappyBlockAsm8B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_repeat_extend_encodeSnappyBlockAsm8B - MOVL (R9)(R11*1), R10 - CMPL (SI)(R11*1), R10 + MOVL (R8)(R10*1), R9 + CMPL (BX)(R10*1), R9 JNE matchlen_match2_repeat_extend_encodeSnappyBlockAsm8B - SUBL $0x04, R8 - LEAL 4(R11), R11 + SUBL $0x04, DI + LEAL 4(R10), R10 matchlen_match2_repeat_extend_encodeSnappyBlockAsm8B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_repeat_extend_encodeSnappyBlockAsm8B - MOVW (R9)(R11*1), R10 - CMPW (SI)(R11*1), R10 + MOVW (R8)(R10*1), R9 + CMPW (BX)(R10*1), R9 JNE matchlen_match1_repeat_extend_encodeSnappyBlockAsm8B - SUBL $0x02, R8 - LEAL 2(R11), R11 + SUBL $0x02, DI + LEAL 2(R10), R10 matchlen_match1_repeat_extend_encodeSnappyBlockAsm8B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL repeat_extend_forward_end_encodeSnappyBlockAsm8B - MOVB (R9)(R11*1), R10 - CMPB (SI)(R11*1), R10 + MOVB (R8)(R10*1), R9 + CMPB (BX)(R10*1), R9 JNE repeat_extend_forward_end_encodeSnappyBlockAsm8B - LEAL 1(R11), R11 + LEAL 1(R10), R10 repeat_extend_forward_end_encodeSnappyBlockAsm8B: - ADDL R11, CX - MOVL CX, SI - SUBL DI, SI - MOVL 16(SP), DI + ADDL R10, CX + MOVL CX, BX + SUBL SI, BX + MOVL 16(SP), SI // emitCopy two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm8B: - CMPL SI, $0x40 + CMPL BX, $0x40 JLE two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm8B MOVB $0xee, (AX) - MOVW DI, 1(AX) - LEAL -60(SI), SI + MOVW SI, 1(AX) + LEAL -60(BX), BX ADDQ $0x03, AX JMP two_byte_offset_repeat_as_copy_encodeSnappyBlockAsm8B two_byte_offset_short_repeat_as_copy_encodeSnappyBlockAsm8B: - CMPL SI, $0x0c + MOVL BX, DI + SHLL $0x02, DI + CMPL BX, $0x0c JGE emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm8B - MOVB $0x01, BL - LEAL -16(BX)(SI*4), SI - MOVB DI, 1(AX) - SHRL $0x08, DI - SHLL $0x05, DI - ORL DI, SI - MOVB SI, (AX) + LEAL -15(DI), DI + MOVB SI, 1(AX) + SHRL $0x08, SI + SHLL $0x05, SI + ORL SI, DI + MOVB DI, (AX) ADDQ $0x02, AX JMP repeat_end_emit_encodeSnappyBlockAsm8B emit_copy_three_repeat_as_copy_encodeSnappyBlockAsm8B: - MOVB $0x02, BL - LEAL -4(BX)(SI*4), SI - MOVB SI, (AX) - MOVW DI, 1(AX) + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW SI, 1(AX) ADDQ $0x03, AX repeat_end_emit_encodeSnappyBlockAsm8B: @@ -13859,16 +13806,16 @@ repeat_end_emit_encodeSnappyBlockAsm8B: JMP search_loop_encodeSnappyBlockAsm8B no_repeat_found_encodeSnappyBlockAsm8B: - CMPL (DX)(SI*1), DI + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBlockAsm8B - SHRQ $0x08, DI - MOVL 24(SP)(R10*4), SI - LEAL 2(CX), R9 - CMPL (DX)(R8*1), DI + SHRQ $0x08, SI + MOVL 24(SP)(R9*4), BX + LEAL 2(CX), R8 + CMPL (DX)(DI*1), SI JEQ candidate2_match_encodeSnappyBlockAsm8B - MOVL R9, 24(SP)(R10*4) - SHRQ $0x08, DI - CMPL (DX)(SI*1), DI + MOVL R8, 24(SP)(R9*4) + SHRQ $0x08, SI + CMPL (DX)(BX*1), SI JEQ candidate3_match_encodeSnappyBlockAsm8B MOVL 20(SP), CX JMP search_loop_encodeSnappyBlockAsm8B @@ -13878,286 +13825,286 @@ candidate3_match_encodeSnappyBlockAsm8B: JMP candidate_match_encodeSnappyBlockAsm8B candidate2_match_encodeSnappyBlockAsm8B: - MOVL R9, 24(SP)(R10*4) + MOVL R8, 24(SP)(R9*4) INCL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBlockAsm8B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBlockAsm8B match_extend_back_loop_encodeSnappyBlockAsm8B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBlockAsm8B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBlockAsm8B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBlockAsm8B JMP match_extend_back_loop_encodeSnappyBlockAsm8B match_extend_back_end_encodeSnappyBlockAsm8B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBlockAsm8B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBlockAsm8B: - MOVL CX, DI - MOVL 12(SP), R8 - CMPL R8, DI + MOVL CX, SI + MOVL 12(SP), DI + CMPL DI, SI JEQ emit_literal_done_match_emit_encodeSnappyBlockAsm8B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(R8*1), DI - SUBL R8, R9 - LEAL -1(R9), R8 - CMPL R8, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(DI*1), SI + SUBL DI, R8 + LEAL -1(R8), DI + CMPL DI, $0x3c JLT one_byte_match_emit_encodeSnappyBlockAsm8B - CMPL R8, $0x00000100 + CMPL DI, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBlockAsm8B MOVB $0xf4, (AX) - MOVW R8, 1(AX) + MOVW DI, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBlockAsm8B two_bytes_match_emit_encodeSnappyBlockAsm8B: MOVB $0xf0, (AX) - MOVB R8, 1(AX) + MOVB DI, 1(AX) ADDQ $0x02, AX - CMPL R8, $0x40 + CMPL DI, $0x40 JL memmove_match_emit_encodeSnappyBlockAsm8B JMP memmove_long_match_emit_encodeSnappyBlockAsm8B one_byte_match_emit_encodeSnappyBlockAsm8B: - SHLB $0x02, R8 - MOVB R8, (AX) + SHLB $0x02, DI + MOVB DI, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBlockAsm8B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBlockAsm8B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm8B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBlockAsm8B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBlockAsm8B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBlockAsm8B_memmove_move_8: - MOVQ (DI), R10 - MOVQ R10, (AX) + MOVQ (SI), R9 + MOVQ R9, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm8B emit_lit_memmove_match_emit_encodeSnappyBlockAsm8B_memmove_move_8through16: - MOVQ (DI), R10 - MOVQ -8(DI)(R9*1), DI - MOVQ R10, (AX) - MOVQ DI, -8(AX)(R9*1) + MOVQ (SI), R9 + MOVQ -8(SI)(R8*1), SI + MOVQ R9, (AX) + MOVQ SI, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm8B emit_lit_memmove_match_emit_encodeSnappyBlockAsm8B_memmove_move_17through32: - MOVOU (DI), X0 - MOVOU -16(DI)(R9*1), X1 + MOVOU (SI), X0 + MOVOU -16(SI)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBlockAsm8B emit_lit_memmove_match_emit_encodeSnappyBlockAsm8B_memmove_move_33through64: - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBlockAsm8B: - MOVQ R8, AX + MOVQ DI, AX JMP emit_literal_done_match_emit_encodeSnappyBlockAsm8B memmove_long_match_emit_encodeSnappyBlockAsm8B: - LEAQ (AX)(R9*1), R8 + LEAQ (AX)(R8*1), DI // genMemMoveLong - MOVOU (DI), X0 - MOVOU 16(DI), X1 - MOVOU -32(DI)(R9*1), X2 - MOVOU -16(DI)(R9*1), X3 - MOVQ R9, R11 - SHRQ $0x05, R11 - MOVQ AX, R10 - ANDL $0x0000001f, R10 - MOVQ $0x00000040, R12 - SUBQ R10, R12 - DECQ R11 + MOVOU (SI), X0 + MOVOU 16(SI), X1 + MOVOU -32(SI)(R8*1), X2 + MOVOU -16(SI)(R8*1), X3 + MOVQ R8, R10 + SHRQ $0x05, R10 + MOVQ AX, R9 + ANDL $0x0000001f, R9 + MOVQ $0x00000040, R11 + SUBQ R9, R11 + DECQ R10 JA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm8Blarge_forward_sse_loop_32 - LEAQ -32(DI)(R12*1), R10 - LEAQ -32(AX)(R12*1), R13 + LEAQ -32(SI)(R11*1), R9 + LEAQ -32(AX)(R11*1), R12 emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm8Blarge_big_loop_back: - MOVOU (R10), X4 - MOVOU 16(R10), X5 - MOVOA X4, (R13) - MOVOA X5, 16(R13) - ADDQ $0x20, R13 - ADDQ $0x20, R10 + MOVOU (R9), X4 + MOVOU 16(R9), X5 + MOVOA X4, (R12) + MOVOA X5, 16(R12) ADDQ $0x20, R12 - DECQ R11 + ADDQ $0x20, R9 + ADDQ $0x20, R11 + DECQ R10 JNA emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm8Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm8Blarge_forward_sse_loop_32: - MOVOU -32(DI)(R12*1), X4 - MOVOU -16(DI)(R12*1), X5 - MOVOA X4, -32(AX)(R12*1) - MOVOA X5, -16(AX)(R12*1) - ADDQ $0x20, R12 - CMPQ R9, R12 + MOVOU -32(SI)(R11*1), X4 + MOVOU -16(SI)(R11*1), X5 + MOVOA X4, -32(AX)(R11*1) + MOVOA X5, -16(AX)(R11*1) + ADDQ $0x20, R11 + CMPQ R8, R11 JAE emit_lit_memmove_long_match_emit_encodeSnappyBlockAsm8Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ R8, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ DI, AX emit_literal_done_match_emit_encodeSnappyBlockAsm8B: match_nolit_loop_encodeSnappyBlockAsm8B: - MOVL CX, DI - SUBL SI, DI - MOVL DI, 16(SP) + MOVL CX, SI + SUBL BX, SI + MOVL SI, 16(SP) ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), DI - SUBL CX, DI - LEAQ (DX)(CX*1), R8 - LEAQ (DX)(SI*1), SI + ADDL $0x04, BX + MOVQ src_len+32(FP), SI + SUBL CX, SI + LEAQ (DX)(CX*1), DI + LEAQ (DX)(BX*1), BX // matchLen - XORL R10, R10 - CMPL DI, $0x08 + XORL R9, R9 + CMPL SI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBlockAsm8B matchlen_loopback_match_nolit_encodeSnappyBlockAsm8B: - MOVQ (R8)(R10*1), R9 - XORQ (SI)(R10*1), R9 - TESTQ R9, R9 + MOVQ (DI)(R9*1), R8 + XORQ (BX)(R9*1), R8 + TESTQ R8, R8 JZ matchlen_loop_match_nolit_encodeSnappyBlockAsm8B #ifdef GOAMD64_v3 - TZCNTQ R9, R9 + TZCNTQ R8, R8 #else - BSFQ R9, R9 + BSFQ R8, R8 #endif - SARQ $0x03, R9 - LEAL (R10)(R9*1), R10 + SARQ $0x03, R8 + LEAL (R9)(R8*1), R9 JMP match_nolit_end_encodeSnappyBlockAsm8B matchlen_loop_match_nolit_encodeSnappyBlockAsm8B: - LEAL -8(DI), DI - LEAL 8(R10), R10 - CMPL DI, $0x08 + LEAL -8(SI), SI + LEAL 8(R9), R9 + CMPL SI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBlockAsm8B JZ match_nolit_end_encodeSnappyBlockAsm8B matchlen_match4_match_nolit_encodeSnappyBlockAsm8B: - CMPL DI, $0x04 + CMPL SI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBlockAsm8B - MOVL (R8)(R10*1), R9 - CMPL (SI)(R10*1), R9 + MOVL (DI)(R9*1), R8 + CMPL (BX)(R9*1), R8 JNE matchlen_match2_match_nolit_encodeSnappyBlockAsm8B - SUBL $0x04, DI - LEAL 4(R10), R10 + SUBL $0x04, SI + LEAL 4(R9), R9 matchlen_match2_match_nolit_encodeSnappyBlockAsm8B: - CMPL DI, $0x02 + CMPL SI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBlockAsm8B - MOVW (R8)(R10*1), R9 - CMPW (SI)(R10*1), R9 + MOVW (DI)(R9*1), R8 + CMPW (BX)(R9*1), R8 JNE matchlen_match1_match_nolit_encodeSnappyBlockAsm8B - SUBL $0x02, DI - LEAL 2(R10), R10 + SUBL $0x02, SI + LEAL 2(R9), R9 matchlen_match1_match_nolit_encodeSnappyBlockAsm8B: - CMPL DI, $0x01 + CMPL SI, $0x01 JL match_nolit_end_encodeSnappyBlockAsm8B - MOVB (R8)(R10*1), R9 - CMPB (SI)(R10*1), R9 + MOVB (DI)(R9*1), R8 + CMPB (BX)(R9*1), R8 JNE match_nolit_end_encodeSnappyBlockAsm8B - LEAL 1(R10), R10 + LEAL 1(R9), R9 match_nolit_end_encodeSnappyBlockAsm8B: - ADDL R10, CX - MOVL 16(SP), SI - ADDL $0x04, R10 + ADDL R9, CX + MOVL 16(SP), BX + ADDL $0x04, R9 MOVL CX, 12(SP) // emitCopy two_byte_offset_match_nolit_encodeSnappyBlockAsm8B: - CMPL R10, $0x40 + CMPL R9, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBlockAsm8B MOVB $0xee, (AX) - MOVW SI, 1(AX) - LEAL -60(R10), R10 + MOVW BX, 1(AX) + LEAL -60(R9), R9 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBlockAsm8B two_byte_offset_short_match_nolit_encodeSnappyBlockAsm8B: - CMPL R10, $0x0c + MOVL R9, SI + SHLL $0x02, SI + CMPL R9, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBlockAsm8B - MOVB $0x01, BL - LEAL -16(BX)(R10*4), R10 - MOVB SI, 1(AX) - SHRL $0x08, SI - SHLL $0x05, SI - ORL SI, R10 - MOVB R10, (AX) + LEAL -15(SI), SI + MOVB BL, 1(AX) + SHRL $0x08, BX + SHLL $0x05, BX + ORL BX, SI + MOVB SI, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBlockAsm8B emit_copy_three_match_nolit_encodeSnappyBlockAsm8B: - MOVB $0x02, BL - LEAL -4(BX)(R10*4), R10 - MOVB R10, (AX) - MOVW SI, 1(AX) + LEAL -2(SI), SI + MOVB SI, (AX) + MOVW BX, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBlockAsm8B: CMPL CX, 8(SP) JGE emit_remainder_encodeSnappyBlockAsm8B - MOVQ -2(DX)(CX*1), DI + MOVQ -2(DX)(CX*1), SI CMPQ AX, (SP) JL match_nolit_dst_ok_encodeSnappyBlockAsm8B MOVQ $0x00000000, ret+48(FP) RET match_nolit_dst_ok_encodeSnappyBlockAsm8B: - MOVQ $0x9e3779b1, R9 - MOVQ DI, R8 - SHRQ $0x10, DI - MOVQ DI, SI - SHLQ $0x20, R8 - IMULQ R9, R8 - SHRQ $0x38, R8 - SHLQ $0x20, SI - IMULQ R9, SI - SHRQ $0x38, SI - LEAL -2(CX), R9 - LEAQ 24(SP)(SI*4), R10 - MOVL (R10), SI - MOVL R9, 24(SP)(R8*4) - MOVL CX, (R10) - CMPL (DX)(SI*1), DI + MOVQ $0x9e3779b1, R8 + MOVQ SI, DI + SHRQ $0x10, SI + MOVQ SI, BX + SHLQ $0x20, DI + IMULQ R8, DI + SHRQ $0x38, DI + SHLQ $0x20, BX + IMULQ R8, BX + SHRQ $0x38, BX + LEAL -2(CX), R8 + LEAQ 24(SP)(BX*4), R9 + MOVL (R9), BX + MOVL R8, 24(SP)(DI*4) + MOVL CX, (R9) + CMPL (DX)(BX*1), SI JEQ match_nolit_loop_encodeSnappyBlockAsm8B INCL CX JMP search_loop_encodeSnappyBlockAsm8B @@ -14342,8 +14289,8 @@ zero_loop_encodeSnappyBetterBlockAsm: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -14353,369 +14300,369 @@ zero_loop_encodeSnappyBetterBlockAsm: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBetterBlockAsm: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x07, SI - CMPL SI, $0x63 + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x07, BX + CMPL BX, $0x63 JLE check_maxskip_ok_encodeSnappyBetterBlockAsm - LEAL 100(CX), SI + LEAL 100(CX), BX JMP check_maxskip_cont_encodeSnappyBetterBlockAsm check_maxskip_ok_encodeSnappyBetterBlockAsm: - LEAL 1(CX)(SI*1), SI + LEAL 1(CX)(BX*1), BX check_maxskip_cont_encodeSnappyBetterBlockAsm: - CMPL SI, 8(SP) + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBetterBlockAsm - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x00cf1bbcdcbfa563, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x08, R10 - IMULQ R9, R10 - SHRQ $0x2f, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x32, R11 - MOVL 24(SP)(R10*4), SI - MOVL 524312(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 524312(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x00cf1bbcdcbfa563, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x08, R9 + IMULQ R8, R9 + SHRQ $0x2f, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x32, R10 + MOVL 24(SP)(R9*4), BX + MOVL 524312(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 524312(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeSnappyBetterBlockAsm - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeSnappyBetterBlockAsm no_short_found_encodeSnappyBetterBlockAsm: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeSnappyBetterBlockAsm MOVL 20(SP), CX JMP search_loop_encodeSnappyBetterBlockAsm candidateS_match_encodeSnappyBetterBlockAsm: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x08, R10 - IMULQ R9, R10 - SHRQ $0x2f, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x08, R9 + IMULQ R8, R9 + SHRQ $0x2f, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBetterBlockAsm DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBetterBlockAsm: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm match_extend_back_loop_encodeSnappyBetterBlockAsm: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBetterBlockAsm - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBetterBlockAsm LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm JMP match_extend_back_loop_encodeSnappyBetterBlockAsm match_extend_back_end_encodeSnappyBetterBlockAsm: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 5(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 5(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBetterBlockAsm MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBetterBlockAsm: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeSnappyBetterBlockAsm matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm JZ match_nolit_end_encodeSnappyBetterBlockAsm matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeSnappyBetterBlockAsm - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeSnappyBetterBlockAsm - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeSnappyBetterBlockAsm: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - CMPL R12, $0x01 + CMPL R11, $0x01 JG match_length_ok_encodeSnappyBetterBlockAsm - CMPL R8, $0x0000ffff + CMPL DI, $0x0000ffff JLE match_length_ok_encodeSnappyBetterBlockAsm MOVL 20(SP), CX INCL CX JMP search_loop_encodeSnappyBetterBlockAsm match_length_ok_encodeSnappyBetterBlockAsm: - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeSnappyBetterBlockAsm - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeSnappyBetterBlockAsm - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBetterBlockAsm - CMPL SI, $0x00010000 + CMPL BX, $0x00010000 JLT three_bytes_match_emit_encodeSnappyBetterBlockAsm - CMPL SI, $0x01000000 + CMPL BX, $0x01000000 JLT four_bytes_match_emit_encodeSnappyBetterBlockAsm MOVB $0xfc, (AX) - MOVL SI, 1(AX) + MOVL BX, 1(AX) ADDQ $0x05, AX JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm four_bytes_match_emit_encodeSnappyBetterBlockAsm: - MOVL SI, R11 - SHRL $0x10, R11 + MOVL BX, R10 + SHRL $0x10, R10 MOVB $0xf8, (AX) - MOVW SI, 1(AX) - MOVB R11, 3(AX) + MOVW BX, 1(AX) + MOVB R10, 3(AX) ADDQ $0x04, AX JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm three_bytes_match_emit_encodeSnappyBetterBlockAsm: MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm two_bytes_match_emit_encodeSnappyBetterBlockAsm: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeSnappyBetterBlockAsm JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm one_byte_match_emit_encodeSnappyBetterBlockAsm: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBetterBlockAsm: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeSnappyBetterBlockAsm memmove_long_match_emit_encodeSnappyBetterBlockAsm: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsmlarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsmlarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsmlarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsmlarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsmlarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeSnappyBetterBlockAsm: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy - CMPL R8, $0x00010000 + CMPL DI, $0x00010000 JL two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm four_bytes_loop_back_match_nolit_encodeSnappyBetterBlockAsm: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE four_bytes_remain_match_nolit_encodeSnappyBetterBlockAsm MOVB $0xff, (AX) - MOVL R8, 1(AX) - LEAL -64(R12), R12 + MOVL DI, 1(AX) + LEAL -64(R11), R11 ADDQ $0x05, AX - CMPL R12, $0x04 + CMPL R11, $0x04 JL four_bytes_remain_match_nolit_encodeSnappyBetterBlockAsm JMP four_bytes_loop_back_match_nolit_encodeSnappyBetterBlockAsm four_bytes_remain_match_nolit_encodeSnappyBetterBlockAsm: - TESTL R12, R12 + TESTL R11, R11 JZ match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm - MOVB $0x03, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVL R8, 1(AX) + XORL BX, BX + LEAL -1(BX)(R11*4), R11 + MOVB R11, (AX) + MOVL DI, 1(AX) ADDQ $0x05, AX JMP match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm: @@ -14727,50 +14674,50 @@ match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm: RET match_nolit_dst_ok_encodeSnappyBetterBlockAsm: - MOVQ $0x00cf1bbcdcbfa563, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x08, R10 - IMULQ SI, R10 - SHRQ $0x2f, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x32, R11 - SHLQ $0x08, R12 - IMULQ SI, R12 - SHRQ $0x2f, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x32, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 524312(SP)(R11*4) - MOVL R14, 524312(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x00cf1bbcdcbfa563, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x08, R9 + IMULQ BX, R9 + SHRQ $0x2f, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x32, R10 + SHLQ $0x08, R11 + IMULQ BX, R11 + SHRQ $0x2f, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x32, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 524312(SP)(R10*4) + MOVL R13, 524312(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeSnappyBetterBlockAsm: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeSnappyBetterBlockAsm - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x08, R8 - IMULQ SI, R8 - SHRQ $0x2f, R8 - SHLQ $0x08, R10 - IMULQ SI, R10 - SHRQ $0x2f, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x08, DI + IMULQ BX, DI + SHRQ $0x2f, DI + SHLQ $0x08, R9 + IMULQ BX, R9 + SHRQ $0x2f, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeSnappyBetterBlockAsm emit_remainder_encodeSnappyBetterBlockAsm: @@ -14972,8 +14919,8 @@ zero_loop_encodeSnappyBetterBlockAsm64K: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -14983,309 +14930,309 @@ zero_loop_encodeSnappyBetterBlockAsm64K: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBetterBlockAsm64K: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x07, SI - LEAL 1(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x07, BX + LEAL 1(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBetterBlockAsm64K - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x00cf1bbcdcbfa563, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x08, R10 - IMULQ R9, R10 - SHRQ $0x30, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x32, R11 - MOVL 24(SP)(R10*4), SI - MOVL 262168(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 262168(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x00cf1bbcdcbfa563, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x08, R9 + IMULQ R8, R9 + SHRQ $0x30, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x32, R10 + MOVL 24(SP)(R9*4), BX + MOVL 262168(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 262168(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm64K - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeSnappyBetterBlockAsm64K - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeSnappyBetterBlockAsm64K no_short_found_encodeSnappyBetterBlockAsm64K: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm64K - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeSnappyBetterBlockAsm64K MOVL 20(SP), CX JMP search_loop_encodeSnappyBetterBlockAsm64K candidateS_match_encodeSnappyBetterBlockAsm64K: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x08, R10 - IMULQ R9, R10 - SHRQ $0x30, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x08, R9 + IMULQ R8, R9 + SHRQ $0x30, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBetterBlockAsm64K DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBetterBlockAsm64K: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm64K match_extend_back_loop_encodeSnappyBetterBlockAsm64K: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBetterBlockAsm64K - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBetterBlockAsm64K LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm64K JMP match_extend_back_loop_encodeSnappyBetterBlockAsm64K match_extend_back_end_encodeSnappyBetterBlockAsm64K: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBetterBlockAsm64K MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBetterBlockAsm64K: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm64K matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm64K: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm64K #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeSnappyBetterBlockAsm64K matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm64K: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm64K JZ match_nolit_end_encodeSnappyBetterBlockAsm64K matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm64K: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm64K - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm64K - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm64K: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm64K - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm64K - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm64K: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeSnappyBetterBlockAsm64K - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeSnappyBetterBlockAsm64K - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeSnappyBetterBlockAsm64K: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeSnappyBetterBlockAsm64K - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeSnappyBetterBlockAsm64K - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBetterBlockAsm64K MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm64K two_bytes_match_emit_encodeSnappyBetterBlockAsm64K: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeSnappyBetterBlockAsm64K JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm64K one_byte_match_emit_encodeSnappyBetterBlockAsm64K: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBetterBlockAsm64K: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm64K_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm64K_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm64K_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm64K_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm64K_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm64K emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm64K_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm64K emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm64K_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm64K emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm64K_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm64K: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeSnappyBetterBlockAsm64K memmove_long_match_emit_encodeSnappyBetterBlockAsm64K: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm64Klarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm64Klarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm64Klarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm64Klarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm64Klarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeSnappyBetterBlockAsm64K: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm64K: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm64K MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm64K two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm64K: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm64K - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm64K - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm64K emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm64K: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm64K: @@ -15297,50 +15244,50 @@ match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm64K: RET match_nolit_dst_ok_encodeSnappyBetterBlockAsm64K: - MOVQ $0x00cf1bbcdcbfa563, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x08, R10 - IMULQ SI, R10 - SHRQ $0x30, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x32, R11 - SHLQ $0x08, R12 - IMULQ SI, R12 - SHRQ $0x30, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x32, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 262168(SP)(R11*4) - MOVL R14, 262168(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x00cf1bbcdcbfa563, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x08, R9 + IMULQ BX, R9 + SHRQ $0x30, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x32, R10 + SHLQ $0x08, R11 + IMULQ BX, R11 + SHRQ $0x30, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x32, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 262168(SP)(R10*4) + MOVL R13, 262168(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeSnappyBetterBlockAsm64K: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeSnappyBetterBlockAsm64K - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x08, R8 - IMULQ SI, R8 - SHRQ $0x30, R8 - SHLQ $0x08, R10 - IMULQ SI, R10 - SHRQ $0x30, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x08, DI + IMULQ BX, DI + SHRQ $0x30, DI + SHLQ $0x08, R9 + IMULQ BX, R9 + SHRQ $0x30, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeSnappyBetterBlockAsm64K emit_remainder_encodeSnappyBetterBlockAsm64K: @@ -15523,8 +15470,8 @@ zero_loop_encodeSnappyBetterBlockAsm12B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -15534,309 +15481,309 @@ zero_loop_encodeSnappyBetterBlockAsm12B: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBetterBlockAsm12B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x06, SI - LEAL 1(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x06, BX + LEAL 1(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBetterBlockAsm12B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x32, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x34, R11 - MOVL 24(SP)(R10*4), SI - MOVL 65560(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 65560(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x34, R10 + MOVL 24(SP)(R9*4), BX + MOVL 65560(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 65560(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm12B - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeSnappyBetterBlockAsm12B - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeSnappyBetterBlockAsm12B no_short_found_encodeSnappyBetterBlockAsm12B: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm12B - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeSnappyBetterBlockAsm12B MOVL 20(SP), CX JMP search_loop_encodeSnappyBetterBlockAsm12B candidateS_match_encodeSnappyBetterBlockAsm12B: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x32, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x32, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBetterBlockAsm12B DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBetterBlockAsm12B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm12B match_extend_back_loop_encodeSnappyBetterBlockAsm12B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBetterBlockAsm12B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBetterBlockAsm12B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm12B JMP match_extend_back_loop_encodeSnappyBetterBlockAsm12B match_extend_back_end_encodeSnappyBetterBlockAsm12B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBetterBlockAsm12B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBetterBlockAsm12B: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm12B matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm12B: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm12B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeSnappyBetterBlockAsm12B matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm12B: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm12B JZ match_nolit_end_encodeSnappyBetterBlockAsm12B matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm12B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm12B - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm12B - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm12B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm12B - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm12B - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm12B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeSnappyBetterBlockAsm12B - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeSnappyBetterBlockAsm12B - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeSnappyBetterBlockAsm12B: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeSnappyBetterBlockAsm12B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeSnappyBetterBlockAsm12B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBetterBlockAsm12B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm12B two_bytes_match_emit_encodeSnappyBetterBlockAsm12B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeSnappyBetterBlockAsm12B JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm12B one_byte_match_emit_encodeSnappyBetterBlockAsm12B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBetterBlockAsm12B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm12B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm12B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm12B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm12B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm12B_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm12B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm12B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm12B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm12B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm12B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm12B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm12B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeSnappyBetterBlockAsm12B memmove_long_match_emit_encodeSnappyBetterBlockAsm12B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm12Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm12Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm12Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm12Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm12Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeSnappyBetterBlockAsm12B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm12B: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm12B MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm12B two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm12B: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm12B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm12B - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm12B emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm12B: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm12B: @@ -15848,50 +15795,50 @@ match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm12B: RET match_nolit_dst_ok_encodeSnappyBetterBlockAsm12B: - MOVQ $0x0000cf1bbcdcbf9b, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x32, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x34, R11 - SHLQ $0x10, R12 - IMULQ SI, R12 - SHRQ $0x32, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x34, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 65560(SP)(R11*4) - MOVL R14, 65560(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x0000cf1bbcdcbf9b, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x32, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x34, R10 + SHLQ $0x10, R11 + IMULQ BX, R11 + SHRQ $0x32, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x34, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 65560(SP)(R10*4) + MOVL R13, 65560(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeSnappyBetterBlockAsm12B: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeSnappyBetterBlockAsm12B - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x10, R8 - IMULQ SI, R8 - SHRQ $0x32, R8 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x32, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x10, DI + IMULQ BX, DI + SHRQ $0x32, DI + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x32, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeSnappyBetterBlockAsm12B emit_remainder_encodeSnappyBetterBlockAsm12B: @@ -16074,8 +16021,8 @@ zero_loop_encodeSnappyBetterBlockAsm10B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -16085,309 +16032,309 @@ zero_loop_encodeSnappyBetterBlockAsm10B: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBetterBlockAsm10B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x05, SI - LEAL 1(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x05, BX + LEAL 1(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBetterBlockAsm10B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x34, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x36, R11 - MOVL 24(SP)(R10*4), SI - MOVL 16408(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 16408(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x34, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x36, R10 + MOVL 24(SP)(R9*4), BX + MOVL 16408(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 16408(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm10B - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeSnappyBetterBlockAsm10B - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeSnappyBetterBlockAsm10B no_short_found_encodeSnappyBetterBlockAsm10B: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm10B - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeSnappyBetterBlockAsm10B MOVL 20(SP), CX JMP search_loop_encodeSnappyBetterBlockAsm10B candidateS_match_encodeSnappyBetterBlockAsm10B: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x34, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x34, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBetterBlockAsm10B DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBetterBlockAsm10B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm10B match_extend_back_loop_encodeSnappyBetterBlockAsm10B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBetterBlockAsm10B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBetterBlockAsm10B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm10B JMP match_extend_back_loop_encodeSnappyBetterBlockAsm10B match_extend_back_end_encodeSnappyBetterBlockAsm10B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBetterBlockAsm10B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBetterBlockAsm10B: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm10B matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm10B: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm10B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeSnappyBetterBlockAsm10B matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm10B: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm10B JZ match_nolit_end_encodeSnappyBetterBlockAsm10B matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm10B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm10B - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm10B - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm10B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm10B - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm10B - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm10B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeSnappyBetterBlockAsm10B - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeSnappyBetterBlockAsm10B - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeSnappyBetterBlockAsm10B: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeSnappyBetterBlockAsm10B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeSnappyBetterBlockAsm10B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBetterBlockAsm10B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm10B two_bytes_match_emit_encodeSnappyBetterBlockAsm10B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeSnappyBetterBlockAsm10B JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm10B one_byte_match_emit_encodeSnappyBetterBlockAsm10B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBetterBlockAsm10B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm10B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm10B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm10B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm10B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm10B_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm10B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm10B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm10B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm10B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm10B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm10B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm10B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeSnappyBetterBlockAsm10B memmove_long_match_emit_encodeSnappyBetterBlockAsm10B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm10Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm10Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm10Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm10Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm10Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeSnappyBetterBlockAsm10B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm10B: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm10B MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm10B two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm10B: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm10B - CMPL R8, $0x00000800 + CMPL DI, $0x00000800 JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm10B - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm10B emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm10B: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm10B: @@ -16399,50 +16346,50 @@ match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm10B: RET match_nolit_dst_ok_encodeSnappyBetterBlockAsm10B: - MOVQ $0x0000cf1bbcdcbf9b, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x34, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x36, R11 - SHLQ $0x10, R12 - IMULQ SI, R12 - SHRQ $0x34, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x36, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 16408(SP)(R11*4) - MOVL R14, 16408(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x0000cf1bbcdcbf9b, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x34, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x36, R10 + SHLQ $0x10, R11 + IMULQ BX, R11 + SHRQ $0x34, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x36, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 16408(SP)(R10*4) + MOVL R13, 16408(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeSnappyBetterBlockAsm10B: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeSnappyBetterBlockAsm10B - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x10, R8 - IMULQ SI, R8 - SHRQ $0x34, R8 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x34, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x10, DI + IMULQ BX, DI + SHRQ $0x34, DI + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x34, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeSnappyBetterBlockAsm10B emit_remainder_encodeSnappyBetterBlockAsm10B: @@ -16625,8 +16572,8 @@ zero_loop_encodeSnappyBetterBlockAsm8B: MOVL $0x00000000, 12(SP) MOVQ src_len+32(FP), CX LEAQ -9(CX), DX - LEAQ -8(CX), SI - MOVL SI, 8(SP) + LEAQ -8(CX), BX + MOVL BX, 8(SP) SHRQ $0x05, CX SUBL CX, DX LEAQ (AX)(DX*1), DX @@ -16636,307 +16583,307 @@ zero_loop_encodeSnappyBetterBlockAsm8B: MOVQ src_base+24(FP), DX search_loop_encodeSnappyBetterBlockAsm8B: - MOVL CX, SI - SUBL 12(SP), SI - SHRL $0x04, SI - LEAL 1(CX)(SI*1), SI - CMPL SI, 8(SP) + MOVL CX, BX + SUBL 12(SP), BX + SHRL $0x04, BX + LEAL 1(CX)(BX*1), BX + CMPL BX, 8(SP) JGE emit_remainder_encodeSnappyBetterBlockAsm8B - MOVQ (DX)(CX*1), DI - MOVL SI, 20(SP) - MOVQ $0x0000cf1bbcdcbf9b, R9 - MOVQ $0x9e3779b1, SI - MOVQ DI, R10 - MOVQ DI, R11 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x36, R10 - SHLQ $0x20, R11 - IMULQ SI, R11 - SHRQ $0x38, R11 - MOVL 24(SP)(R10*4), SI - MOVL 4120(SP)(R11*4), R8 - MOVL CX, 24(SP)(R10*4) - MOVL CX, 4120(SP)(R11*4) - MOVQ (DX)(SI*1), R10 - MOVQ (DX)(R8*1), R11 - CMPQ R10, DI + MOVQ (DX)(CX*1), SI + MOVL BX, 20(SP) + MOVQ $0x0000cf1bbcdcbf9b, R8 + MOVQ $0x9e3779b1, BX + MOVQ SI, R9 + MOVQ SI, R10 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x36, R9 + SHLQ $0x20, R10 + IMULQ BX, R10 + SHRQ $0x38, R10 + MOVL 24(SP)(R9*4), BX + MOVL 4120(SP)(R10*4), DI + MOVL CX, 24(SP)(R9*4) + MOVL CX, 4120(SP)(R10*4) + MOVQ (DX)(BX*1), R9 + MOVQ (DX)(DI*1), R10 + CMPQ R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm8B - CMPQ R11, DI + CMPQ R10, SI JNE no_short_found_encodeSnappyBetterBlockAsm8B - MOVL R8, SI + MOVL DI, BX JMP candidate_match_encodeSnappyBetterBlockAsm8B no_short_found_encodeSnappyBetterBlockAsm8B: - CMPL R10, DI + CMPL R9, SI JEQ candidate_match_encodeSnappyBetterBlockAsm8B - CMPL R11, DI + CMPL R10, SI JEQ candidateS_match_encodeSnappyBetterBlockAsm8B MOVL 20(SP), CX JMP search_loop_encodeSnappyBetterBlockAsm8B candidateS_match_encodeSnappyBetterBlockAsm8B: - SHRQ $0x08, DI - MOVQ DI, R10 - SHLQ $0x10, R10 - IMULQ R9, R10 - SHRQ $0x36, R10 - MOVL 24(SP)(R10*4), SI + SHRQ $0x08, SI + MOVQ SI, R9 + SHLQ $0x10, R9 + IMULQ R8, R9 + SHRQ $0x36, R9 + MOVL 24(SP)(R9*4), BX INCL CX - MOVL CX, 24(SP)(R10*4) - CMPL (DX)(SI*1), DI + MOVL CX, 24(SP)(R9*4) + CMPL (DX)(BX*1), SI JEQ candidate_match_encodeSnappyBetterBlockAsm8B DECL CX - MOVL R8, SI + MOVL DI, BX candidate_match_encodeSnappyBetterBlockAsm8B: - MOVL 12(SP), DI - TESTL SI, SI + MOVL 12(SP), SI + TESTL BX, BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm8B match_extend_back_loop_encodeSnappyBetterBlockAsm8B: - CMPL CX, DI + CMPL CX, SI JLE match_extend_back_end_encodeSnappyBetterBlockAsm8B - MOVB -1(DX)(SI*1), BL + MOVB -1(DX)(BX*1), DI MOVB -1(DX)(CX*1), R8 - CMPB BL, R8 + CMPB DI, R8 JNE match_extend_back_end_encodeSnappyBetterBlockAsm8B LEAL -1(CX), CX - DECL SI + DECL BX JZ match_extend_back_end_encodeSnappyBetterBlockAsm8B JMP match_extend_back_loop_encodeSnappyBetterBlockAsm8B match_extend_back_end_encodeSnappyBetterBlockAsm8B: - MOVL CX, DI - SUBL 12(SP), DI - LEAQ 3(AX)(DI*1), DI - CMPQ DI, (SP) + MOVL CX, SI + SUBL 12(SP), SI + LEAQ 3(AX)(SI*1), SI + CMPQ SI, (SP) JL match_dst_size_check_encodeSnappyBetterBlockAsm8B MOVQ $0x00000000, ret+48(FP) RET match_dst_size_check_encodeSnappyBetterBlockAsm8B: - MOVL CX, DI + MOVL CX, SI ADDL $0x04, CX - ADDL $0x04, SI - MOVQ src_len+32(FP), R8 - SUBL CX, R8 - LEAQ (DX)(CX*1), R9 - LEAQ (DX)(SI*1), R10 + ADDL $0x04, BX + MOVQ src_len+32(FP), DI + SUBL CX, DI + LEAQ (DX)(CX*1), R8 + LEAQ (DX)(BX*1), R9 // matchLen - XORL R12, R12 - CMPL R8, $0x08 + XORL R11, R11 + CMPL DI, $0x08 JL matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm8B matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm8B: - MOVQ (R9)(R12*1), R11 - XORQ (R10)(R12*1), R11 - TESTQ R11, R11 + MOVQ (R8)(R11*1), R10 + XORQ (R9)(R11*1), R10 + TESTQ R10, R10 JZ matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm8B #ifdef GOAMD64_v3 - TZCNTQ R11, R11 + TZCNTQ R10, R10 #else - BSFQ R11, R11 + BSFQ R10, R10 #endif - SARQ $0x03, R11 - LEAL (R12)(R11*1), R12 + SARQ $0x03, R10 + LEAL (R11)(R10*1), R11 JMP match_nolit_end_encodeSnappyBetterBlockAsm8B matchlen_loop_match_nolit_encodeSnappyBetterBlockAsm8B: - LEAL -8(R8), R8 - LEAL 8(R12), R12 - CMPL R8, $0x08 + LEAL -8(DI), DI + LEAL 8(R11), R11 + CMPL DI, $0x08 JGE matchlen_loopback_match_nolit_encodeSnappyBetterBlockAsm8B JZ match_nolit_end_encodeSnappyBetterBlockAsm8B matchlen_match4_match_nolit_encodeSnappyBetterBlockAsm8B: - CMPL R8, $0x04 + CMPL DI, $0x04 JL matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm8B - MOVL (R9)(R12*1), R11 - CMPL (R10)(R12*1), R11 + MOVL (R8)(R11*1), R10 + CMPL (R9)(R11*1), R10 JNE matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm8B - SUBL $0x04, R8 - LEAL 4(R12), R12 + SUBL $0x04, DI + LEAL 4(R11), R11 matchlen_match2_match_nolit_encodeSnappyBetterBlockAsm8B: - CMPL R8, $0x02 + CMPL DI, $0x02 JL matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm8B - MOVW (R9)(R12*1), R11 - CMPW (R10)(R12*1), R11 + MOVW (R8)(R11*1), R10 + CMPW (R9)(R11*1), R10 JNE matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm8B - SUBL $0x02, R8 - LEAL 2(R12), R12 + SUBL $0x02, DI + LEAL 2(R11), R11 matchlen_match1_match_nolit_encodeSnappyBetterBlockAsm8B: - CMPL R8, $0x01 + CMPL DI, $0x01 JL match_nolit_end_encodeSnappyBetterBlockAsm8B - MOVB (R9)(R12*1), R11 - CMPB (R10)(R12*1), R11 + MOVB (R8)(R11*1), R10 + CMPB (R9)(R11*1), R10 JNE match_nolit_end_encodeSnappyBetterBlockAsm8B - LEAL 1(R12), R12 + LEAL 1(R11), R11 match_nolit_end_encodeSnappyBetterBlockAsm8B: - MOVL CX, R8 - SUBL SI, R8 + MOVL CX, DI + SUBL BX, DI // Check if repeat - MOVL R8, 16(SP) - MOVL 12(SP), SI - CMPL SI, DI + MOVL DI, 16(SP) + MOVL 12(SP), BX + CMPL BX, SI JEQ emit_literal_done_match_emit_encodeSnappyBetterBlockAsm8B - MOVL DI, R9 - MOVL DI, 12(SP) - LEAQ (DX)(SI*1), R10 - SUBL SI, R9 - LEAL -1(R9), SI - CMPL SI, $0x3c + MOVL SI, R8 + MOVL SI, 12(SP) + LEAQ (DX)(BX*1), R9 + SUBL BX, R8 + LEAL -1(R8), BX + CMPL BX, $0x3c JLT one_byte_match_emit_encodeSnappyBetterBlockAsm8B - CMPL SI, $0x00000100 + CMPL BX, $0x00000100 JLT two_bytes_match_emit_encodeSnappyBetterBlockAsm8B MOVB $0xf4, (AX) - MOVW SI, 1(AX) + MOVW BX, 1(AX) ADDQ $0x03, AX JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm8B two_bytes_match_emit_encodeSnappyBetterBlockAsm8B: MOVB $0xf0, (AX) - MOVB SI, 1(AX) + MOVB BL, 1(AX) ADDQ $0x02, AX - CMPL SI, $0x40 + CMPL BX, $0x40 JL memmove_match_emit_encodeSnappyBetterBlockAsm8B JMP memmove_long_match_emit_encodeSnappyBetterBlockAsm8B one_byte_match_emit_encodeSnappyBetterBlockAsm8B: - SHLB $0x02, SI - MOVB SI, (AX) + SHLB $0x02, BL + MOVB BL, (AX) ADDQ $0x01, AX memmove_match_emit_encodeSnappyBetterBlockAsm8B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveShort - CMPQ R9, $0x08 + CMPQ R8, $0x08 JLE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm8B_memmove_move_8 - CMPQ R9, $0x10 + CMPQ R8, $0x10 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm8B_memmove_move_8through16 - CMPQ R9, $0x20 + CMPQ R8, $0x20 JBE emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm8B_memmove_move_17through32 JMP emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm8B_memmove_move_33through64 emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm8B_memmove_move_8: - MOVQ (R10), R11 - MOVQ R11, (AX) + MOVQ (R9), R10 + MOVQ R10, (AX) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm8B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm8B_memmove_move_8through16: - MOVQ (R10), R11 - MOVQ -8(R10)(R9*1), R10 - MOVQ R11, (AX) - MOVQ R10, -8(AX)(R9*1) + MOVQ (R9), R10 + MOVQ -8(R9)(R8*1), R9 + MOVQ R10, (AX) + MOVQ R9, -8(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm8B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm8B_memmove_move_17through32: - MOVOU (R10), X0 - MOVOU -16(R10)(R9*1), X1 + MOVOU (R9), X0 + MOVOU -16(R9)(R8*1), X1 MOVOU X0, (AX) - MOVOU X1, -16(AX)(R9*1) + MOVOU X1, -16(AX)(R8*1) JMP memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm8B emit_lit_memmove_match_emit_encodeSnappyBetterBlockAsm8B_memmove_move_33through64: - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) memmove_end_copy_match_emit_encodeSnappyBetterBlockAsm8B: - MOVQ SI, AX + MOVQ BX, AX JMP emit_literal_done_match_emit_encodeSnappyBetterBlockAsm8B memmove_long_match_emit_encodeSnappyBetterBlockAsm8B: - LEAQ (AX)(R9*1), SI + LEAQ (AX)(R8*1), BX // genMemMoveLong - MOVOU (R10), X0 - MOVOU 16(R10), X1 - MOVOU -32(R10)(R9*1), X2 - MOVOU -16(R10)(R9*1), X3 - MOVQ R9, R13 - SHRQ $0x05, R13 - MOVQ AX, R11 - ANDL $0x0000001f, R11 - MOVQ $0x00000040, R14 - SUBQ R11, R14 - DECQ R13 + MOVOU (R9), X0 + MOVOU 16(R9), X1 + MOVOU -32(R9)(R8*1), X2 + MOVOU -16(R9)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R10 + ANDL $0x0000001f, R10 + MOVQ $0x00000040, R13 + SUBQ R10, R13 + DECQ R12 JA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm8Blarge_forward_sse_loop_32 - LEAQ -32(R10)(R14*1), R11 - LEAQ -32(AX)(R14*1), R15 + LEAQ -32(R9)(R13*1), R10 + LEAQ -32(AX)(R13*1), R14 emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm8Blarge_big_loop_back: - MOVOU (R11), X4 - MOVOU 16(R11), X5 - MOVOA X4, (R15) - MOVOA X5, 16(R15) - ADDQ $0x20, R15 - ADDQ $0x20, R11 + MOVOU (R10), X4 + MOVOU 16(R10), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) ADDQ $0x20, R14 - DECQ R13 + ADDQ $0x20, R10 + ADDQ $0x20, R13 + DECQ R12 JNA emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm8Blarge_big_loop_back emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm8Blarge_forward_sse_loop_32: - MOVOU -32(R10)(R14*1), X4 - MOVOU -16(R10)(R14*1), X5 - MOVOA X4, -32(AX)(R14*1) - MOVOA X5, -16(AX)(R14*1) - ADDQ $0x20, R14 - CMPQ R9, R14 + MOVOU -32(R9)(R13*1), X4 + MOVOU -16(R9)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 JAE emit_lit_memmove_long_match_emit_encodeSnappyBetterBlockAsm8Blarge_forward_sse_loop_32 MOVOU X0, (AX) MOVOU X1, 16(AX) - MOVOU X2, -32(AX)(R9*1) - MOVOU X3, -16(AX)(R9*1) - MOVQ SI, AX + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ BX, AX emit_literal_done_match_emit_encodeSnappyBetterBlockAsm8B: - ADDL R12, CX - ADDL $0x04, R12 + ADDL R11, CX + ADDL $0x04, R11 MOVL CX, 12(SP) // emitCopy two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm8B: - CMPL R12, $0x40 + CMPL R11, $0x40 JLE two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm8B MOVB $0xee, (AX) - MOVW R8, 1(AX) - LEAL -60(R12), R12 + MOVW DI, 1(AX) + LEAL -60(R11), R11 ADDQ $0x03, AX JMP two_byte_offset_match_nolit_encodeSnappyBetterBlockAsm8B two_byte_offset_short_match_nolit_encodeSnappyBetterBlockAsm8B: - CMPL R12, $0x0c + MOVL R11, BX + SHLL $0x02, BX + CMPL R11, $0x0c JGE emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm8B - MOVB $0x01, BL - LEAL -16(BX)(R12*4), R12 - MOVB R8, 1(AX) - SHRL $0x08, R8 - SHLL $0x05, R8 - ORL R8, R12 - MOVB R12, (AX) + LEAL -15(BX), BX + MOVB DI, 1(AX) + SHRL $0x08, DI + SHLL $0x05, DI + ORL DI, BX + MOVB BL, (AX) ADDQ $0x02, AX JMP match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm8B emit_copy_three_match_nolit_encodeSnappyBetterBlockAsm8B: - MOVB $0x02, BL - LEAL -4(BX)(R12*4), R12 - MOVB R12, (AX) - MOVW R8, 1(AX) + LEAL -2(BX), BX + MOVB BL, (AX) + MOVW DI, 1(AX) ADDQ $0x03, AX match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm8B: @@ -16948,50 +16895,50 @@ match_nolit_emitcopy_end_encodeSnappyBetterBlockAsm8B: RET match_nolit_dst_ok_encodeSnappyBetterBlockAsm8B: - MOVQ $0x0000cf1bbcdcbf9b, SI - MOVQ $0x9e3779b1, R8 - LEAQ 1(DI), DI - LEAQ -2(CX), R9 - MOVQ (DX)(DI*1), R10 - MOVQ 1(DX)(DI*1), R11 - MOVQ (DX)(R9*1), R12 - MOVQ 1(DX)(R9*1), R13 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x36, R10 - SHLQ $0x20, R11 - IMULQ R8, R11 - SHRQ $0x38, R11 - SHLQ $0x10, R12 - IMULQ SI, R12 - SHRQ $0x36, R12 - SHLQ $0x20, R13 - IMULQ R8, R13 - SHRQ $0x38, R13 - LEAQ 1(DI), R8 - LEAQ 1(R9), R14 - MOVL DI, 24(SP)(R10*4) - MOVL R9, 24(SP)(R12*4) - MOVL R8, 4120(SP)(R11*4) - MOVL R14, 4120(SP)(R13*4) - ADDQ $0x01, DI - SUBQ $0x01, R9 + MOVQ $0x0000cf1bbcdcbf9b, BX + MOVQ $0x9e3779b1, DI + LEAQ 1(SI), SI + LEAQ -2(CX), R8 + MOVQ (DX)(SI*1), R9 + MOVQ 1(DX)(SI*1), R10 + MOVQ (DX)(R8*1), R11 + MOVQ 1(DX)(R8*1), R12 + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x36, R9 + SHLQ $0x20, R10 + IMULQ DI, R10 + SHRQ $0x38, R10 + SHLQ $0x10, R11 + IMULQ BX, R11 + SHRQ $0x36, R11 + SHLQ $0x20, R12 + IMULQ DI, R12 + SHRQ $0x38, R12 + LEAQ 1(SI), DI + LEAQ 1(R8), R13 + MOVL SI, 24(SP)(R9*4) + MOVL R8, 24(SP)(R11*4) + MOVL DI, 4120(SP)(R10*4) + MOVL R13, 4120(SP)(R12*4) + ADDQ $0x01, SI + SUBQ $0x01, R8 index_loop_encodeSnappyBetterBlockAsm8B: - CMPQ DI, R9 + CMPQ SI, R8 JAE search_loop_encodeSnappyBetterBlockAsm8B - MOVQ (DX)(DI*1), R8 - MOVQ (DX)(R9*1), R10 - SHLQ $0x10, R8 - IMULQ SI, R8 - SHRQ $0x36, R8 - SHLQ $0x10, R10 - IMULQ SI, R10 - SHRQ $0x36, R10 - MOVL DI, 24(SP)(R8*4) - MOVL R9, 24(SP)(R10*4) - ADDQ $0x02, DI - SUBQ $0x02, R9 + MOVQ (DX)(SI*1), DI + MOVQ (DX)(R8*1), R9 + SHLQ $0x10, DI + IMULQ BX, DI + SHRQ $0x36, DI + SHLQ $0x10, R9 + IMULQ BX, R9 + SHRQ $0x36, R9 + MOVL SI, 24(SP)(DI*4) + MOVL R8, 24(SP)(R9*4) + ADDQ $0x02, SI + SUBQ $0x02, R8 JMP index_loop_encodeSnappyBetterBlockAsm8B emit_remainder_encodeSnappyBetterBlockAsm8B: @@ -17343,8 +17290,7 @@ cant_repeat_two_offset_standalone: CMPL DX, $0x0100ffff JLT repeat_five_standalone LEAL -16842747(DX), DX - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX ADDQ $0x05, BX @@ -17410,8 +17356,6 @@ TEXT ·emitCopy(SB), NOSPLIT, $0-48 // emitCopy CMPL CX, $0x00010000 JL two_byte_offset_standalone - -four_bytes_loop_back_standalone: CMPL DX, $0x40 JLE four_bytes_remain_standalone MOVB $0xff, (AX) @@ -17441,8 +17385,7 @@ cant_repeat_two_offset_standalone_emit_copy: CMPL DX, $0x0100ffff JLT repeat_five_standalone_emit_copy LEAL -16842747(DX), DX - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX ADDQ $0x05, BX @@ -17494,13 +17437,12 @@ repeat_two_offset_standalone_emit_copy: ADDQ $0x02, BX ADDQ $0x02, AX JMP gen_emit_copy_end - JMP four_bytes_loop_back_standalone four_bytes_remain_standalone: TESTL DX, DX JZ gen_emit_copy_end - MOVB $0x03, SI - LEAL -4(SI)(DX*4), DX + XORL SI, SI + LEAL -1(SI)(DX*4), DX MOVB DL, (AX) MOVL CX, 1(AX) ADDQ $0x05, BX @@ -17546,8 +17488,7 @@ cant_repeat_two_offset_standalone_emit_copy_short_2b: CMPL DX, $0x0100ffff JLT repeat_five_standalone_emit_copy_short_2b LEAL -16842747(DX), DX - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX ADDQ $0x05, BX @@ -17626,8 +17567,7 @@ cant_repeat_two_offset_standalone_emit_copy_short: CMPL DX, $0x0100ffff JLT repeat_five_standalone_emit_copy_short LEAL -16842747(DX), DX - MOVW $0x001d, (AX) - MOVW $0xfffb, 2(AX) + MOVL $0xfffb001d, (AX) MOVB $0xff, 4(AX) ADDQ $0x05, AX ADDQ $0x05, BX @@ -17679,28 +17619,27 @@ repeat_two_offset_standalone_emit_copy_short: ADDQ $0x02, BX ADDQ $0x02, AX JMP gen_emit_copy_end - JMP two_byte_offset_standalone two_byte_offset_short_standalone: + MOVL DX, SI + SHLL $0x02, SI CMPL DX, $0x0c JGE emit_copy_three_standalone CMPL CX, $0x00000800 JGE emit_copy_three_standalone - MOVB $0x01, SI - LEAL -16(SI)(DX*4), DX + LEAL -15(SI), SI MOVB CL, 1(AX) SHRL $0x08, CX SHLL $0x05, CX - ORL CX, DX - MOVB DL, (AX) + ORL CX, SI + MOVB SI, (AX) ADDQ $0x02, BX ADDQ $0x02, AX JMP gen_emit_copy_end emit_copy_three_standalone: - MOVB $0x02, SI - LEAL -4(SI)(DX*4), DX - MOVB DL, (AX) + LEAL -2(SI), SI + MOVB SI, (AX) MOVW CX, 1(AX) ADDQ $0x03, BX ADDQ $0x03, AX @@ -17735,8 +17674,8 @@ four_bytes_loop_back_standalone_snappy: four_bytes_remain_standalone_snappy: TESTL DX, DX JZ gen_emit_copy_end_snappy - MOVB $0x03, SI - LEAL -4(SI)(DX*4), DX + XORL SI, SI + LEAL -1(SI)(DX*4), DX MOVB DL, (AX) MOVL CX, 1(AX) ADDQ $0x05, BX @@ -17754,25 +17693,25 @@ two_byte_offset_standalone_snappy: JMP two_byte_offset_standalone_snappy two_byte_offset_short_standalone_snappy: + MOVL DX, SI + SHLL $0x02, SI CMPL DX, $0x0c JGE emit_copy_three_standalone_snappy CMPL CX, $0x00000800 JGE emit_copy_three_standalone_snappy - MOVB $0x01, SI - LEAL -16(SI)(DX*4), DX + LEAL -15(SI), SI MOVB CL, 1(AX) SHRL $0x08, CX SHLL $0x05, CX - ORL CX, DX - MOVB DL, (AX) + ORL CX, SI + MOVB SI, (AX) ADDQ $0x02, BX ADDQ $0x02, AX JMP gen_emit_copy_end_snappy emit_copy_three_standalone_snappy: - MOVB $0x02, SI - LEAL -4(SI)(DX*4), DX - MOVB DL, (AX) + LEAL -2(SI), SI + MOVB SI, (AX) MOVW CX, 1(AX) ADDQ $0x03, BX ADDQ $0x03, AX @@ -17846,3 +17785,752 @@ matchlen_match1_standalone: gen_match_len_end: MOVQ SI, ret+48(FP) RET + +// func cvtLZ4BlockAsm(dst []byte, src []byte) (uncompressed int, dstUsed int) +// Requires: SSE2 +TEXT ·cvtLZ4BlockAsm(SB), NOSPLIT, $0-64 + XORQ SI, SI + MOVQ dst_base+0(FP), AX + MOVQ dst_len+8(FP), CX + MOVQ src_base+24(FP), DX + MOVQ src_len+32(FP), BX + LEAQ (DX)(BX*1), BX + LEAQ -10(AX)(CX*1), CX + XORQ DI, DI + +lz4_s2_loop: + CMPQ DX, BX + JAE lz4_s2_corrupt + CMPQ AX, CX + JAE lz4_s2_dstfull + MOVBQZX (DX), R8 + MOVQ R8, R9 + MOVQ R8, R10 + SHRQ $0x04, R9 + ANDQ $0x0f, R10 + CMPQ R8, $0xf0 + JB lz4_s2_ll_end + +lz4_s2_ll_loop: + INCQ DX + CMPQ DX, BX + JAE lz4_s2_corrupt + MOVBQZX (DX), R8 + ADDQ R8, R9 + CMPQ R8, $0xff + JEQ lz4_s2_ll_loop + +lz4_s2_ll_end: + LEAQ (DX)(R9*1), R8 + ADDQ $0x04, R10 + CMPQ R8, BX + JAE lz4_s2_corrupt + INCQ DX + INCQ R8 + TESTQ R9, R9 + JZ lz4_s2_lits_done + LEAQ (AX)(R9*1), R11 + CMPQ R11, CX + JAE lz4_s2_dstfull + ADDQ R9, SI + LEAL -1(R9), R11 + CMPL R11, $0x3c + JLT one_byte_lz4_s2 + CMPL R11, $0x00000100 + JLT two_bytes_lz4_s2 + CMPL R11, $0x00010000 + JLT three_bytes_lz4_s2 + CMPL R11, $0x01000000 + JLT four_bytes_lz4_s2 + MOVB $0xfc, (AX) + MOVL R11, 1(AX) + ADDQ $0x05, AX + JMP memmove_long_lz4_s2 + +four_bytes_lz4_s2: + MOVL R11, R12 + SHRL $0x10, R12 + MOVB $0xf8, (AX) + MOVW R11, 1(AX) + MOVB R12, 3(AX) + ADDQ $0x04, AX + JMP memmove_long_lz4_s2 + +three_bytes_lz4_s2: + MOVB $0xf4, (AX) + MOVW R11, 1(AX) + ADDQ $0x03, AX + JMP memmove_long_lz4_s2 + +two_bytes_lz4_s2: + MOVB $0xf0, (AX) + MOVB R11, 1(AX) + ADDQ $0x02, AX + CMPL R11, $0x40 + JL memmove_lz4_s2 + JMP memmove_long_lz4_s2 + +one_byte_lz4_s2: + SHLB $0x02, R11 + MOVB R11, (AX) + ADDQ $0x01, AX + +memmove_lz4_s2: + LEAQ (AX)(R9*1), R11 + + // genMemMoveShort + CMPQ R9, $0x08 + JLE emit_lit_memmove_lz4_s2_memmove_move_8 + CMPQ R9, $0x10 + JBE emit_lit_memmove_lz4_s2_memmove_move_8through16 + CMPQ R9, $0x20 + JBE emit_lit_memmove_lz4_s2_memmove_move_17through32 + JMP emit_lit_memmove_lz4_s2_memmove_move_33through64 + +emit_lit_memmove_lz4_s2_memmove_move_8: + MOVQ (DX), R12 + MOVQ R12, (AX) + JMP memmove_end_copy_lz4_s2 + +emit_lit_memmove_lz4_s2_memmove_move_8through16: + MOVQ (DX), R12 + MOVQ -8(DX)(R9*1), DX + MOVQ R12, (AX) + MOVQ DX, -8(AX)(R9*1) + JMP memmove_end_copy_lz4_s2 + +emit_lit_memmove_lz4_s2_memmove_move_17through32: + MOVOU (DX), X0 + MOVOU -16(DX)(R9*1), X1 + MOVOU X0, (AX) + MOVOU X1, -16(AX)(R9*1) + JMP memmove_end_copy_lz4_s2 + +emit_lit_memmove_lz4_s2_memmove_move_33through64: + MOVOU (DX), X0 + MOVOU 16(DX), X1 + MOVOU -32(DX)(R9*1), X2 + MOVOU -16(DX)(R9*1), X3 + MOVOU X0, (AX) + MOVOU X1, 16(AX) + MOVOU X2, -32(AX)(R9*1) + MOVOU X3, -16(AX)(R9*1) + +memmove_end_copy_lz4_s2: + MOVQ R11, AX + JMP lz4_s2_lits_emit_done + +memmove_long_lz4_s2: + LEAQ (AX)(R9*1), R11 + + // genMemMoveLong + MOVOU (DX), X0 + MOVOU 16(DX), X1 + MOVOU -32(DX)(R9*1), X2 + MOVOU -16(DX)(R9*1), X3 + MOVQ R9, R13 + SHRQ $0x05, R13 + MOVQ AX, R12 + ANDL $0x0000001f, R12 + MOVQ $0x00000040, R14 + SUBQ R12, R14 + DECQ R13 + JA emit_lit_memmove_long_lz4_s2large_forward_sse_loop_32 + LEAQ -32(DX)(R14*1), R12 + LEAQ -32(AX)(R14*1), R15 + +emit_lit_memmove_long_lz4_s2large_big_loop_back: + MOVOU (R12), X4 + MOVOU 16(R12), X5 + MOVOA X4, (R15) + MOVOA X5, 16(R15) + ADDQ $0x20, R15 + ADDQ $0x20, R12 + ADDQ $0x20, R14 + DECQ R13 + JNA emit_lit_memmove_long_lz4_s2large_big_loop_back + +emit_lit_memmove_long_lz4_s2large_forward_sse_loop_32: + MOVOU -32(DX)(R14*1), X4 + MOVOU -16(DX)(R14*1), X5 + MOVOA X4, -32(AX)(R14*1) + MOVOA X5, -16(AX)(R14*1) + ADDQ $0x20, R14 + CMPQ R9, R14 + JAE emit_lit_memmove_long_lz4_s2large_forward_sse_loop_32 + MOVOU X0, (AX) + MOVOU X1, 16(AX) + MOVOU X2, -32(AX)(R9*1) + MOVOU X3, -16(AX)(R9*1) + MOVQ R11, AX + +lz4_s2_lits_emit_done: + MOVQ R8, DX + +lz4_s2_lits_done: + CMPQ DX, BX + JNE lz4_s2_match + CMPQ R10, $0x04 + JEQ lz4_s2_done + JMP lz4_s2_corrupt + +lz4_s2_match: + LEAQ 2(DX), R8 + CMPQ R8, BX + JAE lz4_s2_corrupt + MOVWQZX (DX), R9 + MOVQ R8, DX + TESTQ R9, R9 + JZ lz4_s2_corrupt + CMPQ R9, SI + JA lz4_s2_corrupt + CMPQ R10, $0x13 + JNE lz4_s2_ml_done + +lz4_s2_ml_loop: + MOVBQZX (DX), R8 + INCQ DX + ADDQ R8, R10 + CMPQ DX, BX + JAE lz4_s2_corrupt + CMPQ R8, $0xff + JEQ lz4_s2_ml_loop + +lz4_s2_ml_done: + ADDQ R10, SI + CMPQ R9, DI + JNE lz4_s2_docopy + + // emitRepeat +emit_repeat_again_lz4_s2: + MOVL R10, R8 + LEAL -4(R10), R10 + CMPL R8, $0x08 + JLE repeat_two_lz4_s2 + CMPL R8, $0x0c + JGE cant_repeat_two_offset_lz4_s2 + CMPL R9, $0x00000800 + JLT repeat_two_offset_lz4_s2 + +cant_repeat_two_offset_lz4_s2: + CMPL R10, $0x00000104 + JLT repeat_three_lz4_s2 + CMPL R10, $0x00010100 + JLT repeat_four_lz4_s2 + CMPL R10, $0x0100ffff + JLT repeat_five_lz4_s2 + LEAL -16842747(R10), R10 + MOVL $0xfffb001d, (AX) + MOVB $0xff, 4(AX) + ADDQ $0x05, AX + JMP emit_repeat_again_lz4_s2 + +repeat_five_lz4_s2: + LEAL -65536(R10), R10 + MOVL R10, R9 + MOVW $0x001d, (AX) + MOVW R10, 2(AX) + SARL $0x10, R9 + MOVB R9, 4(AX) + ADDQ $0x05, AX + JMP lz4_s2_loop + +repeat_four_lz4_s2: + LEAL -256(R10), R10 + MOVW $0x0019, (AX) + MOVW R10, 2(AX) + ADDQ $0x04, AX + JMP lz4_s2_loop + +repeat_three_lz4_s2: + LEAL -4(R10), R10 + MOVW $0x0015, (AX) + MOVB R10, 2(AX) + ADDQ $0x03, AX + JMP lz4_s2_loop + +repeat_two_lz4_s2: + SHLL $0x02, R10 + ORL $0x01, R10 + MOVW R10, (AX) + ADDQ $0x02, AX + JMP lz4_s2_loop + +repeat_two_offset_lz4_s2: + XORQ R8, R8 + LEAL 1(R8)(R10*4), R10 + MOVB R9, 1(AX) + SARL $0x08, R9 + SHLL $0x05, R9 + ORL R9, R10 + MOVB R10, (AX) + ADDQ $0x02, AX + JMP lz4_s2_loop + +lz4_s2_docopy: + MOVQ R9, DI + + // emitCopy + CMPL R10, $0x40 + JLE two_byte_offset_short_lz4_s2 + CMPL R9, $0x00000800 + JAE long_offset_short_lz4_s2 + MOVL $0x00000001, R8 + LEAL 16(R8), R8 + MOVB R9, 1(AX) + MOVL R9, R11 + SHRL $0x08, R11 + SHLL $0x05, R11 + ORL R11, R8 + MOVB R8, (AX) + ADDQ $0x02, AX + SUBL $0x08, R10 + + // emitRepeat + LEAL -4(R10), R10 + JMP cant_repeat_two_offset_lz4_s2_emit_copy_short_2b + +emit_repeat_again_lz4_s2_emit_copy_short_2b: + MOVL R10, R8 + LEAL -4(R10), R10 + CMPL R8, $0x08 + JLE repeat_two_lz4_s2_emit_copy_short_2b + CMPL R8, $0x0c + JGE cant_repeat_two_offset_lz4_s2_emit_copy_short_2b + CMPL R9, $0x00000800 + JLT repeat_two_offset_lz4_s2_emit_copy_short_2b + +cant_repeat_two_offset_lz4_s2_emit_copy_short_2b: + CMPL R10, $0x00000104 + JLT repeat_three_lz4_s2_emit_copy_short_2b + CMPL R10, $0x00010100 + JLT repeat_four_lz4_s2_emit_copy_short_2b + CMPL R10, $0x0100ffff + JLT repeat_five_lz4_s2_emit_copy_short_2b + LEAL -16842747(R10), R10 + MOVL $0xfffb001d, (AX) + MOVB $0xff, 4(AX) + ADDQ $0x05, AX + JMP emit_repeat_again_lz4_s2_emit_copy_short_2b + +repeat_five_lz4_s2_emit_copy_short_2b: + LEAL -65536(R10), R10 + MOVL R10, R9 + MOVW $0x001d, (AX) + MOVW R10, 2(AX) + SARL $0x10, R9 + MOVB R9, 4(AX) + ADDQ $0x05, AX + JMP lz4_s2_loop + +repeat_four_lz4_s2_emit_copy_short_2b: + LEAL -256(R10), R10 + MOVW $0x0019, (AX) + MOVW R10, 2(AX) + ADDQ $0x04, AX + JMP lz4_s2_loop + +repeat_three_lz4_s2_emit_copy_short_2b: + LEAL -4(R10), R10 + MOVW $0x0015, (AX) + MOVB R10, 2(AX) + ADDQ $0x03, AX + JMP lz4_s2_loop + +repeat_two_lz4_s2_emit_copy_short_2b: + SHLL $0x02, R10 + ORL $0x01, R10 + MOVW R10, (AX) + ADDQ $0x02, AX + JMP lz4_s2_loop + +repeat_two_offset_lz4_s2_emit_copy_short_2b: + XORQ R8, R8 + LEAL 1(R8)(R10*4), R10 + MOVB R9, 1(AX) + SARL $0x08, R9 + SHLL $0x05, R9 + ORL R9, R10 + MOVB R10, (AX) + ADDQ $0x02, AX + JMP lz4_s2_loop + +long_offset_short_lz4_s2: + MOVB $0xee, (AX) + MOVW R9, 1(AX) + LEAL -60(R10), R10 + ADDQ $0x03, AX + + // emitRepeat +emit_repeat_again_lz4_s2_emit_copy_short: + MOVL R10, R8 + LEAL -4(R10), R10 + CMPL R8, $0x08 + JLE repeat_two_lz4_s2_emit_copy_short + CMPL R8, $0x0c + JGE cant_repeat_two_offset_lz4_s2_emit_copy_short + CMPL R9, $0x00000800 + JLT repeat_two_offset_lz4_s2_emit_copy_short + +cant_repeat_two_offset_lz4_s2_emit_copy_short: + CMPL R10, $0x00000104 + JLT repeat_three_lz4_s2_emit_copy_short + CMPL R10, $0x00010100 + JLT repeat_four_lz4_s2_emit_copy_short + CMPL R10, $0x0100ffff + JLT repeat_five_lz4_s2_emit_copy_short + LEAL -16842747(R10), R10 + MOVL $0xfffb001d, (AX) + MOVB $0xff, 4(AX) + ADDQ $0x05, AX + JMP emit_repeat_again_lz4_s2_emit_copy_short + +repeat_five_lz4_s2_emit_copy_short: + LEAL -65536(R10), R10 + MOVL R10, R9 + MOVW $0x001d, (AX) + MOVW R10, 2(AX) + SARL $0x10, R9 + MOVB R9, 4(AX) + ADDQ $0x05, AX + JMP lz4_s2_loop + +repeat_four_lz4_s2_emit_copy_short: + LEAL -256(R10), R10 + MOVW $0x0019, (AX) + MOVW R10, 2(AX) + ADDQ $0x04, AX + JMP lz4_s2_loop + +repeat_three_lz4_s2_emit_copy_short: + LEAL -4(R10), R10 + MOVW $0x0015, (AX) + MOVB R10, 2(AX) + ADDQ $0x03, AX + JMP lz4_s2_loop + +repeat_two_lz4_s2_emit_copy_short: + SHLL $0x02, R10 + ORL $0x01, R10 + MOVW R10, (AX) + ADDQ $0x02, AX + JMP lz4_s2_loop + +repeat_two_offset_lz4_s2_emit_copy_short: + XORQ R8, R8 + LEAL 1(R8)(R10*4), R10 + MOVB R9, 1(AX) + SARL $0x08, R9 + SHLL $0x05, R9 + ORL R9, R10 + MOVB R10, (AX) + ADDQ $0x02, AX + JMP lz4_s2_loop + +two_byte_offset_short_lz4_s2: + MOVL R10, R8 + SHLL $0x02, R8 + CMPL R10, $0x0c + JGE emit_copy_three_lz4_s2 + CMPL R9, $0x00000800 + JGE emit_copy_three_lz4_s2 + LEAL -15(R8), R8 + MOVB R9, 1(AX) + SHRL $0x08, R9 + SHLL $0x05, R9 + ORL R9, R8 + MOVB R8, (AX) + ADDQ $0x02, AX + JMP lz4_s2_loop + +emit_copy_three_lz4_s2: + LEAL -2(R8), R8 + MOVB R8, (AX) + MOVW R9, 1(AX) + ADDQ $0x03, AX + JMP lz4_s2_loop + +lz4_s2_done: + MOVQ dst_base+0(FP), CX + SUBQ CX, AX + MOVQ SI, uncompressed+48(FP) + MOVQ AX, dstUsed+56(FP) + RET + +lz4_s2_corrupt: + XORQ AX, AX + LEAQ -1(AX), SI + MOVQ SI, uncompressed+48(FP) + RET + +lz4_s2_dstfull: + XORQ AX, AX + LEAQ -2(AX), SI + MOVQ SI, uncompressed+48(FP) + RET + +// func cvtLZ4BlockSnappyAsm(dst []byte, src []byte) (uncompressed int, dstUsed int) +// Requires: SSE2 +TEXT ·cvtLZ4BlockSnappyAsm(SB), NOSPLIT, $0-64 + XORQ SI, SI + MOVQ dst_base+0(FP), AX + MOVQ dst_len+8(FP), CX + MOVQ src_base+24(FP), DX + MOVQ src_len+32(FP), BX + LEAQ (DX)(BX*1), BX + LEAQ -10(AX)(CX*1), CX + +lz4_snappy_loop: + CMPQ DX, BX + JAE lz4_snappy_corrupt + CMPQ AX, CX + JAE lz4_snappy_dstfull + MOVBQZX (DX), DI + MOVQ DI, R8 + MOVQ DI, R9 + SHRQ $0x04, R8 + ANDQ $0x0f, R9 + CMPQ DI, $0xf0 + JB lz4_snappy_ll_end + +lz4_snappy_ll_loop: + INCQ DX + CMPQ DX, BX + JAE lz4_snappy_corrupt + MOVBQZX (DX), DI + ADDQ DI, R8 + CMPQ DI, $0xff + JEQ lz4_snappy_ll_loop + +lz4_snappy_ll_end: + LEAQ (DX)(R8*1), DI + ADDQ $0x04, R9 + CMPQ DI, BX + JAE lz4_snappy_corrupt + INCQ DX + INCQ DI + TESTQ R8, R8 + JZ lz4_snappy_lits_done + LEAQ (AX)(R8*1), R10 + CMPQ R10, CX + JAE lz4_snappy_dstfull + ADDQ R8, SI + LEAL -1(R8), R10 + CMPL R10, $0x3c + JLT one_byte_lz4_snappy + CMPL R10, $0x00000100 + JLT two_bytes_lz4_snappy + CMPL R10, $0x00010000 + JLT three_bytes_lz4_snappy + CMPL R10, $0x01000000 + JLT four_bytes_lz4_snappy + MOVB $0xfc, (AX) + MOVL R10, 1(AX) + ADDQ $0x05, AX + JMP memmove_long_lz4_snappy + +four_bytes_lz4_snappy: + MOVL R10, R11 + SHRL $0x10, R11 + MOVB $0xf8, (AX) + MOVW R10, 1(AX) + MOVB R11, 3(AX) + ADDQ $0x04, AX + JMP memmove_long_lz4_snappy + +three_bytes_lz4_snappy: + MOVB $0xf4, (AX) + MOVW R10, 1(AX) + ADDQ $0x03, AX + JMP memmove_long_lz4_snappy + +two_bytes_lz4_snappy: + MOVB $0xf0, (AX) + MOVB R10, 1(AX) + ADDQ $0x02, AX + CMPL R10, $0x40 + JL memmove_lz4_snappy + JMP memmove_long_lz4_snappy + +one_byte_lz4_snappy: + SHLB $0x02, R10 + MOVB R10, (AX) + ADDQ $0x01, AX + +memmove_lz4_snappy: + LEAQ (AX)(R8*1), R10 + + // genMemMoveShort + CMPQ R8, $0x08 + JLE emit_lit_memmove_lz4_snappy_memmove_move_8 + CMPQ R8, $0x10 + JBE emit_lit_memmove_lz4_snappy_memmove_move_8through16 + CMPQ R8, $0x20 + JBE emit_lit_memmove_lz4_snappy_memmove_move_17through32 + JMP emit_lit_memmove_lz4_snappy_memmove_move_33through64 + +emit_lit_memmove_lz4_snappy_memmove_move_8: + MOVQ (DX), R11 + MOVQ R11, (AX) + JMP memmove_end_copy_lz4_snappy + +emit_lit_memmove_lz4_snappy_memmove_move_8through16: + MOVQ (DX), R11 + MOVQ -8(DX)(R8*1), DX + MOVQ R11, (AX) + MOVQ DX, -8(AX)(R8*1) + JMP memmove_end_copy_lz4_snappy + +emit_lit_memmove_lz4_snappy_memmove_move_17through32: + MOVOU (DX), X0 + MOVOU -16(DX)(R8*1), X1 + MOVOU X0, (AX) + MOVOU X1, -16(AX)(R8*1) + JMP memmove_end_copy_lz4_snappy + +emit_lit_memmove_lz4_snappy_memmove_move_33through64: + MOVOU (DX), X0 + MOVOU 16(DX), X1 + MOVOU -32(DX)(R8*1), X2 + MOVOU -16(DX)(R8*1), X3 + MOVOU X0, (AX) + MOVOU X1, 16(AX) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + +memmove_end_copy_lz4_snappy: + MOVQ R10, AX + JMP lz4_snappy_lits_emit_done + +memmove_long_lz4_snappy: + LEAQ (AX)(R8*1), R10 + + // genMemMoveLong + MOVOU (DX), X0 + MOVOU 16(DX), X1 + MOVOU -32(DX)(R8*1), X2 + MOVOU -16(DX)(R8*1), X3 + MOVQ R8, R12 + SHRQ $0x05, R12 + MOVQ AX, R11 + ANDL $0x0000001f, R11 + MOVQ $0x00000040, R13 + SUBQ R11, R13 + DECQ R12 + JA emit_lit_memmove_long_lz4_snappylarge_forward_sse_loop_32 + LEAQ -32(DX)(R13*1), R11 + LEAQ -32(AX)(R13*1), R14 + +emit_lit_memmove_long_lz4_snappylarge_big_loop_back: + MOVOU (R11), X4 + MOVOU 16(R11), X5 + MOVOA X4, (R14) + MOVOA X5, 16(R14) + ADDQ $0x20, R14 + ADDQ $0x20, R11 + ADDQ $0x20, R13 + DECQ R12 + JNA emit_lit_memmove_long_lz4_snappylarge_big_loop_back + +emit_lit_memmove_long_lz4_snappylarge_forward_sse_loop_32: + MOVOU -32(DX)(R13*1), X4 + MOVOU -16(DX)(R13*1), X5 + MOVOA X4, -32(AX)(R13*1) + MOVOA X5, -16(AX)(R13*1) + ADDQ $0x20, R13 + CMPQ R8, R13 + JAE emit_lit_memmove_long_lz4_snappylarge_forward_sse_loop_32 + MOVOU X0, (AX) + MOVOU X1, 16(AX) + MOVOU X2, -32(AX)(R8*1) + MOVOU X3, -16(AX)(R8*1) + MOVQ R10, AX + +lz4_snappy_lits_emit_done: + MOVQ DI, DX + +lz4_snappy_lits_done: + CMPQ DX, BX + JNE lz4_snappy_match + CMPQ R9, $0x04 + JEQ lz4_snappy_done + JMP lz4_snappy_corrupt + +lz4_snappy_match: + LEAQ 2(DX), DI + CMPQ DI, BX + JAE lz4_snappy_corrupt + MOVWQZX (DX), R8 + MOVQ DI, DX + TESTQ R8, R8 + JZ lz4_snappy_corrupt + CMPQ R8, SI + JA lz4_snappy_corrupt + CMPQ R9, $0x13 + JNE lz4_snappy_ml_done + +lz4_snappy_ml_loop: + MOVBQZX (DX), DI + INCQ DX + ADDQ DI, R9 + CMPQ DX, BX + JAE lz4_snappy_corrupt + CMPQ DI, $0xff + JEQ lz4_snappy_ml_loop + +lz4_snappy_ml_done: + ADDQ R9, SI + + // emitCopy +two_byte_offset_lz4_s2: + CMPL R9, $0x40 + JLE two_byte_offset_short_lz4_s2 + MOVB $0xee, (AX) + MOVW R8, 1(AX) + LEAL -60(R9), R9 + ADDQ $0x03, AX + CMPQ AX, CX + JAE lz4_snappy_loop + JMP two_byte_offset_lz4_s2 + +two_byte_offset_short_lz4_s2: + MOVL R9, DI + SHLL $0x02, DI + CMPL R9, $0x0c + JGE emit_copy_three_lz4_s2 + CMPL R8, $0x00000800 + JGE emit_copy_three_lz4_s2 + LEAL -15(DI), DI + MOVB R8, 1(AX) + SHRL $0x08, R8 + SHLL $0x05, R8 + ORL R8, DI + MOVB DI, (AX) + ADDQ $0x02, AX + JMP lz4_snappy_loop + +emit_copy_three_lz4_s2: + LEAL -2(DI), DI + MOVB DI, (AX) + MOVW R8, 1(AX) + ADDQ $0x03, AX + JMP lz4_snappy_loop + +lz4_snappy_done: + MOVQ dst_base+0(FP), CX + SUBQ CX, AX + MOVQ SI, uncompressed+48(FP) + MOVQ AX, dstUsed+56(FP) + RET + +lz4_snappy_corrupt: + XORQ AX, AX + LEAQ -1(AX), SI + MOVQ SI, uncompressed+48(FP) + RET + +lz4_snappy_dstfull: + XORQ AX, AX + LEAQ -2(AX), SI + MOVQ SI, uncompressed+48(FP) + RET diff --git a/s2/lz4convert.go b/s2/lz4convert.go new file mode 100644 index 0000000000..46ed908e3c --- /dev/null +++ b/s2/lz4convert.go @@ -0,0 +1,585 @@ +// Copyright (c) 2022 Klaus Post. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package s2 + +import ( + "encoding/binary" + "errors" + "fmt" +) + +// LZ4Converter provides conversion from LZ4 blocks as defined here: +// https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md +type LZ4Converter struct { +} + +// ErrDstTooSmall is returned when provided destination is too small. +var ErrDstTooSmall = errors.New("s2: destination too small") + +// ConvertBlock will convert an LZ4 block and append it as an S2 +// block without block length to dst. +// The uncompressed size is returned as well. +// dst must have capacity to contain the entire compressed block. +func (l *LZ4Converter) ConvertBlock(dst, src []byte) ([]byte, int, error) { + if len(src) == 0 { + return dst, 0, nil + } + const debug = false + const inline = true + const lz4MinMatch = 4 + + s, d := 0, len(dst) + dst = dst[:cap(dst)] + if !debug && hasAmd64Asm { + res, sz := cvtLZ4BlockAsm(dst[d:], src) + if res < 0 { + const ( + errCorrupt = -1 + errDstTooSmall = -2 + ) + switch res { + case errCorrupt: + return nil, 0, ErrCorrupt + case errDstTooSmall: + return nil, 0, ErrDstTooSmall + default: + return nil, 0, fmt.Errorf("unexpected result: %d", res) + } + } + if d+sz > len(dst) { + return nil, 0, ErrDstTooSmall + } + return dst[:d+sz], res, nil + } + + dLimit := len(dst) - 10 + var lastOffset uint16 + var uncompressed int + if debug { + fmt.Printf("convert block start: len(src): %d, len(dst):%d \n", len(src), len(dst)) + } + + for { + if s >= len(src) { + return dst[:d], 0, ErrCorrupt + } + // Read literal info + token := src[s] + ll := int(token >> 4) + ml := int(lz4MinMatch + (token & 0xf)) + + // If upper nibble is 15, literal length is extended + if token >= 0xf0 { + for { + s++ + if s >= len(src) { + if debug { + fmt.Printf("error reading ll: s (%d) >= len(src) (%d)\n", s, len(src)) + } + return dst[:d], 0, ErrCorrupt + } + val := src[s] + ll += int(val) + if val != 255 { + break + } + } + } + // Skip past token + if s+ll >= len(src) { + if debug { + fmt.Printf("error literals: s+ll (%d+%d) >= len(src) (%d)\n", s, ll, len(src)) + } + return nil, 0, ErrCorrupt + } + s++ + if ll > 0 { + if d+ll > dLimit { + return nil, 0, ErrDstTooSmall + } + if debug { + fmt.Printf("emit %d literals\n", ll) + } + d += emitLiteralGo(dst[d:], src[s:s+ll]) + s += ll + uncompressed += ll + } + + // Check if we are done... + if s == len(src) && ml == lz4MinMatch { + break + } + // 2 byte offset + if s >= len(src)-2 { + if debug { + fmt.Printf("s (%d) >= len(src)-2 (%d)", s, len(src)-2) + } + return nil, 0, ErrCorrupt + } + offset := binary.LittleEndian.Uint16(src[s:]) + s += 2 + if offset == 0 { + if debug { + fmt.Printf("error: offset 0, ml: %d, len(src)-s: %d\n", ml, len(src)-s) + } + return nil, 0, ErrCorrupt + } + if int(offset) > uncompressed { + if debug { + fmt.Printf("error: offset (%d)> uncompressed (%d)\n", offset, uncompressed) + } + return nil, 0, ErrCorrupt + } + + if ml == lz4MinMatch+15 { + for { + if s >= len(src) { + if debug { + fmt.Printf("error reading ml: s (%d) >= len(src) (%d)\n", s, len(src)) + } + return nil, 0, ErrCorrupt + } + val := src[s] + s++ + ml += int(val) + if val != 255 { + if s >= len(src) { + if debug { + fmt.Printf("error reading ml: s (%d) >= len(src) (%d)\n", s, len(src)) + } + return nil, 0, ErrCorrupt + } + break + } + } + } + if offset == lastOffset { + if debug { + fmt.Printf("emit repeat, length: %d, offset: %d\n", ml, offset) + } + if !inline { + d += emitRepeat16(dst[d:], offset, ml) + } else { + length := ml + dst := dst[d:] + for len(dst) > 5 { + // Repeat offset, make length cheaper + length -= 4 + if length <= 4 { + dst[0] = uint8(length)<<2 | tagCopy1 + dst[1] = 0 + d += 2 + break + } + if length < 8 && offset < 2048 { + // Encode WITH offset + dst[1] = uint8(offset) + dst[0] = uint8(offset>>8)<<5 | uint8(length)<<2 | tagCopy1 + d += 2 + break + } + if length < (1<<8)+4 { + length -= 4 + dst[2] = uint8(length) + dst[1] = 0 + dst[0] = 5<<2 | tagCopy1 + d += 3 + break + } + if length < (1<<16)+(1<<8) { + length -= 1 << 8 + dst[3] = uint8(length >> 8) + dst[2] = uint8(length >> 0) + dst[1] = 0 + dst[0] = 6<<2 | tagCopy1 + d += 4 + break + } + const maxRepeat = (1 << 24) - 1 + length -= 1 << 16 + left := 0 + if length > maxRepeat { + left = length - maxRepeat + 4 + length = maxRepeat - 4 + } + dst[4] = uint8(length >> 16) + dst[3] = uint8(length >> 8) + dst[2] = uint8(length >> 0) + dst[1] = 0 + dst[0] = 7<<2 | tagCopy1 + if left > 0 { + d += 5 + emitRepeat16(dst[5:], offset, left) + break + } + d += 5 + break + } + } + } else { + if debug { + fmt.Printf("emit copy, length: %d, offset: %d\n", ml, offset) + } + if !inline { + d += emitCopy16(dst[d:], offset, ml) + } else { + length := ml + dst := dst[d:] + for len(dst) > 5 { + // Offset no more than 2 bytes. + if length > 64 { + off := 3 + if offset < 2048 { + // emit 8 bytes as tagCopy1, rest as repeats. + dst[1] = uint8(offset) + dst[0] = uint8(offset>>8)<<5 | uint8(8-4)<<2 | tagCopy1 + length -= 8 + off = 2 + } else { + // Emit a length 60 copy, encoded as 3 bytes. + // Emit remaining as repeat value (minimum 4 bytes). + dst[2] = uint8(offset >> 8) + dst[1] = uint8(offset) + dst[0] = 59<<2 | tagCopy2 + length -= 60 + } + // Emit remaining as repeats, at least 4 bytes remain. + d += off + emitRepeat16(dst[off:], offset, length) + break + } + if length >= 12 || offset >= 2048 { + // Emit the remaining copy, encoded as 3 bytes. + dst[2] = uint8(offset >> 8) + dst[1] = uint8(offset) + dst[0] = uint8(length-1)<<2 | tagCopy2 + d += 3 + break + } + // Emit the remaining copy, encoded as 2 bytes. + dst[1] = uint8(offset) + dst[0] = uint8(offset>>8)<<5 | uint8(length-4)<<2 | tagCopy1 + d += 2 + break + } + } + lastOffset = offset + } + uncompressed += ml + if d > dLimit { + return nil, 0, ErrDstTooSmall + } + } + + return dst[:d], uncompressed, nil +} + +// ConvertBlockSnappy will convert an LZ4 block and append it +// as a Snappy block without block length to dst. +// The uncompressed size is returned as well. +// dst must have capacity to contain the entire compressed block. +func (l *LZ4Converter) ConvertBlockSnappy(dst, src []byte) ([]byte, int, error) { + if len(src) == 0 { + return dst, 0, nil + } + const debug = false + const lz4MinMatch = 4 + + s, d := 0, len(dst) + dst = dst[:cap(dst)] + // Use assembly when possible + if !debug && hasAmd64Asm { + res, sz := cvtLZ4BlockSnappyAsm(dst[d:], src) + if res < 0 { + const ( + errCorrupt = -1 + errDstTooSmall = -2 + ) + switch res { + case errCorrupt: + return nil, 0, ErrCorrupt + case errDstTooSmall: + return nil, 0, ErrDstTooSmall + default: + return nil, 0, fmt.Errorf("unexpected result: %d", res) + } + } + if d+sz > len(dst) { + return nil, 0, ErrDstTooSmall + } + return dst[:d+sz], res, nil + } + + dLimit := len(dst) - 10 + var uncompressed int + if debug { + fmt.Printf("convert block start: len(src): %d, len(dst):%d \n", len(src), len(dst)) + } + + for { + if s >= len(src) { + return nil, 0, ErrCorrupt + } + // Read literal info + token := src[s] + ll := int(token >> 4) + ml := int(lz4MinMatch + (token & 0xf)) + + // If upper nibble is 15, literal length is extended + if token >= 0xf0 { + for { + s++ + if s >= len(src) { + if debug { + fmt.Printf("error reading ll: s (%d) >= len(src) (%d)\n", s, len(src)) + } + return nil, 0, ErrCorrupt + } + val := src[s] + ll += int(val) + if val != 255 { + break + } + } + } + // Skip past token + if s+ll >= len(src) { + if debug { + fmt.Printf("error literals: s+ll (%d+%d) >= len(src) (%d)\n", s, ll, len(src)) + } + return nil, 0, ErrCorrupt + } + s++ + if ll > 0 { + if d+ll > dLimit { + return nil, 0, ErrDstTooSmall + } + if debug { + fmt.Printf("emit %d literals\n", ll) + } + d += emitLiteralGo(dst[d:], src[s:s+ll]) + s += ll + uncompressed += ll + } + + // Check if we are done... + if s == len(src) && ml == lz4MinMatch { + break + } + // 2 byte offset + if s >= len(src)-2 { + if debug { + fmt.Printf("s (%d) >= len(src)-2 (%d)", s, len(src)-2) + } + return nil, 0, ErrCorrupt + } + offset := binary.LittleEndian.Uint16(src[s:]) + s += 2 + if offset == 0 { + if debug { + fmt.Printf("error: offset 0, ml: %d, len(src)-s: %d\n", ml, len(src)-s) + } + return nil, 0, ErrCorrupt + } + if int(offset) > uncompressed { + if debug { + fmt.Printf("error: offset (%d)> uncompressed (%d)\n", offset, uncompressed) + } + return nil, 0, ErrCorrupt + } + + if ml == lz4MinMatch+15 { + for { + if s >= len(src) { + if debug { + fmt.Printf("error reading ml: s (%d) >= len(src) (%d)\n", s, len(src)) + } + return nil, 0, ErrCorrupt + } + val := src[s] + s++ + ml += int(val) + if val != 255 { + if s >= len(src) { + if debug { + fmt.Printf("error reading ml: s (%d) >= len(src) (%d)\n", s, len(src)) + } + return nil, 0, ErrCorrupt + } + break + } + } + } + if debug { + fmt.Printf("emit copy, length: %d, offset: %d\n", ml, offset) + } + length := ml + // d += emitCopyNoRepeat(dst[d:], int(offset), ml) + for length > 0 { + if d >= dLimit { + return nil, 0, ErrDstTooSmall + } + + // Offset no more than 2 bytes. + if length > 64 { + // Emit a length 64 copy, encoded as 3 bytes. + dst[d+2] = uint8(offset >> 8) + dst[d+1] = uint8(offset) + dst[d+0] = 63<<2 | tagCopy2 + length -= 64 + d += 3 + continue + } + if length >= 12 || offset >= 2048 || length < 4 { + // Emit the remaining copy, encoded as 3 bytes. + dst[d+2] = uint8(offset >> 8) + dst[d+1] = uint8(offset) + dst[d+0] = uint8(length-1)<<2 | tagCopy2 + d += 3 + break + } + // Emit the remaining copy, encoded as 2 bytes. + dst[d+1] = uint8(offset) + dst[d+0] = uint8(offset>>8)<<5 | uint8(length-4)<<2 | tagCopy1 + d += 2 + break + } + uncompressed += ml + if d > dLimit { + return nil, 0, ErrDstTooSmall + } + } + + return dst[:d], uncompressed, nil +} + +// emitRepeat writes a repeat chunk and returns the number of bytes written. +// Length must be at least 4 and < 1<<24 +func emitRepeat16(dst []byte, offset uint16, length int) int { + // Repeat offset, make length cheaper + length -= 4 + if length <= 4 { + dst[0] = uint8(length)<<2 | tagCopy1 + dst[1] = 0 + return 2 + } + if length < 8 && offset < 2048 { + // Encode WITH offset + dst[1] = uint8(offset) + dst[0] = uint8(offset>>8)<<5 | uint8(length)<<2 | tagCopy1 + return 2 + } + if length < (1<<8)+4 { + length -= 4 + dst[2] = uint8(length) + dst[1] = 0 + dst[0] = 5<<2 | tagCopy1 + return 3 + } + if length < (1<<16)+(1<<8) { + length -= 1 << 8 + dst[3] = uint8(length >> 8) + dst[2] = uint8(length >> 0) + dst[1] = 0 + dst[0] = 6<<2 | tagCopy1 + return 4 + } + const maxRepeat = (1 << 24) - 1 + length -= 1 << 16 + left := 0 + if length > maxRepeat { + left = length - maxRepeat + 4 + length = maxRepeat - 4 + } + dst[4] = uint8(length >> 16) + dst[3] = uint8(length >> 8) + dst[2] = uint8(length >> 0) + dst[1] = 0 + dst[0] = 7<<2 | tagCopy1 + if left > 0 { + return 5 + emitRepeat16(dst[5:], offset, left) + } + return 5 +} + +// emitCopy writes a copy chunk and returns the number of bytes written. +// +// It assumes that: +// +// dst is long enough to hold the encoded bytes +// 1 <= offset && offset <= math.MaxUint16 +// 4 <= length && length <= math.MaxUint32 +func emitCopy16(dst []byte, offset uint16, length int) int { + // Offset no more than 2 bytes. + if length > 64 { + off := 3 + if offset < 2048 { + // emit 8 bytes as tagCopy1, rest as repeats. + dst[1] = uint8(offset) + dst[0] = uint8(offset>>8)<<5 | uint8(8-4)<<2 | tagCopy1 + length -= 8 + off = 2 + } else { + // Emit a length 60 copy, encoded as 3 bytes. + // Emit remaining as repeat value (minimum 4 bytes). + dst[2] = uint8(offset >> 8) + dst[1] = uint8(offset) + dst[0] = 59<<2 | tagCopy2 + length -= 60 + } + // Emit remaining as repeats, at least 4 bytes remain. + return off + emitRepeat16(dst[off:], offset, length) + } + if length >= 12 || offset >= 2048 { + // Emit the remaining copy, encoded as 3 bytes. + dst[2] = uint8(offset >> 8) + dst[1] = uint8(offset) + dst[0] = uint8(length-1)<<2 | tagCopy2 + return 3 + } + // Emit the remaining copy, encoded as 2 bytes. + dst[1] = uint8(offset) + dst[0] = uint8(offset>>8)<<5 | uint8(length-4)<<2 | tagCopy1 + return 2 +} + +// emitLiteral writes a literal chunk and returns the number of bytes written. +// +// It assumes that: +// +// dst is long enough to hold the encoded bytes +// 0 <= len(lit) && len(lit) <= math.MaxUint32 +func emitLiteralGo(dst, lit []byte) int { + if len(lit) == 0 { + return 0 + } + i, n := 0, uint(len(lit)-1) + switch { + case n < 60: + dst[0] = uint8(n)<<2 | tagLiteral + i = 1 + case n < 1<<8: + dst[1] = uint8(n) + dst[0] = 60<<2 | tagLiteral + i = 2 + case n < 1<<16: + dst[2] = uint8(n >> 8) + dst[1] = uint8(n) + dst[0] = 61<<2 | tagLiteral + i = 3 + case n < 1<<24: + dst[3] = uint8(n >> 16) + dst[2] = uint8(n >> 8) + dst[1] = uint8(n) + dst[0] = 62<<2 | tagLiteral + i = 4 + default: + dst[4] = uint8(n >> 24) + dst[3] = uint8(n >> 16) + dst[2] = uint8(n >> 8) + dst[1] = uint8(n) + dst[0] = 63<<2 | tagLiteral + i = 5 + } + return i + copy(dst[i:], lit) +} diff --git a/s2/lz4convert_test.go b/s2/lz4convert_test.go new file mode 100644 index 0000000000..82ee5bda55 --- /dev/null +++ b/s2/lz4convert_test.go @@ -0,0 +1,448 @@ +// Copyright (c) 2022 Klaus Post. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package s2 + +import ( + "bytes" + "encoding/binary" + "fmt" + "path/filepath" + "sort" + "testing" + + "github.com/klauspost/compress/internal/fuzz" + "github.com/klauspost/compress/internal/lz4ref" + "github.com/klauspost/compress/internal/snapref" +) + +func TestLZ4Converter_ConvertBlock(t *testing.T) { + for _, tf := range testFiles { + t.Run(tf.label, func(t *testing.T) { + if err := downloadBenchmarkFiles(t, tf.filename); err != nil { + t.Fatalf("failed to download testdata: %s", err) + } + + bDir := filepath.FromSlash(*benchdataDir) + data := readFile(t, filepath.Join(bDir, tf.filename)) + if n := tf.sizeLimit; 0 < n && n < len(data) { + data = data[:n] + } + + lz4Data := make([]byte, lz4ref.CompressBlockBound(len(data))) + n, err := lz4ref.CompressBlock(data, lz4Data) + if err != nil { + t.Fatal(err) + } + if n == 0 { + t.Skip("incompressible") + return + } + t.Log("input size:", len(data)) + t.Log("lz4 size:", n) + lz4Data = lz4Data[:n] + s2Dst := make([]byte, binary.MaxVarintLen32, MaxEncodedLen(len(data))) + s2Dst = s2Dst[:binary.PutUvarint(s2Dst, uint64(len(data)))] + hdr := len(s2Dst) + + conv := LZ4Converter{} + + szS := 0 + out, n, err := conv.ConvertBlockSnappy(s2Dst, lz4Data) + if err != nil { + t.Fatal(err) + } + if n != len(data) { + t.Fatalf("length mismatch: want %d, got %d", len(data), n) + } + szS = len(out) - hdr + t.Log("lz4->snappy size:", szS) + + decom, err := snapref.Decode(nil, out) + if err != nil { + t.Fatal(err) + } + if !bytes.Equal(decom, data) { + t.Errorf("output mismatch") + } + + sz := 0 + out, n, err = conv.ConvertBlock(s2Dst, lz4Data) + if err != nil { + t.Fatal(err) + } + if n != len(data) { + t.Fatalf("length mismatch: want %d, got %d", len(data), n) + } + sz = len(out) - hdr + t.Log("lz4->s2 size:", sz) + + decom, err = Decode(nil, out) + if err != nil { + t.Fatal(err) + } + if !bytes.Equal(decom, data) { + t.Errorf("output mismatch") + } + + out2 := Encode(s2Dst[:0], data) + sz2 := len(out2) - hdr + t.Log("s2 (default) size:", sz2) + + out2 = EncodeBetter(s2Dst[:0], data) + sz3 := len(out2) - hdr + t.Log("s2 (better) size:", sz3) + + t.Log("lz4 -> s2 bytes saved:", len(lz4Data)-sz) + t.Log("lz4 -> snappy bytes saved:", len(lz4Data)-szS) + t.Log("data -> s2 (default) bytes saved:", len(lz4Data)-sz2) + t.Log("data -> s2 (better) bytes saved:", len(lz4Data)-sz3) + t.Log("direct data -> s2 (default) compared to converted from lz4:", sz-sz2) + t.Log("direct data -> s2 (better) compared to converted from lz4:", sz-sz3) + }) + } +} + +func TestLZ4Converter_ConvertBlockSingle(t *testing.T) { + // Mainly for analyzing fuzz failures. + lz4Data := []byte{0x6f, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x1, 0x0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x30, 0xf, 0x30, 0x30, 0xe4, 0x1f, 0x30, 0x30, 0x30, 0xff, 0xff, 0x30, 0x2f, 0x30, 0x30, 0x30, 0x30, 0xcf, 0x7f, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0xaf, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0xff, 0xff, 0x30, 0xf, 0x30, 0x30, 0x30, 0x1f, 0x30, 0x30, 0x30, 0xff, 0xff, 0x30, 0x30, 0x30, 0x30, 0x30} + lz4Decoded := make([]byte, 4<<20) + lzN := lz4ref.UncompressBlock(lz4Decoded, lz4Data) + data := lz4Decoded + if lzN < 0 { + t.Skip(lzN) + } else { + data = data[:lzN] + } + t.Log("uncompressed size:", lzN) + t.Log("lz4 size:", len(lz4Data)) + s2Dst := make([]byte, binary.MaxVarintLen32, MaxEncodedLen(len(data))) + s2Dst = s2Dst[:binary.PutUvarint(s2Dst, uint64(len(data)))] + hdr := len(s2Dst) + + conv := LZ4Converter{} + + szS := 0 + out, n, err := conv.ConvertBlockSnappy(s2Dst, lz4Data) + if err != nil { + t.Fatal(err) + } + if n != len(data) { + t.Fatalf("length mismatch: want %d, got %d", len(data), n) + } + szS = len(out) - hdr + t.Log("lz4->snappy size:", szS) + + decom, err := snapref.Decode(nil, out) + if err != nil { + t.Fatal(err) + } + if !bytes.Equal(decom, data) { + t.Errorf("output mismatch") + } + + sz := 0 + out, n, err = conv.ConvertBlock(s2Dst, lz4Data) + if err != nil { + t.Fatal(err) + } + if n != len(data) { + t.Fatalf("length mismatch: want %d, got %d", len(data), n) + } + sz = len(out) - hdr + t.Log("lz4->s2 size:", sz) + + decom, err = Decode(nil, out) + if err != nil { + t.Fatal(err) + } + if !bytes.Equal(decom, data) { + t.Errorf("output mismatch") + } + + out2 := Encode(s2Dst[:0], data) + sz2 := len(out2) - hdr + t.Log("s2 (default) size:", sz2) + + out2 = EncodeBetter(s2Dst[:0], data) + sz3 := len(out2) - hdr + t.Log("s2 (better) size:", sz3) + + t.Log("lz4 -> s2 bytes saved:", len(lz4Data)-sz) + t.Log("lz4 -> snappy bytes saved:", len(lz4Data)-szS) + t.Log("data -> s2 (default) bytes saved:", len(lz4Data)-sz2) + t.Log("data -> s2 (better) bytes saved:", len(lz4Data)-sz3) + t.Log("direct data -> s2 (default) compared to converted from lz4:", sz-sz2) + t.Log("direct data -> s2 (better) compared to converted from lz4:", sz-sz3) +} + +func BenchmarkLZ4Converter_ConvertBlock(b *testing.B) { + for _, tf := range testFiles { + b.Run(tf.label, func(b *testing.B) { + if err := downloadBenchmarkFiles(b, tf.filename); err != nil { + b.Fatalf("failed to download testdata: %s", err) + } + + bDir := filepath.FromSlash(*benchdataDir) + data := readFile(b, filepath.Join(bDir, tf.filename)) + if n := tf.sizeLimit; 0 < n && n < len(data) { + data = data[:n] + } + + lz4Data := make([]byte, lz4ref.CompressBlockBound(len(data))) + n, err := lz4ref.CompressBlock(data, lz4Data) + if err != nil { + b.Fatal(err) + } + if n == 0 { + b.Skip("incompressible") + return + } + lz4Data = lz4Data[:n] + s2Dst := make([]byte, MaxEncodedLen(len(data))) + conv := LZ4Converter{} + b.ReportAllocs() + b.ResetTimer() + b.SetBytes(int64(len(data))) + sz := 0 + for i := 0; i < b.N; i++ { + out, n, err := conv.ConvertBlock(s2Dst[:0], lz4Data) + if err != nil { + b.Fatal(err) + } + if n != len(data) { + b.Fatalf("length mismatch: want %d, got %d", len(data), n) + } + sz = len(out) + } + b.ReportMetric(float64(len(lz4Data)-sz), "b_saved") + }) + } +} + +func BenchmarkLZ4Converter_ConvertBlockSnappy(b *testing.B) { + for _, tf := range testFiles { + b.Run(tf.label, func(b *testing.B) { + if err := downloadBenchmarkFiles(b, tf.filename); err != nil { + b.Fatalf("failed to download testdata: %s", err) + } + + bDir := filepath.FromSlash(*benchdataDir) + data := readFile(b, filepath.Join(bDir, tf.filename)) + if n := tf.sizeLimit; 0 < n && n < len(data) { + data = data[:n] + } + + lz4Data := make([]byte, lz4ref.CompressBlockBound(len(data))) + n, err := lz4ref.CompressBlock(data, lz4Data) + if err != nil { + b.Fatal(err) + } + if n == 0 { + b.Skip("incompressible") + return + } + lz4Data = lz4Data[:n] + s2Dst := make([]byte, MaxEncodedLen(len(data))) + conv := LZ4Converter{} + b.ReportAllocs() + b.ResetTimer() + b.SetBytes(int64(len(data))) + sz := 0 + for i := 0; i < b.N; i++ { + out, n, err := conv.ConvertBlockSnappy(s2Dst[:0], lz4Data) + if err != nil { + b.Fatal(err) + } + if n != len(data) { + b.Fatalf("length mismatch: want %d, got %d", len(data), n) + } + sz = len(out) + } + b.ReportMetric(float64(len(lz4Data)-sz), "b_saved") + }) + } +} + +func BenchmarkLZ4Converter_ConvertBlockParallel(b *testing.B) { + sort.Slice(testFiles, func(i, j int) bool { + return testFiles[i].filename < testFiles[j].filename + }) + for _, tf := range testFiles { + b.Run(tf.filename, func(b *testing.B) { + if err := downloadBenchmarkFiles(b, tf.filename); err != nil { + b.Fatalf("failed to download testdata: %s", err) + } + + bDir := filepath.FromSlash(*benchdataDir) + data := readFile(b, filepath.Join(bDir, tf.filename)) + + lz4Data := make([]byte, lz4ref.CompressBlockBound(len(data))) + n, err := lz4ref.CompressBlock(data, lz4Data) + if err != nil { + b.Fatal(err) + } + if n == 0 { + b.Skip("incompressible") + return + } + lz4Data = lz4Data[:n] + conv := LZ4Converter{} + b.ReportAllocs() + b.ResetTimer() + b.SetBytes(int64(len(data))) + b.RunParallel(func(pb *testing.PB) { + s2Dst := make([]byte, MaxEncodedLen(len(data))) + for pb.Next() { + _, n, err := conv.ConvertBlock(s2Dst[:0], lz4Data) + if err != nil { + b.Fatal(err) + } + if n != len(data) { + b.Fatalf("length mismatch: want %d, got %d", len(data), n) + } + } + }) + }) + } +} +func BenchmarkCompressBlockReference(b *testing.B) { + b.Skip("Only reference for BenchmarkLZ4Converter_ConvertBlock") + for _, tf := range testFiles { + b.Run(tf.label, func(b *testing.B) { + if err := downloadBenchmarkFiles(b, tf.filename); err != nil { + b.Fatalf("failed to download testdata: %s", err) + } + bDir := filepath.FromSlash(*benchdataDir) + data := readFile(b, filepath.Join(bDir, tf.filename)) + if n := tf.sizeLimit; 0 < n && n < len(data) { + data = data[:n] + } + + lz4Data := make([]byte, lz4ref.CompressBlockBound(len(data))) + n, err := lz4ref.CompressBlock(data, lz4Data) + if err != nil { + b.Fatal(err) + } + if n == 0 { + b.Skip("incompressible") + return + } + s2Dst := make([]byte, MaxEncodedLen(len(data))) + + b.Run("default", func(b *testing.B) { + b.ReportAllocs() + b.ResetTimer() + b.SetBytes(int64(len(data))) + for i := 0; i < b.N; i++ { + _ = Encode(s2Dst, data) + } + }) + b.Run("better", func(b *testing.B) { + b.ReportAllocs() + b.ResetTimer() + b.SetBytes(int64(len(data))) + for i := 0; i < b.N; i++ { + _ = EncodeBetter(s2Dst, data) + } + }) + }) + } +} + +func FuzzLZ4Block(f *testing.F) { + fuzz.AddFromZip(f, "testdata/fuzz/lz4-convert-corpus-raw.zip", true, false) + fuzz.AddFromZip(f, "testdata/fuzz/FuzzLZ4Block.zip", false, false) + // Fuzzing tweaks: + const ( + // Max input size: + maxSize = 1 << 20 + ) + + conv := LZ4Converter{} + + f.Fuzz(func(t *testing.T, data []byte) { + if len(data) > maxSize || len(data) == 0 { + return + } + + lz4Decoded := make([]byte, len(data)*2+65536) + lzN := lz4ref.UncompressBlock(lz4Decoded, data) + converted := make([]byte, len(data)*2+4096) + hdr := 0 + if lzN >= 0 { + hdr = binary.PutUvarint(converted, uint64(lzN)) + } + + cV, cN, cErr := conv.ConvertBlock(converted[:hdr], data) + if lzN >= 0 && cErr == nil { + if cN != lzN { + panic(fmt.Sprintf("uncompressed lz4 size: %d, s2 size: %d", lzN, cN)) + } + lz4Decoded = lz4Decoded[:lzN] + // Both success + s2Dec, err := Decode(nil, cV) + if err != nil { + panic(fmt.Sprintf("block: %#v: %v", cV, err)) + } + if !bytes.Equal(lz4Decoded, s2Dec) { + panic("output mismatch") + } + return + } + if lzN >= 0 && cErr != nil { + panic(fmt.Sprintf("lz4 returned %d, conversion returned %v\n lz4 block: %#v", lzN, cErr, data)) + } + if lzN < 0 && cErr == nil { + // We might get an error if there isn't enough space to decompress the LZ4 content. + // Try with the decompressed size from conversion. + lz4Decoded = make([]byte, cN) + lzN = lz4ref.UncompressBlock(lz4Decoded, data) + if lzN < 0 { + panic(fmt.Sprintf("lz4 returned %d, conversion returned %v, input: %#v", lzN, cErr, data)) + } + // Compare now that we have success... + lz4Decoded = lz4Decoded[:lzN] + + // Re-add correct header. + tmp := make([]byte, binary.MaxVarintLen32+len(cV)) + hdr = binary.PutUvarint(tmp, uint64(cN)) + cV = append(tmp[:hdr], cV...) + + // Both success + s2Dec, err := Decode(nil, cV) + if err != nil { + panic(fmt.Sprintf("block: %#v: %v\ninput: %#v\n", cV, err, data)) + } + if !bytes.Equal(lz4Decoded, s2Dec) { + panic("output mismatch") + } + } + // Snappy.... + hdr = binary.PutUvarint(converted, uint64(lzN)) + cV, cN, cErr = conv.ConvertBlockSnappy(converted[:hdr], data) + if lzN >= 0 && cErr == nil { + if cN != lzN { + panic(fmt.Sprintf("uncompressed lz4 size: %d, s2 size: %d", lzN, cN)) + } + lz4Decoded = lz4Decoded[:lzN] + // Both success + s2Dec, err := snapref.Decode(nil, cV) + if err != nil { + panic(fmt.Sprintf("block: %#v: %v", cV, err)) + } + if !bytes.Equal(lz4Decoded, s2Dec) { + panic("output mismatch") + } + return + } + // Snappy can expand a lot due to 64 byte match length limit + if lzN >= 0 && cErr != ErrDstTooSmall { + panic(fmt.Sprintf("lz4 returned %d, conversion returned %v\n lz4 block: %#v", lzN, cErr, data)) + } + if lzN < 0 && cErr == nil { + panic(fmt.Sprintf("lz4 returned %d, conversion returned %v, input: %#v", lzN, cErr, data)) + } + }) +} diff --git a/s2/testdata/fuzz/FuzzLZ4Block.zip b/s2/testdata/fuzz/FuzzLZ4Block.zip new file mode 100644 index 0000000000000000000000000000000000000000..c2ef2f86d34288ec2015019ac4c3e93d83f6fee6 GIT binary patch literal 203950 zcmeFZcRZK<|2BTw$}WnKk?eTyK^fV5MMmL0vyx;cB_SDQN3zKXS&2$0%8U{jr3jgY zl%3z}-KVbW>bmc~zt8V}JbwS(*ZDYclFRda&Et6-$Ma=sz`)20(L=0|MYyRI(~&}X z7z9BSaI1kE2E$?zWE2`sCL{4E92O44P$_U627^XpaTFX3g@jWGR2YSdM`CbzDi%Q@ zqX<+u8Be6*kaz^%pzEl#-T?(OUAdAXb7|yQ#8smrdA(j!1QeXSok2sC`K7ZcMoCRq zaP)(;vnL&7`d2h|k)~GZLKq4bUI>FR5*bjO3K)Ssyy1@g zF`a5Pmkw7JOeKtr3OrKISkr zHz?lhH}X_S~MAEsY))=IY5;qfkuT1W25{QGSE* z@Y@(Eb}@O;h9{=K2U5(0DxP zIV^<;nh{UMV^Ih)8iRww@CZ1bfJKr(-{Q$w3LZ`QQ=s}N=n@!VJ4G?3S}+SU%D;NA zAA};Y>e`M3Aq4&k1&hOz&{QIxM4{keSRw&IM59O~43>Z+6JaO{l?*-+5Lh&p2pW}& zBjLcLAS1|dI2DV8k+6RXC8F@}-uuyJ&7vBaAs6E$cmJakW3QN6#ZPRx;6X6L%s<-( zkHR3aFf1HF0TT`k861X;Kw~jv0vJsg6q$si;$T!1n07cK4oM}#VMqd*3d2*;2sn&@ zg#9U?Uy~1`R;IqxP@yvrR(01g;ZWON%C!oP-$(fa&`x3JpBzAgu#Wy}Cn63@MZr)Q z9Fc?|!4U))=oJJNhJcaL1R{u*OvRDm2s9N&B%|;+Dwc#Jkq88^V!#kc(7AsKs}1|3 z%QQqWu%P*@o5|b1pcteZ643t|MR&n)Z%d4bPypd*{0fPJKw%MJ(ZOKA+CxRd!8`_rF47hIjO^n>LB6>BY1C zqt^&v5oqg;%%k<1Gu_W#qu{VaECGQ+AgOQ!9)mzo(O@K_u_zRQf+WMSL^O_oA(E+Z zH0W3=g-C^wkYIHnAi+`#R?R;(&%cLJ86k!q34^z(QTd^N4W>>fQ!C2zqEQDBRzBm; zu&7`)B*3X)9>J(+G6{wu<4FY2AuuEy1EW&0Ffs;%#Z$0gnTHY41SA5ELV=zoVkuyK z!vAqF{oO<_ot>YsT^lt!7;#Zx2-XS3)kllks62*5V2rXxh8nxn5p9q(9r^{gLNS*G zIYfdK(krils?3y4{B2>~24+H>wz?^88@l_mP2pe>#eyy*;*lg00*OPBVMsh!sW5mr z0gEP42{^EFfc^&yFP?``D&<_^>SShL`8Jc>vlU{DAmiUNih0gpjbz{rLXU^p0=fFQ%*VC5jvIu%Jm zB8V6i90z`b4CXQs4}+s-EiY9_=<#xpE)d-;t*(2z4Iz(TDsEa??R9M$WG^l09^77f zQm$+x>d?ox-!EJGHW|Hnh&$Sw&zygC^+kCh;bBjT?d`}L&x*b!XDA24Zfqo$=w2+l zJkOP*^0m+;uwv`jv_;0+E2$H0Nf92;1u5;9+#ac2AKA=*opfMM-@;hYPO^C?`Iu$3 z_|3?%F>YfUyr)a9o2842wc`+L0Q-wK&l&A}QGI3HYku8K3jK+XQ!k2r6cy;&8lhV; zzV@o6MfH8eU6qi-YXP2k7>Xr9P_+Us`g0b$vi<$vdS4P zk86llBe+%O%7nb-?v>)#H#B?A`%zms$0t>R-a5C~ZM@%-@ErNZ)GZ-7t#szY`zNlZ zhdR!$cHR4s{65(`jDJN>lEs&6d-OoZ6Tv(1tlP}F4^^i3p8L2zlSHXh%s}+NDo?a4 zuTAtgb?-H!g~3<)0@#=9>WX#z=^iCcc-WWALmm?i(d9ci?>||jyHAD6E=h&Y)ei?q zEp3WFRh#dWY%kZdeTL{G&lvL_Bp;XPfUIN1l8YS6tM6Q5VDRbSw9%icx!h?|wj=k* z>CLtCNdbxx_Gtdep=%YVtPBw1hgoW^Jtr&Cj%ysOMwLs{J#sIm647FI8jqPx-w^AB zq~9|4Z|_UXxa&l0_RsX_fXBBui{0NeAmFGwPE;DFG6LI`Md5S`ze zcBWrGaa8TOU}4$42cou)+mdd~ay{Nz?{BR?798G5Dms-B<9DQK;$pX7f(BEIR-5L> zEABDg{ay=6X$dyZ-E$U+Lg5X2>x0m#_Z$XowL}A}P>8S_6~@Tq$UMGazGbbPOQC6* z{PYo138Ht?J={KY%GsTBL;W?LTY zwP<+OIVDLNCLZB|6WGEh?a}4?cxmlS@{>rD`YmaOa#cr8Ifg}4_pONy_0=N(#bClT z*N%|4Po|)~ulre#&u}Mv>D2eUW%PNAAy+88B%eJXKIg^y))joqNc@mgZp>on_)6V+ zcSQ8M75T~y3b?9o2wnd z^kPS{ovdHzF{JliW)IbH-KKv0;DG9Bf|_)yXv)VGHo z6WAIXj;O1CFT~P89>=z?toh?3RZSgOAV?F(06}$m!{(<}chpg2Oui+ZHTdzk4a&@C zJhVURQ$yEYQvB^HAWH1tfcvG^^MfV=*Pkd0bWXNyWrXI_SV|NKN&@IebLu zN3N9}X-NhQ&xw%`B;73F=(7iUzm1yTX`GsdQ9jfgP}%eAfddM<#asExsSzyDr@$Y) z_sVYiz}KX_N$}xE7@mQUt;*%r1 zz#>2X%*)=|-`~3QITHjEVuNm;Jzvvno%y`EuC^&_=Hm(LzMDuWbDSRXI90kq=a7Bs zczYh*2|B1@SK=spdeDF~1pRuu^y#_*=U)%ZkeWq=jZ4*g2)raC)cxZ%&$gTQz5dTv z26g+P0~t@Ml0M&7HyH>!r@(PmpTDcsW&;To$i1hxgPL7ILn3hq7!^Z8fxQT#2S=hXFo6B26bi_Ba0mhlybcbF|05{) zXK-$g|H1>OKr5YEZq=g&!2S!JAOSunlQ1A9fs-jHkQk$=Faj2fq=ILVev!bQkH(SF zWF!tCVF1#RSU4I(f`bH%j7FmWtF-TbNw$LY`LnMG{HvLwq5yWvR&l4LG=io-n+ZjN z{Q$&dECzuk!KnbOQ;;~&FjUZ+WE=@}6b?@$5nwRz763EjNuVc@C=`i+!U8BW_oAFL z*p!{A`|TYy$+r>buhxVg)M3eBUOl@_@)PGC*8JAiv>Vk^LeR6VGv-0cXv(yV3>044DHt|QZpzWX_TH*{r4JPMY;)Y&;uV8l0 z0^-BwmEYf$xpa6wI7|2U@H_)v?bAT;ga-g(OnZL58pv)zdO-%z8jc4kK7vLJfab?w z5kw;R84N?DfD8nrkwh?#Ng#2=;V}Tt0=@!90<^(ga`O%OTAtA5FFQI7&392SNTdcv z0&^)awRJW+1v`;QAW^B*r_obzp$jrBMV;&x=!~LVQ|b3jW>pjJBG=QkuNfPBsGOO3 zd1kYJ@csqwJ-taHw>O^E9kC7Dc)HL-;ng|KtHX^M6Kp7aNx8Liveot_#$d7Cxo$}8!mg8XwGhovIvZ5#-h zE_nQ@^G1(bB{_4dWLMrFsuF2YpzvCYz4Wp_y56dSP)wJ2fNqqRyPXiFdyK3^5g#({ zR&;nI?`g0Yx;+e$t+0`_dFVzeu=`4l!#Od*dKU0egM&A9f zb{#|LcQAWgRr@GjJLw*@|KSO zBzwBy3v}~x;qfmY9Spf|a(8x&e!r@#_c$zk2$qn{su1HeJ0BNQ;NjVE{rJN{U!f*c z_1E3QFwyO6u}=Mr&zb!iyPWZ(#U>6)jBHu&-=sB6NegTXH#m-7Z`^Ms`>iGx7=1sNF(kd9}6BSLg4fW*pcgL=@d>1orNZy(LTGwed zE^EcujLXL9t}koOz2om>m|QxRW6(S|ukgIMdh%QQuA@%h=v+8c$B)W9Qgre|LJ%Z5i!7I9o9wWRF4aPj)`)XS8? zZ5!NYawP&FL33;V`^9?*95GQ`2eM2K$(f#N7?UpMF8B3$#V#MBv6{I$T;jZcbV|r} z?p1ejoc4I?;EJqs#f^CrV=emC51g8=Y4LJiLySFMy|TOI88zE{&62uRI{9#EQbR_Z zk^N7PDNZXqL%P4ocvhPz%BB3i)tv8Z?a7H?Zu{k$ZTrJ_G@I%yv0CT25A8XscS!h1 z>>e|^-Qk!Krb=?s@GT9~gQ|A7l7iqeS3-N~UVPbS{WTsNL3F|uFUkw><2T;4lQXzI zGAGy#AAOD9efX^I4Zcs!H}k#6BykuP2h!oo3i>J9yPib}bPcC5LQ+8opWHo4YWEuhZ|_GkcNM|-vEFBHsZ@++G6$-3ODq41C3)zZA^ z!^z(nPafFYmX#1%6cw(vAehp(b6l;n3?BZhM@R6`kyYCx=ay3@BHHg}JYILd8?e;= z`uwMyH_dAz^}V9+LvL}$UKZ#3I4EgkX%yUa-QnQXXJh(m+Fr9y{Z~Tafg^iQjy8H7 z;5ZZLuKsdgT1QyfYoqVfX)B3WL5F$h&MH}!J0C2VEeQxI#cF&hmNZLJU2QlNF7dT* zn9mv~A;5Gq+oMV6W&3KkF$)1rp@`~(QEnAhfQN8Qft3?HR9Af0~UcYa-q#e{;BX+5{=mt~pQYdYMmO!M;Jpl-*; zbs|ZJzXujyt4Sg79SEc;!kf>W1 z^`%ta?xU|G^G~>bs`b@*!npmI6Y*AZ+?@zqjbp;TcaWpbW|%S0wgR0uhk3detXFJ~ zF=sjGX~A||`x-!&)9L0n6a)F~7}klH@Pe!t5q!H-^&NLPtOYA_hT5b&HQR}*moSoa zsi{x=G^l3KhJ7i!?6tkW{6^F6(1g9#qxNT9p~sQXr85b>DYhZkP5h7Sd(UCN(wFBp z-N+J^jthHKYQipY=L3t$S+8!zjf~5xTc0_y-}9*~#eO=>;gNVG8HdAR31G(rge@9C88i-ufx##!5*$m$AOZe?!{Ma=iOWKU7wTg8-9>NF zhxA0rV03>c)~Y}-pNd%=Y3w=z+-l$kR1X}Er2vHiNdd$tjs|6j08SwRpNA$PF?b4^ zKmbn|G=QU2B;X;D2qX%$79hB&fPnln7=@W2gMxnP5zhKIj{SoGx(T?nhVubt2bds8 zD*I<7L;`4ZI0cU*gJwZfaX6qhkWhF6h6Fl}LK`?I37sD2ea_!Nt;TV@bMCk3No5PNCboAdZK5IBqDRp82mVZL#)8bo{ z;=Sf9lSN7Lg5^m|$f+aZ-O%Br@=#hoK~U@+Vepa5pf}S^Z-?duiksKhlf9Dz)oWR} zZykZV%Q2T+<*KTKU5B>ItCG5|+~+YfzS(hb;H0I^PM&Z6dBu44_|HjsTVtWmDMUPtL^M#zEYzQbeNNS^?O^41ht-J%`b+D(>k;v zHCW75OyKj{SF;u4-Ba)Nb;tac%F-GVZk^IP_oz5Yc3@lH#=9AJ+kLjuPBn)6Ir)th z?BRXP``N@UR>n2nflHl6M23ZnvhkZ=j@kYOzxBeqa3lb!(7;K6e4PVnZQi@1O&i0YP9VMm=df%U~8Et8^y=uuj&HLo5 z)~WD;HWzQE<@sqXwo2utXNUSF`>D}q*w00`HV*6$c@*vMw|r8vYL30#;9%zfmjjEE z6=waDf2fwch(HtZT0@;c=?+8mk*>;Mm)Gi(c}E*|UcHfY2&w9D85udmFE)?z4(9(pleM1t`5S^c*uj>FU!(VCCmAv8n0aKNV_{5=T49b}jYJOL#*2@=4a0;o zz1}>Y=(E{&^@zv!5uT)g9a^vXnAaLD#*6}VwKXUCF`RfW8y1D*T!-W3>jqo&QMME# zzXQtj2Y0@`Y56l95*2pg?rRYHGs96#pA4JKP8SbfT;bT4 z%M5u(o^n^>OkdBOR9;V}Ia4YFY3GC_lpI*yuf`-rO5x8wILKa$0!?h8l^VzRPkfRb|D$z0f{kL_A zbK&{;Ln+ro?px#E1Vnk+QLTpSzA_4qT)sCjXjJiSQOzdqK$do?tP&mtFA}%-&39Ldadz?#n1P`ouLJDLZuZ zgB8{5_Oir-&~N-aTKzYpwj>NVa)US3H|IJ=_@1fwM_f#HcyZbGg)qmf!rB@6fwtoA zi^S;@+a4R1DF;2XI-BJxA6((CcloAX@7e`N)=2f~ZCdUXXxpufH!rb6Q1sLRI%w$^{ZsgZi=l&bX?z7VyxVMu1)8@BIz@4JmDjNzvxK0S zGjxz>BoyS{A`aPv$OGyjb3>!4aICyHkEiG0?BAOvO`xVQPukLyf#cv-12-Iy4Ol!_ zS8-U7tkUQTpacVD4yXWtod>!q0*D3#uvt)WRFK~ShJZlAQGiwq9?2vM@qgG9eo4Zj zq8-dcoM5rd?_1P;5Dml9$TCgI{A<4?;|NG15@-TM7}%XaMhysm7zP1!a#~gmwnQ=o zheV*r6atXS5M(?S0c2VjNM9)g3J(8&#ssJdUG$*dkWF9)i2e7yizc{IFEg9cSd+hY zQ7Q^-Py{R$i3FM=5Ie#C1Se6z)(U6~pah|@WU#3L(Tsvb!YF7A8i}Xr)l?vf0$m78 z`cs4blv4K5Bz>Rpz|kZWg+!o|fHVsbClZDOV-o15|9xoxkpF{>qjtdg^!7cy7nZ~;kZF9w zlVQJ+!atzYML^+P4)Gntfe3wnB1{AvXtP)h1`sV|G6jvo0il=-#wQAe1H%G_!J-ks z3P8fZ0k#6}1Ta2;r2&B_Pyx*d1G?LvBK=L-8zaVuIh#g3rfa>5qT~J-?o9$+d&S_u zSwNRT4Zoto0W?XV;Nd6|6+;1V96<)`8R$_s40seUXrSn!fUHZ!qmV%C#Q=8#62Npa z0k|u`1Ll9iy?+@w{uf3M+Gp4x&q!buZ{gl&$R8(bkPOrGsr#3-+vsX)^p zfm;pSpg9mATr3<%vz%aHNFou)rFb|TMgf8;8V#^G&0s2UISG=3xMC3jmgZy8$Sy)Y-(l z>Ej(d&1dSHUc!84;!qtO?0dMs!upjBsZvzqj#x1;pMaC)EcU*Ll!o(DqTKeGyY}@UMgRE#oi(9 z93>0fEpc*f3Q}^hyrG=_v0e7Bq*Sxq^dlE9v0dCgsvvhYPe%Jq;)SB^X0Np8_pC}* z!+2Q6Tb%WrZ%zoWTE1J+WwlWCzb?U1g)6-6innOX4-%cJlUH z%8JdpM3aq%sTPABdQWvDFJWuvF7IA^tX*dAc(dgJ+eF0bp;j@zB=(fR`Z~+*adlnj zbgYEBQ-=1&!?^x5p42uu&UP7d8UOo63`Cz)2_Jf;8PdUpGw;J6Ce+D^c!~~PKQ7(Z z5H~uV@|1JCL|?RKW`TMBY{bJAsj|E5N7}u{T4Xz`USyBF3rtX;(0zFq*mEhpStL-` zkcoabgZ~$a>Z8(YT+Lbie)EcVKD@hga=T3XdcEzi{-MfaU&q9^KGzrDSCbq%8_jrF zsedW@H59ya)W*7hXqTK;P0MMvwBT(Xv6-hou!+7nadlw$GTHkSUe}2@bLvs0gU}Lb z!GrEO%gW(A3(qFI)cf;LWU=XRN%xeDO-eW~DyWdrziaElx~YoOQ{qiJspR~lWKV~$E9K@PS-R9ck1#FE9>q|@87B34H-T%FL~ecLU6$8Za~n} zdmG&kxaLAWhtYB0)aPC>ez<#Lu=(}r@!Luxs$)V)VV6`R}7m zuRoXY(t9xO$los8E39`&LSJTWZD{{C`6M4Do!XX3w`WBSybJSOl{N`G3uI0Bs_E0B zc5V~(!7*yxcVKGr>j-NWDjc|dWY>;;wQgFU85gPh9e8u`fmBPuhO8%@N{r=jBkdA{ zB047Z*8cb6-Nz*Q{rZ70%|YMqq7Uf(OP{m~^}2gl zcPNaU^;pK+&l@?konaH@8hp`8+`n1tJX!(WaqT#KC9VVBx9tmkZ$WBb7@vNZPGSPH z>}2o@?F#Oswne+CDx4-C*DG-p^}y-yLB;cg;G*;E3a6t=!Yqs8zK>kGQ*Zx>;Uc3; z-%DO4)pLB=I=Nae(nk)Ga#7FAZI)bu_CgP<(8zC%-FPd&&zCe_bmQr2HoBlImP{tUpfr0?zj~xY@+_1-)w)r zPfP?k*@~kM6~r&0T6!7hcyDVy#{g-STzckd&G*$*K2%Ia%|eUM)D+tUs=ER|?3eZ3 z{@C}WHHQmVv|p-tQ79#R_Xc^V$)J**Z@s#`gY&uFM~N!N8oi7}i^Fu&@arb*>`E9yY=W7PRX0zG z*jFE(>`&`bRTD!QVpdz}A<;X|I@Lvus`PAo9A}X6=}))EO*^KU&dI_K6AxYzm=fnfG>@l??B} zh!X1nLB;VBvR90a-T|A%yiW3U^2qp>!#8`aHoH5sr`>wG_})#fTSpdZD40srSGym$ zH?>x$9=4{|1|D0Jhg{XH93vH<18 zuQl>0pI`DMlC^J@HyCNye+q@8tSu3DRkc2@70G|={{miY*6x#j_D(O%Scjc564w$? zQm!CC`s{t@C>IC^f>ic9?_CvIdd?uAFMh%D$Ta=makqhtXipaE5FGq7C2*^O8wp1d z;V3MchyWMZ^9f{tM~PG-1xtf`H~^|}XkdLK0|ytJi~}A^08C*3y~2RG3Lw+}0VMxr z*%dvR%s85S5$7oIJKWp};HHnU!LU6D!<+tRI7BiUNYnstVgY}O2Pz>jE=O>In0p4aJhJ>J?DG1=wB_U}PC-{y33T|qMH+gUNk_!}kV2?c7_ z+mcb@ZpmU5v2eaHsw0o|vmMtkp3U%qgmtFI#*fUu$KC>lfN|t%Eegc`0NiTeMv@Q! zBm+hmSopA@b3qnF!QsHmPzeYGa26BDWHN?=0)9-8Op%bl7m7i_NGRZyqq#PT{M?eF=BBD}ZK>McA!bB7b_;d*uJWK^s1_K9~1+ZuW`T$HTV0OfVAqt}b(ht*m%)! zJV^LSAQK1vbs|W4;Uqv_0Ou^Al1adOi3T15@BpX*JQ{_f0NtAUmuIk^=yv3XXRtus zibzejDY+|bFW7jO$9_oL>8n7%^9P$ek)OitEdBjUD-@%i-&l;dsC?{|kGySlJx#^p zZvI9#KIcWp9XGA*A-=4EG^5}W*S?{AnsadLnCC6GLe_WwId1+jo!6%k=T_d_s>nO; z+=i#L=W?q*v!uv9_Km$M^5m$M@v&rSyF^?1UHzEH%3&_51U3V&v1Nu1Xv(z|=|JQF!=lJkxPi@wu;N%l!d& z=!KP212>Bs4bp7xu_@l;9GOeM;pj4D5Aji3UHbEmitf0|cZkL6yQW~HPz`)INb=R&OCmz=9f52CJ8N?S_w>xQ4sTN z^3IBg!P8u+rz&>l9$8qOd*Ze%hN^Rcz3|N#d*c>E)C}SDEPvH~La=ekF7^g>rqj(Q zyT{e_aCKN1M`lg)dA#X;!7{#w%BmOq&&+X~;`)-c5>A>e#9Dd?RfeL|_g zM6^nAwd$rs$xzFq+zWQehxPMyf6-Q&ojYH!%a75(AcZSc_szj#aZmBM5qR&iiGhmj z<9SnOr8WZnST-eFEL)HwQteQFM#Q_%W6DSZCD$o7LV;&*mHzqAEmQalbmQ^ z#8=?)X2(=yPeAqYv6?i36r1XjTE|~SI4Af+$DZ$+x(6;-|s6&6qc_}yq#Qz(=7qZph2EwBCre=0n6Ynnq{z> zv)cndGc^7>G-I!tWRj)&72$mAu8$gX77vGgdM>=lnzj+xx0~7SvX%aN_S45_PTxUT zo?5PQGvH0;v8=h-aUrIJ*?j*x&+Os8`G`mnl~&DJll~`dK?jAaoNLKP+|efXF!@b* za2ZRjaKnc^LyoS(ah~73_M>}lEpO48b2h!O2&-)Eo!s1NIyN>x@YcvV_f6Tc`ehNj z2XnB8V=SrxoE2r$kd8|TpLKw37)rvbINMJp^!$av0Y*l(?pu7A_d4R5PW&~vRF!Hq25sY=OV zd3nTfMdWkR#mxtG%9GJz>>r@>%QB+k1<7xd-p0NcmDqk!lo*?qB77u9*H-h*O(CJv z9y{DQZwl`hm+**v&$hxpF*THPIG*BtA$7wxHVwYsvHplM@i}Gnk~|$`{YoM#B8^S) z?)%;P5VUfEoC2Ia8@v4teq8O@*2H1+pVtmZEq%wBOx6(BGz87>v$@3#mDNn0Rt&0_ zY4Au4>F$&iN9W3F#NQ3Ozr`zm`;`F1zt=_q3PMli@s4usd#>eG^vYQcl%nwKXqAV| zsSN?k-O2pBX?;GnLsR!R{D+)>J%Jxcs@u7cfw?ej$cN3}0D{I2jDt%@<&+I?yW%OC zpVv0E#7Dl<A#H3NmwZFRaa*NV;L? z;$AD};*Z#leXX2C8F=M5BdVgA-W=4bs(MWTb7(78=`!i_rTZ!Z`E>$Imcrx5E?|<< zQ|x1tV%3MG70@DhktRo1pB_E*N%VcojyNy-qH|m6U*KuI=#k%M%Y@fjMXUFTA+|)BT@&`{R0FGglLW02 z!GRoeJ5_nz1L?-0q|80>hG{Vv?R`czYrU}>)9N$&ci%Q;KWK8UtG zfh*tmP$;>aP|n!d^h(Jx_r~~UXog6%Rd34!_+Fyh(^u9jt36k*9ILOta3HOv>haU1 zfK8|M^qfw6rOWSBBPpkcviQxf=g<$C4PnJHRaW>HD{HT-OwYbT63i zI`t_RG45|WLJY~z(0@RjL@zFdS$%BZTp6@}95~_T;#PVAf?5)?51=dF11~G^!;1bK zpfnEZ2q$ueX17iODZk%JDzeVsZGM$oF%~5Y$x*{s+8tVMP)l?iij% zF-tngk!SQ61i8L6kkLLwM+Z%E=W4`os;9RU&^{?KPH{6rK`q;qSVztRT{`ONP60Y7 z=3WB0mNd10PG2x@{Pkn)-=%E(7wjbB7(1&?Hul)^+7I~l3&ek5g!|dNyguL-A^x%H4^^vZ74AA^dU<5Yg zaV*e3JW1WjBV^78LA%6&f45AsF5xgeJ+yLQ2luB;2HRTa|IE_8X|DU3CD_%T@S~#W zAnxn*K?|tO8EmUQU4{GoWk7*|r53cq#Sfzt6; zZk4)y7SZ2^6-|_#-?oxY3JKkGfKe-3-2+Ika$~!Uj8D#VHjikE*`1z@KD%t_RwFYV zfIRR0Y39j9e1cixttQK;(Q_xyISacrFIFpldv|r`k#xN$1x1SXsulROM{b^P2fiJq zM1_L( zEe~>+mjCT&eWU`loBSW`DBHKOLTdCx+#7Y4D$kxaBRTA~=e@K~C2y*LbKP-A)mgi} zqYiVMiJvNf=u!Wk?mh&WU#DGrGv&Nxeq7GlDQ46JAtV$eg!JtAaQLbK1Zju9&6(S+n z!HS=YNt1~7&B)0kg*`!l?0{X3AZ{DFSlt6bgT}7PC398PJ>G@1kF>`*6%fi+TZl1= zmB8TcR#QImG)z$NuP-1drzx7_;HmhPqX8H49suJ|8F^3-?5oH1e_Xqo<1Bu>A?+=x z2?A&Zq&|2hwW|Vn>R2*Z;EuGw8K4$;`_IeZ$8XU?x|YF5z!ISjcGmL}v zJeG;SG>-fJfM#yk5sFhqpXqW&FeOom0eZ3zOn2InL2UW}c4gy4C5{4M9r{opom3|5 zq|5j1RQYdu<}?txFc;(Q8G~f` z-LGUB2++W-gdq^&2pn+9lY#3SLDL7}I4sbzNEDhfOCkc<60kUEG|1|)D8MEGiUv(U zQL%rN-~OG!|I6SXwc}EVF48oJ5{43uoIWVZG%EXh^*2rGr>4U50Brh8>PG?QhzgW@ zGVsy^D?ge7$|tcvmBs;F)NAn1{v6A%T46~wnn0+Hu=>4*nj0`4!pNuNay+PjKmv6d)OC_^pePmxY!L)tK|sTRH4lVMB%-OnrUqUDl$_%J zJ5r0LJ&(plbEy4`>@WiXJ(7xPm!*v=aI1kE4JwgAQ=ln89S61MXgnyw2gUgW0uB@$ z0LlmuY6ws$4?2!a#v=j$0vHfl2?c0PEGW?ZBkA|6aGCj~#x)N5-@Ohr#~_oJ)C3;< zeNdD0GY&vCQQ@GZ708dke?bAf6p%$h9WWJG3#f3wuK+^^3<2s~!QjRK!4ZK5Ocs%* zLZa~`_@8>|FRuf}G-#2d;YX>m^gC|(OuE>M^uH5%K49>?pi77~0U;g!*?R2)1em%V)4`{w0Nj0u>|{M@9mw3X33s>KZB; zMS%lbBPdJg@L0@j(%l;pt$d+@b(%;EO)Ja8KvpJe5Rtk) zhQQnD)0EV}1HO}!6Z))w{pc8$T=Ky5iNt|8;(6vPkvL znONr);`c*Kx6SjCHVwLP$nW}3g(&(zvwuYyUHOuWCQyozGV^_nZ2>+qe-d zvR9w)T10>~5Ss7$%6!(A@}DNlA>4)eS$KZ#dQ;xi&7B5LzkW zko>+tbOJGIt*h63Z1qGDx7Hl)P9k!EZ719`=}l5hVQ+=pyJ;y0R`qEvk@(DuyY_hp z?(@CrW;q{dqh%$a?d#b%V4t*4oLeS&>VBLNeE?A>J;^XmS6thc{b8Z7>cfO>Y^{m( zA2cTv@JAJ94cH62^{&jW3!jWTp3Z(+RC0#|Yv(ChpY(m}eFfXb2ljbJn(Ci_JE1<{ zliKWi*ypL_*QV$6`$o;br3;O<%&J*s%`!9d9__+-n~OejZ)VWKAFrrLS;^15d$@>s zswHluqgvxdOZ_8OgOn#5itmp0Jy?3s)oMle%tsVyaxQ>rNO3pAgQoRx^o?5351$OxpI7&F{+k_F*^6D44?k2(7Bew zx9c=`>*g-o%b0$6bh)VYz2VJp@6kHUYq9Dix%^#S-F|w<%e{p4`#p3%*=6~Bl@h6u zmdx0GtofUTTWo>hRg!@5CVbqJUS9Qk86zG3#68!j6Oq+E9A65fLobBvD$q2y^tWK* znH1=dX-MFW2Nq;2HY) ziDr`(E}CdHgi2M&p>tM_!$Iw4N5qC22h5OVF`P?cU+V07^d(;6D+UX*Y$(yPs5c?9 zt`#?yOpGZk3;DusCEmuR-6xTTUo3AQEaT_S`kXBvADAxXp=5G5^;pu$-M)3n>&FyV zoGsxr2_)H`5>hk#zOWl4&8mDS;yLJ zlG}836KbwFAG$&^)rdAzN>?46yg2R(-?sTWMM|OTe8l8~-tRBARF*!RTG%zSwIFjz zjazVU9Hhbjae(@x4n%PnZrn7hC#voC=BfWuC^B$Fpi*T`q>$D5-0CK$JvGdWIl z$-g9>QoQ&o_BlPrj^Lz^?-tU8BsBSK%0tIbsH`y$Mon3$e`oaGf8th7wy?8ZbRc&S zS8xUj5_0z4_!ik-shw5oZG7kZmvufE_sij8nEau0&Q|*pXk{Rqfyo=6-(J|-t9e-D zK4Pse;>5UUy%)Dpb;i9C8_@WFm zMMES{v?QFheB8`Lx*5U@VW+U3`qFBL-t)CF3t!W6+KGsGqR-!Woc#6x-I2lz+tI39 z%#WDAHJ?mq$ST+8;}VqgFpv*y++JOr@ki%oueo5zmTP10?swlxUkiHi#I=8Bt)r1`Zm4{_ znwn6<$XkmRYJA*)6M7(Y`2s_j!G%yGvy8Ru(og;8^A%p5HeB1VmUP>jzJ)y^*8KoC z(s>4LANB0$m3ITV<9XMXVpU#B<#oL&)wn|gQmf||l0jR>Dl$VOc%)Nng2CH#1b*Ed1Ci+}DD_G8BW3W9|I^Q=3 z3yX6+7yr-=8^7Ul`@&Rv)&ow;eIpBeU-O9HS@yU0r$3-O4^7M0%h1Uit}9Q=r`12Y ze$&})97I>f$b*~%yD}66(j};Y{adFr6BN`;I~oNF3hkBzpP(QN?MW>ad>90P2-4jW z8$EFMcFNT%sNwQC!-?HyCnKc(&RS@hz?l2CE*ienq5UGb(Lg@N6M=7xLICITP{6q# z02?8}2~X4?dM(fz!MQo$2`DZtSQ^mz<8rv&6i1Ohli1%{)N;h=T~1$?2P5(ONL z1Ke%k>?{fl95X})wl(4(v!}m|cQh~UK01a&E>7De)%+5WZ35#_p4#}Fn-|~zFNeO= z0RUWE@Wj*!1Q<>KGe9zt1P2N%5!h@IKcEp2hr$34B{+Ttiw2H6fWNUM1kF)^B!Y8Q zfXx?xQvmajz+Qd2hFvQ}|P>25_WSGfb0*j>yhKmbrgL^c>qrW8S|Yr~%MUk8Cd}rpe5k;QrCf z;EWsuDC@z)(ZIe4Qan8EY#~q#g}?yCn^t^-0&+b-e#NZ?#FG;o`Pf@<(BJOK~Vu|FPM^}h@1G2JMtg7IH1r3G3_^qbt1 z(rpkF(M$iMrGPmWvaIh9bG>hZ4!dDIT3XwAF&-`)+%8;m1RV(2cKN)VMnUA6h)Zw_EjG+?D=}vdf8d{z^_go~;FMT+$ypTCb59 z30$6_{s>*&*ry~hzA>fm^6OxloP72Z@waIQ)6}+%R6a<3sas2^J^xv?+GmL1*UEMN zn-H1MOLz4Zt6bT)Dt2exVr#91bK4sidfsqQU%v@>h+Vi5w0!#Y_p{&kUgVYqM_x@v zF(@vR)W^GR^-Pa;7}{@sI&MDpUbXJ~muCudiMl+AXOi^9`RD(sGS?41aGmQfk!EiQ zOc(F)pw5zWL9yqt9Z`=vUgd319Fgs2nn@r~&tE+FwD()e;*M6>mrmP^fr=`wq*HqtM zo2;t+L1dSOPup_zvkV$DKu-Umu?LNPJ4^o}wP`;PwTP&G)e;Ou+J}HZ|EW(=j(0iy zt#AinR5QH3EG`3q&P{6O%fp!bknHV1?Jiqh2D^}M1a#>?LH&@E`?sA2>h_aG4ViD! zK%f()8&RDMqDjd=)GE7=gIDObF?7S2IYTrqDwxO8hHYc=c@!^%Y3-2<<0`sHlPM;lb z3tMoqm3{h5%joPl$)G*o{j9vP)OLJ?#b@u`Z}>9&4!>HS@|*g$H9UPM_e#;yes|KY z>pD&GFA{V5hZQ0}Tq6bDIN-nM?%TH}#{$gvKC1G1)wl5fu=nQSRIcs+|1u<%IYeY0 zqgvKlEGsf3GfAQl8P+265SbDZkvTJwN|X|rN=hpF+``*q&I%-c`JdlhHyKB0Tj=yO~BGL=#FW{%-jvmKV@~v{n*jRBEw(mzcg7+x0H3``d;blE^)jQNU^hUpBsI-+vIgMtJ_iW z+@6W!g#{_Tt78$}riRsLLQ2^+Efxk-K-F%i+uIg<{z0dwWiV^37+-%YB~h@i2{fa#%y+^Qq(p%(Sya z)vU*jS&NBzgDmAUi+(2}vLBl)Utvd~k`EW%P+{dnR1olnS$p$LdawMgO;NPCn3*PR^BUjPAP{RRdFx`41k;o+!_*E&}EJG zukdRnLmE{t8=mK$D*n;G%IyWg#JRw^*0@kKXe3X?VJ`6ltlTbmASpx@>X~Q zo5A=QXPDDR~yu)63A#acD=V?^!4H}mJ{WUvwiuRN<1p;nzl*o*8f6jY*URiJNTvA{O#jhl36k>m6nbArlrTkg^ zmcrve`H06N&>sNee2m0(tPq5v*`T0tUSDRwO(a zz03)a9|8VW8pNX*@Mc2y0^E|2w?f(qXii9*AUGvRSi-7-J2jO=5i8lk@~rWTUudVD zTT4e%cZo#U!1nqwgUkEh??1kI^F4USy+*N>McpIrs|qaKpS5gkW_HxB?%CyFqiM|WigimxOxtHI!B(#u zbHy55LY&I(`Gq-io+Dx7b>4gvtAc)lj{(>7bl<6P%=KwUi#D+d14fPHU7Or;`n{SI zgaZpoKe(!P>@Vm)lj^S-#-uFzKsY97A-5E3Lo)h^Gmh&xThM%YTNLA)0<(lfT@@n% zAyJKVkK_ca%@JZr-j5+6fi&%LhikPnEQOtgq2>GhtHyV_PK|+>p{hl zTwcqDdisXlTB3_3981J*M}{|yeeB7v83|j$^EF@h6huEQG+|%FOtK%c->2Vyz<=x^ z6D_6w$wb}sNFPe0um`r$bmBHaw^&8CfBRHsCaRmX+}1el{@ibQ3KOs7(ifY~KSwBi zdoZ8bBXLv4Oy_y!oe8qV;gVXzu zG>K-f?E~GpZ!bu4=tej$mv+uy^gk=kBo$eRG94N{Z|`Ct)XaHV@M>MDUBFA-wo7xm z75+Lwd&&>LD*U{#K*|VkySHEbP)B|ociJV>52G#11IDDdr!5Z6aXa4LJmcABA#yKc zH{ZvnY5T(cC1hmt=sJ%H?!T87K+k@%__3;r(}VI;+ivRitG9bPi)$R%>zbZm!(!N& zpdV&=vZ%U5t$V}%W5pYtQZi1D7v{<^Is1SEezu`~?%L zdE1`04(0E(QlEa?XKl(i`M{q{(!p#dH*epSOkwrew1U9pxX4P8o~(L68$GT&;#oGg z6FuH#Y>9eX;cJyKaXjwJ?c2htmXMW)Suy% z=kLI@tZ5>U{L?^s*i9?nv#9P`*YIY_{V~-*{S*xF{Y$Y{&e!~hpCOu&Q?ZQ z@$h;^Q>Ge!8J2ix8FMEl5b94 z{JJgsOulH7CO&dPeHL4pd7aAp!O;2QxJ(OP-yyCD3K^rVtJfGN0t7z{wzKg(qDH^o z_;`ymV>-*84$}RuvSAOClX~~L&^sNJOgjZOU+VcZCAN>dF7dJKxA9eci2u>EFLX+l zW6F&lNFM#5;dQyR`i%FydY&9Q-PE`3HhttasVu5hQGEK`v)De*HVz%{ck1c#E~V-x zgjSzu+8G(WY)eUaHAlkuCTpD#dYM=cpFaMSiRY@(`}BE5Wx%kj<<0U$_Sr4JPGcM# zeg64#yFf{Y3WVeD$I&MVr!Zikp%#gi1X(|j*TE}{VDq6s2o@8t4_iSAmm~@8OFUAH z0k)zA*d(BI1HgbkateN(#!xD3ui&AGQ2FH)M^XOXIDG-N)Ezc9X-FOGHoU$s4C=%H zZUi?G7=kU~;I$-ySsk#BVED)4!4(1gLMv$iiesQ%4V5k|VE6Ih*p`NZDwr4kXLYQA zIwn>W+3$r_#Oku`lYTE^bqV`xQBaQpinfF#(EZ?2SVDyxj3p2%AlL*+usVal0ay-j zF`?xQjwo0X6uTgbz<|A&^vAm8dRfYMV~Oj%bspVwZE%IZ0O(>H!y{6%dbEzN4LQ-ES(`;qwFYeCs~o zi1p5;GiBL0tpIF>&%lHZ_4YYA=h5bq+npSwiO|3k=LN8`+L9&nHdBSSnr;`*cKI8^ z|17GYiRN{{bC7*X>Ga4t%oaU=sb6l3eE9L-a5AJEw$D;Qq>Sg})E5HYUIh3q zWB67D_z3(IINg!<6U=kK_CCJ+96@2dI z4etCZgTu6SsuafR{U4uc-sLb$vUvQ}+f~aG9s!=k*RU{DMJdE=$UpN?bwSP1)yup- zYpCpPXuF;Ty&K}Z<+#yw1V~MUmjb-t*O8L408%4DdX_>4#sEPE{HHLW1xPj!AAnR- z+5*Iw(!fvz95EIyF~k(mbcOnc1nG~%QU2Xk#-hnA{qOx=c0pv24aY3T5#g-+Uu%M^ z3+gvm0I-myZ~)Vh1OXo!xP&B0Xah@uAd&=HLxk`|1m7$mS-_kP(gI*k0x9{AA>R5m zi9G#1-0XsIGmE@zCrGA{B+l^nBX>qsY5WW-WdP0R_e!w8bDL zl!W^WZxV>nAs~h*4SBY2`|Puf0Krd80)ylk?)Cd|@zrHAw~Ec9IqEb`vt~y+lkKv-eJA5f=>xqFxs|uK znC88sqr0&|qHBAh^FXDa`O6O#B0jAR`^0yUc@J!1#rSKa3Q^yDL^7NX+`iMy|5}k@ z5bAKb;c!ky*R8Y(MS{u~>7gB0(}uPk`r>W9ypy+XRg>rG%N}MX)}+(-;vO8&QI_3i zE*R!zggr&Oqk%cLq+F3r*eUZHo6D$+fnX#qF@W)6s8Du8Je|Dnp!OWque~t@1f#dw zf;Es#=PUWz9mYc_Y!2KoWULb&8qo0Dg9aC1ULXjDyNdzhl%xc-p8@>@5)Mc(5Q#4E zLt#o5Oq}^+-EsZXb~7R%czeor|9ah5B4Kpmzu!V}cx%r0w6|V`+jn(sYzq*PU|bXI zeC9Bn1%RCJK!GnD4O4oc;D|`1K{Nu&WKg!l<_K?MsEWel2+CK4)CL;)_xcpE0_tqE z?^de2bMEXO?k@3=@$e{MvTSL`4cb$+7>wC(XenWlUJ3VcDp;+$B{RrU{_Ufj{^!W% zG}lu8C8E_&Z>f-H~^GA007ZTW?TBaxlN$RrWnFwWa!C3#LM_wa#U1Kqla}tCZBDm%eD4c%be| zZG7iK^e!5W^IC$YDB%{k< z#e35^1*pcS6zti$GNZ1dj?QC!zg$#zFx^1U>c(H8Get+5SwSBe-jKzl*YQfpc~|p` zhxH6xe0+KloF2MkyjhcBX}s2AwmSzAQ3obxK6#z=VH< z60J$ehW+;`ZmE};wpa=qrIiRVH}1;ZSWUaoNIovt%KFjq%{6^2M{q@D!838b`$6v` z^Mc+bDIaf=BV8r#pEhCabDMeLYo6)$jJZ%c{O+D@d#dj3${2MU%g0{GFnT*kyNke` zxovRd?5^aEDx6#wlWo%kwY2%mQrq*MQLrb? zRz0w4j67v9z9rf^p3Y9>mYra1Y0`C65<~gT;s-Y_+hzG%88IepSKXVnFcKeeOp&eh|C!pOidz&WQ_HMr&y3C;zp|$vwzqR9i-y9`zQ_JTw ztDaotGKVVzWa=ne3ym1_%(PL49hZu8_;#=!j^6v$gQnG)WwK{PZ<_uy&(^A(jOqkE z@ok^@=r*72TX^Budu?J7t@4UqD)R28sW4m3-PLIgacOU2ds&w(RgMg}VpbJPwrg$p z9Jru)tSanlpI=;P5_kTE8&sS-a5kl9aA!I=Nv4H!!s?Zr1E>9?a;MJc-cxIM=*?~` zP$OK@84TS57m`j^tPcJ!w*RgMMF z&Z9HsB#N#jD$Aa));@(#deao3mlm2Ra3;EJVHE2@xw)JsIAuqg``g#PV~4u%ucF&) z3S*Bx_T>npd=QnlGe3e1Eo^ck!r?N5+6O zbBzm|zi-%^b2p^*1u9TU<;v5KXI7%nd#yFtZM4;e)yro#X(Vnj7#a1O6hWaP(c7F( za9e*uv7k^1{FUs$-27`~ifq@fNh1E~OmlIqS6+UeMxjF3R_RdE4$G)-I%HY;r4yEZ zw}LM>5C_rY(=g2>!2<#g#``0^VBUN`^G?JnUH*`hS7^EVu0*QKP)0F6auHkc9*3&f z+Vp*Ti^FN}n?HuPl;dR_z}JcL-D!18;wGFSe~a+Nm7RXzL$`Gqmq(r5q|j)MzcWmd z3=uU>)Txd`=*WbAE^9LiSFRGW1PU2dyzRx4%DDhMnUUTcdo73IzAKSr{Hzum66M7rklQgyD zsBBTcUR&`)E>0hASDUx>x8&a;Sfdm`0Pwc&_#%br?=xJ`VCectS-3W#hHGN^?=xK9 zQd`}6agHKt{>q*yr*;-H6ijXLU;Cj6*VcyDcsI_Iom_lRHGR92d>?6YhJYUvy(%vZ zLt>uXx$D0LF?$6&gs3Skf6ej34fe0fbN@P*$wChq?2W!uqz315_Rq+R0EP!J|G+T} zB}E|M0R0h<2QVLui3H*&WG0XV04Wt2UI~>e0t`!m3Lr*G5)TL;(3SqD@&7s_2Lqk@ z^}YOAmFV~-6|vtU0L6H_hdK_dh5bnTa7x>o_kfnGDNKyc? z2>-&+R8Z^wX>q>=uXV?M=gfw{y}ar(PAdnC;Qo1)Ktu{NS^%a578fjx>jSC|2@i82 z@t}(Z0xj6U@aEvh0i&Lj1yn9cXcA0hgbNAUUvR7a1E2CQsuB!BEq;x=?>`cmb^q#h zUWtj+1KvS_>k>LLxP!gvvwjGX7pdQVzerY)J`gPk(jYtrFr_7^W8oShbNQgGjL07? z!MsO=-Y#4pNR$v#J198eDj~d0(02amB7r#fFHF1CO_(u- zuiY{|9%M44KeHCYI@2J6-t}(jp+sh#D*m+=sOVb)oCqjDkn4gr8Aj6ruuGDFffhWh z3^5K$!LV9@QDFcmD?x(WY6&=ABs7NiQ!W95%>B1BgiH3 zUfT=+_6n9a2y6j51XX=l1DF$uV4frZ3<-!4D2czy^LddA3`>?fdD8tB7F%>)Y>D?H;cz7c}~ZxdP@qT!C}Jt}pVQ~{_l=5a-CnH&7f5m2TTkR6y12Fr zA&)RP0@2_!1y~!=0!U-m|o2htQ(MoC9p92 z6cRcZ#U*WJiAGymV4#q9btsUpWbT!vNSV3mIDf<>sHhb zg%7S_RZ}fk$-STNI56YFrUI9N2)iO|K~U_0BLVIkw6kDHgfu)^&=WNOZW_h{#}^)S zc)%P5Lz@|XibgG%tIqF(LywN1;#20u4ir_=!N}PI>X*TC~>a-Gn&y!u@ z+KIL&r64Q)2|t#Wf(;r6XFhl1WN4h#&E=%Qj0l)5rm~!{ag|laiAG#E(J4U!4+Ty=MHSLU~Pt~h!xyx zcn<=z(HsqsJWC0{S;LqRmCtyh+;m_E-?#P>jCEVvVkv|AOp0rlz z1$zkw%6DMem4-3~7Va4s;-NqVhb&Y%q#<$v;5QsRM1&uSNdKXy3#Sfj{J_V9lE;5~ zRermFXpL~DI3~uODiVTHpCnl7g{o~O)~&T3*2?4kGzU4Je@0YL8G^$POwE7`gyRp2 zWiZwqScTy52M9Qj1|-12F9CHV*zOP`g_Q&j%G7`!05=a@u|FQqe{=Ex9+G<0pY*S{ zZeXPO4styIS_|xBIHZ3I2Q)Y*zPn=3(y(y=mz;=&jT9+iLA(rIdO&F7Er{T(kw*4l z=tqOa=FivdS5U=POVh0TyQAM7F_#Po&HWuOK!9o|9KKLJgB1}Z;ABQ-Hp4+pgv|+9 z0#Gw00@)DECL}03N+88Eq+BWqyaB)x{3rZ&tACFdu!H@8_?F=a;1+gZ$&cOZ+hw?w%qo1sU zf8zj0o~OW&Lu@yKQKa>CcO_0l+*k^{zwe(&a30!G_3{`jC2Vb{ zCqNCY1t&8^?yyzBJ0J9q5|A`V!f^=;{bLK@ zx3foSN1$x1m01RACh=;3T)SVr0wWY!h+Lt^&be-6 zKr>vxfEZ2Pzeg4XDJw}Fpb{YT0(lCQX$i>51iLBhq|n47fu$efR6y{GEh*z z+5Ztjn*>nMD!^jkzQKkJyAeEA2=oBRs$k@Tfm487gqCjiKevl_8$I+30 zOp#sI^Bt7|-q04=8t>lEeh?KvdDR95b!3$3?N#0`_XrQ=SQ{>>U**RRi1e_km=KVB zp#0#a056z-gb;1uR5V9QJy@tNAlN4G8iRvY3JPn`=|@BMjKPzjdAmDKHieoTL3c?##70WDsL|>&8Jk za1yVfp$RaD3PLIXR)Go>sN*E4FM#U+%*ahdE{Ufx$v==AOpauWO!RN+R^75~UaYNqeZ8K(Purd(Nha;p27^om&kG+vN8k73odiB~yYRom zLvNeJJX&oE{*H$>{0$GiqvohV7eLY!@}U&(7zs+|XbUd4FdU&r8}G*7~mThjO%X*Qh$G-XC2)BP_?*M@-WdI-5}Q zv@3AGCD-J~XOrS%-R3rVQ4dx7yx7NHcfD`SrZt^tPubyFDoO3+0fQhkKa<}?hZ$7v zeoPbNd}oZ1KU3RHJyH?zNr+SH!VOf36=i^h)_DjXyLeZ*7*c>r|mcn?ZAur9IN;^_kc-o*Nfi^0{!BmnObu%PSbsWQ+`JW4!?PJ|2D&CM&51^YCiQeFlr83N0Mt`S*RU8ogx{ z-$jzM=X06OKUcRwm*s& zp8KHW|1b(Y7nzY0bWHln330Jfl}A+iN@|9ub~73dAJ*Hjlk@(8rNOzoyG99)*~jAX zj^%aAU$)x~nI!7T6h9Ux>}ul=H!J;G*h89dD~jqEcgO9n^~jCc-Zc?eJLFni-!ARB zX}IT!A77%6>a63OPZG-emB?Vt;aH8%2%@s#R=O>>D7~pu_p~u+vQX#bkDh%%pM1<< zgW&1Q7W)$Ei&Z~)_o%N*a1eDG*-Fu^d-IQXZ<8Fk66zCw%eyR_WPGh6{AT4aafQNO zGJ0oMl-X!uaAKO@s*8|?Uj>&z@W5DxcB3-C>us-@bLQhw4CmjpOrf|sUt4%>;1wO! zF*uBM@LuKS8CKxjN0LrG9?`d|)1*vyrMD|irGd6_-sf{nhcR8HX8CHYwdbz8uWMtr zTWv77630s$%5qyIT@0`JQP1_McSo;p(Rb^vbLWcZ!tOQeZqZNc?!iNe+B&Ao3p)_& zZu!DX)!pwC5bLgJ>?_wQm&)N2HvFpenaYa8{r#d8U#)#SS~-;1o{@d~pr>#^p~xpse&C^PhDTnM8dxY)qHy7gRct7fizslP^`4xKy~E5inPnWlu{~QjMt|Zs z1qvZ;KCB6mf6Mo9G|bw;9fdNJWiir2TOgcHl*+xIc<5WD!@)O)>qUCMHQf_$6#Ck; z^oe&yoJwnD(W@!{R!~$sfzxNYbG}}B>Ul=qgvjVp;k~)0W@a1q@zIFE!?Q87XK6~T z@M%Y08eBK75mz6*(Ln!xA9vN1+mYenw5}xTE2h=$eOhoaI^I}cY87)wm}AQKMEdErhq1NYudkRMLyr`MuF+?8 zv>dM_M^%OHFBxp*m{0F|?R={Jv(>kW-p&ipg+;IL)DFGPRf%e^PA7doC;P9U(&c zs$yR{pkdUp^vTD4kA|Nt-8EUx!s~=|TMH+4%~pG@Ts`F+!#<90-%b|gsPsjAVz9(! zJ$pNpOg+qV;1);J!WP~>Z4s?;bg+Eb%kwbXIna%&|nt#RC+J#PL1 zJ+GCx2+#Ch{RvzKD-frzUprQLI%5il^8fk1bTe6b^b@t33tvEO&Gx5hDkS+9y~VI%GXq!=4o;u_jHzSqV80Lg);YR{k9;q)cDlOWH58bvSMljj z%uXx%Fmz99A$uEZc&Yuho<257o1xjQu|3u!@0_ni=St71JhqmS>gAk&%>HEW=Xvv4 z^@f`odV@PvT*{@&i?WRGJ(`MIzWP40$m@7udNqJAcN#tus#X3DS8RnBPa_~4R3@!4;tZ^GKR zqXI?fYu}%Y%uac2{EP|5p*MLX#x~(ImAw4++?*H>OWEqz;kP9I4aB{+?-#`V>ko*# z%s~X=p0EaSclX=9D}VV_aiB02zg!eWdgO0_=;N;y<<|hwngECniovfUfan)~KLF7$ z*8`$I{0ktOGj)<1TB835h#sDqoTT`7BL{;1W#oXv#kbKpAbqDs7!k+>gerB&$(R^^ zjU3=gyZ!TMp+Z$qibokZRX^$W5qm=-L;3qXrEnudWqxfKy`*TYZ`2%BcG@^6PJMKZa8d1nlhl;_hXOB@US*WTX2i`4S zefE?)lI_dcX248GReVG!@ORYr++h2JaMSf}9Hg)XY=`vDi-PvQb2U0|{CvX~eFeep zirCh@FL=(=uE#?=+QZ_Q@KPTiX#@EB@elCzB^zNNWvub84Vg6$uCbyuOj2~M+8;ED z{P<;tl42~WV3@)`Py|}1Ho(dHB?g(gaICle!M2KiX0rzfNj~+@V<`B!{3`xuV<^lx zdwvq4E1pwN;7?2}O}IU5JeG)}sXNi|{SRx+Oyk<84JXJC{h-}Hp<4VwyFVl7<^#Ng zw^zgrspyL!mA_f5lpyF{I^nc@)v&~S`Qr4iEWQS6qVMHiknDqX5B&U=K(_@Nnl#kdq00S73Bzx-?JBZB z**Gh+lgxVN=~gHTg>>F%>yvR&0+%+&sG_vA>upYMu3%m?h`wFFPU+AC`M&#|vUu3s z|0*TJY$Yh;f|3dPIEWMk5^Na73w3M@E0}qUSRs&X9~zec)`d0)29JeiB51myfd)-O ziGPsFsA+Q4@N=x|lOsLBkMa7nNRtb3Unsx}E(J`4wnSK47NBr}HULx@AjikSC@e|P zz#y4EbOWSeTrc<^kP*z#SOIr~B;aX5v-1y%XtWjvgfM7I3ks211_9K%HZ-D#DU@CA zLFTa?__-Fu10o>_N&#r&0H_yi7of3{fQ}&utAL?~Xx4F{ra)RLpwtGL4ir5=x5O8;k}n0Ev=38mAuyE78byY25|+9kp@>V zhyxIz1qoyfVBvsvG8hwpcPtGhH5lrQ5Zk3J5sZXH;Nm9|9< zg3f{#9YB6=Q_HjsRj`T#%Lw)GuQ%aEN6xIQAslYQCp2N5oCmqXl$FdKNGnHt?W%$% z4m!?AlL!k9LU_(#o;RW}kdi_=tRT98Uj-UqKy$PLC^%T80YeC7HUjijBtYNx$9mhZ zRj(s%yMU%>O_jh`PS9*5h4lV`EwNQgXch>Y#K^ zx!=u;ejiHPFheqtFJmIzWQ|oAUa@dg+ym~2Fv1;)7~Es%%>3qQbB?NC`l1D)fGcdt z@A)c!8GuJ#?DGhhu$*=rY!mA?U{rZ-%h!4%{R?($npV2eq}Sd<#YtnSPO%@aYu69D zxJF^NOkFBI#V%;j>|P>r+jk*Wau1hq8Ou}gJ%!=nYR8AVhZRfGGLFifNoHmpqF|&M zK5$%xO+(j$@#u%ph7F2h&u)`+UVT%ej1%l0#FV6z4D0zWF3YL;uI3K5zvRo@op&HR zy2ww^ssPWGUt6(I9XDx=4QxO?H>}dwFTdQua2<6@P2TFn6SKn(SyS?4)}$}f;S>1 z{9AWm?(w^_vJE5_c@X)Ph(4HEQU^BUwfDoTX_WWecd@GFwDK0p(tnURxhwgQ;@*5B zbC$8_i`G5z3h_*VT~3b{Og~q8^owZ{Xh!#WxmK6NHkN%>=IPI0b!y=sKHV_2>g%{# zHrq;&O}=J#aaynJ*`6ntd0H9jd%t~1xO3?|EhFAsM1t`+&x_Xs1YTY9cC8NIN8NZ4 zi|sv2Vou{*$(9ua?8&Vu9=#pSyeWM zQFcVM&w`(6xHfB8^mEU))o*VwjSRDNqm6XbMsl+362tdIEGem#%DzfdI^ksV($QUl z#-aR54#aC6&2sPi+5UBUd9rec?TmOUi*d}gs}<5+bR=e5dO>bZ7T%?{)ap$ltR*rk2VxJvqm~$>TDec{ z;USg%`i2E)_OX*Rnv@pZ=MuA*;?J{1TD%h9XggA%w&fn9Nl}Fd4FNj{_z^jiheyN$ zC@!QOp*B;xNMpQBe;*C`r56DkhHg2q54tCD2J;rYPvgXqhiwwR#CSW+?xjKXOHt;Y zoNF$E+Ku?)Px@`nk^|g38aHp`b+oj(;d>^uIn^vB_t;2&kiXhu^9Mmyo<!V+hHwu<5DDFta8s*knb zHaj#;haG;l|A-24@MZPXDD?~f4VwZzaX$I4lwI~HS9@HsJ6LjgVbde0qYpPe@qL#l zSI>B7c~M4t(DdZnC|OETT(;!+ZQILtSLPF=1|qQhagZbnxnTiLW?xhb1-Koj-3D;V|*)Wrx=(ksi@mzsGSMT6-|IC?(VKzFZ3)F{| zg|lMzvyU6h?_)-BNXzF?NLM<3ZQjV5O)KFr8Fza5dZ(ya1?lYL@SXcyD(-k|NHuPy zAL3!*m78{VPhUmZt)}s5EVqgIL2I>^gy0F5tJ2oz`TJ33s*i&YGH$xLkyCC~j1}`~_5scQ zCzmgtq`isXf^zTNmxVj=Rk~#9bd5bj@^(HJv1V+1g?RZKJGV%Xmsj|8s*`6ArpVrW z+GmwwyCd!9Yk9HWcbm*=ZJACmANDp2Eh9)$QSTwAnG>1jE2eIFF7qySTX&hx2@lcK zor@ILRc-4gyQ>vk6?c1CN@7d|r=RI)c(d^k@hGfu2wnf`=Z`HJ`1 z>(f)KP9~0RjoRL&>U;3xKE~b6ZKoIB+ggv887Sl)KN|3GtKHNqtE;R1H^~&QYtL2wZVqjNK-oA{w^9+l zlWi@#0BsBrmIC8W5v?ew53N8|56XEQ2v1;i2t@LTsyDv3F|2NXez|l#aU$#9bV|) zl4A@Z&Jg?B;*c@V(x5a0RSaU}1r-XU7zp+m?kFCKr9c@5Q&-RgQ0Ky=V+-)4OIkof z5$MlIIxv->>@$_m82tIO&iP~k?t-eOrnmkQG*h&CrKuRFhTD}{-y@v1ma#{wXWL?p zkN2m?s+I`Sp25!AT$_5=_Hu08#^%gvUlZ@USMKC~J({iSuz0UwwO8{7sZJ(x^@NHG zv#JZFxl%dV<((#0djpzx-k3fzWBF9qZKF&s*V%Nhdz^-=4D+jrPc|YUPS;~M2yyao zF5|-vx2dn+k)xiwdG1Q`7;)xg?zF_A;>%;H_cwhp*Sn^+ty>bWmj31QMFFQKT&SC$^J}&RM6AuO605qBbC>;a@Cydb~7$kDNf|@o?+TsgqmXY#1yQ z4pfJZrXpyygRgRR-!;@nhOpj^do(qoXjqb3bDG%7yH$qII)?un(?Ys^rUHzxIAO3a zmf0AlUoi2MIM$FW4nHI&3|H+FG!~=UnM^0nUfOk912@wXb?8R5*^-Qt_@v3wz($g2 z;J_W>6Ope^V+`*fr!3IE(4xz^H&9s5P5YCtZRfZ7qZH({F-_thrt5YpzGiUY2tK#T zIUwU+l_rnwz=7w9CXc#3z6UiN(`*c6%SQz4{Rq06?-sH5Wz*cRIqd7GcCbT+Tp@xeM-#!gLIs2sP@Y~*ZT3uuc!vpmpqM^#; zeB|6+^3|INDpj-F1D@?>yg1D~A02Aoa{=w$GoN<&aKs|lHL9n{Y>eY|UWz$N#9hXx zcST;k$JE1TGfHh<#Z~Hbqv-YtzgcCjGiR(FEzQf7D=`q(3{6CBO5D*lQ+z*kD<-jL zYWr4ga^14585h<~pQ(2oZ1G)JbW zZ*dmk#`Tv;<{pquQ@WkDJw?Cm<{3SX6N1_X6sto=40!x{pSBr|4sCuTRd4gqHlBj@ z(t!pK-Go-T=Zw;6!rj{6qnbSs)v)6~w2YRrL|@qPk$CIkQm@o01^>={jv17?d~6dT zTx53b4#w|xUd;~Nxu1e%obS4}=Tq^R8~4Wcb}w!jQ=hfwXTud0e%Yu1s%vFsI+i7F8yd zJ4Z=YoFKnRyCG1mrPnxrm^ILF3SGxx6FWQva)v@xee1LtBZ0^8O^oyWvWr^E=*-P za_g{?+AUl7Tp$`@E}pol4(F$fMQ=mnbxk##&c!3 zEyo{opKuXP^&U};!{yT0CjBGgnJ~ zbo`U6-lJaTJ@}eN?O&fhKjmI3TBK+CdFXE?#rJSH_7VqR*_|UE1MqNo^5Li~_YDPy2RmDOD_8Gy%p|2llG zb{v}2DAd~qCivUzqZpLFT^W$?@J~VhM%f}?{r>s)&)WS$zTA}cSW-U=vyZi21)zqC z;OjTZJbNin;zB0M$M0t2e?@i;5Hfu*xymOE-?IoW1$aTp66z?R$py+Q5(!%bZ z#spF82CpGSpTiX)&-q|Ah@klUsz_kJ0gn=lz5@|E+*iQLfv*i>P~f3RBC2d~;((VD zfOs&Q1PaJ#;L%FMy$6dKR9ya8szTyTrP|*E$u_tuyHqU=CE;g>uKj!{9|9r--1C2Ea!Zj%g2-ILm2?&^> zh=*7ML1Yg$JfO>g_zhfJpofE@693VS;@|SUdZ)NSAtm6du2zs&3K%_{_j=|koucL7670Qh-`r{0{#c^x`M3}au^uD0hg1A!NS*}@DDy1=tf`wI|CAS zNdl4Q_(1bS0Y1V~K)DDNSSz3v;=tSn17N^O z1GkR^BUB)Tfub$U1;G-C1cY<~Z4(msU_nj@1`2o({^$u_e}(>zJF$zCY}h`s4te4L zi&NU1^8{Ix(OYs z)Za0%QW!$`pNRVZVBMJGTQ{v_7gS|EX|A>)#s8n)Tx83KZW3r7Ve}zfE)vvX!A^qI zi6PIk0-qTWTfnLXwSU+g5R)}n&5(gdP&Fo5S^S6To}M0{jeh!2(Qph`Z8U3XVV^O~R~5 zj5J6=A&~@cK70`kzGPJWpdK}T^miZ97%ZyaP~62H zlFEU#q!K(HzzBm35y(+M2?fs#ROBteaE}8^4Qx_iTf`D2A+H3em<1I1VfV%$o?-B~ zfRhYOK*LzX=UQ~!ufQjN0rvp za=nB4h7;ty8W3TwpHmzK7PKE75cmXsD^*I0XG<&#xh0`C*`89(+KUCtUt z71~8A^i{XyamsE3jXgPY=1=wAHVR));O_sH&uQErO7&!@zrZA3q+{gygJ((ij;1`E zn=R1~*ELAoAa(Oxt(MzUqL;(VdIf#s9H+C|6^*G(%8Jgyd|{XMURimv7?Ifs?YSN3 zc12RYOVcgw64ux%7h~ZPVb-o&SSCWOvyT_4n@{4L;z`+t_O%&UOsJSEE1`0aIemkQSY0rQwJry+ z!()NkcAFTCw^#&ZxK0wSW(TyRb5B#mYh6eP?|s@4ce1wDL-$6z*~pE?Mky|z<@YmQ`119u7L)tRlyfdI&ZuUp#dBA_D%VVP%fBOM-T$;ySv~tU zF>_8glT530TVC9WDUBl6Z*;De5^sRE=at>(`col|+8`VT0 zt+XKTm%yFd>KYzAjVF&V;bZR4ZC4U9qK&K=liTC0sV+H!XON^8stHl6{a}-8ci@#P zv2&1ZyI3gyrGt?S57TyNiitYZ79Kq~`cybN=@bN*>LAB-d!!n`483| zcQkvm`-K35q}k3Zuhl%SJ1cHF7|%;daiZ1-O|kjq#xtp|M+5hCJJ=U+<|v)=;tHAa zJk3H`8||ZZIp}QPRaIjn+6l%c#;Er(9U(dUTckRuNp-a!9BQMplUPFVX0F0<(Ih|D zqU6h{3*ApYJ>Bq9wfZIV<%v9rG>uVB(&nuhqjM(NaTZqj*`-(Gs!NhViH`(##5 z;7gVmMO(Z{f~y?GsxaMF2l|GB_hyd#gIp7ad%7;`%xEY?-U$$>%@JwdkIkm=ob=!j zF~7W7Wp9?E0()_O&b`1^tU1fofQxQ1FN7YMCDdVji#G1u_L?-oW}~HPVeTGQw?QIq zY0R53oF+g&ozM7mxa_IW_Y`CsC!*E$%pRYX3%_u+gkg{3mGYRLz;52U6x5rIo_D9; zMpBex^AjJuVQ99`OJjMHQRZX2ACnc^xrhJwB}enVm7xzvVu^Pvu`J9E7_%D-j~`U) zthaEDvo+8yxc)epx?D5(`S=C-@hfS4*ZHrt+w$(@m->Qx5Gc-|UBF|W^u9U#v{S}| zQ+!OG*{PPhRDHSoHs(Bj(UNy|#Ju^$@&$Zq#d2My`~qWUD`!VhOwQw?PuuqNM%2y; zR45WJKdWlpZ0X(2G4(WX{K)o7s%ZJCFvk*OwaFWP!t)viYWwEN$Xp*5U#s;zv~Ui8 z@s6K=u%UK_mi$Af(C|#ss}=E5=S>*87niP9sJ1$Io!n~7i@u4ej^PxLL|-U4b86uF z+@T!?nNO{H${&5uG71~lK5NB!xF(j-pZ4y;C(8--> zU`$Z=z9doqijKOg@+!bCex%P$UVm3LnZr9GN-kA|vfW$KE%U}p3mPG#SX9AT`>Qvrw_alrB zU`O~&JOg*_TRf}T)h#>wHHh z;oUjy+Kl8$20xEiEC!7z7$m`DEh&v0TmTt_Nfa<(0$_B|?1$mhpv1s}RD}pbjv+8d zrs9Dep8%ugAVB$3_`N<%i;!4xM1lx}2;>PkYC)(1K5fWXAZ@XNJ~reE0Oa`7s{U=BLRKH? zNw%(%ECXTL5z-?`1TGUq{v#BEMS=7LLDpEnICF$a0{RdLV8QqgQ$ipbfKELgT*m|p z5KI8o5(@T6;soQNVY;}b`F|XVv`VP8F>8fmg7;L>oWIq8My+e)BQ7H54f{lqz#qv; z6yQY!jSkeXAteH4CgP!ozC9$i7z_~yW7QF$pCm%t!2?qUT<}0+1Ud+Kfk|*7C9EI? z{?i46+o&#~K(!Vn(bBGqUJ)sjV4^|ibqMvG`L2Ul{4) zs@Y0Bu3y4;Ic#W%;h-VG)i%>!-&XFJSr4@j&5PbO1wQV$Ysy`Pa=EIH z=3+ucHj?7f6%h*N_Jx_E=ie}N2(xKu>2@-Ys_eLaI(0YsvCKR6!>{$to3Xuz?BkNs z?oPgYTV;1>U(?yemizCL-X^&Rvwcw)+UP;QIijnniX%_r>TRmbM{*NfXGR4xEJ=lX zu3%ob--|cCR~ql^biWfUJM&=KnN3vO^O!B&Irk(1J$s?Y87w>Zw$eSUnjw2=g+eSl zQ&0MQgy)v|o+*5MF4TBW-KZAxia2?YS(a!b(1bb`!<&?2cCREakb>fL6P=;P_=5}2 zb@G`qkGL!^olWrBbJ0|iZLBA%z{x-p!?%B9=`q)_LP@hFT59dWIjikrFUI2~d5x7S zsSfp8JmwU6L-l&|tF$Y{CoO8cGMt+vV(V;pA1-S`0fouB@L+PLR&wr^A*)MKA7T?Q z>`F}*Hg66+PrqP&RH2#UR{s4m9wW0&2`TUCE4M!JI;JsnF66moj?<($g>w{Z|W8T8?v> z4^GS|Fe`_)$Og6^h#i|wA2{!jMkCsezT<$CXgzK4NE1Y>oYfYd&@#NWy4+VC$_u~Z1lx@$mihG(60(x!E*6vDcAddxW}y*dF1`=$WA5FMt!g z81dCiuz$AxrRNq+kHQ=OhrPE9sCrx1z9$U|l7e)1cb9Z`cZ(oMcS#GYLd9qnTh zd9m`PnogRxdPVi!3OCHbD$oWD5W|vd=0DLLm4nOq8jcz&VTU4aBsAl7HrFVZ5PWx0 zra~%EDGn7tqWZGd7-rqhZ&_E)q5xC#po^6M|YO=S$ey0x805Ody5cyWm*I52V;u&$T@`cffmb`t-AG9{pr(lo=C$cRd3`R%f2Rn> zg{|KZ0z{y~A`GMxlLx`-@k@WyaO+Ai=Ri??mBPiUVnnht)FjXBJD6jTA%R&7rU*@_Slsb{1766#^P300)(3UqsGxYu z%Sx=^F2X{tp|N9pokgz|Wk+Mbj|^ym*<=U-GE%>Ra}OC}nh6W5Dn>^J=9g`DNv`^= zfU^S3)SWGqe=aE!0Ovv7s}@%u{2jkPDtdq{1kfB%SOOPdae#^dRFemG@B8!(lE%BwYCY6F8S&|Go>%jdr>8T@7P{>6v-{z@EQ1TASnL&=;0HHZnW zDf$CUT547i#4N2Cze8M6iXtu?^5irA>r8~d6=$J3M=`8VOi;b$)gOxvKr7&bhEsq- z5D=Z3fhIMOeggWR8R%q#&N?S2pnpLFEdWadl+r+MEeELc0sb-|AOn3K@Bg@B)${RJ zO>e2=Zv{QWC{^vx`O7U(g&=f^CI%j0+T>*e%0++!<^W||0HWanC_4*)V*vzcuyO-O zkQ;iep&ACz=H>u%FB>SRf>IS1lpp<{bf8@9e@bL|f~u3bn*XH`Gz?Z4vR%(L<-rJE z|9Nz{K;s?+qj7v%F5Na%-l}%6*1!=mVya&)Pz!waZjG!-J z2~+|9c1ia4h2meD{D0DqtVe?(Fl@fn$~mwOs#MVh8dFP5@g1R16DH zBZi6@&B1yVBwT|p@PL&GDAMo%4KpahTY=MQX$gQ8KuruP@qA!q%WM5_i)a75`Z)i# z2$}~+>$=F0L|0-UHNAr7WK{e*nfGntjHrY72t3nxWzOQf0Y_TLG!En z)2!BDh&!M$@N1|bDcTI2BcQeB;staxAZ`Y$Z2)fL0}=>O{sdAIKvV=BCm=QfcLpd& zascuW=nnvH8W-TV?+tU~!m?vPM*LMU8(uh!IpNJK7^=kn(44+)`vKO+wO+G|lbKu# zIm))FQQ9`1^NQPRLJn{JB7B|ji!}m0{h`P+=+nr~&mjIF6v645ipuSpsA`pv(gv z50EZ{T?;_f0XI3gf`RM<5KKTpksBy#{=@S3FGa$hZZZk7kodnf2pYkNCgGl8L1}$| zLX&{N1zdFO0EWkF!4G;B;Cf>RKar0YEPJiMk!0rrKbMCeSZN3@vn9gB}31=L13(_yI)`K=r|X z51dNy76If<;JOE?@PM_*5B}o=R7$Yn0?ai)ap7YBw}xbh@HNp(96u2eL3P!ELlp-f5}FVf+VK)~oQz}LwkJQ*w&&-_E?0g_=-=K1w*d&i zdk1@w4-W!yaYp%lcFirp#sEmTK|z%byd4Al0VfwX!0>>!9w*T01347vjL* z0>wT+eS%^&px9Z0ImpWTe|HrByQBEu9mW6dD2O8dAK6jtnCvbf#1hG+ARP{K5UiJM zJL{|;GCi7Q6S61?>NR9Q2*EszEOhOkk@#@#_kAV*Ik09bX?SBSnL*k0z zz2rSp%V?m+!#i4PZchtHRsBAKtj|waF&Sj77FKl$AMJ-@mMXC(FRb1R{I<*cY)l>( z;xGbtUgqX$AxQMD0G-0wbEDnAIuB#Be>)G~!#*pn2QjyY<%+)@#(@n6JpdkF)d`RU z&|m!x3ju#T1%HJ6JpSQ3&w2n2M;0|1bRt(l(Zc?A=?{VA0MK|Gj+u%$9xNEy3kgTn zhY$z`h)im#_kI6%keUoI`N(`{300Rd3SSWpcE z&ES7P0F+#Z@l-V2bX6dbI>yNiSatPaEqKTihtfYW0A4B+$C@|}Hb;-;ZF%e;IAOU( zTWjAfIhuz)+shfdPUri{7E`=jvmv(_t2WeX^2%ViS<1kHQX;z^?}UCzLb1U&&8u{a zs{E%e^;*m6S#_|DMaz34t;-44tfT>CfAuMnFUWV0r)=xJT4Kew znKa?L>B#SrUsE9T_#j}UU1U`GzNI3CD}_^DZHI}Nk2F;v)>VD-%k%<1cbnIF zQ~&|(*bUvgPYu0cd>v=gCprCVl{{TPKGVqGnWN_W&}hq$sI)s(MH!4V>+0Lgs@JyY z-Cp<5_-X5~lbSnx9@D+OPXGLvr2TwHr$3a*V_Mfg_pU$&T_iP@BVgSwfVi z22n9e`ooD2|NG_@a>Im(i9Uvy_EX~h=#C6rl>gBq^~e2s(=vq*f#%g8@<|^dwM8nM z2%wdK3hhqGv&C-zzrX-ar)>p)p*k2xpt;--NP8dr>@O?rlix4^#;iYnuKnE2zrp}m z1BPV|bPn)(G|o@H65$;usd9hMKKKI$ActwH;ufM>ctyc?il_y)L{NLRL4T_HuGa4) zbK1>7ZHFj30n#nsHz$~o4hBgpu?-WqnK!*TB<=A1-foBx z%VvP(coy45dPviPv{qaS-sGxsX0_3DndwVEioifN$A0QZ^LS^hrwU;x3D)L`_??t~ zIZ^ieM%NO-983^5tW5-#ul{42V&PVmll1CI)`gu~ z*7Y4ZA@6>KfbaHV{_9v1hExw(UgPU|(PWyMh4sd}vPkU)R?&us!X~8stc%2Rkso)H z%6YA|T!uf5Vw_pqmd}2U|59?@yHiMLPi$mn!RO`P1)s6um#N@2y%P6~=qh&cX0Yw< zc|eu~>g_;!6H$2I#MZK*jRCE)jNk%cc>g;k%}i3Gh7Wr&epPk3&x+V3NM61z?ypAE zS&3+*u-fo4q%4rA{s}Uq-m_OvJaQ#>IfM(fex!xp^@ZDPK9#MebBD*-8iwB6jNZ zkDK1pMU)g_xXt+SQMz7T5G%d+H{s>P`AdzF`?&k)K$}bZ0&l)%$;qi zMOyyGF!h6o`=PyJV`>3I?$1z{YK#w8j@5OJT@?!=HM&V!CXKBc&z-Ib+tsa&vU6g# znci)>I;~Ax6iORi6^hdHPB}FAQ4^5Zj8lqUpLjJNOtjvFs&t{O+BM|;hJ3(5F0^Rs z3Z-$0Ai93D4&K{0js zX$>P}&8-i}4@&PrAj^QCJ>LOF21E%hY=F z=kyCGJi==I8fs11-S_DWXR7$SgpDs@tUTK(Qe@`^>S~dSOvw#kNIq($3h_mj^kow$7b3*EHInv*31m+%cz| zze=m0lfjLu+1_#YxW^I`u%gpfQ9eGdmc8~co28P-r(&YSuKkpDZGCpI;oa)NV9i!M z>Fs6PyvKGE?*rOp_+#$5lA<5dN;Td#7Uv`8v|-qxSzRNwN1ry#zg!i3;9WK$y&OGQ z<=o0=XzubGSpIo_Vc~3!)!10zbZu++dOw-t)z72rdK*%vRMuznm)ov->F&+v+co-Q zx>?8&SESQX42h6 zu}AEu>k@kj-F*JY;vl|ZZkuhZFt}}fc|w_as%O;g8_|rfU~ze3LvY{gdiI?OGhWTS znZ>2wZ7hlKg`r5_b^5y-w|QJrZ-JtT!jg^xn?5RH8>1OUB}Z0|6sKtx-f1)EvkQfO ze3LJMDqZ?lx$T#)I6|tK1$kO^xE`=2U0FONjGZEh7GIdp_7cp!si#xt@jb&fxht58 zP1TXFK59!Ut?VaevG%+yJ^h&Gt!KoOHV|;Jb7XVQB4VRlTkmxT2N5g6c~%;a03^j= zeFz@Ee)Tsb|1l>`OxgUOT(x~*#fLHTYuQL2ngj=_5cxgt_GsX_{hoe}z?F=_FP=)) zws;?`J6n(;Rrh}1E5dGSf2^ND1w99AmHuc#3Ie%7h6!YjHJrzzh4p7#pv03)6c+YJ zNjckY+lPpy+=q0mJn9DXm4M$Cpz-X0>V1PB-c?Y;I9N6-+x?z+EgCza%j8Mu^!p5C z(p)oOKlziGb({i$F#dy>^^bMO8uJg*zz^5Lq(AzvzLKfT4C}ZU_Gmo*8shPR^fxh! zN-9wJYp7WAe$+3%2^qM54-vt-!_m$K1@MMYkaJ(d={MW{UN4hm?lwWeax&<}A^gHz{xOH(tnreHq;od*P&eT$6Yb8v=PdiBpl5Gy#F=N@W;4{d2wi z4ZB|GVd$J*Tbz0ZTBbynBymCd*SmgoekM(v5T|Ran%1F)XT(+ptvpVtb8oba8~Yo4 zf=CK3v-yvnid=Iq%J4Cb8}=HBxh=3Uyf%t3wxH{7kJ+>>)sDc_h23zkb`FV|mc-{x zQ-AP%3CmNwXr@%D=Pjb9aC?wL zKbjsAuc@sZWK2sUJcOwZdr8@LiJe2J+-9Y=h$#M)nFF?PGOts0lk$P@jy*2N%eD8& z^N)pjmMPN`Zw#a`iEv{?`3~-fttj!%G3DRhBV|CVE9hXOg%eO?EE#zXv#Egpn7M=4 zoMbSk3`?9@w_M&#+}Ck|G(t7l2i;pNQ}Iu^@2?Z8$%pzaF~x3OH`UO+7kp?-ih$R-c7kG#zB z!dM~?mD_>AamgGeuH=U8sBSb%f))#94-T8yvNP|w7b7*f@Dbrhq$hW^bO=|A--zEd zp~0CxTKIuu`G8v`K&{i;&F&ll=fx zILJtnc?tnD=wxg%!$3|`yrFrx9Mpe=)b2V?Aq5)x4;!%lC>K{rx4F^HJWaw#?bP$d zgDeSy;-3hUn9)wqqkSJfS7$&gAOqXIw7%E^RWS(U97`QJ|g zpQjc*&D-K`^259k!1F*@M-V_6^lRa^y4w~)OPRv&XD^+ln)9W%(qJI21Dk21?2WJx z@C;OLt%+-6^uAPBXd9KWXr)3RW%8<8j}7nHz(5XJ;DH`?!?=J{Y(eE-M<03El`$HB z^3%?O?L}5IJ@+9uK>DpeLV=A) z18-N4eZI}VE5h^c6WaL#vdoH_aE6v!A&>*G$UZd7W6|aBdcOE6FHr*=Mu7L~(DWw? zh0T8eQ59Q45iso=Ovhh%M@5g zw0<4XG6fxA^?g`qffxq++CutEbPWFQ1CKVszLVxQ06qTTOqI64LsYe1lwmEa^WBF8 zdXV!4yFd;JZXmwufi078?DIdK_LOR3lW5)fqx9FfAVTWELgY+w?TcCu>AYmIQ!^79 zUlfp{#3lR=Rzvt5r^X}r{_UU7_V_s=fp>i&)C7{#fk=cT_MTPWKZgDJ1Q3mo2$%da znP=MTS$S3TKmbSgK5>bG zt7!7qi~99VV8+lwDU**-_zqfo-$IvAhhRGFPT>yaV7B$q!Er7pFIHFb)V#T(X6hfp zhm3$V(FA8QsSJ6*bL{NTYc*U;Ow#ZWG5H}#Etom*(z1C#G~Dd&X+y@ZXFmg=acW@Q z^q=mx34?PavedV35sh^{(W2rIp6ej0$UED1af!7?QLlQSBPjp zzzy6#+q^BH!;uwqP5^C})`ADP2_Ybh33N0-R{?Y<%`E{YAM_>7!6%S%3M{}JpyLHH z7&w8SmIHu$c>oF@+T^eT)>^>y2k@T%9$>=5CiXoTf87@GiFw^fV&v!lLYM6k7_l-2 z#-f7w$CyBD21vpHig{p{0_G_|p8+_0(2leMxuw9*3sPpd0dxp3^r6i^e&Dd;f@bva z11T49nDFxLVM?jPi|>AT_vAbzacN##B&Z^m*a=yqdr_|SHqbWGr^V)Iv@Iawy!N0Q zFD56YH8<8e;2L$C2{}%5rGfbNF#OKMUyBqbw;O3GhXVN>%pu+jmv`}Mi`CxZjj6^> z+-co|{8TpLBhC>B3(r2$G2fGTmQ2Ytup3x-zQl!EfbXhEmf6aX@2qCAkg4GuS?O%z z?w8pU5r7qXs6MmIGNwXVACb*(p!G<0Wgb32NYTYsnJ?IuV`Jngp~{n**Duss1~o}( zb27qkn))8ULYEHs=Q)f8%^_UOGO?%Ncs2jg90D4B3jjO<5l$eRiH)D#91w*#pjizl z>zK^B!Q%qw93Obdpw}V@Z3YGnPA;eejuU7Dfr1hE0?a_G9=s~hCkLEpkS75mXh5%( z4LG1c*PM-=7raKO+X!_1_fmornDubOJ~gqbHJlF>4AeZMyx+@V?Vz`U5>5Ol95JZ+ z8R6c8I+B7Z5BJZn#t2W}M)(bV?rgse71+574If?k)OoRa-FB!es;_@O{Ggz3*)RFd zMI||fdeQ0HRoF;n%D@W2OvxWY#9(Ae12uWSn=W7NR8vF}yjfNDT94Z4kw53<(yW+h z^VxpHPN|pn=;T+ z4tg-Q0<7hJ-x38`NvK<>%_td*lPZA0Ti=RuSb2n*^aGKz%W)8L7zYIlBWmbA)ZqOF z7`&s8FF|6%YBQ@Ye_zbQi3xVvuqW$-?^4-DSLz~D_^^%A26D|G&6n-J9CEjgEN z_}k!J6ZzZV{h7z`PLqeI#`-!ue)kz$>X3aDP~zjQ%(xk3z#9>SJi@?qTE1Kt#O0@@ z{W1C4Y0VNz^1hDzhrOT1%?&%?0?yx3o?`0tr|3i-b8DL$Q0)I_DMn;iSAm;i#{Rv0TWMQAuxTk_L1cp;zOh*i5lsKXCYXD-+v51jC3;t4T8YPui z@N`_4dfrJ{N?%aI_YeH5y*4Pz+-XE_7*=NmXUlo(AhZb&%ik)DzgXT%tC=l%-J6K` zL&?E68I>08I#{AP?_veFGDTkoFHIe#c&kQ=fGI?B%tJjX$u9$!FNc`- za-)IoJ*ky?`hXxYNmq9dlV3EyEaytdVsfIYPe*RJW8YDP;y|n;D1(FRa&qyT%BaSh znFpg{q7pw|z2=d6|5@w@B*!3^DD7}}P;J>=)~l@jSkd(+IeA6GOz}&=-J|bC9U+&y z_ZL}$*y;9k^*h#|qK)dX5h2tB4=Ra z00R6QR1EScilm3=#XqzKWcQmRF81e?Y!n(}zx&nBFZf)Jw_ma|>)!09LY>{4fL9~O zwU9BO&fQ0Kw}KRYl;#S{9T9CLsH6vYHBCO{o@}4w`(OIK6vcm~Y%TS2Qt7sPQ`+Hm z&UGM!Kbt|1n?&<`Y(Zi&)6_Zzo9@fT2M;t#xr}-bM7#&$(lGipI2Xv?BItWnsj$DC z-Et*)&a(Z4q}R4Dso<&Qs?slYM1Lq3Md7O${|wB38S}(egM;Jk(^gp^!1q^$bLw!o zoNHXCrH7qeO2fR#RiS&*VdY*Q!nE*&tvn@3sn$Z`+9;y!I^M=#nL>`#&8knIv4vjd zvKvdCh?kU?c>}pWj-(iEX7;nG)Xt-Xn$QG3(LnS@vDX`lp_duw=ap%RszLYBtP|*u z%8Bancwv6}(?m=#!m0COnN2LX#DP)q?`4Upf1K!g7vCi`b-mJ1yG9OTv8 zH@yh|qTRv-%5_OAcYLTAh2zh`@bCf`FVJoQzc2Wyyxd?+0Hp|G#DGu=Naum7hYx6= zKwcbF)&_8gX27ck900$3ncV#U6AehI?cbv#xxhRhwl>Tz(ATCm$@Zy z2ZF#Zuo46DZ&1)9@HYd&Jjj~h=Huo8a2sGX;AaEwZIHhSWO;x~3Dhz`$n`&3Zxp2A z@_p@sFtUY=-6P{EVk<&6M)`z{gOA`U6*+y{0skrv!D)slh}=5DUQaPJ4?oTb2KU81 zU2}qPGs>`dzAk!G>;oAIMNG|xzpBlksTnn|F~p#Y;JH6^ULXw=n8*1o`1rx7fyxnh zBY}pD0~nh5%((&okPApv0IiZ2c&-6W5@>IL%nKkAfkzlvN`aZ--@Tx}tKxp8W~d4A zRsBtRgbdF1C(&jCs22b#h!O)2YoG=+2Z|qHAqOxXE^vB*D$fe!K>-H`2w?(-U=Vf; z&^s0YWds#qg8yx9$qN9bJkYW8{|B0BN$*9VZ-GGY{CmUzc?&K= z;N}K_wSZdzz_H*)2T7w)+$Enm;4gs1xD`k?2E;8OH3K3zGhTCcez1)MYD2Jw|Mw&Q zm+KIS&k)J}CJWsLhtD<%Srp30GTAxA1ixYa{7imUQno*XLj*s;8wd^O; z1Ge<+G=6rOur8=!H*$;J_fDJnR5E62;i}Z&`6bc9=mTYKWm=nJ#=(JQ^R)`&sc}LU zMcP+i6U#1+Hj;r|_j8hrSJIa|D@iAXY8XXEBP+q6A;v--oLtvW8m>sZ!=!mRiTCZ17VZy9MG&FH3f2r~<_sB&pok;x=H7*tvr*DXr z{kWy?@Wt$00#{jSNJq|w&G_zMH|{>nWvP=pbSCCEkW z{nCb-+L3qckPrO*yrubOblF#1uo(50Xyozg~neaZ&G;;X(bnYv{zOdiy`@ z0(>>K@t29dD>q{X_34m(f<(f(qC_4L5`RJAmZn73M|q{efwPEz7#FHdxx(f>g8p7! z{?+9(OC@+tbICW@2Q>pU@ot9vZcSm6FP)^unByMjwmP~AkGQw{Y#6e8)|#7GI8_4- z6n~5rP6n-9Dt6!TknnOy-@~v*A`B%gwNJQSi87tJf%h#b^2#a(U!b}=BLldlfp{F! zk@w?hbp2X;ne8s~_F^D(;6rtRWsytkb~lgb{fnQe$rr>6sC#ErHgcw?(0ko2go;q>X($QMW7R zUmN*0XE3~Vv%yxW)@{MCDvbhm(NIhNib==xC)Lb(0n;4IdIRNtLC(qGka_kL_UZ+V zr*qK*-;}v#vXI8J*J#x8>K}D!Pmy&CQh4j!6TNxLD7o|V4K?e~SktI$cgcuh4?BI6 z1tjn_WQi*Fv6|U}v_uzJUsSJEG1OR2eK_)Ph?YuscrQtXEF zAkQ841Hof--^J<KHlALJ~q=%nnWdi;xTxz7~F^PUhETPsCqz)2p? zC2%d5&}0zuSTm6(bLaZx1LGD;gD}Djy=?NpwDdgyf~wz#01%XKS!gHbdC_prwva>Q zmOyRYrsKY95=`plErDvw<_rF}V(tm}*vE@OJqL}~hXP`-q-`~IlP~8q1O27dIjmQ9 z!@j*xiptNdU~X_{pRT$@PKX;lSF-IGr<8m>tQ!`Zu|r%jLQ9Xp!o?0(v|5gZqoVhn zst)0KVs?r$3Q9?>p~%eZ(Q4$Ars(A=TuC-OenuNQ%|cN$F4DG$w?+l#Jn~FoviRM8 zh)@pH@KLCGR%)r-?CVP*TMjiJkILfG1{^`mo7$mVk;gOD8J|!+$=Bly~i4pQ0tPfUvE)eUpD~B~pPeWma8}9Xj)W_;cn5 z^*1252cmH>FL8k=O%NUnOdtTU2r`WML17NiQb5==HwTdF^McZ*Ihb*|czK`+%Y5b* z|CjW~Kb;p8vN#f8=Fh0|?yLzxk+5w1d;T=?Gd1l5y4~_hP>S=9h5a?z0xmcQh+<;{ zX>q^^%MA_Z1#$6!=fh_Kwifd5TTesql zydJul4jsb9JI`Q&XNak}nrS@dO%@SOVj4;i9At85-4manW^5J6HFrzPuEk3;uF1q2 zdRf!gE88WNr%mI$D>ZnG*_Y2nLn_9miP03zckV$+`Fzc>^f8W>#dT#?uW=f(7{9ZD zAwjXZn#fnYZ`hp^9^a^x;h4$l1IRq&XUdkYYF=j91;(sCw)HoRISec|eeIyaQz?#A{~vLX~+up0a>TS1D4!|J=AN34b8qOlykZ%ZD?T83xm)M?C}BbH;g_CU{~; zquT*Ov3DODV#x7el#Tl*c>57knTpOIvSup5vMs z=?-K!Pm`h*CGQakl?1l+2iUyp4GQ}fIk!BhE33YXpQQI)hxMXY&@!2Fo!U5^5VOt~ zo&$BwiOvERg}A@G^plbgB{B&XO|d1T(`v~praFCnsg)D|O`}lv0p1P>rhO!)IPEu1 z7w#9(`PVZI%)ba*DY7=7_xLK`3?!O(4k{D0VQ!D*gs`B!OkB~G-nwmB4K7H^A8t3! z>M*yhu8azg9>RTlPZ2YuJ>@JvE2Qig5J*Sl&{L4Rc*0-Ho;hNh&%e@ZB`(do(Zn^U zoGE{N@aQcaTJQrWUff#mi{@FJC2a;Hej$SuB-*BA-w|ppQ*u>Wy9vsgiax() zXGG!2!u=SEQucEz6aJ9_TDc}QGjVxtKy%i(`SaIdOiFntgLvfLlKMD%`1^RCk-~k7LM$`g#}tLKkE@$TTluGX{al|nxo`SS4W8=W-F(VsKzS8mjTsw#dFkgTDF zw0aWFqY)E0Levpa$3(jRT>ryI(ls1Arsym}35paBy{cU@V=J~#GT$Pnl^m~{AzG!= zvqcBoC8(^@J$9v3Xkfd|H#&@On$zEQideRn5`I8FJGq3@c(O0Xp8}q*M zV}-y>E(3mQs@FM(7nZ@&UnrQCIbMFs)ReT76($HSDZiyDygZhv`dY(eUC=d&k)+mU z!w{tByG1P0go9`GG(o1vH`GGdsNi%GSd%K@0dPN6`hWl6YP}-MUVxd z?yG%sdWw0FbM+pNu%^ZPsC_p{$>?WL!MNSh+06jt^kPEmHt8BIkiU|I!Js|W?Oz4yn57^`2#l!(l)KaC9Nx0~i{ z)Oh7cl;0_x*ejk4Ad&UK>;?M%WM2_Zxj@My#`3oME zy~@lDA=7@swmr}B3AOs?hg?ZKwe!>q_wh#5Yu`b^b^MDz3oB<~TKCHG+< zFK|S3K0QLVJ+UzyL3z4$^R9f4Q|j$*o=p}aHba4RroyhZ^*TWxdgpfTMwdwxWrS=Q zD@pMae$0@`Sjy^q&2h`LhvbQ#dqNM3+fE$CXR(H5=! zx~XsO!q$e7vSo@;XZZZVx`Q6VwLfur6v~@7PDf4pjJxss2V@&VpZ2IYXWsNPPM7?s zfJ@R!(T+i1Fr9QaW_6@@%s=NjX23D|8J-N|IGs~Z`69Oe2xSo)0u^w-$k zCW9r}!K#?M>5mtw5>v>x4bIdz3O`e`^=C?xvu6)xOQqHly%8mMXu=5ZUR^Y4uJ$&v zrpHyJUMGxvd_PTj^;{Je0)d68DDOOrHHH1yPH10R_TPQXs%iZXKE9f0{D*&N`?AX9 z;Xw7A*hR6Bb%L)c|G)t1gXtQaHlcfl#;@mtIM9>A@4!wUK zigzhL=92ef4)WURMHye^a!bbd4Tp6OBckXhL|ZqAk1j*LzHY@meblwYX58& zT9lsRh~8&mx*J4C`b{JqP97BnAB0u0nIq=L-w3HSdc<3P`u_b=uP?36(KzTi_94pl zc>v++ZOpJlHkt_m0@EA!uVM7 ze5Qp~)1oNtpKuIUEw=(wp2-iPrWQ06M^K-t#A&~hr8O|~Gj9|laKH zy2cTlZ}{cXi6t;WLN(rD9YTCLsx3m8BThNwE44Wbzo)pSD0>lYuHM!uR+X8q)Ptg0^zyqUMj&70I^_!re4!BAv`ph>W$*n_n_31*68#UAJQs~|BIgKAoV zJE}e!p6j>*PV=SID+`3_A8)O!+zh7DP?U2k)&!Jr*6d0}Yw-LCpAf#SM5C8dY5Qmy zEB**id+XD%$$;2>p@IFJQYV(y;<(1@rr^<__CB}@yZ-XGZ64s59SJ@VP_y?Z(kESrZVXD^IL!E+hq>(?B`PV~~I1aDB1~#7U z$@^m7|6J6EMy%**mkfCNsZS!7q`J6>_%k3e_qV zaf(x0ky5IEzckEEJB==81*aNR&B^PMHV*57%FIZU${y*GvVM1Ypw@o9Msw#3M;{Wa zU7Mf#UgNMg$b?wxdGYPI&!bLXqYut&4KKg(9#E7 z9Xf`7y7gInYIWD?$Zh8DdUNeT)Nv7el@zR%B}J#>vqi!Ce6?F?*VAe6d%?N>mm9YJ z2RqZ3JzeyQd%HOlI#~9pf#31)@^d{UOPZp^d+wh38qpyUKT$35&N00grygzxY5C+- zD?MN_`ZVoLwb72E^VL&Ffe)H+4m{L?50YP_ytYf{BBN_O9d0eWo8=qw73u9e%hk_! z{3Meg+Pa>2e=*wp2K{|L&qzrnVf`TzB>|s^3>oaS?MGq0h^{V|Zyo(u4~E9I=6i$Y z`tguTIJ)IxAp(IaOUqL&$UlD^ucBaPPilVP(=3jpBYn2gygq#nV_d)&G=|G8_&pTq!WVs&z z&)H>OJ;GoOXA)0=!6P`ZdgRYTcQ%ksIlII7Ui#gr-N;!1!3aW&7@p0N$oqnr#-j{U zpGlGX@3@e*$z{0t;Oq*U)D7t?adcj0CYppXJ@MX5pSv+ESeP35R`Dzy#~(#7j`&F% zZ)K_N(~OKFYor$}(0?s8_%VOXM=A?Iiu2~}P1&}|BDvcu_W$YDt#?2(ydB-i+&^{5 zjnvxRm_~2?vbXKy9YWZQ}y<576oWUk983D{c^|XAQC`02vEFJgoWnxj}!F4baX&6U1Uy z8!nbv4oBlC1`Q8QFP1V{rv1ddr>vJj&vl%A zcOjM2merGSmqdrqw|lAV;PN<(mb+|aS(!nh?Y<>s8;BX=}fG_Ce~<_pfX-bN4(OuYXTKt1b1nasm`>MA>4-4=_|{43Iww_?$8P8 zH7uj<%vfo$?eQzRC$h(~L`L#F4(HZLFz>rR+tL{+Bkscnkx;-=ga)65Q%RcUp2ii6 zqcapmR8t|e2e3yx*o0wya3@CfGDmpT*TFT^Ts5>(dX=o?ZP6~1B{I^R$Kst42N|cn zsjJd>;m6s^HIM1UJ>KQN>M};S)b5FKt5fktv`AW7&@Doz;V>v{jCeyHyxg02EP#V- zk{XYMvo<-N85~ZAR7VrfKuJvKI-f^SS8{UoWu$N*@9f)Napq@H{}4$YB0+ND%hkEY z>?KrszBRi5FX8~vMzx;yjBj4v_nn-DoQ|mOexR4vVMZFTiU%PRS9qRCQ^DLHtNWsj zEBBH@Xv7IdNJ?3jrO3W*HBjT&D9!1BQu2%?oZRqJqwmf)ZS^?8H{Z#fV(5jdTmu!a z(}Uo0k?62YzpcR+vu;{aVFkkQH$x6O7kAx$#trpY!bpu*&_pS-}us;{RH7GgM zpT@x|J?4tC^`!2L7~w;)O+!5exby|PsEB7OrTxJyiBppjN^7fg7n8*&HDyyb8XhSdO>=$0*MxQMq8 z?snnx8HBJnzImf}5T>k{{DSUdt2RTrF%r54-JTtVqJ`yek`xB5SX++kvMM@k2 z&XjH$b#(mQ)_#9VCACV^-za9YM+q(&ycwTC)_oK{k6e}=%pNJ+WcR`w;vzW0Oh2S0 z>--j12wybz)=)dhOxznyShg_AysJzH9ORzJWFEP?^N9|-DEz?a}Y(e5qSBamnS28p&!#+J;0Z+k2@ZFUQkrX zKLvB#dgYYLcPKf3>@icH3yv&qqH1hj=Ei7EwHM9fO}Ov&_?!0RUPsr{1o-KQnp<>3 zDijAj(l1f9iN;2SCx6;gX81@ls8tOnDvf`O-BdvM_;c8!{YXf5$FrLJL`n zi(ousPir*%?d|?6;fORPSUzOpclPjgFF2b?D~8EoSy|*{WXJ*(7nzb`Z^wQhWh@fiLYA?40h6ROK=}~ z74Vx>r~FT+yD+fFInS#q+nXdHh#{`)L(Zu&NCR{d{nCdaNE3^w$+?b~>sDM?dvNd4 zCe=+d3`bNnjg0{t29R<%Gc{~-Y&(+o`kdc9k%^?uvzhp1Nn>pG4}0afC(=tPoP@x9mXRpfh1w{&}^moCG1Cf9w& zo|(#%&3yPOlkskr#Cl%{cvqaP-8hG$Ywfcx6=It5YIUa)LL4k=t7oPO8mc4tGzK`2 zqc^DpN}EGz^5mXiWt={CPI_Lj&I7(hFNSkvx3Qr$!$|4NGo#u{y(!oE&gccjG`U+T zEC{4%^9=Kks>px-EwILm|H7On8hRo{e!atYTGYcfSk6gM1Slifw(S71767F!SYC_m$k7Ci+&RXW5D0gvm;08GAkHOh|es z+js9=UgpiLC@Q0B6xB}`NuJ(&UN{bT1(Gi#Q?<7r+i@OQg_Z34Oko$$7)j#QaV*Rm zdoHUBgx_Z)Lu{r>t?O!iW2VM4n~*H#7*8_mr;ElA7xz(QWaDjVGjBo1T0Z%?_3! zA%K06H!vfsZEYIVWXn(fBUqizotYSZyw^Jc13}iALG1g40+n*`@R9pa6#`~XOaf#2 zId<@Tu-MVhZ$>s0lwRrCuJ@ZGFyE-f42CD~(9N;(YCZOqwSD5=EDh@j7enBB7YoXhyN~NRz-}L^X+MWqN6ireC|(2 z^#DuO&;Co-ncD66>SB&yi#sfyhNoRdsbbtEreHC!A7ALj;=>f**xUxTxcaXOZg`sU zmg(^NA9bncz=)TGIc${fWJG@C@5WHWeDEl*^E6&o=xZash4G=O#&>*XIvQ2vwfPYO z5vgHgsg2w7&U@>0ahxB*$xBV?cS*!f{C#g*8%HvA2t0Z-kl(y0xR(}+>a79Ojoju} zs>)H*hh^)HTzOPF0k*hobQ|U{1->l3%g@AcCZ=2_K3B45Df)Jnf94J`hE#Ydni)Qy z+j*kjkgFF&lkRz3_A24zJo=gfH(E)LK)x;??h6J>ZpmS!-DRIkeyF}-3$i5Bp=C8W z*y3sg%ie1e$3&1>N^_XBq*j^alI;G`A+>-|uc&QarTYSQi_IW;HTXroXBB6z5{q?X zo!0ZVj&0X2$}dk@Do{UnkxMI8wCc-zG;1A zb+?We#AgZt^GN~oVr4kD3<>N*-{dYD^UH?e1qipS2cc%lzl34U9n-ZBDzP zSO4*NGa`jO1A!t;G3Fdb@F?5{R2cNEpIv71u!hIq)r(f;rQUNT=|o&r&Q~lj`j7uod55S~{jutCSl76Ya(7Nv?aCfQ=rBH{VHL@Hd>trfq|97lv{-u9XXV z+MJGgDlfLVS!=)WLB*v<&2=lA>c-9{N(i&QbbHdwRaM^+6V(lSwCcIK`i&_^uJJ>* z#PoVc?6?kzEn|1hIbw@MW{F=gP#DOG8|eF}Q^8ET(v?N3P6tyD(C!CQwarGY!j#$7 zG{tjoE+O)twPh8NZT30{Xdk;%ZPhY1B9Chv?QJp_TFNly3QJs zA}Ey`QiY>Qe&$)H@@Nc8ezn61SCcxj|-^*0it6?+V zzaRg<+WJka{jZ(xFrVEnWVL&oUS@H#lpDYuyP>5~bP ze7o>;)@c}!YNcL74Vl+H$I4heL%TjlvyoxGj;d;O5!${QYqLINd5z&DJ9uWX)PzSw zL?nAWpv9Yc(IqsqGxQpxEMt9&OBH)4PY!F+;NHQWVgZey%gfD)#}4L;&`~E8)WSUP z9Lh-w?r3ib%WYWHdAlBzY26Rn*f$Hlj zYo+zer^`VeC4C?ke5BGqx+>&h#UEZt3%x;~4AFD6ZLrv(m1d_TNOH}O%@4_|kG4x$ zSpH#^1O|d0=J3rALnV!cy{?q+!%6wWGC(P?Rsfq(o zNoexrvBCX?R3wid@9kFk3<#C4uOt**Wy5wd^TBU|BOvINR@>j@aaX6o*%Y3C;=I<3 z%fUwV15fe?vn@pZS++P6eCR{_x1yum%h(l>XW}J1aagQ7CyE;y=6CxnB7P?-4dSk) z_|T7})X3xddF;Y)DPq;r#?_;7L^+4*x>Jo-Gw0Wr;#iUfq^k^Ydh=-s{jsP1+IsLD zAxH~pG~A!<#^njaQk?GLTN6`k?YF-c#K>8!wA=v-@7;M*mE4!?zKJpueG5s`um^G5 zb^Aft6hM%45|TOR0*zX+Hv&eXp#K5O{BpOIvvq*a!FwwSs48ftAo<{yy!Gu*cu1lc zPb_*9|5eHKopy6UNGjYK@@WD!%#|LPhY+TmdAXe%R^HK)HQ00toAgvAHde^+oYI2- z4`zsLENDrPHZM8j^R+w)jD>Y$I1)i7q-}0e#}11;26~<6PbYe z(45$d09gkAy)BD{Z?uV`d}R!Ne6x9HWa}z48d{j=ifi7|ty4(e4|ulsJSi-$-|pu^cS8Bp6T7lW~Kwq-3Hm9n<8 zMyC7ENj!H4!>0R%HXk~elELbr2fiA*gs&JOhXS`r0wy= zEX|VQnI^?2;^u~DKX7dV19-`i87Sz2bH^U~NOI13QpT@D1{-B$0dlP|L9avSGAmbPKjjn*63YR#PfJE9wdt-6m)kTL z2QjWmYh>xDjj4&3_Bs0$PR;%9Qr_4RWPU|yQdt%swX@^Pr~SU)MTSV=20x6*Q>#dt zxT_LiK1~9De9`_*q7%GqdXWBVN}Zb8akdA3Vg_F&~P=L|F;Vg_zT6_9_hP0i{?# z%2g2t^+7-2Q@Wi=Zc%6`{ie?WM~c!>Uy7#%#|x1+uTtREUTq*4798!(8Y)6RclAYq z8NRT$mq^%$-c(oPB9&scOf+0p^&bLv*qcT(8pR2R(m%RvG-;u+l!f#jEhNKtXqoF& zD9s-#Z831j*C>HdymQp9bYhWejR7v#pL+blV-9A~@%7_%>YwGi`o&i^h%XGqUMVmu z&&L9-VQ{iIw9t>?AexC_cD}eI)2Aylrn~)iwGSVSRF&MX(z(^iz1jVk3)LbnJ>W3c zc7)K<`P5f>9)OvdaU){UT*G>FeON@HqRj>}RJur&OC}OrZzIk0aZaPt;vS|;S!$;i z$PuuTY9la1!&Hm<%7&{5EOF7s+f{HonM9SMtb~#g1xKxupHAx}7mtGfr!oI$;qXRW z(<8;&O6y{xkJLrbJw)b;^X4P>_QHT}aA7cEj-QXVA#J156UBSs# z3$4W41(LE@Vk`~?OTSxx%`rU95Fz?ySNFX_|K!HtPF4>Z9v)BNJe+@n(R_PyjVt3$ z5{pYPylBTfz&^Z@exFs^XuKoB-AXP++9JHTk|ouScY6vYXzGKBXV3xTewTOmw+R= zcVo)RxryrZda@->I`|GT7MKIh*uB2kRBkfcGx2Ew8ypcZMw=CJz6lV75Su1g6~;?+ z5Fa8~HNPm1b93b5xIa)*8)^zZv(IMfm1JMC^44dIv(regV>BUQYDr{14M^4nIEst_A;y3LL1w1EC5wRmMZmkylUZ^O#1RKS{?=WkhbRJc z?h*k-ii};`UnkF*TB2sFn)WH_>FKi?ujp7`lCYZl)N0doQe*PQ15s#nMx>~&bRxy6 zt{I0hF`HDcbp1rVuZqs@SAk<+9#)d$iNy?lu`22zD(aS#TYjCxvfr{2o@}h@!PFQq zG8Z&5ZZCkHezG57YL9i-J0vSu_GmfT&|}%TWzJ0xt0D+gvbdHgAAPLG@HTUpm3IM^ z5Jygr&ll+_uJNfEIXAU9B_C!+S;M<1{A9@*dpq^Ns-0xBQD&pgG%-&7i6Rl{oZc~F z##WJ#!_T_v`_dc2Z{d)dJnmtYi{N%hQ|^qO0{>p)jid!15hOWcAz^`7Ty&t;>8uB& zbhSw+;X@;9X{0n|ysxu%XN*SrxYIxbUWX*uCI`m&l681yNL)+hyCXd!g)BKF9~HFD zdje{8+b3G1s85^d20roTWkwvPoW^4O*fD1)koukA+ki;obAS)#jZ?j$9P_HOZIOa( zg5>BthX$GHQNchgHK%tpl`kYaA)Sfbco^}BsR>&6Vsim|+nF#1D58}Vp}caFwkw8i z2gwR%gS-K!^vIZN@=V?k-DRq*FMJ~-(v#mXJ=IjmK47M`#nvW}ij*o)xm;BWBMQt&si8{MQ*gKWDa%OGg; zq71R_oEl>oE|_kS=65qv%awn6v2tnol4eK%hH(K|)yLz&1))4{5xOmM34I1Gm8}U8 zQL~9~epRp^cD&gl5&>n3$_7+33RDR|hG9WDxo_YBX;uUkEMa9{M0C&*2$g90Zr@us z6Y1{nICxA|AbWEABe+0nIetZJf-g7#kUpA+kLUSypB6kKVBJIS}9xE>+s`(?&Y8HeVjj&Pcg<2>#cw&orl#-5GA>y41j|;wlK2|wLO}^=hK+FFk z7DGo7+bc<0$lgn%ZbZT5FAJUH2i2kX45SV&tXmNF)ttzW?We3xTpRjZR(2JQj#y$Vr4;Y}<`<5eU@GhtdKaBwO*Py$TmH>cmv%LxYDjiF z-7?>nO_&KbZ28cdp0llvINIWFYVl_Vh!W5g9o`Bc%TbeqTYRF{NZB)7pj8j?gvgND z$N3J3<_xP@@LC5wHY^|lJP|4OX>v5z_`8}$!eDVPPC>f8pb17t_{ zY6eYnXV$qAc;}0qt@{mR)z0knWlhK%hm_yJzIc9X<&CzAjr7AZtrdZhH_{9~@6-*h zu?T(V>113&FB~n+I7m1jZj!C1vhZ01D9H&SwY;1kzkiPGC00F#vgvrn@bZV>cWct%xW=u`$K9!aYCknj#LsjT^}+jkfR`d+m1LM2^Hlu!WA6mFIeM z@=L#M<>sIm6+xw8BDpCs!NR5E7Z`ldLDVol+)AO{Hs6d%$O*A3`ZAtw`R{9syh zQf74Od+ao4yD?~|)kB9FRD3?6qI0~?CMJKcF{9q2e;P}VO5TX4)KQ<1ngH6czWqpU z+>b!N*!cOagBkYwrX32Zc~x`wWM(jwqB;G%bg7yM_Icx!fllPl zFJD?$Sx5#!=t12Vk%8}ryVS*u>;k&p&-{lO!?G5Z6bQaRn5kLa+3?SZta{0M%<#Z# zMf>KY=)=9sOE`sh$Y!1E3EUOn*OA!Yf0%_lH!eP3x&)1~?n|r6q}}WuhMcOKDD*J2 zTsv8@noclaaf&{YCU-$-lD1_xRbNf^ysJCaaEER4&kZUsJn!u?Xu2#n3c7kfhLAZ= zUj_mWNCpB10_s;fPqy4GxmmJ*vBRGRw*96piDH*0hIg$0Aw`8+WP4 z1V;ovMTq$}Cjhwh+KPkQke&eM!+E-SEk<-7tbH3~i|t$+;55N2gO+-SXt1*kHVrw0 zrhYT|CJM2&!Mn=%RJ~zi#ljAU8VT)9~N~1K1du;E=BW!2JFWY!6NritcOP zzu$QbX}`4iF=gNiM@7c}_p`vVY8}5*k=6Wb z^%|Otq>jVhhTI0ZDa;*XF33#qtHX{HULHg^N>Yiyd8GX&=%3%(T_0(8SscodPuE*{ zT8e5o4HlH)`r5K0wB>!ecD)(0Yk^_YTij)MsO-EwxT?H=c7bm9@tUSxjGpI5)m71b zzLG<h+DJE1UQ0)1YKRxqr+*RbLkpQ0}E-e_7D zt0#QJ|D)D>nUSC1zO~L^oSG<9+gNwjFFkvKMiWI6XMR)UIYPa)CA?H-0&}|o8^~ZFO{33d74!$ z;b!jMV5cX1^l&w8)|m+H^`zTx{;ZWF$x1P=vUZZ6y$0O&w6eK=&c~>AEz9d*6+c-O zmr)esK*EfO2?^^WGlN+I4fuz$|01)C1an>FQaQJ5$D#uTV*FH@+xc|(@@_mVr_r-P zJ1wvD>>>$*ty>SVW#bj|){{7+gT_rq_ZUTFz^NMitf3170rIDZ2cr6T{S7}}^9;hD z%i(bQ-z1Fx@d5hZSHzc;x8VNAxBRyk`2RLW|7+=~d?Ndg&7B%OROv)2Hta9*VQHHm zfRx~gXx*CiO{sK`26`{(UF6_ad~IZM2*9mjnQBgwd(BSPwO?5Xc6q)D&Pl@JITFvFcIu6RE0{~t^KugP>rxKw!ysb9Zi1w z3PPBTE+K}vE#Dl72+iQGwDqnHRhkSBWAetV_S(FIlF9Y*{y4%>~`EkdviiUGF?R6};dFWZ!A0BRAFd3fu-HZ0o1VBD5Zu^~MV-OWy zMBP1Ww~F5PMYnkB!a-Gq&rFY4*?)7LsDP4=jUKhHD7Z;bC8{fDa$%qhDH+>G)&sh% z_%1AyB6D~nbU5V5HP}fKYME6>@SN0xSKFvrh^gBEg37;yW#E{B_xkT)UFc7+ctz!{ay4dxBPlL}$ZSt|f+uQc6KE?|9OM3_M8vSPk{>&UqpS zOL0&!5dESw#A*k0+48z^=5GmSikuwZ>EmX+3Bw0MQ!B<2Sf{6y5H?n?%~OQ4WSb}^ zoB4vOFKHaCvXJcsRdVT>; zH7IYYlIPqWSA!MIy9+Te*BF-=jiZ2Rplkm$KwFuw5UJ|+uS+j}ADJ~WJqA~U2Cq2E zE{SXxES{0W+w{m=$`_pf6h5i1nA}h@mg9MdAW%7wL6d$A3>GmU^18#SR>&)4sP{6p z(iz9?Cuvr7rpHNJ_gJeg?Sx;3s+X3)77;_=!`$v=+g_!C>Lrpy&@uvKt8S#F4M-VH zsvn7nuB*m*k{8v%A2B<_6|K2#5xyNVJ%BkO|C!KJmeiCjgOi)-@betBfo;3Z`Hskue2!LT^Cevnni-BlSQGd4si3w3s|{Vlwvn zFSZe2P{nQz^cgY)P2CQzzIViO3C<<#JJDDxr!y@Zz*1nGW{L3Sawv3%j(=0{U0Jo* zSZDwm9^Ie(Ly(k~l(^})zO6v!xABhk+0q+%ZNXyq2So>@2woe>EnxnuJ&rQuq3!)L zW6F8r^H_+mELb}H=k5X_FV*4!b#q^g=f#&@W>$24l7lPblx)}H+9!DTVzk%N(M$I} z{=VO4F?MlXs#4X>l8?819mFxS!q8dkEPpRKO@3jd;9k1X% zK=iTAG3KSeo(5eILY#l+%YQ^;1s1K^J@U6@h}8cn+qYmLwQ#8V!DgWs-&xY4tYp(f zp4Kueh_rJSuN7cw`?0(^_qD~2`{oVBne$$1@+bQ?;iomz;=-mNcCnApssf$k; z+J}BYK$OpS279w$FZ+Z6WT&lwI+36#n^adv&5Xl6{xTX|1c-tl@6f32wf5)kiX<{3 zMPMV~Tc@bb9FO}!s>!lU2>dUyk7_VxN^Z0K zw{!cj>yg#O(;^DpVuD1U)DA!U_4GOx9J5kJ>Ewbv{t4mVowbr9U!-Y;8RKJ!gC)|q zWP7+dWdAPKQ}eajjpQz8fNrkP+h5i`N8NTT%A_D;iuf|%PVvpbY+ya)Ji9pkFGXj| z09nL@0zpg?fcB9U{0G`+!e;chcZdm)^g++ULeFXl=m0k|WCheGu>#Ts7z{Z8@h*&j z05WDkLpz{26_6!i$N-Q|0Da}mfO0G&w(~TRoP!R8p{p0fadzU?QCw3~upcY77Umvd z6q!$@^4=UeNUb=74FTpy>5>TNIz| zXh2>(&>lFK%S}!#H;w`b6^LQJh=d;wI7TQY!%7Mt7LF36Z$KR4s@uh{P1K(q*&_RM zhIFNU|H_J)Rl?=3<{utDOUVh*o58m|;H;-p$alj-K?g!Y|Q zpfSn**eGw3>eULUVousA3~_MO`Ki+KCV$(j0GT)4rJCt)XwX<1CmIS6MbuSsW3B<0 z4vdIpV~HW-{2AY<5yG+H6tYjZi3be3nWESgjJ!xR!(r>Sz_%HYqa(VBDPLgHRUSa# zuzfWN{Am=8S2!yv4dCPN-DGm*Y^rm014ebx3HQwJ1XyZX?4zm6m*6(+5#+Swtq)Q6 zsc>{^%VjF&YRe(NW<$lzLDv1vs~qf$xZB3^%m1#e*ZEh==kF+c!Gi#QOcB8!gP>* zm`?2yzu{k&vJei{!ruV(Pv!t4AR2A(g~}2NeLhjY{|vr!rKb$9TL3vKw!ds3F*f@x z$ES3I3E*=a!j`UmdO+j%6#s>Gsj+o*Jn$_$rdLhsn4}FkMa)Z-(9b2lBM4;5d3!6l z`KGULDHOlxf=ySp4FlkF{4Bm|1vBJdV`K}O9E4)*9Q@ru8+(s~uqILF5tJxH%Rwmo z4WhWerwDjdE`{`JJU=?Dj9SS1nME2;5QJKseEbm*HQpeiiPt$uf{TLlPAZrnIA3|9tr8^tK-XD<7iU4%BjEf#=3pMe(22vVJ`jYZ zjoCL8&xeyFU@IVw6$wImqDjpE*XGER)!KB-UGgsQ476z;{Nj-$%<=jBA%>rrq9N9< zj}TuV;Z7ojfF~iBD!6_e+#{S!@`DSC8Bh}b{wRVsE+2fL`ODh&Rjt|&b$&3WnEVYw zFK0h|Ct9h{H^O6dMeuuH!J%%#Ng1$O2F$NG) z_Gwe{zE{Zi4iZw}eLXe{2q`NC`lu){w)s%q^zJPLJ!U=YS>gq=V}Z+s#Mt~Ri`0sO zc#bhpFEzM1k7R;CX`34sr6pD~a%b9)k-7ps7j4dt#`}9p-bMHe3=>T4Gg1sKmxeF- zd$NPdr}o22-VOdRC^yoXARStSrz!t?9SLwb+H|=qPmC`C;F&I=U)IWQUNl0DffCBU za@hJ}2=s?7F>@#sJ@soVtE_+s8U1?!myt3wkz8h+Zo1b}P~H*^vNhX{~y+Y|DhV*QNvo5wG=@!#0(?*qTe}bK9sw2tBS72U!Sru|~$P@tfC0iRIIUxQI zSf8v@CWhHix&v!dIf|>=yo1(qh1C)ZPBWnC8_84^FF_ihgut|G{5OW+KgX{B$9>Pv zyv_gdLiEU#~32FxYUWMTf4^W6 ze@+Ov#@F7g<9@Jlnpg0r{*@TO%@%QZ-wqN@&V0c_0c{@PJ4dVA){=T`Kilue2_9F5 zZX&h`4<6F$&Jr8(op{tzp8WwTqJAf-)#lDxAhslGu~O9-`jU>t<3vj5JEcoW|v5 zIY@EupZ*t{lDQw4)Fu3*qP6{CH_iTsQRBSc-$R67{Ha14k&lZYuJ4c&pBl?q>Jh?1 z3f{QMv?$3pRQP~o*9b|DnXu^a0>K>v5mF=@+#JCuK#RsmiM>S`D0Fx6*@q@*4?)P6 z$Lt*;_%r$M5CjrnaOdDjV`Q513FXZKnMmVHCT|!e@Aiad7G2$=!vcS#jYCf+2@26z zkFo$*KUj-19TI(A!8AZXsOtCl<~RC?nbmdIfpK7Qt_IUjj;J{zny&PAQJ`+PQT&G%L)whIlodn1FQ$M-g@qQds>#{_Y)tnY|RB>692@sH%r zwD*uZJk1XSaZt}aSW+*rzKWVLWga&MV4B&q&Hb<+_N$G8TaRWq(Qpl>P#0=@x-$x- zDU9LCoI#+4p|&sYur>qua`eK|!N8&=_CdsXX5do4jzl7xV5iN+7%)QQV?&#+DLhXs zIj8L?Xbk{%HYF~MM6i^;E(7K(v?D)fE}p}&eOg@vP?MdKZ>AVF68ipzAX_di>a2AW zh!Tq5mYPGkH_W`~M5;w{9-$lBE$tn(ljf z+E{n@8GZhj^;5wGuzsCl0M;++}4@?9ZJHXke zVw5VoLLxl!DmbbsburL-U}TgLU}w{+CcuM1K1s;7W(7^#99!`Ma7e^ADq&e_21hE=Q@m0W3GfQh0qqVszPuf_nHNEe9%P)O^7Bavj|a zYv0-^)5g1;Sft$|&*HFGmNB^4L^~^L!7o2#7*u5pG7sWYRKy zIDe5|f~lRCec;0C-q5v_bi-N_PHidwKUu%4mDZ@cF#~+S*YmlJ9Ti;&N9qp4L>v@^ z64eQTNP)jI@^pRw#68|8J@-qJC{x`*_Hstyxa+|h;2Rx?x5=|D=j^XJ{>tj5`3`Q1 zi~F*p|KiOs)3wEZEq8-=^;h}9XVEgpCwxZX`9)*%@|Lcb7mf=2Ke!&dQ^kA8c*U_j zW#yGy-NvXR*2WrYx{z>5S>)Iu9 z+v>nEBT;0vZ_SicQiIBEZG<*aSqM>!Y(f7~sn*rxQkh8^)rVei_>@mT@jB%`;ZpBl zFs6=s@xH9XTvf@a9=&v!GMa|%Xo9fxD+)qAVk*_pv09mP|LTi6Z(Op0StQP;2)x_S zQ(7zxVajS4e1A|}SAwgZAtoi~BZ?qH#rG>5Z!NBTsio8FY>OhxN(QewNzW>#p>-tg z>>=t>I=65}(LDDodkF#n_{|i);$=tB&nQ^M_2SylK~<{*h4*#;THG#wRr_dHI+LyB zYP?(!otD$67qGA;(Y-wEc6f0yLF4ndB1tmtJg7&;c}V^nUikmhQesl|Kkm$au1mvt zHFn?Ln@?|#At^~})X20b51BYt$GyCCXNd|udy?KRR1G-80C29}fbof5x}w}XVr6>E zPFjwo9!-2mg(-!brh2uIu}<>C(jarx7=|Q~tdljUlhsWkOMJS$P~b=MhC&MMOUDz z)lgj7MKc$x_82J)d>6|bC#0iXU0IGkK9YsTBIsy(*mEgBr|3eZxK*NJBBi|iL=R<| zKJeNSufs}(%EFe$OHkNHC5H8roFN*r{Rvq%@{HZByg56nIvhPPd6%suug==+BCZ=88R^>q zLwM-+B)eRyfqaihP%>p# znRyn@Pgtp4w?QZytQp&1=z?gE0wX;&Ue(6+tL5c#jNaKoNFi)jYtZD*YU?`&O!BY# zbzZ&cAxU=J(cvi#-zgSvAJQY5TNLL}oBpAVa=lAx)<)(e^*m4Ybx+)66QCc<8?<)) zi4d%nT~hj@Pt(gwh!c}wO)wt(Ju&Y?KFxD42=rQ{Q6`@=(4q?7G2>u4m|!(BzvJ^E z%4$L!My@>HcOCR+p&&* z>IB^1W+Gj_Y`+}?A&yWOc{!TiK)&&V2W@voMf9Hwb-!DGxqzA?(uwO`JYBI$0}HW= zu^|*@gG9L@)nzo8?|eb3e?C0&`9<*~^4b^H#KNpDeD~GWzWvCfHNj;);gyMg--+2E!)q}vD$s^EFAbH=*KC7Za8J)xT>M`~oeRU}8J}VE zxV)BeQm`KLV9A}|4Nz7FEYmat-8q6S1lp(_pACfKHZYwZ*lP)pPM> z6IDPIVlI!v@VH4}W__R#*7JlAEH_jTB*Lm)6)Vv|E!MSpWatHQDamrBtSvtdc;@uB zXR-@p12Y3Ep(Mt|C#9Ap0F=)zT~2(nX7h&0T~UHMOrt(3hU0voAvPb|rYSsqsBrex z5VPlveEl#B_HL%7(%5i4q0Wyyc|Yk+S6|)TgE=qDF!j(8Q2k1iNJjzL{XTI=TN^$J z#^<0XAiJEGMN0zK$jJh&pX7X8SA9AcwUbxfdZ`^HEFQrkXkrt_jl1T30<{x3*$o;3 zH-bFj8zjsI;i^j=5(CYKh8_rwl1+bxvS^Wo!?ZZ@S=EsX*=Us;DT-XlKJi1F6~d%6 z9lc8fWH3KWoAXF%=(10Ox-&9st^^M~uh#54uGDf66^V)j50H!dtucSCimHb2 zT&R+pVOUsbr?lenugT{7g5@eY3*Lfce`RXCx7Ty7+i`qe{1;1*t>GEqBQ zq()komk*}JlC3>MVQt;TLQ}NhqWw+O^%&?SxR-o}{!hr~vf4NxBO@S|KWiQ;1*p{c z%7VYSS#n9ygS%9eq;lxt2Kgk)glRGFmsK6f>;q4j#CPY)8p`+yvFvu0*O!O=-hCzK zy<9Px`L7xwMb>4fgS`oJ+*cyC#IRZ7svpaF;G&2vr^5)quC(=Va10D8S#Ihc>RD13 z5|alVi^F}n%8J~*2>jsBFwotN;IrL2RB{2uq^j4??s!tgFTG|^ju&irZ50~2Xn5F7{S)Mr!hV1RZ^ApCzN9WxjKOz0ed$Y6Q~c0e@- zqah%2mze{Qy=P>~%*X;L$pFONngC>JQ+5`1dO!j<2cR+o(3QdjXm9}p3^TI-PeHtl zK!Iscz(V+}{51miJ#}KMq0Ei?KY7M*P2%c$`6ZH||BE$#5O7?PBI3hybJoz-i5Sc z=-|p2{DKBjn5N5M4%hK`*m%=xxxZxIL6>*?eUxNasugo;WV^OTZ~G9K!m_Z&VtQV6 zY~prpNK@45P)L1d=3HHL!HDXps6?+rkhieo$d7SwT9C2G_6k7)^_a?*a(xMt{g%KT z@I$s*5m9VqNn0t&3{D2JoYT^E!ZgndzC`AQy@DmS16f`UlOTRs2TL?rEk^=Aq5{A? zh|-EyC#fjn+aCDTPitvv3?mvi-zRJM%CXQO{lcEWgii23ok7rvLX;C)jNk$$t-r_p z$FUv=fS&&gT? zKS}Iqe`e9uepu74(7?4C6?ffmcKO&SiA!v4TX|5qczB$?ihO?fRd7m?E_=p+5OTV| zmu`Ox#M(>8(M&_35^*1K0Lu!5w6Fbr169tSH?8f|88>oj1g>Q`EdlrFVAcdoVrfXLrbu@+L(k;aR#V}`#|T-@6qA~fE6D5j+$mgjk-LJ(!PyW zZ?#LvTF_75jdh^U0MZ-l8puSjV2K-OGYJX2orqxKz9;FGs@fNv#9%l4;tLrv3mn;o z1Ym_B0IW~~{)Y9l)w%Gwi2H?Qu-{psf-GA%+mW<$>4I^M`*C4gHFBp?UM*ANIMH={ zumqpV?>QQP6%vZbeJUmP82VPVWdV``(-0CWuyK1&j8O|g%f?zLOZYQn3BL`%hS7^X zJptFw?DiSPs(g7Wpd$8HBm=O*mybj`xF`-_Qx#AERw%^@QVR;e3dI%_09YZ;WIq5a zM<;?%FdMDpyuD6RGR`nNNYp$d6_>Pg?zX!AkkQpa%cTeqqv}D6t3nd&e z@@JO`ll8wHPycbig}%1r^q(W;FjTS~P$(|t>1WZfWZDW-rYQ`7YSX_**MN{RNyjJx zos4p9U!Z3ur6CY4Q+Ou~D%<*sV@J&lRZ$>`sVvTZ&wrl^CkQxoV-_h*^$R*iu#jyq zYt1<%adoW3>(u;>c9@uRVs!!A*-Cr*Xv944XfJaqnv^si)tA;({hsbPuOHd-vWw%T z4}t)!)x_+t$mcYnUDQrlrP)cpUnNa6H9y7pm)|z5ZeL} z6qsIK7FGJWam6{3)&asC9_Qu^q=6t^4N_b@vICZyVnQ^K6D`5VMUe+P5SZwUfm%|` zhuYPa8zv%@~cM5c(cX$08?G12^~IS(v8*yuEF0K2|g zm-m}^4pLF7rm1%KiyA0w!Gz3(gHGFhkD{2xDi9azf?nJ>nV;*Nt~o&G9=<&hQyCU>R*=`=#f|JztS6o}s{$w;FC&>j9Ci@2KYdgRYT zw6tla&KK@|nh(6mKL)IFX+Q}tLZzNIzC};RT({DL24xKOPH9>FW03!RqJWu{+aZXm zY>&l1h~aNLb&hWg{ii+0q^jVe0Im>TCY)4|z!`ZjG#cw4TJjhMaB!9NO}=4edvgI# zx&-yRDWs0r^3Qdn~m-V5t`|x!7l=+>L|B1wJR%@3Wu?&eLlRxGq?m^IsM6m!11X>VSoXUu3wqVxR+DqMFZCjB`h zZcfa+z#?G<092t*`#y5IXfRji&sA&7FKj!IfkvK}C8FMjKkkZB*cv-Zbt>2zPhV0> z1WetebZot&pW6}FFl%qD)2DdW_iB1et;KXu(n6LH}wm)DS$PujNafhiCEv?5@eK zVmzBXer{izPRd&Cq+hCuqC1J9ND}ZQF%!fw4a9QzS1-Yl+%$k12Yr%J+SYEE;&FH)>Ii3b)SQ3^QnGE^PJSmA`q#^F)h(KIS-h~*`c$VIA26<`r%cKNcbzD{CwJwql0v-v&3 z!E#Tse9(#{2##yYuf$KlM!DTJrou*IhvUB2#mPE*A%g8wuShkea0oOULYA%lSW-Mk zT`i#|X_p{GOheuf6Fe>b0WkwDnVRwdqQp~hz$!T`OwcCzyW{h{f7rNy+$sK&YlJ_3 z%8(@P8=k=Qcbe79(4OCvNMU$nEPp3-Z%lG16+pa7A(|N9+S&NzTz_Yyyxs1{k3kY; zs*OFDT)2XR_JD`+`9=+VYd=}P%E3`H6y2>lPxgJ*6vAY$Yx9nv%rdXJW+llm9t?cu zl7E?`@_WUAl6cukp5vr#F>8mhN^FCNYkK6) zRHzse@u_#6j78TggSU%Kq}EwOsJVv_pAzJTqiW4CANrw+hNm0&FE)szkduoJy0drG zr0}UOXOdN1f(v(=>(AXhx(4j8B;F9$eF!2d`-ItWKwK z*gQs|zV`L@J)VyRWo=x3dujJ4SxK65&@$^1CwdwgtMhg9#n}+0gm!KEb!f!YP*|(& zdY0MF)v>v26x8=@{3VJQrjb401V(bDiiHyB;}L>GVqM7{m;wHdG1=4hv3a*eBmI z&_2#}Zoxt>MC*(w6F^Hs>Q`nmP*IWGw@tHzcF<8RJh9Br5p!MR%Gt+S70iBClWKV5 zitcvBlnB)1J3HPW6_QpFcz)A7OEc7-nGdc(e!$r-miW_s&v|@-UR}rd;`*68=uW zC2HkYgqZaP3ZCs(fPM{oBpF!bDl%g`(HH!V%LYPXEOw5SLH<}pE3B3HsQxs}E97Lv zD~`7)&xZxgfv0R5Z?FV zE3_*$s56(Bhh=T^oR=vtVP(j4c#8hgbUfI_1H$IMEYAz;%&eIDdIwk5sn4gbMa?h3 zL`n2p`SkWzY*6&yRD{eiUD?(l&Bw+7C&^3th^SvdvqMBy5=-(us=f2jzxu`wOJxES z=bS>&CFX%&A$!m+BucFQfp-%Y4a$nKeKxFE-FG8K&qD&`MhUYD%=k~PmY-hUg zy<{Cqi`J|a=zWXRNqd$Fn!itO+{(2TejPe+-@PC^bH6E#Ph`J}OE$@QJKjv1XMc!Q z3l|$JF3;=U2PJ^fK0*%kWEoucf+8wScz}I#*6+W5VuRU!sR_J7>i-Bv`4r+zC!dbh2D%ghM3Q^E19pU^9=w>5HhL8DG z_-Rp=)diY|jIQ6$<&rZgmJ(=}T-vD~8&6T%WygqA9!t9*wKQN4H2F#^j(>B3um@rZ zrP6+$0}k6y$I~vyz7@9V#Q$XeL2UW){(}XYTDsz&3hDs%2LJ=@EAlgpr5Q;vlB`R8 zm-auJKNjS2s#?xA4>f`$*gYo)ZEt8yB-0fc$0^g%UUIawZYf~NZC-tv-X@X$Ex2R> zgbn0*zs7uku<@V$2Y~(}BYGBABS4`6pw)og*ua>c$;7~f)zp~7$cW7dPe|p288|r>ed*UIE?54k_Mo>fu7Cye}#?z6*m4?*!W*z<9~&X{}nd=p9ve_7xYX_ zWJA|#X|4%Z=LB95zNTUlS;gNT=-D=Si@!6Gu6h=hIZ7>ZO?M7u+%DFdEF0CQ#9huF z+ph7hV#2{o`)0O<*uXU&c|>9Yuwnv(Tl8of!w2;{XhkW)2NgP;2bRFlS%@T>~NOMg~sjSaOBs zf#Y9x$upjapll6Mo*-vIP6Q{Se>4MOq*i!NC24$pB&EASx}_VWySt^k zyT5yHFwb+&_n!Cv$M@W0D9;{<+r8$RYtAdy1l6Gm-M|+=fBQy;6@^Lh*5~I7EoTbO z?@!n*qq^w^`>uM8a6x;ncORy623vd!1v&Ut6L!mBr(n^sxsS4BMfX%+|8ybGV-g_` zVk|<1#co&am9h?Xz4i|=?N5tt2%V`Eo0g{Teeo9ePS?c!XJS-aBU_1z0nU&1GT$Ik zP6}35Zagj$Yz-I@AMHdZv^pAORwh>0+YvbMS30-H7)>`%dCiCAoa+{T+GX!!ZEK|k z9hMpM@vO>R(S9J40k3-IBM+{qLWY}+$(~{VceT*B*5{pFRG*X#oTls}t5vaZH=3Dx z?Jthj>~rVqI2;FPur8U>0>#>@8OD|iHpd9dj|L{kjS$ka9ZBQz7+OHX&0=)80H(2D z_y}t?GJydSNHdob)94S@(`_?H25p$s?m=mvXC^6aSc za8eRT4;@AyE%<>kOfK&1zy%x1|E0aLUupmlgacxDrL0f8=9F>%}ayIJy5qB zi03y}O$dj}QJ?T=iKvGwS*lV9$yTsWOxuktrJ9~Wod|5$&G1moC-%s|==kiR*&9Oo zU}ucjAp%AJc(24mld786WWIGbr}`8_o&1dJI}}|4T5FwY4UzspS>)~e+fb^PXzY~j}-zi~%${BTE>7iIzONPOZFNmi13 zy|T~?2TOPyB!qksAFZaj?|iUAr?5u=#%`nQwc0Sj-jR)9K4(sU&l0@gM|~#aNiZV9 z_cdg6y6VI}ALbxyBn;18M*Q(irBsuEaS0n46bJM9jxrJAh}g9A)q3fTJFWFmPG1%JzY3O98|Ht6 zcOQ-v=m{n!Pw3v&ehULz`}qaseZ8%H!DuY0TI5^do3H(=2=6;eW5y} z>EqxD@@9xLEs{D5+GSvE8QCL&p<6-5!h3^$Gnt0c6H2KJyzNg;BZR+)7JPVlSxc7N z)>pySQ^_4pDQddpq_|WP&TDm;OwoD>uitF=zI!1!w(!_y0}I9lr}dlfZkLp|iho3W z=eF*{mT~(bf?=>bQrcyH8Ix}RW7H>@p#oCb!O0Vp<|K)&=m7^My)kj&tG2O@*w7e` z171Ni`~KTORZle}M_Q&X`r3o_^gGsLr&Rj$o6TCY(Q!oK7HxbF4fUuSis~q(9|`m# zyOlQCZZ9y|dcN8YmTDLpdF_V9^o(R@R*!JVKA^XFFww;8?)^cQ2QeSh4~vFdoljml zd9&QJT+De!{G3$Qy-9y+x^h`}E0BrIh$#?9+LMy&d32-ORN>d~Z)c2a*wD>i1^dUj zw8G$v-l{Z8cKNU-Ix@7*?$O*kGUk2o0Vg@gJC2OO{6X*i9;If~xeu`%Lz*vd~!ARU538<#`p|qX*F6t^6l7euo_Xe9w@%17vOqh}? zl<5s~SYze10@0UF%|0I99<}rmVHgB?f7oH`q`y^-Scayz80ymZ{DRtVE}}a%er_~Q z33NChAeHYl3-Ly-b2<>S5(kyX9)7|5OmCJa@$}9nf+Z4_f4F8B+iONcv%HM~adAUB z73^opjfBwet^B*qc9hQ^UJYCCYV#VrBKAYtGPjPTuA)vHu09IY*x2o0XZ`N=eCk=D z<(vBZWHD_pDmLC5Puo%etLtM`YyvI@k8)Uc{KFaSEe! z0gNOD7y@bMOL(V+GYJ`c5Uwq9P=LEt1D<+d)u_5f)r}=u*K0^+<84YV-4-0EDA!uq zay2m)D51}#-@jEbX&UCQUjWO%{4BK+(aZ4a^+ZW&x%`YHHBZ0W+5Y-3izkNmzC5 zjHD+k+ORnS%G-UeAh%H4@VYYBlSdr4sla?fr7H*js_DU-?MKDaaYpGf#{-EDwzdns zgPr@Mi?RhpH7?f5oVw2VW8_)Z|sYt(@_&`2lkcg)M=%l;mq)1rfoyft=Qo)4nqT z|I;aDBE2*|p$$9_FNsoRo^1;r?yV2Ew zApw9hk_GM7*6vjaUK`+y936kusSq-NgB_$g!$f^LS}`9?xMZysfe}>b_$eP@!(316 z9(>gkBzX8K1y>C;eG8GIKmp&W7VgXgI5S6LkMqoLzGrd|pMHF9PB4~!MKnxS?n_#N zh}7>u8>{EQ*QU7d;J2o?HxZ^*A}u@xZF8?-fX}CPy8cZb;m^5TaB|y zmYhhKv9yE)id0y&^+xCmM6w~N_mtm@!|m`AAiyb7P1cBcwL+O>Gt14L0nSK2t6bdb zxfZ-IJ97muOqxtAL2TyAEFP$+9>5uynDOSJKZ2D+qJ*hwyqae_2jGk}Thn;(b;L?a zkv|)6Y=$79IALTJr%9f%oM1dkP)AEX=yFahVEdsc;g^$?gwfsQXMQ=5U#{AE7X&a< zPLwU#i(wz?Pj?lq-<`Kdj6#vAaLx00IZ>2=`$i33RN(FsYoKEfyhMs=Z(KbV?){Y_ z1b^YU29N$z8B}ZEvAytCDfgY{X$E9IS?})iW_o^5e%78#^IT7DUQN3HTX+#Ho_Ojq zA9>d4i83#39=05DsMR6I@hrtcEHV!igNcS$v}p}&ACHuLsqRopcS9R8oGyz5X)Ho)taY>3eWJ=@p zR4~_REPN;GWF=pFO6YUsS9q=k8gKI7&o{P9BQ=mM#XqAU=A2Ct#gl?KBN68DsEYe0 zoFqLzq#o+D8{(0ij%|FOfmD;*!0n`mh{|^13dj8q`LH?-I~OysG&5W_bYOTp)S$6> zn=>Qd$uak{)z%XejMOIUM8NOrYV}c5-qVK(_(r7a4!gX9gc+Y|$rNe?yMTKKsrP|E zL|h4Z@LV^iBLaaUAnbrc=ip1BmCHvxk=z$0FY2U%7nQSLrP1PovD9nFQ$I|{7V|LG zZbSz;A1yA?s;=AJ+w;ShE!Tk~4-U|m9TbPz=H6}&ff5+=0m{gZaN?9DBf1=yW2M^& zu2x|tJ;|0x9qC_mP}WCYyRbd0v0sqj_c_bYYvm_O0zcM3K~B=m9$1 zIQYj_*aTtz=8qC=RbqK~=zaW>tb{duDYho4MKEdeAqb)V~ zd8>n2^-=w(^8H(h(scXQe2eccI_za9bXX&t?JO+S)_BvDx<#(q)H+rCWz#r|Y?&$~ zi6qCGk1R1hE70iC1bAk&8hxyS!siy@e~Pd4I;tZ}_#q7&`kignj>xy)2^1rI9ZHw}grx%3g2cKYh>#@OO01kjH7G}w z!^QFg$SqX?MV1Q^W-o)*c6&*4=jImaP(J9RJ@@};)f>u{J&xxx{H*X~XU)DQhT4t{qedY#SKsDf6u4oiEQGT} zV((<~z$1bZM(jk|+Ub%dlsF+BZ^6&t$}U_sLkPLY5J^)rJLH|p{P1UF@r2L?%jQ!t z#NGradGszSI2q}eRe1sO1=T0mZFsjT>_sVdJpHsd`7$=e`qTH15_t74FS(S^>}L{) zsAdyb_9qS{R%C?k5el|%X+2Gtnv=`2rk?XASHhi(HlD97s=vU_p4?cnH(wThG}Ms~ zjYxYUMY&mTMmlw+Bp1fPAGykI-*gJ1^jjGp_TO7LqE==JJlo%qrc7MurF>P4wMwz` zCFXcYEn+G&)mAtnUPu=dn2Y;hJo;jX8FxMmJ@OMML1>usvzt`_RUh%0%RYhj8>bUp z*tefwlB!8Uc{I34`s82rw_Q%ATc-}iDA^ilEv6cna_LVsFNjZTIj9cS!lkOb@LU=Y z35E%Q5xWk>=c#gtMBqCy0;r{W%U8tQe zd)sf`I4G6_M8mgZ&0DO6N)438h%ZYnf=t2&NkcC$rUg1oK5begyUH*ulw?La1ko?;Dk-XvgrzEFoH?AzatMtdLSw@NTf? zNqr+wIerXsNw{DM2E ztAj}WC026XdPMWiQ=v zO|0Wq#dk?NO(G|;d0&A;hfo)hpv(#`K&8oraIJ>ujJh?Y#D!s1-6y~2Qsf1Yp@E&z zdWlyBkEx)3iCK%Q)$J*d3PrG^E|a*fe(0j~7JXgO>`ppc5$&FI5mP+<)|^G^7Hw6| z>bH;mR1po9tHf>Z+uo;?07?IOYN#{?Tp$&AOmJ8ySKIqcrBFE;I9bTkOdBK>&&6EC zamZ0ZXX8GpVOwaSx9}7^Z8y7X%4l($Q%A?>(6T`}#`CjJWd`cKM<1=y`eIqB9@cV@fY9ESL>5PKol(!#!+m)sttDim^gG9cQRs z%m}u#ow*BU8({_PjbksxsxVokPY)UDG*2$t<$S9+oX%0lq;Wf6J;_L@endUyxw88# zdMI3V266yt+4qLp<&5L072m6JK6jWh)eT}qtuIzUH)O6jUEkQ$ooEXxha5og*^ZyY z@={gfTzqvR^j6`UbUIqy6+0qP_ug!9leH>&j`KNpK9-b6wL*>E$JQ;dN<)j5e|%QoEIDmC|GRx_#;s??m^93*hELkn=qBC$$ts)uxQbp<*?CM}S#%ls$6G?Ju@ zVaWdb*5i`L%T8|Ix&L2Q1kpjF?X7qJZxw+JYwRc8vP-sDVge%Bw=lwc98|WaxczGx zZ68}Lk5cBmGNV*_&oAEZG~zAylwev`K@MoOtaIzSuB|H$(2Cr-xjLw7*(BsH=Xi@- zCC=uOY+}U)d-raRBU&dV_3iq{uh8>pt|R22SiNf(t9SEmdnbI>{)Qpm^Z;-^+wZIc zNg4Sakd(jNB<0t&@&PY>SUR7SlSbj~YF6sCe^4pm4D5f#y9TLIP;wK9G`w|M4?yy9 zRg)K=l@+zGv+~zg`H^79g0;QwJUtz0&NjTk%N-hb(;!XUte5br$`tpv2x>qG)<~G- zaco1TyHf0hTWwx_KhpAxX{NZj`$*5n*{g|HwVTaPE!pm9I>fE5#7L3o^(jWqQ}EYI zfaaKe;bxw@tUOO{OIkDXffNeKZA<8jtG-ya^pm|-0{u8*w)B3BKWfKKz4AmMa2hN) zlNt7fEc=|P)yDGJUj^fzSvepx-CP7XIl;#{*3l=1o2(p5_GiVSp*b|)66XIBg|Q8rsu8R{8_s~qBVdHh*ew1 z)$p2xgtz%8_;4#fQW0iB;bIcExL2`; zOs9A!ayNDL^xTqCQK}g4-ka?SA(Mt7X2HZ>PK6@Q07zidTMtdgyue1xteB zEbNzWyQxEBaOz!OrC)nYb@P6C1`LvTh4!QF^e{5GXK;~I76yNIFN`qca5=03g4VBr z$BfU~44aPp_C41_4v7tMHZ9u`92)@&cR55zao>sUOA(ki+KsO|XNoO9UrGSZLW}=Z z7V@5F`TOKwUQI08X9l+=8Hrb86W;4zYk=DJ&Ak}5b%Bn>X7nvWBvFE zg9*@6vEVyZT(X^_&-!P)tQG(F*$LW8ugF7j4BNrvjC7JJQ-bp1ti~^V^4@zpOesZ8 zP0ZQC0PA=lISCC&y@qEP3fYp2uqu#bC%3=V~pBJ7+Ci_lbsiqv!_LS5nMYJLUE z8kfH^f_uOa1%muzmgvof>RSEiWp99>1UP`8gIqx)I_XEVO{~=4Xa)s zsAkPCNCb9!swqI(?drnU?NcexKB?O^94zk0;)&Z}r*f8PJxw_(i=86Z`9KnrEL5#Y zFZ^kg&To!;#6YKBoz=L{x|p8_-2O@tnKc7gbYZ6fV`kae%%*zaK#^>P~YdajE zhU_2)S2LqopaO8!Kmr7gkzBujC)19Y!33nwpcV6a<@L&7;<~gi7_=xRl$V|hg_QMu z3=Et14MFpu4$&3A$|P`VDh;;JtdvDoL%w{0n=tQ6L+Tnax4o`U91C3rK zV&KE9wQ~}VXsxsCctLX!U3wdw+vdBB*4U7rZLQZ7Ty=3Gyrz~!ltHAgS{MNK#KGOv zSQBt$OV~?~jmBF-vJ6@FjB(es9n`UH(GId5V zXPiMt*;7^8Ml52=Mr@QpXE<2qWS3F$rcQhoryQP%bYO9vxxp!XPfsm1W``9xR2I<6; zyJX~ab2jG_(H|t9OK#vH6$mqsLpyA6i)Ix)^7lTEq^%<#$amaSsA&g z8L3Q+y9Pr{aXcr4;K>)n9HZvzse*%5`m7Xh-3rs67igW%uq>+2FLfvcEaEtvy>trMBl%k7c$|pU3Q?xVG6(y|fl{x9 zex*ZncQ8>Ss(=%>`Qz6f%6;nR^ed&lAY)q^aOn?3W=aV#qb{)f7%rR4`Fts6fo!Ss3sL&BKwzP>Ao;lw zA^mmYb|v%3qrD38f9~EmlsJK*OZ3-uLR(@;sud@=dwT41hgUabH5@Rwq<2=eCGYAOQT^d4aeO=Qio!pDkl&Iji!ft=j zJAnMTJv<1hAe@1Gv3Bg^>%gK%CE!i*C5LR80C(8bf(U7&LUXA6Jehv5|H%{kPis-# zJD&tpOmWo@X8x=%{q_4IBIYv%5@o_)?lI%KW?$-z`uCHl>Kf8R;)oC0mR+tybR3+f zEATd*OV7;dbNjpOaMav~^AtNfRj!YJAjgBz7I6H47#@r|qD&_>b^b_+Nn}Y)*-c{L ztf2f{mkomva^zmun80DnZ_?Bed?^UdP+{nh9Dex-eFi6C-=zXa+|CH}dc7oq`(he& zhr`P+KcW_NCkpfUNfc%{CLewjCU9k2jX=3|Ueh19!9kFwbLzVn?3qdl|57+uxcpOc z`+f~)-sHCPOf7J*#oh!PE9nic3a(9cSsrv-T-dUU@Cs%e-U(iGQrX;pnKCRJ0vEKj zDb|c8&s6s1U5w|+?^c-$^!x@pi<<$ zDGw4pobY`gKDFfFkcz4uX$2xVS?%2p8fEsK1X0*fZd_D^TRZ-tlZO34XZsWRzC%VvPAm6l{K=Ms1INyZ!?&_iAD?d00vFjCgG|ZKgFzshc ze%hZzN7pLv=Vsh>X$OD}uL^o<3VV{)@UE-1zza?XGw6C4=Iftx0PF z@MfK>>+=V=`ssgzf^{J)N#e*r9_(gO;rGx9Ro&7R{arx&;6B;gmJsq0oa;e!?sHz+ z00bsDeEg&tjU~s!bT%ruEP@15v7>%M{WMTIc%AAq7mSwb77Z*I7)&5|Q%X~3M@wgT zMBl$SS)&QdPxYR^^i}WaE?8Z@5@PZ*iS2{xU4Oyr$Hy9b7&}7fqOW8w47F*w^2GOG zdAZlz`NQQaNJ9GwI5Ts;uK`}B`)~q;d_d%nI1`lg07w={quL4qzqKxgM{hD9&{#xh zfW!r-bFO@=KQSj6NPO-8*>ml_*`26bDV<#HTAO@t$F3-r=2YtI1|Sqnou9iK0vUpQ zXj6Z+vn8UhBo`SS(o3!%0KH^SED#D~udhQvG)#V`uxb6C`)2E+J9fKwaf9XL<#xX8 z@&Tlsjw(!eT-=ziP(M1TYii3mCl@>56xqF%BcQiHVdV941Kdux^GXZQP-%&CbZ8DG zg~J&ly%HJ{o&fI#kMb|^-gYWAtpNbE;ib5?8`dWyVQMswmSf7%jDc%jW$!rX4z$xN z-5ZZRR-AGG4um`9ngj8X(m3pxvwK*48m7Y|9%U7ieO(lx-ksP{n0#Se5ZP$}K6qv} zn|mw8+2(-uL9H2>LS_j$`57}uv%qAS5D+*~gY?<}P#J{9^n-!&|Kg8Tg2e;|%8ZJB z;HFJ~;-=SRn2OST_O2Y?v510e&1V5J7W|I88Yv}OHrNGPc-|;=5I*nUwb?pX`ByHf zs%@8S;1*&J3FZ$D#;Gnn^+|W6B11t&QhHJVT?qY|9%?^oKYhV2Q>0=9CLH6g*e2JD zALPo8WCy@m0PmQ+UEeNr4d@!=G>3y#rBh14ZNw-8v21mvSnzd`cMbpWR89N&$s<@{Y>N@*uu>(b2)60b?# z#jpmGm&X-y37wFJ*RhtT?6@)CQ4GS6sI_IEmtDtehc>2OTU-XD;k_$%LFOWJ)?A+@ zCFb=`?d?Mp!)(}j0E_!}0d9$A_FGJK(caz1OGbo9 zIY7^^w*inu-aPVwUqPX0W*)6!n&xo@!wL-+Lj(N*CmKpf|F(?gy|ll=Sy6MDB}~Uus;J zY5&~VvS$M<#@6+vedW?m=s5pCS`pcqHv)y&Q`Np;r(x?kfSU&}1Y~h@i~cA=5_ah- zb^_WDj@rB<36<>O*cp2f+s0>r!WhBH##Um1Kzodq%sGNhAWnBHpBi(OfH^O4fe`WU z3WToD(LY;MESE$ef0F@yR&m35PSC%r6T0WtZ+AlKgu-3D1@cKGpiZC%>V%aI&j-6< zDM+cX-#Nmn#}z*M0yj&L>QNDau?3dv-oec7Lp{pz6QGxP8V*E0YWmM`L;dulUj^7{ zpjvo2JInva$zH#vv@*&(`Z15=r-DlrKe&i3J}J6ipcSdlCi- zK>CnE&#EGeBX1wTs5NaUU}tp%bv!+FJo@>E-UBTqsVY$CA?8CM>h2``MW=gw*g!-q z4qKESwkyVJZp5ZceU2svT1t&Pa7)>&2DFs-Hpu-z2@h@}!pa^^+^{3Z3{unNRqBbb z4{sn*D}PdhbkzkyPrNjgDZLq@Rha5Yg+jxlSl7U0z&9K}+2{S>G`$GMfHs}62iFSefc9a= z+DOilE(~(~gLC{%hRRbE<6#5{UVsq%krJ=@W{8eJu6@+^Cua!ezNoVzAbd0DZXg&w z{4TW!PUCibb~iaZ%-n$u)J)5En8%F&8O*K!yKYpB09``(K#4^H07}Eb~=P{%UrT`CUIH@L8HONM6jG;A`Wp`*pxht4%LEGz?sRALd zI;lX{m}F}&?&<*V3h$RPGY3#+a>!{_#bQH2X`0-DC^PA48R_UW>2*Lfz=c|xn%deL zv>+A`?F$-u8YX&e;5INCkQTj`HlvmnEh7^h3$q4`76UVIUlsgXVe&U(G@( za^um-qfXI4>QTRlh*~6szQjdm(HWz<%DcJHQ8aX<^&NCk34XK(H5PLjO!Y?x62o$3 zGS;gDk_UAmA2S!5R=F#RtDMYKU9=rYrj!^+k<=v@YRseoZD!$Neg3n|{AUzG-NA6* zGWy&nhO^*%)D`Y{O4i#iMyrma^39{RMgbd% z<@z&OWEIhrgyi?EcfTZkPvUdd91U(-fWv@BYfzPo6qe2^%`_833ruqRPPz~^qDhF@ z7Stcu=YSD9FMqM7ODp>F5Jw10-&|^(%_1duzy=blYXrYu5^R8fUJBeQ8Emm;@6T^es#j|&@VN~mwXGrLraY)}4Hkz=0AM$#rvfl$CJ*S%{{ z@@bw8F*IMus*e0TvC5}xAvm-Y6bU)~G&gkX>TS#N;M*NnR}}7%O?W4s;rC#v_;=9- z*W+I!AbIobOeEV@`r3kLc?KMvDNPAas#TY}1k_wnXHxmM7O4@w4@Zo>8eW2`y}LvC zIh)cegurpW^qqJuY$Oe5M^NvrB%v53-AvjBv_a4?anGcn3ucoUh@#y!(al?#;%76=s{VUPbP6jMF162__ z+9zIc-YPuj&8OVh(Mx>q+7%)~M_aB(F9egEhz*6! zpBkgnW;XIEiQGmr__tv1xe=>B3bAzS$iRBX;~r-XuPF^&xf0qOOZ=a{^Hg5FmMKC8>DQ(7#pLij@DG@G~XT^1tX2nh2dfj+Z11D?hqJm zshhSB`*Amos4ulauX%ZqR*8{u%k|-Hn&<~GQ4feXBk(GS+%0f(@)3?(48u9NTq${h zsTK3o-Hd$`#^#>&Ynz~U<1Ny&zMbt|is|DeywVS;acA!-*oAQ}Na_OkEtLz(GET!+ zug+AEFbksI-Pzgc_I*n6VZ~>A7aMcpdtbiM=i>reOjHnGVS>Wzz??ehgMB#I+bPr?52@(`4-+Uz4YNzdi05^ai3|;g zGIfP>Ci0V$^$cgDx- zXJvV?c5nXeB}AXOzZJA|LuAFcG_48F$+blcY!HA+kMhx0WE63Tph;+$Dz*l|0eU*N z|7c0CMUinS;JSyu%jTDIuKXIwa8?HEGrN)h!pr^}HHNPIwfG6#Xnpvfs4@Q1XHGfB z-RLv7YQXx;-poJrnI|hzfIc$-qR))E(Pzp6`b=90-QEG#K0Ljq+c%s4rrX!RbbIj) z-ELtFq1(wJbbGkmED9ckZjS-z_FORCKKUozj(0=1$Ni++8-AnP-vV@dRQK*ry1mJx z3{1D%R9=PrqT3Gux_y`$pxaI3O2Bk`ee+McJ?xrpS9w__wiWzAG^p!0y1hl>IX%F* z2j$dY{-E2p?ZI@r+@Jc)pLEUJf6!;Xhf7ucul1R)-o~Zcr2en;ng3ibyC`2%g5Lj; z5=^|Z-V^;7y8YAdbbEAMe9bja0~Ti9=*0jTaNPIL|H9LhZ~y4)f6(o5|4O%?Qv4&` zZku`|2ATxKK)=!La5r>2=QZ7q{5##g0MPA@p?}luHkBtKzv%X1fNoc>;RWb+rztSq zj)ceagKh_h0zkLBUDNH&QCpTbbo(TPZddF4oo)|{s~!a=_+8WO1;6O_8ZAYVNY)SL zn3lWT=mjK=>nNL?B8a#ZC}J3J%EhMzT`yqy7^VBmlI;*HBrKA%Np|m_~tx zy_}HC<9^Fd%mTj&d&{nZOmP22*oy%``G9=9+AhYJb=)-w#MEhmNPrLS(F5q<0+>A- z;sOrj12016y_u^n(j{FCf#rT~SRRdVjPTjv*QIqB<_10gk=u3bVJrDs{(eWh+E4oel zlNBWgc=G-Qof}rP@`U4t6`igEv!cs`f3Tw9qo=zx0K$s0ts(zpMcG1izN8aA4x=T- z)220)>iAf6mOx}kY=UvZpv9{y*~7=4qrN1FJ66(|Wql%|pFujzxO;DyDQ@<@U+ENk z^=f^q@7(^94Z^$%ge(#5Nh+XkmH*72PV;sVP+KWq^ylNJij{W_=|O;CFW61HEQ5$* z@NNR@GR=j7Tk(eqz0&|IA50R}`qw((aC`^=pRRox|4;|~ zcRq~*O9I{haUF22$~>j`M^)w@iJ1DADiaA%W!ivMne1RyriHpTpvnyXp~_qURGBA~ zf2%S}<~nodf2lGL09EFJ2%yR=>4vB>-Ti*5GDQH_#OvCy8&xKdiuZ0*nbSdFRVJh| z`K8Ld$;H>IOvP>ulp9rMjaF}4igP$;#>E^r;`DVcE-#itp8_j)S8-oRaoA?5FD2XQ zO^2c-L$pt2#Q{3irW}nJxJt$Lz|9l?93O6R9ZlxgNW0Ret;)h7zd)~$?&lq{I1FF` z<_cKGv#E#s6Z8UO^+6EO>nJ@3=SDKUJl0!O|4-7V%ms+o=6C5+(RA?-$_vll_qRk0 z`LcWrtl0G(0w}Nl5iB30%3^8p;eQ3o?<}5SU&HdwS5;tG9(*wVg5}#Krq|Wle6C^n zEMS5H0L$kBusm=uK8ifmxJ*0rgHue7M7>pI!$~rvbhib(7&E8_ijc*4h%;8AGqY~N z#V{CT_{5%JElMKc9QkWhbIGH)b>>6qq}5IO7-1J2Vb{ei#3b1O3|!$^0^CTRZsRL| z@sK2%4~6IKcOI?2G-*^iBB&1kvOwE4%8#qUHqqWY2Gj-^?m0vnos<;cY4|%DXE_i` z<}Pxztyi>?(z}9mvvQkv!>gU|vZypKB+$-Gs73{pk|N(g?}n+0FA-?(_j`u_oz z=U`s=e+kR~=M1Do^q(>i&c#2P3Pk_srh*^>hF@IWL(9Z#nMe?zVWV;cn+k%jT|Aw@ zMKw33fLuhYq~KNY(Y!~`v!lQ3eNXC0aHN>#8eP-@Y7Uqh4V|$ zlYcGfx%gYqlWPeGdd6=AJ>cB{2zn&`7W7!;q?Y_982=^c(f?OLPthNO9-050pr=ad zy31tyNyU?>o*5hdA^4!8+?BJ>BsjlKS*PAW7##g@(UqouoR-aZG|#W`uEsSi4!GTp z*#WnkJ;m?Z1OCyPb@hti3){o#3%awd^IE8bM+;mFT>Z)ruT7lZ>Pk}qNUk&Mnf^2V zc~Rm{4y;D{M*7aIb03f#sOi0rV-TDFPa&xz>UC_5uGpc+E@Y@m7l(|1)tD>BU~bh$ zOm#v=5rz2L5d_i1oW7?1p@{kaTT~AgWu6�?Ji(fxpwY>IPU4Be1rlXuWxr6eYkU zq5n?L2J=fZ0gl95iS>YCuf#zGTi;D=Qc{12{VpO1B|7^1ylX*}AqxwjRj`1jo3naT z-MaGaU;gc2Wi+r>o5usRE<8L(c6&~UeSFH$MJ6>@YMDKY0=bJ9Hr3w~RL(UOt5Zf7 zQcZ2B7x*tHI69YXhX_IW2=oi`g9j(md6WRFN&{w9KVGw{6?lKMs=1Z`t2%zes5s&cgdWK|y%E`nLr z@tsTgU##jdz^ba(xB{#y(Aoj4DiWU7Pgd0n!m9S|tNq?n{l=@ zMqZ~qjJD&u_4LsdA49Ahz!ceDdoU-{eInvAs%|C!v=qYOMO(BI#Y+D!AqGsrEdJNr zlhGY1L=|ga|J8Wl`R=v)of#x`4;6mHDRvAn(`|lbTlbYi1X>U9sUO9HEl;FX5XS>F zYs@(<=6qmTUB|zZ)%~^$!DaqDs3tpt{CVIK{9hp!pNRulFqiZI*_H3*Kgq5zQqaJ3 zDn|8;f}bxg&JyV|^);RP!x9E4>yMuREkEJ^olX_Lku3v~7#21Now}O)&va_c4V`NH zC!LD-gHC0K(5ZaC)2U$qow}#^H=R1Zvnc*=+uix0G)bh51~^X5q{FC;83u# z$GoOfaZRhcZ|GE82%YM{uvPLKo$7oF0_fCv2%XyH@RLrx`sE3OyYYlM|LqAI27AJc zZ#-e(-2iyPQm;K>>8ED1*PbwzVTe%ppFCj${|`N3fC;l;dgzB96C?P3L`j=6+6)?? z$AoC=RU1||!f~3{N1A6{X}~Tuj1@pS=7kfs(h4IPL97%imDtnmUw57MF4)M3_UVPe#vVWy+k z*_ebeapgvZ`u_N>r;_Bf40CgnY9eFm+mkWInlMLibq{`}k>x4lt#zC3Eo;`vvveAH z6YR}m_qR;5l)@~1o!(mq+ge7^&CTjqP-$|RB}}*8 zNfAJi5ko`0yDR~|p^Xd*s-*^u8>GS9f_hieia-VR<_-oFR8RgrC@3YlTU_nW023CL z5Y%fzK%B!RI(6}&Mxr4(wnPn>b;)Im3k5|(jROT0{~7ohySc#6fm-DP)Uw-FP+!M@ z@6b~O20T2^go@We1HRpd2r8Z*0CwVc%ICCpH%*V72&fLu)8GJAt<}=4cZ=G#``8g3sNRx=kJgD8i9YxV|Je-br1EyZn*uuTVH~;IK(-|QIew<31V^ zr`MpqQe<@VUcO=(ltP2Ap}*Q5vBKAJLA_Jw!;trb)pSH0CTXZaq{?w}yaj`jh@Uo1 z9M|`Ca$qXv0Fx)_k!z)(rf6JfL@^lnDU- zlBbaoOVlvPiP_p&!0V5J^)f6FTCk8

2!tlQAiZKL6$c{OGT1JOr0UU2q*IPyxd8 z%N$$Q&t!c)RysM{28)z&EJOp^?3E67up))-EH}M-7f6Ef+a#|peMz~e^nKj!K&UAR z0UrbK-WfoQfUE$-3|#l50n86$0xq|v(R{(E0iva+0kLS%GiozvuxRKozhGeohIBE4 z=;?sbUci0WS{g63X&G5GHLxw=3TilErQk^=^hnT^A@uYU(_gsLBR+FS{b zh=_7Hf4+)p^-Gzfqct84D+2>P(Q32H4-r{sI`>Vy z_obsA2EJ>1c!%cQE%xp=lWp|m1D~)L*Pl|nY6*jP%ts{lrwt6$bEL;uKJ0~&>F=z$ zql)axEFJY)^z+IC9ZG9qWxI!brZ5_rdZ%VQARkPpab0-f&-o`R`E@?reP8zDbF6jE zJi|{Sp6VrV;ESZ+>Dy*TL3EElu+P`|q(u9kbZ9|LCP72Twg->LXI1|c2KC`H6xR2a z^I2TWUl^aY4Vggg=+N^!HKBa)&|}O)=pD=4mp}hbcL)8vQ@}Gl)Ribjjn^_mF;gW| zvYo%4%i*qAni>ZW&vz*C)8$2>Rah-PJ6zusuabsd;a9KSa|b$LWqZ=L-DcSbQCcdO z=dEYCvE*QwhYg+#2U!sd!b7{xdPK~;co>@@k36ShK#c3t zEfjv}+K5!qhj9XWUgc)muXg%qp=pmEy-0lot9T3_&RvVjY0u;n=4Lweg}bNg*2rRi8V#Mellx8I!eh-At*?aA{br3}!#M<5CfW|x5+km|F z_%hxNbr_zmAC#_)@@}09%lT;7 zeer#AE`583*wMmLf0Z}5_bXvqPU!<{`v$pcK=}pTE}LsIB|+Ce@`|4Z7VXV|iB1G5 zi6q}$rLTAT@U(^TXJq0jwvQiqEF-?7v{g2DeD>h>ml@Jw!X|D(98zbwRCdD_(I-!t z?PS1FjPM2u<}p{E*&;Vjx?@mGI37Mvo34oXOxwwJW&+Ifh>vn@M;0Y0h{ZFD=9zVBKVL)!z?DFAl%9J$q@p<|z^4MEXEULd_%V%x$eaRP)n2)v0!{A>l_n-}Pf5VAl zI8nGfO9kuqnxpf(ayK;Fl3&G6RjOyw2L2IVQ9&!d)FzmYbS9MJ9Cy^&9n{w`cD|QxUbXeIL;k7AyG^xeHjT&mM#Iy`k zC8apv%q39Z!G2a=CcD+q?ooLpw1?!`7BD*&-qOah?d&S%IIe1ApNjw&|Kz~OY0PNs zka~??Kdb-y&SsAN$MyiKxxKVtr8?$$g*!r$O=BlL!)eWyn8|bQdvOImZ==}1=8}n_ zACgahnp#wxJ{S(;sWN+*jng>UEiyO}+pSfx{?v%8tY~^#Lp^%Qm%652wCm}!PWH#l zQHCd|TT(%6>n2?xE82$5_BjeWS&mgTeV7SH(^9#ejAq61USUsA7Y$qkipkB_tPFLy zyYy?6SI>iG>hGUP9L%tdW<`B93__b#w=X?_w3cD_fteLw-Uuu%{?IrW(oh^f0 zPBQ3oX-7f%kJQgdtQ5*xtzPQm%*)^FV;Hk!@5i%ylwDxE3HROK%E_h;#%~5%)Ox!} zztbxj@$yJOmq9*r96pWvyVIjB5nL#6n9 zU+h3Cp|Dv{skgE7|FHJfVNtzX|F8%K(jiC+C?z!s3?-?w(j7C*j36b_NVk&GElNqJ zf;7^NARUU7Gy;N1{q~^WI_EskbKdKEg=?5={*cAJ*IsMi>;6PYahY==$K^|kmHajV zyc#Z!+BNOA=#)CAYn*ZUN)IbfzRzrWg}J@D>UD7QG+ugrjNcglIaP(uc>Ybh;MXB9 zY7fr62!;E^!bnrl!-Q6AEFA5YmHa;VNGf?>)62uyh)~S~Hg3&s=G0EduT5o(LxE0> z(cj)HeQw;Sy{ASVHCs)kZYgN&OK@4;}ms>{-ub`Wh!sp01-d)l1GYOM_;a&DRSX^{O zZF)9DiH-;}`$hj4#7mZ!=sweQ2vTKf4Sis5P8My-EarZ3;}O-7>6-(&uLC{%ueMUm z4;t@;Sv1D;K+ca74aF^$g_gfrK-xfQj%FE!t`(E0L#Ls99)+RFZ*JwdTo9*W6vKTJ zAaoX{s2D{BxnEG^3qOnzEjfg34Sgrvjb(X9rsi`5b+aVtdH==5m!+ug+JW!_EU=`y z40hNT>BiIjh^>Yqp8Qlbl`4>k;e(^+(9801eu3HS)}_dOMh0-pT(b(DaPAF~38jS4 zY2x{}cy-vpHFBp+C*w=3wl=aUDaOR7wJszyms=(ud&`wd$T$o}yHTp$Fs}6+M!YsI z3wcSew2)Ww_2DDrbe4cowhh@*VpoYU2^7o!RML@Cz2L6q@%Ie>gmn+Lu&C50ww|Zf zwE_Ad^7e6Mj-Lybu8txhN@<^6hbLF0ltsf`NXW%W;#U^ToLUiP0%;{k@nYbTka#QC;%ATQFtZ@X~&#bpTC>;JO z;sL$?F0ndMT*H>vUr*yzz(bz#+_h^qz@uhp1pe7Wo*)_mfx%H=B%r*Bfq@ZNq!|{B z1|IsbaNvm?3Lr$G!caiU6ODjlVQ`Ev#tezT2m+8&C{hqm`oym3;+ON^CFi@*@Z}pl zl*e#Va1RBxw`Rq1W4U>=)t^~UYIFafTQ09FE!Ix!@ykYm!>gUbm_+KbL|X?*7fX`N80K0 zTtiDjoQ$5S89#5mD~2l7Zg!YUdQM(pd!t?u`Y5eadfIP|BNt1$uB`vea16dH)$TbZ z8xbtH!6kmqhc)?gMU;QiD?)x>CS<$a)lk|2x=uE^bI3@+cOUN@oELA6uDtX5WB%y# zq5G8bjA@D@HgC)Qyq!wzsMCiOno=1uZn${olr-#l`#)NvQhoVSLL9YyP3ioPx*e&C zd)LSA7s!LucGp=KshXAM{U8o-u}e~p>~arSrLVEbOZTyPT1LgQ{CDTA>MVi655 zCgtZ%R5lswo-+%`e&>-tDQ+kj4CJ(_Be=lkP_ouZL_q+%Y!SiR8%?d?l0L`i@+^Ve zP)zl1r$o?vO`^VR&elpQsa!&!^})wCy#$d4{#)e2hcV2$6L)W&Xnv}XFqPr~ZSK-Q zZkc0lQgY#=9;?-Eo_IZpB^?xnq}`o?$;1qaOPcJpFV#JPt}D0H1{*B4UQS|WJedi-(C+`+P{ds%6p!;L~$!p_gul(ueIgJW48Z#=}@dN@pNkRTCvwaWg zy57%n>Fe{RwKWHbt^1;O4#euw6MP#gH_J+SC{BI~Dm&BhMaDWw#}z&1_wHgN{_9B}8^u*a9~PawrsxmRU;8aE zRlJvri-=x2D4P;%-a4;H_0Hga2D@s=J>NX~T-ICs@g<700rv4-NlH1bw{-0^LtlmJ zDG`rZa2RGb=Is~VFn!NR!>yDwsgqXa-v>s0mLz_h$TpWnBtxl_>H7=> zCUEI{%U}9PD)xhBq{+7T%pPSIe)e|ASYDo9&&bR(sV81e3MO1kC9}ERcXbGRJTGk3 zM}1?+ow$o?UY;iEenfgQ=ZCWAdK}VUGv134H@;Yl_)va>ClgKYJ*dw3LU!tT7`YlR z2eP21V0hULwjar&R?;byZ1Y5*R$%@@{&kb%mg6d4xy_lDr$L1ACIRQ7c{f-g^1l}!H?1eSC|vF1ZB1l4!kvxG?m4$l8HCY zDP$2J9KFA!0%Mg(W4Da(c)YHcz%Q02ZON3WE-EH(l&YE^bq97%@Uwee0uk-XO1upN zgDJ|bjlbV}NXrImZGI zHFXeO0UN(}PEvyE6~bUXLkUXL0VEaT5gq3TLI+tz>hHU1)GP8OVIyMp0# zL7{{1mtHP3^SXD6i>1d1KS@{MHm{F>ePf+Sx&_h<0Y4o)aD|L*$c-c6A@|tL#k*cH z{1~~srStMy0j<8-SGls&Ft2@IZ>PKU$2v|GD(*``^d4CkYk9kfKR;PxkQZ!EAinPD zx~=jtUb-p$x}^Ju8K&=T5p||$5Df;j< zv6%7o?@{Gx^YSlQg+E4>N(hi_y89!Mgu?mA7g-S@v}sN+KuJ_ySyTv5cmzP z2>BdG!jH(~olTL;*7c{Tse%P6o9@ng4;CO{s<++V*xicXu}g+Ae=-bB=!D#ly*-@S zdAp|Ure<@j`|F=8(*^2U06njYipugO!92_>I|N71YhMo(un&IweYNb7>SIYQzaZLp zK;J|2I!@mse&3vZX5a2e@%hEdpUawVWqcI$4bxHDa>iABPeWW@zpbl$d*g5V9yJDG zZ#MJ{&u{=A)R_&ZXZk*fyxS8NAdQal*UBjHwx$KshCG>^3EsS&Je>9Pc{ValYCGVcd0D7LoT+&Vhr1#o_M&|Owh3b|7c_v@wSkmd6pAcvAOhuDj zw!iVBf9iRfxY9#Pglm}p8j-&st@isDcJ+{{y`;rd%ko}BhL^D)p8#n!nsW1hl4oKD zPLfh8BL-5=j*`+>Oh+-elcd9u_VWJk!T;pvF)x-FJrC{OITQGBl6(6f1wJ+oezEKg zPHpXRavHyc;(nba^(b55CXN627fv2MyP@x9g%k9g89A-{jr*HOpZ=^kk-V4Ya_3|W zC+K<9T52mil96$@{p1gQroDS5UcSdvYcEVs2~OHh2>1RkeJ1DTCqbOQS-xlbOyis$ zqNhSQ!05$f-_$mh6q{4gpG$Bq0k_TRl?a}`BIi4El?DJa@BYeZh@QUn6I<(`UxJ<{ zfIgEEiH{2n4lqjdr6R^L-NtrFK?q2OOXHb_&mP7+P-obY2w;1=9`<9q9ATI|&ULTU zO+!OKt}mwE^Jz(f$7yYnS+T-rJP+hQr&@kU5d*ODXt_U#J( zZ(;>rZsSs4zaP`iVfm6P-d=mVT%g%pyAbV-ifuaW)-=_mqjH{Z zS-4?mP+sv&FG@Qr;S!Ovv;vLdle@y5GC6WVgYPTDMbvD&H_ovQ4g z*JuGnPuGY%vO|e*KS0ql^2Qm486x#NCvl3NH8wmAZ-%?h6g>_9QuG8DVl&09pm$^O*#X#Ss2Z1ulTY$KG;n<3IZ35*Hi^8)LjrFJxDJy!rLTXnO6}I_Lb`MZz^RbYY)t=uD1$-Wv5L ztPa23A8~Ri-5w5e$G@Uq$Vy+A z?vSRL*t_F?FkU+&SgvM#V?_n|j@l=--qYfXA}&8@`#I`P$?fayxoRDtP29K{$=A72ao;N3H&6|oaJ_F)rChR2HSD!m_f&? z-wIv}_C)r^cf)Y_wckLtn(rdtg8}BI3lk0;ads^1`t$!57yRE#%W}m3T3YJ=Q)w{< z6kpAz5!LoQn72zx@OeGWKk@c*Qd{>iKdk_ksZo_lRn zFTFnBv|YoPIaE!sQgU+O%;Ycl$qQ^VvQ1dIFrQQYIPp%-^|}e+*zSWW75WHU{yPAS z?RD3Shq#NON0~RPD52@`-9|h2mYk%()YGSL?w@}QTnstk0R1@J#X!hT{&dt6=0v3D zH8qDQxxy{;B5*La?k5$Ln}@=lS@M1c#}2o|$Kx&o38?_cQ%%JI@>H6WGX|5-f5X@$ z02teJVVnLwFaH_4+?<=N2&h=*sfJHdXoD0K-sh=lH@*!?2x+aewr$ye(YrF(B0mJW zmuBO_4Tyt$SL!%eJ6sxO>{M|(Wd+1R9#XT4ng?;t$*kH9rzOwqoJlb#2H8YyGUDW@ z?MAH^Du2sS|0xbqEJi}7n_?94mps+IOQ|@C-22WCe+g1gBpg_jAktwrUmOlUeYjoR z6i0grpgZjULki(r+n|aC+2!Dov_$9IBP4OvgL)F|>92Hs9N?J{T3v_%_Km7jl8?`)Mn0 zA?|5L=9fB1F`!3X6euZ11n5zNiR|>dMDABs?qntd$>3+D&Myb=g64+dlkr&0 zEBc~&nPHtxEI1#)Uw&hPQKYGFmc9CO?KwUFc3?Y)QPYkvd`2JwwB2+--B%L$378QJ3k#!!;b;sRj>VXP zkwP#86oddiG!sHW;4pw2j)B3!P%H=wf?yDWU<3wc215eJ?J(e67o_w*oUC^Vo%2h$ zf-l2zanQmd!-YEHi<;Q8yI||mtWvydTo#Y<{uUQC5ZB>k4V4TK&n{5*adAO~&4i&4 zL70#r)C{l%G84uk1kHrNLV_@?5CRAjB80|bgy3Kd8U_RlHiHO(g~1pU7!3g+0BEe> ze-)SC)!$*2L_(DDeL7NGqz%V!hEDSXRa1l5J46uM0l%Nn63k4yuf*=(5zl{sGa+)Q0m;dEj z5(6Ra`cEgy0wLY|9TFG~jDs;CKwJQb3x+}g-(UQz#G+D?}Ir$AE;uW`J!s0*FZviH4v7 zvnmi24u%8cZU9My!J<*?336Txog;1gJKSp^Vb6U1hgYCbJyj8>q^@0k*6}zQfu@Jo zbo^=#dRHL!kRIid>Miib&hGO3Lw7s4%ls#kjUo}pmHSL@U`|EF8pbQT?-IM?53%b^ zIsNFJ)W@x_?6ck2jm+>)M02>hDg$bDjo{wlrN@JWR;HnK*dKY9P|X|U@ZNQuTc^T? zZeA3YE9tZIu6E=?ouS`$+{I&y8yVZmD9^VwKeSq|^-6B6d3TT++T=C|wo1~!oY<{X z!cnk!bhF85>+1D`(i}EAl@%bK1a)oNRG(*H&W$KU6V}$ja#` zKWlc=3)I`7H=2{q%eFaO%MfKT6_8?kHJfsdB7YYY*YbeW*$g3>h+Hs|5jMkLVS^CM z5KCXQoxSVKKs8K9XB;@45**XylTo#O`|uN4TAPud;Jxh_IYd~S!4<}#IdB-eU&(`q z)e|K*cm^H3v5#q}OEQ|KKDQ-xT8#YAjcWBE7VW5%Cr}i^XTQMzg(!?9@;h6Pn^?lD zR=LPf>1rtD`n&o#3kk=argvQ~K?7RiPaT(a?GpGU0ok9U**jXnHe$Xi*IFN;h9n9K zRzk(u@bTEAoTfwQLBp`+02kA4{8ycFR-`PZd+{GyP`8J>9@&H&6?L5#={j&MaK7~{ z0@)x|xTidpKQiI%nJ zJ*J_>K@Bmh0D=$j#h3RUue?XF_q637Zd!scZ`p{5V>;9>3fW$j92DGKrHJ)YAk@D}=am zYDWI@@ngX*_OXtlmUPihD}GFuYFezcyxE;9n~S6!X1n9jL%JWgxUMLc_786KqWgej&drjG+*Blt5A@U$QA6v9iC}vfajnYt z@!L`tO(#&(aHh)m?&ZTJh)CBYnU_l{MedN*kZT)Tb~L6>t1;NaVkS>{bq$R-l*a+u zcwLcGmKS+@`XJUV<%uie|@Mw#MSy*s>lkMGc1TplRA_!f-!v8lJeIIc0v zraAwUHB;>S&|1B>tCiP5l@h_eA6k3V`|)-zCX~(9ez|b`>CW^14cAj0<(ot8)lp?{ z19Q012Z*Zqsh{JH52UNvrJPj=--f+@bkHUfg9a;plEvRp9y~94^J<(P^X6`nJM)8K znm9pd;?`(PV=DP9B>J4}f)jU66lsEznEN)j_7(?G6~-8?lt7W!U1upntZ6cce3*7h z=(wIB!m85b%C?})o6450zkyl$(IzM7t)x?gOmtr65a`)hw;b2hDW}ppwzn8Ww3KLA zVy<%~xCORa$7-1AHn97#t`1^D-f>lL59w^iZ`b#GVIEBM_|lp=JfT}y&(p^%HV}_` zrHX#b`{QY`&dNyo_@L-d=OPu$rm^=%PmH#C-ZdPiH+P-Jds56NYEa9m{r~yBz#zAa zSuf;z_UeEAM%mOAC{(MZIT$543<)pOoOI^e5zj)xqinRp*{27s-p59Cx?j_f6Iu== z-9gszHWvgv{hGvE0iIF(qY9o++K!|;H<~RL%Kj~}*@##cuZV=e*HZN5NAEx`p#*Le zdHoj4V)F5j^^`8x_rQ4Mr68#dYDVE_Jg4QS$Db{x;O0wch?)D(t0J{ojjlg7XW4GC z#;hkM6$vhPJcBW%TcljLu-9g~JlB?_AYa{vtA_Wd2@Yt!RXafMtAA#z9~e8QC-A-? zD8#U$>T;BtJnfxV;-jPQhdzJoy|i>|j_Eyycaiu+w8QmRJ)DkB6s!7nSr9q@{;A4g zNmO_OAy>boMr1Be3>nrvn zL0oNZFEL5ee!`L9ylNrbO!0xe2V4?xQ*he=>#c?V?6@Y7tKXeJ`^Uq)sqVbSM#Z?xar#jhTNzMJl?aCh`mzjBaHs}7F6r=gS` zO;`;|V_Z$rz3K>TQq=g{9?3cyaHlCJ=C(Rlx|L3QjQU{1CIfq^sJl`!c*u`vPS+)3p7~Fb>ke zA>jwHYsTP+H{%^0R+qqK8*dK3n)+$q_t^TjPoorSXx6=qTwyd)@p_&vtm_-Lrbt{J z-SbJEWBI0z7OC^slM-Fp4;*CvcSna(#IJkkJR89%kB02;+cLSW(-9vqKauWos-O@# zX3-(y8*Xz#qDpO+xG&@Ua9>?;HH#g9lmUg9#Kx4Bq+4 z_hVj&Bf{iJB3`9aAM&pExhHWob!mLw6PmnXkYGri-y0S3xXf;X%x?cFM&YnL$o44o z?xW#L_nyvKu+9o-TtO9x6gmpPFTI?ueYE7cq5Fbhgaz};XO5TWR=K^xlNeD+dfm?u z8n0=3GcN+UWK;iG^yp(*>FeM}7Z$9{zP(;E_03mEE?cJ9>%ln0R9-*X61kzCGoH;W z9C#!91*R{cG59G^T?>HfN-|V}tF8{O%Wg%TRo5M$x*q?ouBMqXKy@w1e|*OcsIDbI zb-nbfx{m!(T_=F*>a=`TUA=(ns;t>h2UOQLbG3O-f~S91*B*9g$6!>~+rUe=&=*Ah zs;+jxNBWqjhzV; z?{YHVoc~#c=23TTM;z#eX8e=Uz-zDTGn+q(ol>n#fE}fO`tHbfCF&;0DZiQO?)w%N z^6XrEkzupq=Aepk!t~^5S#c2-ny3GDFw_2Z(IuVnOINbfkah+OpRdI|>NM@Bt{6x0 zU(v;`(P-2Od)G)9n;9C%SwM-5LqDuzVk+|=UZ}PC{_I(GeVX;40klrN@?vk?%!dYh zYki_NnvIFoD3eYPzckz3gGD0ugbzW9wmtUI5<<)NL$wmHWtr%+mlLkc+gFw&q<=E( z$B*|_q*{m$Ic0wu-mOVGXQImBsZYR3*J4!4X0Ry)2AT%0=MuRb6 zKuAsygcZUdAW$eUJAk9X2)LjiRuBSc%mI5YI2wzDVuX;wU>FWD1jYRKQL2f>{QFMy z91s%43I|FP2+8$#NC*Icgg^7z{8j1~$&1|86Jh$8S}fn0Ocsb%^N)2#08=C|2#x_D z{=y(&JOedDqCpryP7t`dC@2s(5+n#9#e@+E2r#Yz#tH5U>OJ>-y}ww<8{hvmUec|S%zqLUwt)!k7JVWNfCzp5SXVFvn1_I|0M7>m!9rk2 zK?Dp9MhijBAaG#P1gs_y6$DsW5Rj5+6fgq7zySCf5;s8+2CS+7tK=N}b6qdv)|Ew{ zpE=pI9*Dqa0N4T77z&K+fawhiP-sMAgam=?1!xoiYe@(MBnmJO17U@sxY-Uc&jThG zC^!uLUj^<*_s3u?9XA*=_@4%2TR?#1AlB=~K!66nR~Can0lyj;)nO4(A>igm<7$-< z3WLRgv3b$Su5b&1>3JXFoV6+(uECiHTK_~!rgaa<-7$i^_goJ=Z0VpkKFh~#z z$C{a;{;S2U`kfUFjn9acSg1(vYtEZdwdUVoR){nVEuqOTO_$~;=@C?xxy*ik{PMZ| z7yr!W6(H`Oy;B(}Aa3P9k^%)nqXaQfERd*XNGt{eATvP_1Ta&>Ku~5NkQvqtgaBx6 zz|Ca_++`r3&I;lPZCD_M0JP?RmlXd#VN1_sQB09Q&9 z2onO_Krnz{hXsBs6afNK7`XIiW>_o)46GUm4HE>0qClH~3!;E&DN+a}2!SHN0H72N zv>6{|RV?)` z^={`COib&n^xx3zVGds-{5Ae~ccbt^f}3c3z|9T?O^|iJdIN zg_JSae42suzT-972hl{9Dzr>mbBVRbw)hg$Fo59-2>? zAf53!O>|qXZcNV4qZ%2=_jRFXA0&z70S9Vc$1JvEh=_ErTwK{h zQ#7Z2-WvKgPw})>o~UuvV5`~470FglX`0ee6VfkqH?B2MRu`hXcbs(V8-GL)7r8ud zoAlwQ%Y8hZ)k794uC1ic-^WCqZp>-xIb9FgAlT$>bMdve3P&yWMOn>nS;=8v7F-nM zE@F1qRPz2RLQO)Tl^3n|oYpJaO67B2ufu!?7m2pUtMC`~LANkhc1&Cs-n5n4)@#IF z;o`rzz4+9o`2KD&QTh+_@26a(MvYN?)!9E--#j@wpq(S2^yLlZ7wEVpRYh*c9QcK* zPwqofeAma ze0jtUGdg=spRes|rU9EjUFBR=)S#;8$*^ed&SJ;Rz?TZS_mJ^$Ow2|x6~Acaf_*X7 z4Rle3?kIZ{!7Cc;1r-YahB(Kxl&ZXKS>%lY1+t_^EpL6(XlbLS2@KBWfqq%JN#v}SU!nIo45oD-N?CkI$*E)cxpRzf#r{VEzVDOo(<8B5 zo!+IYUnxrNFlmdY+}w3{2pK5Q$y8dnj=b5FLJ-iM60Um$+Xnw&jw&_={6ev|LYKJj)%xH=h1W^4Gfn_~Rnk!ic+ zXZ}6PO4_Exy_zR%41GOTwu)^P2a7zHg@W$A7C2d}6Y99eZJm%LxwMA=szJuSwsUkY zjW$Gl<>Ul6=b?Q?2!FzyzOxc!1BA zif!nH`{x8h^o5ZoX1%S;0#75OeAHB`=5jTX3-8ljEqz25))@?*bL}_jBNa){gR<)$QIzV!*D!>ISC?KBd<*qQD}Q$ycIXV0t5%ue zFsR|G)tJ0QNk0+JUFd92NVc1>p#2&#*7W@NXJ9U2SbvP#7cD~H8#F2Yf_OHAR!?Kd zz1`lImVYWpP|$5bQwIs=7p z=t+o#mq(r3x5rB3AG59_M6Ti&Z$7wEwz`!iTHA{L@fp1&26V>M18Gcichyc_yFjr~ zi3hku`q|-KexZeKm4n9`^;0ndz}sF0+mlLTyRywYOJ3u#+3s1~CKe|j*AM%R?S33X z1%&P&`R1+er~MTFYZEqzRMW4A@TyBLSIg zIIhiN(Skth1>LcOSwEH zMK)RdIE4B5$YUx?F@67y&h>ZGJL}GV^BRvLQPeU{YnM9J&iN7zojTq=*?*^UcV$;% zV7c(4YQ)D+FXk_umOOlsgkFVfu*{JfAAM1>AV7HU)^N5HZL*~PdjCYu|QTE0T* z0jERH_?mCMxx;|YH18TQaT^jpx2wUzM$a$e2pEuGP!b!fW*pMZz6!O)pAEh07AyM< znXba>S9#jMPRDB=Sh;z#LeD25J(uam>8QZbvMAWzVu#?KUd)hr?O=-750R-m%`9R` zDg0Oai0&q(YZ|7Gi(kR6_l|)mNAZKcWaIJ=;bSjtceSL4CvYHkFbl6 z1}WhDdZzenQHgmp??76vyV7C<*$Owu<)6vWKia=H*z!uPB`PI9d>F#V(p+GOtk<|d zh3Da_zoM+0Uo?n29C8Zw;X_B?XwXTGCw>;vc1gU&-Zp~5Go0-@R7O&j7R#@wYaT9D z`4DCFj0naDX>C(gI2c^8>6}65-Kt4i?CAzP*(l90LWB*|Z+Kk)qUgy1VfCH-n}%IAl6=%k{k0#;^m{-&;jftkOD8CU z)Hm_s>9E87GsLE>h=MNOcl+i)iHaLVRMO86vm(1vZ0mGYA3ncS*(l}tg0u9g6;qFn z(onYp;oZsWZ?ZnPK##-i7kiqAcs}mQXwG z)#M#p>{l8g=c|#tu57^BP+h=8`#lb5gXb}=MBfdUy&PNG_95x_w(94lc1aXo zf9jY_$cGIpn37UbrEa42)e?*qv4Oc#SK{wf(nUAiiY~qFpQ)19u)o@!DgX6GJPmoy zv{~~jsSB%}#HAJt=RPQibS?KwRCESwP8JMZ#hndtaMD)4@^GlGgjt_RP$g))e(idz z}P&znXVdI!@FukH(Sv_3%$6K$=Z`pP8Plw>4&jf3(>zffRk-jMjgzpK*HkB{)T{;N5MPu!-tH;=l+W#*OMcyFMA@ zM5mhL)vQhO?Nyzl`5_kF#zqSneecpQb;KN9Z}z3$>d+Pwqv&0?cc#B2YdFaInUv3}-&X6Y79uQlb&coc%16Dkf5shGy-R6=zIC3s z@SV47m9h}xKG~*mTb1P3Sd*ESRQW}NYfg%VYtUAYi)k-wNlNdWe@=f<=7)N5{1Y|a z&7aq)Z<;!}Y~DV3Jo&Lc2t0wkVf>j!*?5NG-1 zedjids=w{n>wqq-wQF{b4bZ`^0%SBiv$XlA)#Iv z*d~qE4NqVyWq;Jt)6?@cm|5nV^T;blCcFLPvKk1_=a#3@cFxZGHV10kTd$`WO$=jR ze5JLv3S7FY;IUw&b@bzbVt15@*@=m^MXa`koSlZ}rx(UV)m>0$KI>>IEr7e|LwtnFmx3N_ zH{K~x2R4}HM2jm~GJuH%4RNAJL%z%a1~9PzKfftu?iDuXdI90wW_?TGHivPh`u6Hu z((cm9TcuP3dJJ<dTGPTy%=SFFlPA|V5 z@qU;X&Zrk~psEkstBDxQf|>5s*yepTIrtDX&vo~+JIlSamc7TW4eG?(e3hpQjz@=Y zR9M-8qv7u4?X~m6M6?H9r`WPHT(Ci!5~h;8jfa+hF?&B3F4s@(78!fU=#D|?3! zgfYCp>)fLga=RWiT-Ww|pk?Hf`%>h!uR*BU^C`4j0g4Y=TPIV*cg}?r+@EN8Eq8Y?3@duV!&pq)~-fNIk-Tgi$^XGX+O znW41`wM@hHpqr`2BByutlS{V+MJ>h3Gbo|app6}|nO;C!jXK%WXVf0Y>rn(WM2F&3 zUldrM(NT1DnazGuc%XJCJ+Gk9_lg5V?xy$HEcyJhCm;Eo0z=%KJf9&oh7}yOL%OTV zVk@;RCk4*wFIXWEcBrxlViC;leC?|6S8DSgCZvaZ77QIo{ua|`Y?ws)? zM}7n1rxlwOY<1`Likd&L4cpA|w)4;a1n=ARA7D#5>_u5@onJ=B(*Lv{?!w?1Z{ED$ z#(p=(;U?-e^kZ79N?i4`hmm0$J zZ6amLm4z(L$G5|;N`G#8Bton#5;d^_f$Ip( z$r&nLDfB3mU;K1_+i7|!J`N+@@*-S8ppyO7_XPs>Whd42v=i@hqN%wPZ|O8ox*Ghf zLG4EcS@+*I9M6-4W?GLlAq-up6J=dG)jwx^9cHV@KoSt=@j52BzLr8%ENm7ZNC$Gr zYFU)N8*ZrGVOORVP4it7myKn#&niTt_2Fa#=TMPH##AkMHLF3_jZ!P$JET&zeeeHN zknR(6uzst0_u~coB#Uvjdt&!b&vAIU9Tzvq-mth{T~;>oY*JIw^Nu1UdnePZ+_c-* zA^Hv4GTz&98mh~7M^p%N?mCmVWU+2QK+-CcA(Zo_FExD+Ok+ihFkERwn2-<824cPt zbKl#oV<8&{&Mh0q+8v2wS^Ok!h^jB!EgsIWQISpK2Jtfzk_ed{9umz zEcMdDhnsvIN1+hc`x$=g#KRmMYIdF@Kkh;YR6kOiP!747gOB${BxNHdFM3uu#~Qah zxOZGbd#Q}aYp-~%LPdPvC!^hFY}Kp6S*S(@rgyD<-<~$~+EXc+*ES%B2A_E*pUBJV zGnj4(x+-tybB|KX1sl&q1D8+y{u-!G0m-SSz;k% zfkZLtc2RZg=+Tqh`pO@9<5g?gX-3~pchS33t+11i-y%5Vd5@_vJbZ`=L$ zX!`>!tWL_yMlO(|BWi${cODG-?j_D7EpX^{^zm`}W7CvZSC2^lux@ksvGQ4OTm1CU zmUn#V#9j>0s@}qRjPKajhQl@p7L->2q4RP1zNt=%OKz0Q`aKA`y_q+cSC-HA~oS(f}kAtHn?f^S!K$R??Y;@4<5NG@P z_GGjdx`MNPRXI`rnZIm4@#G7$YoV}ynB&sdK7iu$3(=oClS;Lg!ZK85@0$uJ4~lYv zpiY+-lh;Ul{%{!n({vs1)WCVqhTmh_rdxSHCirQQCAE_M{_5W9K43f!Fgw9@tvb2 z)?dcs_ZxaxCn}4(-y^#5hBe6qtmlUdEG7K6?=}J~I7Ro`Ibr$dc?NR;vqMkOKiifD z=NlLwRTUhBU4A!h``gE&0q0{8w643F*7}rNpLs6DYQE{s@j7E+`D(<@7Z8tvF0}K? z7#W3qY1RI5n9C`Vo>BERi@cqAx|QxyvF!dA)m!u?5;jZn8-q3brJ-?mqc4x=75BG> z?0qQ!c!@e^yu@hL-kT=kXS_tDu73dbWrO8Qg;BCUTwiU)0b<9egByRts~2Aclvh8+ z>gzl-%01@Rc__eiqIhOLPR6BDBRuu(NoYjAzRROQ_GlN?I$6Yh#K7(^>+!!}?2pgP z$36Zs9}i-UR!@&Lv+|faoKp1+KVv64-Q~zRbAKIA)+Pn=trzD%K6w=k{g3p-oz-hg zqs%)6UpsE^^dCV_fj7L+?uLRyZgI=Ay!z{GbWl#^nXkKNsv4*`5Ar`UoCzJ@u*OU1A91ThmjY+?9f(~gLXo5BXG=)!;?F;2LX&l z(|tGZl{%Sd2;A+9k?>miqU>={m16p?0zU$SPelYZ&F(dBTl~#J5<(0aXQ9l4<+N&^)CaS;}48ZoVftwSKEb% zhsD#qzl>i2Vj^Jt+P7mFh$ANM0LHHXv!fEn>{y5%Ji8yR7FkNP5aQJJt5=%8h0o0c z>iR9a5<|bNh&6Cl#AoXIKvDvZfk#Dv=XlPap5r=V+HaI=0DJL=MV?hZXII*hb*iE` z)|qj8??e8@Gf(QNg2g|rEUNx5&%m^Qcm@LE`r`tYSVgaSy7Paf&DpzzVH_doEOQP| z{zK;cjW+G`i}QHKH^Y$;+48g2Rc_1P4oY-K7J#jK4mY#lh4|&&1|RTa$P> z$2h;jTxR2m4j!XX_h2nnU)P@a>kLCKRVm`|iz?Aye zJ)vm2EuBU!q48DK@1tE0Q*q-ZY~QP0~>DYW)*B_5K{|I4*& zv>h0m@3$TJK|=#%|obg0%qJ4)2 z*0|3+S{%Xxaq_=e@}Hlzuf$mh<-gPG9VXMcZFnr4GHSsIbmHdKD!txT!!z^gjWo<@1iA119QW(3M0hV*6Mm^a{+X)ta&q59xn|nE(F5`0 zKot=Xy%S-q;#|XXU|$1EoiU7X>7x7)Q*1Y~;1yJ62>Y%LI0Q^g7nrI%Gc~E9inb{v zSQTr)2Hz)Q=X)XgQNaGk-ui0U>%PY&@Zn}92G^|BB(l<5vLIXL>02JS2K)kOz?wk8 zRg3}}uzFDkT|8xNaO_zFRxgYK-8mPJYrx|`1HSYEXuu-Z|7^gWxCZUdQM>zTKA zDJI*9apQ-0Te}vYgz$2dg8F|{n09Q->JF0zGX)J9-_$t!63yGRf4zAiaaH_H5o2+H zK~-ATr+5>4v&wFqr0&D`4h@WT+vd)`*U9?nU#B3if%(hT2OnMz07v^`zXd;W$OteB z7_11HL6CxG0E!2InLq&e3=AoRK!5>g6$puefMH-TLJ)~D1K=LOi3xyH5(Z!*!h#sc zx&eOWf-;R!+dk(+n`Z0q+ub#>o4Fq~i2CU#1>-LgnN7ZZ2l`1@zAtVuWoxD7ww}3i z&2i}7gYA*UcRy#QutE)GWv)`U1tI$U{lhhtGwOvrChGK!LkKdmh|>p}PA}fa_H)lQePFgUjT_tFE|48eSrk>u#9;gkRIe4NnlD19R*N3h_$~MZUJsd>z)P z@V$ebVcKfKJZJKJK*k-Dw5Tn7jnEN>m&#R#GNAoU4?5EfOef%9pgjQnc-QRcDxj{u2qaIfex9sw4xy(U!l( z)bP@bR+;Z7H~bVDdg+Ilt0ETVAv=sUx5lJjI&x)3`M1dke5CyR~hX zl9mn;=?>|X?(XiGOr%@7ySp1CM5PgskdkhY29fUW-Vf-z-gm8UfB#$f*Hd;E+J^aJgcPCU?H_irKFE@;2xYD! zfMt=(UPARVNJ&y!|B$xZV%NbHvo_Q5xj%bLl6xRT3i@RNq4}u+lh(Axv6o5nGxqTO zn?7@_kyk{WEH8-f#`;@hYUk_2qD-YioZADS4A$%6dV87HM++OS z1IblOZn$GX>1}^5-rt*PhSljEEY4uK#A6~*F=_!D&yYDFQ=L{=bd?bQZg1V_yRp%J zpacoN$uC$}s{ z6k9^{zE;VT=;8JrgNj;^VcnK65#65@62Z>GRIbZro7kY0KWjF~V;=b|k)>Plqbacz z$kuk)4=Vbdgw#uJXY0@esli*J8qf=dnj}AHTm~C+aZyq;uw%2K`id$n_xrCZSi>cR z8~d`BluKv*c)B2IQ3Un27P_4 z_-@gadHgvZ<{*`T*Dyy(Dw0Ap^62^{ziEtfnhDRVEk5D~Z)Zn1ok;-_xICuM6>;)! z1dD1YuWbcskf{-WVC0z2g9p$GHPs;N=|%@D6Rh?TMf+~cWZMi`iUNKXyR38i9EA^i z!z3CW#QY=;!&UDyz{N6oU&%2zAJ+i8vn?~+C>%5R1AdqoTy@u+YxZ(H;%d;|-IpyF zQselL6C_GW*pWAwQOV*Jibdmv!vfW}Wj9yvs@SXTolqr)3saRPn>a_*;uC;vKKnTW zJlW3F+mMfw`+_Ke&?<<-B^oYONzB(_f&RGy${Y}#gZRw{sXz4GTs_Z z{v5CBSralrAq1tzd5$Uc_h9C_3#lbUm(!gC!rkP6v}Z`af)>3>8U#KhMcRT=2(Nsf z-ONp`7&UIwwhji`_ z9=VAuo8U@6nKxxtU$f*!)8FY48J2QvIPMPN3W(~weQQ+vaYj7Op|7XEeT3d?OYG+d z6HIzNFJCIJuwI8ehD`3jFJfnE^X@_WJsWQ{-L$QjKfP`oUgjcKuRV?nqXCJPhr-UF zG?J5*#0+`b*yO;2H-4-zOm6zUTs!;O)f#{|cJ4PlxLo9Qdagus-Z5Ltt!Z={_vNMM ztI`E~4|RK)eIgc=&2Gp$u3s&yGOr#eq8~6eIEOv=9)4+X+|%KUA2MDquNz#0qaLc( z#}2%--rt8Aj2}wYrOh^MR4zjPy;>##t7TS%NH+{1@9rjodZHD^?3^4X#vtG*HrNOR z97Sej1r9|6>UnNcR#u>?nTg$miId9|%*1X4^cEYjv6=$i&&(!9oWQwhc3=TD*-h}m{oH0i_g%|p!$Eh({R0vEm!0S0$;TU3UV{2V0VvUk*)U$D1W8iCU(tgN~mT+hh7V< zD`1n~$@cuj*vuGMa?mUm533#d<0|M9YfNwG0*uSO_ChaA#7IICwMVeXu=Yk4MQS&U zfQ!IQlC<}QIrCwW%iWwY#&@$nlHqslZ zV>MGS{PGbOs`^iU5w%a!D!v|%Bm2ZOB@`jGuN-%2LP=Y{s*l{W^6`&q7_?)8c(!>=U=n} z>R+^?{gGA}c%S6H*o6BRt@z$Km&$$iL@Sd1jaJGr)P&8_6Rlu+q7}J4j`+m3@A4DYEUq1A+5uWI45QdZE|FM6 z^)Ffx`7!Pft>8#_q7_C@w4y9IkJ%c5IaBC~R-Dv@{-G7Gn$J2l`krV-6F@7z-C_KL zR)o#*{-G7DoNw^!Jyd?|iJ6%e?8|y~3=Xb{7|CWFG^IPR8j^b6rx1Mz1Zahx0mZsH zS<)m6^(XSUvJ{bOQD9FwGgkXgTH&#k{6A?0sfR~(%6fK$)aGK{BduVYxv;pMYjNwq z6==h@x}4$p4_Xmka{nboF7^+tFo%4k6_<5&J^!Q?G{0#D$G>RBd=)?|##*ARnfh-( z=PA|-JRGt=(F(kO(269|nhuiRw1WGIR>Uygr~izn;)~w+U$kOBKoBUOJ<$s0f6|JGKeWQ|iB`1!lU8g#(uyz0O5l5v(yQ(6 zP(#R|7iw8G>1 ziB_EWy_W!JMKD`y&TM1A0_8Wl=#`sq&#;rs)Y{K{jn;`IRsCWm3yEXV;G5F$JOl+@ z=%HfkHq%{xk!U*g;!lKlK+cs2DM`h*zCBW_0J*Kp?_Vt}a)VUuJP~Y~5 z$gxQ2w^>_Ye~^z!{uu#cBz)&4xiXxiWBLC@E0EEti87;2dAL{KZY``Y5~1d$+!=UV z02Snlo?b}rgug^~Z#>@}(qCmi-0~#vIzGH1EFX__BhU|-y5iQCci>PJ8U2&vp10ZZ8-r?b4F>0>o;h9K44I6alJNJ=R*=3n)tRaqcjmq3Ex z7<+rpW3JpWRz3scgWAd3!5c*DZ#PzBduy2ce+-Jl0PL~8NO-+t;IARl*els;_I58mje3*SAmwexX61O^-BHow z^2eV@6!}y*qqtdgd&N@0ee+LFLGZll@4Rhg7YopO@#qF*p%iX$<1In|2D`SWeLKJ? z6#f@Ram*C!@T2W#_`A*0_Tv0(Z0MVK;f`%gffTSJIy;x zqPJperD4)Y`dBtx|3}$Sc}0Ci|Eb0pSAA!|q49TF4$u{f+jI>i;`Fit9UT958&UtO z*nFn*u(PhU`G~^KGQEeA9x=sHl$;7=AK+ODt$e98VH@bHUh(@EHEL5)& zzTNFTJjN-qUDHpSk9BS__;eq;MfF7#Uf3`IutHmDzW37;tf&HDMXAQmIiB1(@L5>l z!_EPjsS5xr0s&Z&8pLG{CNR6!aU1(GMYv9E*|={pvrOBc9FOwuQW+FF%%9I-pYJvAO>rj| z%Y|)xyjliJYNDb>pogz$*0i zRz=t~4osLosSE@}&E=8fKUt_mep{%@vH@b2k<$73%T@l5_cyq8Koao8!Lq-Et}T(? z*^+Cw{jnnttV7XboZ=Ygw$nD{Rof^vw&^f0a)jrUI^BO%bfdekC(ThmWwkXD4TCH) zf1Oy^_|`)%!%DK6x=a%|cb{17Z#)4D-tt<0y_2SNC%FbnU>fn37g0UNZ~qIcVg+FZ z{ivU)?|qc9-n3YiJ(A4ir~FGGeLHeK@-fcs-{udH8HAT#it-1iTGrhg?5>Xvs=OrL z=Dy3Md%~%B{}ZP^c@&E||8H>Wdz350v!`c7vRDq`$H%K<7XkpMdU0%>A0u4+7Ugw4 zJ|j7OYo7BAa+I7LkAZOn}gn09|`o?nJ|NlI9Rzj!K_T|tXzN!3b17XJ%NCQ3d90*Tmr|}jk!#L zKqfXY(C4~0%#PD82=T0yzgSaQsBWtsX`iC3xVyx@|MtRE53Yl0NA7K&z@k+?gxC=iU;9X?^8B_Bg850v1J0}x68#B-e4D^CBvokTXngHeF$7(SU#Ad?6%?x4( zv|C&(V4%g29q7sA0z^?j=Oz~i(CG%W$pSJak5^I{!3QUF%P|*}Qj}W>uzFrs*`KDj?OZ>HCnSg0=SWXkvQqTU*)8^2fxa!uUOSoWvqsN1)`k zdu{9{%j`Vmwy|)E3#2VYg#*qnFXl*7Z;+EJtkOoZV%8II0BzVdOx1eLDvf`<=zqXC zEDZ_Gr>no}2{&ughNB-Ds~nV`H1*UkFWe`7&l8pTtlrWWgYq`2zIBCbF**jBa|wc< zT3C$#yO2^YHOew&8i#jwdGRr}Yw(xPfxbr-_>@@}Y0RGK83z5SB~yfcqd$aubieuY z8F_6`A?!PprwC6~ki0TqLVrD?>~1pt3dfu)_|@I1vFB>VlZZ(?;d1~I64VvNqljq} z5HVp^_`!=v^h;G45j0ss$vi`&F{Dp;AXr^S+xe-_j1A{e5mAJEA9L+u%)K45AOABf zGJ<47DjM;p$BOGn?%SMBz2FJec#N6%mK-dC+YN}%k!j}x8+~!5GG<;De2U+ihh7>z zqaV#i{We59ftWG8IM?{@S5k4ss)~K=GHvkwG-|vU8vLZ}egDek#RUiSRk@9=wQVrf z$R~lh)d!J~iR&8j!a6IzNbKDfH=e9tD1eBGO1!d)Z` zvVu8#Lsyil+`{=*h>KvWzQ3qw_Z5Wb?pVh(Mh^FU*cSaqd zak3NDA0Rd45|kuiYI%v%b-yn`K&kuS>CvC7l+tgkPp^G^o%=phr`x+Y|8kmtD8A3M zUF%DBw1Tpp9tmY_EUAQd|7m4_V5QF##G2MzCzgR+lE(~*rhIYnbsCO+G0ae;LDCYG z469Y;=o-s6E49}pa;DprmRt8%9O|xU^RNtAKV9t(TO6FWHpfN_-^)m7WKM8fGw^kG zs7qq#fokjcq2vfd&h5o{1_m{wKiJT9V%l6_);0H?Xusk9>gUB$k$CoXcDO_Mg5v;J zMYB6bI_AU1hYIm7Ee!vj@ZOxFwR0-wvx_$5Mo?2<&S>EdEzEU zup*X_gP6{FH?7GYH+%fqOQvpexwMO8jkN+CH{YL*^9Lv7guiM*yBZQxpPUgSqW8Ju z#X)d*E}7MDoy$`@ep;EC6-7mAUG(uG*9ql(;J-Y?kDvgRRj!>9ze6cEJ?-bFPHBzR z3lnpRpU~}#&*G(vitQS(<3oNGJGEI&MO9g{q=c2YoKv`%rhrSK<%T$azy4zc2j_xC zv}!d!(<<{^q=vtiKJ9w74PW3Fy6)LeObl0T$?N_^f6Y(wmV6q6Lh@%DHj4@oXHtf! z%9PtYL$&!VJQ=dj$Ds=Ro-u$60TGi)IN%By z8U`aL0v6?)H>}DjGewRAVgtnenT@`AKFAGfUnhU|!WL=B8O|%=DS#(PTE5;%JjWq1 z(1qKn^S(OV;MOzY1ymtcA04z|&Ec6!uWKtIeh95>Xvi!p*DbA;*F{T%#D5}ppJiO6 zT^WeB9U7~b?D)EQ(a+fHANe2IzVD%OX}-b4iC+U5Na^TwjAK>Zy%Kh?xf^I;CFzJS zXe=V16IZSGTp^LJ!UNkFL)3L|`4PmT^wY<2F&k^mr)cdlc=mCAl&C3qnhh zYj^1d>Pz75w))tuPEa~lgJzn&UWaU3$md&lGsw+`q&i!a-ab+ACbsWV}z zk6E*>#1WJDcCmf`zG-9Jr>fe@v<=nK$rAsMh)M5{h{?NL$HIFwbc85&Q+-yuqISf> zMbew%Lc+jl`pRc?%8%Fa|3jGcf7U7e|N5>Eqv?HjdI9E|RCkaRRaAofQcmWJP7H<> z>MfHs$uXKiO5H2hRMcf9pc5mnWAiW$YGcU<6;}x%&uaL)3Ao@j&sa@N*%1x3A$~+*48CO-$<|9UDVz(z zcbKa>+!69-#t^HV^F`yiG9bTshG2aX#x z1_`a~O}E{e6ydxpym;zYJF9X2T@zwcn}5|f0Cxvym`VdsA2m#dj0$ua2-%HiPM-1GFEFy1D}%5A4O3B!-lK*I zmU`IzkA|tN)!_}_#Bfb04~3oSEv<<)*gV*pz2Q|3KbhedP5iA#4by_aVi#gImJM`@ zye%&mJhoQWqmGWr2uUj6a^PH8A=*oBH+;4mZx3kK=gw~^#ZstWP?$fb9k$$BU4=dG zZB?G!un<6nUzlq$(CzKluBf1BuigcObb5<4!GMsCh3)7geA*~`4V19>mW0r%p@;*2EV~B1Zo>u2KT`2FI zYAs^zRyVWB%-Nh{Ul zJu1F=8eoDBP&aX96-6ul`6^A%*P$QmU z>dF;SXAmw8=ynOZ-Q3revy#0_v+fH9y)o|6TAznU)-#ze?Q1-jKW8Gagj$L23#9!- zc{R8vU{Qf`wfsiE<*YwMU>K4Nlh!)>twGW_i;Re`-$Y#$doIdLffj9tl!oVwKF#kE zy#%{Wxe6DXh6f(*5rT2674_Z@?0_bsPd;XoIOB`)f8ANgsYw1ztRwSg?j_-eSb|No zc*mW$lUWFO$*T+O6pI}B?Q8g`O7VBp?=e~H`Q%Y|ec7~3-P*@375i#G%4%K@xRdIu zK)DD9_9Ju25AKaAQ9wUgm{K&_GU#)A0Si+F=_E_Z6m05;?1I5D-Wxu^!elSJ{@cP7 z3|N?Iz;CbG-V`%Q6fnI!6Jm423e#6E18cul+Q-Y`&bqR>#djvEDk2N`BlSo+Kosnj`S zRm0&y z()${3K3-lHOZUgR#NVDsJlZW1C9tNjG6wU;T&$)>-3m{Wq=|5N#i$&rx5KC^7?Yz; z3LfPa7lC?7)a(a2?vtrsTzKB$VOi4!j5ebdT&$Zy6sI$#D#LS2n zQ%~Y`jak6AEM}4ZjAmX-{Ym6}{0lK2MI%KUJ!5qwG8uzI=`qQi{*TuJd9sL(#{uxc z4x|iZ(S_7up>z+DYxy7WR+6fyEc4JI*FtOSWix~2uGDgTT;P=#z29>^uYoT)K^mng zsLnj-r+3&VcA`KGI&+(8htBp7t7X;tUP^rDHtQCLmutVj_khC)1&}wy# zV*Q47Zdrtdp4|2PE}1fnK_>788oE73;CWVPrbg_$w<8V5$I$sfbQHFA-!zgJ+$*iE zoO;N!d$1ub^zw4cMW8YzdgN7U*?xWiZW)q@p+d+ijRtm zFG%H6fa2o$kJJNVOHwWjGE(cNw#kC>YorE zekrOG`;4`ZDer7T@o{#t-5AC}<3N+H@qA4fQXK9Z*vQ5F)YtZ z#x}k_G}(td`}%pq7ft`B;yDFNiQ(txd{C1g`io}8l!wJpC8Iw0B8l!C`fIWDmyMH9 z>!-H7U-xBx9TqS_7>D%y^K~!2*rVY>Ecq09*Lxpar2x;RsDksC;bPDw6!y9%x*gs^ z<&d4kX){03+C-{76@GqD3F45nXWYPzQoKDWX+^|Oo@&wanOnfOfSlXmpB|{2t+gO? z$dym*t%^b^O#os^qNl7*JcIm_p;sLV&zoDD1wmTq6FO@jU;yWOY5pJ!i2R3_Or>JmN2|uUCLMJX} zy-+WOOKLg2h<#3JmW?WvxTU~v_ng$c`JEjs4xv;!{A)xBPjWs3JoyzKjDS@?@?Dmh!T_w8 zS=mlm15sZzH`W{L>;9c4@!1=~1g4mNYLo*N^Z~J1aQ|9tbX^GV@qo7E_%WeZmI8)TBLU{M7>_aWog%S1+?vq<4;S>@Yh z!EnoMG<2bi>_Vyg?#16rn>Dbs$t8=Y+X71)+MlJ(gcHPSYQn+&z$_pZ4nXI|Vge}K*iE^az{cQ5V-tBJu(VxJ1iu|f z{#eonoKyi*$K%B_t8is7*|vXbrF=Z8k}(2Wz13kK!8EO3xG&MMyS#Dw{H1vZuJ(@_d3qo!k&hgPmvKuJ6J_N5Bpni8SH272d2XOUg*>GL@PCw&^PlFI`8 z?0itY*4ZKEG*K<)M%?4F##i$xQrYo7>}`~m$COHn&}gkRQ~JKPpYz#tH-l3H^~?_a zbJfAZ>+>nf+e5`F*Tngb0pp9XW=uQ^^-wmQ#0ew!A$>#f9vdZ#)Q+O~OMajIyGF-1)B=qEuQ)7Pn6hP)dH( zEmF%vzZTx!)pbJLqTRTu3eUuc*9#XmvX+lN^er}QI#cDa-wk_Z|As}1^yq~-`l0yL z-JvKYJ-IqYBxYLUcTW#@8BfXJX8~^)ZEX@at3*Q~(b1_Mn`(R9p3S(xex_qwWYn$) znrbJ{dTMp3YBBSxS-LQ}LtZYW)Nrm5iObqM3!7*qB=CIZL$%g<3KtrouDCo6TPa#tL$&A+2GR?^A@QcGcdu z<2Q&2F5)7jv~uU_A3DmaHWgN4#u$=dWi}={xJ!#m6N%ydUeMgmyN1f>*(kDF=mDTJdgy;IXjOe^8lLr>Gbg@bMwI53_wduOBrRoF``)wZ7php-y?5;Qk z*einLk4Hz-Ywh3)aWUR?j3g2D4VIDcbh&5TVV)cGj%<8xB3aSeOghNa`9(D!*~YJ zNjv=mSq2oHUA_|74U3YW=9t^o&Yvqa+0LK2_*~5ndnM9oo~_i)eh-*V7C=CQF|)Cp zm6?Fgy6tBqYzMhx{vu|v8N{k-Pvt*%(Ks~BNJ=&m?#ZbYXRMS}m{J!{* zv|KhOI8E=1<(Rh+Bn?TF85uWc_}OwUm3uTj`ywA#M>v$J$re5qVM0{+o!nUB0bDxvov^mTi@&LwNKHr-X-A>|J{_mCr=TOHHeKQXG6jG0CI zsP|bV4RH*vQyCMNn`Gsbkp>kc6T-H4EOJZKae2`1PNIF7q-QGAyHdUjDrm`uG_zxo zN>anRxAMT@9QA^1%{g4%Q~2%uQxo56CcPh zV(_zW$TkY~dyf@RvqS*xL5czI8m_`sr;7Ix`9u1j&k`TMINaKD*r}4PEU&KE@cqYN zU$Hmp-;0?Uu$Ud5IL$w{y8LBF1dx$20Fl@qgJPJux!IY(%*KxvSY~cOjA(4a2BhDB zI9Ry2ft8JenUw_$Vge#uSiyiglZ_RKQZf3ktd75Jq(NW&(gX$}zQ9Pr8h5Zk2gd%7 zfD~h3fCw4O-=u*7YW>kS8Zj{eoDK-m263^Qa&a&-f!P2{A}~l3BOrN)jgtlV0YIP$ z=u?4nES#)Buf?MSDROcFyUzb=fE`3nPS1C~c>IG9>{CEmid#A)3^BPcvPdp< zxNvOKziTG7fpJb@HRj;3cy7`+Q|*wM&xd8VhE0^)3gOnW@ zsfh^_5E}$Ms*j$>N5uv=Gb<-R?^xOXZ?a{qd{O@0D=7_(hfO5`1_X@f{U2Q?;O;eM zWj6vYJd8{@m{_>jSXhBal8Xaihk*K&gA-8t8Udb3pz{RyDd0pc5I_!q(WmAUQy|NQ z`@eeJo}O4GSz3-zPG1PDHc#j(v4lt=tH4z7%UIGjE@T3(d1=_L8k})W#k_P_Q~V;5 zKn!$we*}Nue}6y)fjjVw$P>!|xC2vv?f?t$d;?lkz&Fau1;~<(!5k)>Y;2EDGdCwY z=i|3H04U1@^(rfHRqy3792dW=`O1|K$_^zgTG9@)LQea;#7lvD}XWWnr_k`UOw9QAgDOdh`*+{9g65<-XwO z4wvgVh4}iy=(RdXh{F0W(<>Lk|1iBW0E5p#fps%Ogn($%gM6yp7y;E9W^NV$k^?vy z1Tq3%A2>~bRwZL2=0`{LqZydZlnwX-6E_EtUIb`h!Nw+lI|fjd0!nKp&Rxxb1U?Bg z1~Q+{n70F;?P{%ejhG#4(W8a?5yMpeA>Rn)@-Wn%=eYfU$=bbqLGJdt&Np%!;qAKcY8DZ z4CfMRysi3Md|0djv3l3kMD z)YTgr3|zMj;%9311{%q(i*cjAx3#z?7KIHg?I*H0+k~o|QmNcEtmZU`MpGkEx&AAZ z4Wc#n5?jj^snngAuA}Ux=@fczcznhRPEaKJh9g8`+;Mc_v+$jHo z;mTo#X#VC;ks+$!6lz$tO$$*J|842MA>m*CK3sRuT8l1E$|w>G!}Vu3p-Hv+v+czdy2aON6_Fi z&TJ%l;+oFiz$-P68^KQ#`K3A!>_-(7BAr#g8j`kKT%MaTW=L*sliGB()f+CzLIXszAdf zG@He+@!_t?dhESb%^k#4(Epl3S&q=)MCq6EGV*;l!s|M-sBNKKtbD4nJYvC7y$^fj z{fTdp2B7u?$utK|p?CvXx#rJ>Zb>tEJHCM>EUCfd%$@C+gI zslcozkHRxlS0k}+T>=w8RUmNARCkLaynvBx(3Z%H`AHb!&aCU~OG1v~qTUaIRQ%q| zywde0DsFfUQuyG;$z{BGT|H-9Pgj}f_0Al5o6yJvn%v}REZz$s#J&;s0~}Kw^g`mYt{?f|A@*GKl0b-k$?p(*lOID{1az=c7ym2lt z^#`Ltu!Gzx3^4oah~9p$(GuWB$zkpMz(y2s4D&2Li2MiH=DJjkl}U0K+O=S62;^mg zS7bJU775w7FYW8jE2&k28A)_QW(3dzxKw`X{nwBJjAt31Ra-R|q}Rgw71u`-3iB~! zny346x~{2Ax3qzzy<_9)gKDDzCHK2QdZDRVFm`Bf^n20Fp1pNo)XWlsA=JePj1s)J zlD$4>i{t3wzvxGleBnMU3OkapeuNiIoH_2Vwi$evn&U=|HFvT920GNlI(f-Yy0-$M zz)(zvSOAMJxu|vl3W|&l>D?wIp#m+qX9JhtEFakyUR;G`Zh`jG#2@M%+1h5 z2q42A;vM<4xMlWN+;8bA!OR#VB9p%}>|Y!A^647>E5n|}p6uTUH%p#hxs=;_(+C{zmG7EmTn-U2UH2hhS5Z6q-c;I}Rq;%gJPaLp zEdIDwrrvI$xr_5)J|~#Dc<-$je<#M{5=ZQfX|ZO-&ss1>Iw^L&^4Z57_S;f*4}=5cZf*dk4vK(av+ zcZ*bSyA{pvHpq)RP=8>sp;dzD(geAAcb|6*_|X?3X81J$vu!_6Ea1KbY)oJ1K2X`r z)ZM$X=ojmAy?jc5ZW^NL;&O1l9*qJ#2cv^@+{^xrp7wIOm{Z@`@7G}lky$$c&%v9& z`qB&0fK%3+^$IG#qcnXCrax1D+N$ql2mZ@A&^cd)$C|J~1_@kK{Nv{9D9SpsC>gVd zJ8)vvcUxT2fN%+IR zj@mz+_wGsiT(lCs_@;>L2fMAnm4|kLR~GJefj)mZ*z6=@(R1=X`TiHf!_i$V>UcZB zfseqQ3ef45m3=!*=)*QC8}CP(^RYYWwU zqtRgE(-{RzW3|=x=Ohf1B43t3E!N>G+*nEFS$}kZ zZ%$FyN#xUI&DK(fCR4jhS=WVoMEf-~^!jQ{xBY7#Z8P9J zBc)y=@5{$9(v;JOJ}MIE7?har#xdR5{u)>i=(jfxyrrj%cOFHPtY5NA&Sb51uf(`L z1Uuha-+tZk>AG(Dq=pKpDOscr7tiqYOL3t~pzG!&BzZm%JWjqr``xLX zA4pFEgwz$LZlTB7b0Ur6fgNVKoL=9xIQ^+dU+nQX|H@y7tr2O0oaW!`@BQ{^oL>a> zTU4b->-?UV;@w_bWO%3v-K77oGmSh(4>$72s&V0AmV{7**KI%V&rJLHIMblI11k|g z^lLV!FNQ-CxAQJ#;11#H0PmW3Tq z5p%MDjf{aE4-g#!WH12H7e>s0HPQ6H+c^H-*+#1f)3P01@U&Lrf90;I{m@f$zy_sDKl{K&%$KF{ut`Y2{s0Casi<%CdNiUz5*L? z%+?5yKeKbQu(5C(fs6sq^Ii&c0=*ngaL48ScLMyhiI0HJz(VmA=py>d&(>MLcV_6C zbG3u5+i!Q>-t6R4Rjf~Wku_p=A7s2ewjoH$6h&2#ULODSDW3IiU_0H%4Pzpa060xQ-#+SQ6|j<+#8i4Hzq*(% z3s2H^V0;A{n5L7`b$6SiZKrOGdDbwYLR3A{n{QZN$E5fraWlgp;c!KycUj6Ih81bg z!vmKE5x}}{7(d;uX7Vv+6E~`rXsN5y;KkMVIQ57|XXPq<^VC|!Swlic28-&7zd)p} z45hnJ1^eedUtxEk^zbH!>cgzyO*^-#L}W1O_0tW1n52OSxQKsa8v@vM}pFp$J*=y<#oR%o|igfHvYy=z<+BN71XzUZB&To1qEJ0ihb zH|A}$@%SW~LMe*fb}HWP5*Z5;4dgb&WhwQ zS0#hNAv*7$_g&P^PZSh_8wk@Lmd;r0`o&i15$J0C9AkvK&Xb&m8G=0X&r!)={w$}* z1%s6LYrZsbT!{tanz~O=_G@{I{?K?KTu9YOqSIGt-jRvJM1q_+t@S#_`>RH966ug0 z(8`PU!PAY!n7GoHf%Isp89GxD#Kv3hH^N0Ba+C#ys%7@Y1LA$1?jlVd0CXtjL zO29oV^(EpnxWA;6olV_XhkWl9&F8rem)C-g$7@qL8v!x4gGjFj>LfKOyYnJ-7tpsH zgtfk%QrZizIq+n?&zB;#=q4t58)t{Q1eGTFf!4Z;7@8mo6ecK3cUVQNFBZC#%G~dq zYg&E`0NqhA-K*|cR=RsA#)I)f)-ULnkn56CL)2FA6--W2?5!`QSvcxLd0msq^kaI_ zHA&`vZW&>MnU_RajK6q*w8LEA1gOtV6lrboYQy3iP{^rc-XDeS3f|Xel}<#5fg7xq z$DP&?ap?;}TV3?iNB9S^xzfnyo&)voE?v}RhHX5~{r;)1LBy!DOTESvm-B3o^>538 zKn&3^!w+PS^>5D7=X3?xD6~>hAEOiDjdkj~Fd}x7wwY5me5c=K+&FmyVZx zzMF+`>Wo?@xfOY45~U-tlZo(}zJ zVIw``b)6T+0gu}LlI`MUp@PG+9q)VjgnHccV+lfkh(T*P_mcG71fRP#kgKc>)GN_0 zo74dH7vmfIOR$8p)njOrA-7VNPAogZ+MHJ^B|6i2r@usCdbU)rf{${mIQL1aezNgZ zhl^_6FO#{blKW|Sr2J?n>7^Ew2`1%_*Y68)W`Xbvqm?5(LN4^|(){TFmvxbV%3%Mj;Wn3~Z(&d>)+%pl?W=Fr=7HGvy?K`5domKwksnHb<*S%B4T{kGJSI{`;5X1r*6 zw#)EcH+g*Q=&77}D^;UtF7ERV&%v{Eo2`UIDBSc^NR?wX^zajLu46gK-4DYfG^CJc zvHeu|Y;SP9aZ-dfHGJXMp>~|(gp=y21W&%ifIj&s48X>*+2$8VGCZ6Mj)Vo$7u`%8G*qZoIpzj zu(koo+yB+tONR1PQhx1sCGfl&h6w_aLXP1dzUIeq!4qdONdV*etJDqnfJ|6`h&mv= z%#;NTl-^B&uy$rv4t8b^p#H`Thg~vT;kSx*8P2uWVO)E=?6= z77E*S>UQ?pyGk-=d$YzGB_xW^SH;FzM~&pMvu4n?4D5)>U%Ojo8C)e9JfvV#6>dTF zAh@Md`8fr~rX1@U`Y|#Q@a||*3VREEj>MeU9KPi_r>jB3Z1Wo)nqDlSH%Yz5L6p}@ zr(rwJc(^21K%uHy7a*7RWAX5IY+I6w+ODrp=)*XGnt?1uYI-F}H<5(??QNCfMP7q> z`#@fTl#CL5V>HLcVUQ#i{c4AA! zlBhlk_wXsAXx|^*7m6c_@$p;@saEs5CUB5h#w%=z3>45VeuAjzka6v8*CH6C6$ zIbTPAUYp)M9Lrnni}EYkR3c0$=d~EWgdS4^b9RI zbJVK_atFN4L0nqdmnY;87l7lN45Lop9}4}A=F#<;Bot18c}yvNJeHVi?h2G^cxJ4w z?2KzKWbtYv)5=eY&~3fG;IRZZ0t+$AWHsMAfc*G^GJhwhKk%ZUmS!ZK6r`Cf?q#R1w}B?i=^L9$WWtVHs=Lk^;`bB6h^x$hTxg!WPy@Ch{qgLzfabZX$Y1u zc&;mZp3gUSs+-8V^G;tqh(n>-`^G!HiE?u+4MyLBDVm?EWs#_Q2oXTe+V&` zl5ZLGD%r1@k!ql$4EZv|C<|%nRok#7C;GE%x;sdmM(&^rQcqi7M}U|;6UY2LKXh0e z5=7q`h2w^p4?t9Zgbjm6VcbFUm83Zx@UPq$`Kp3YQg-4>6tz@KsN3mDvAOwBi3z}iN{21ph zjq76c!I9}S`C74#D1enfEW`wVaADgL(55j)d0U=+1sCl)$I|l!Tw30aq z>>4*Bd4$x5?vcS0vzcjLOp;SSI}T(gj@Tww@5oK83|Lijnjr zR%GcOqO?jCgPa*Q9s>D{*f2W+Kd)x1>-{MV1X_ba5IV|*spILJOJ{Vky;8Sr4Hg4J zXXGH@@>Y_feAVqJF9OcEMSrzXm~X+a4tEvU4r7=H>^|I8Az}pogtkLCEpT&=CWGII zHza6SB3^$BiP(9RLB#+~s+9qDg{Sw#Dlyg^#mF>xp-@n=HY;do0?tF>`@SMohX{sc`Fbft||0n=NdKS)km=E2)D?Y&j4U!UGam?b z<+^(A`+5vXdk&+TY;T{<#nySJ9m@Qw01is$Qu(Oa;N_H#PvzexGi zdJxjf$$%};3$la7F0QO+w6f;-fk^`j`I;;@Ld(j6Za|bPpvhMV$%Lvsj@e`8$OP~y znFjUbE9>_~hIA4*&@ry^mkc{q&bpMA3bq5!m#7#uKN-)Mb+}Cq?m1WE6{q- zzQpiBBdlWGZE`AuG&$4HYiQwJ&o)8E2mPelqVm{#yA5At5$5aHye4-W>Y|UW-^uGE zXw-Pls`b({w3|0nzrUeWW< z;%mXZ9pa{WPS1PO;9rA<1RT>c-&!U(Ao*@mhZ1*I9y9ie4#U4vW>E|V#!(NvFq7cD zozUgiulkbOXj65N(h)aV+|=Q$NVByd!p?Z~t>qX3X5B_KL&QU&!}yOR4g_)uf7AWO zJ`WaHWt`vAK6n*p2W(vby0SWXB)Ynsp2*B6_WDQ8Pl5xN33tb^yG6)ys-a{_7ZM@n zU~cf2`Y9bddI{E(yoZcdE;=iT&AxTOGV-7H8Bkm&2|l-tO!0-pSC%$nO8#@x`YH87 zf^|x$)`7FS~D2H>6kPp!HU6d|+6qM(0(q%86r_#)s>*dy@MqTu0N3 zn$Bzq5?9TS>u~9RkX$eSJUtQ^__)~a8lIJHv?zLHV!hG)l_Z6tmi7(oqqKyf`OOm% z;WHWISPcfJ8pV|Gt^S>_GxgV!NktlK4Wg%NS1CC&w{s=#Bp!N1)AHuNa}Hab7HvrW zKbi0fndq1B3W;?VZ7sgI~WH=lV=qdY$Uv@lL$lD%vwU`%s zbLspWt#Ro|)+c4(Fw-!e--%l0lElQ~?%&#cR!q1^VvkoN$&Oca6O1NqvK<_6^6Hnd z>QwAH|3f^^XbEv)Uec&!RO1ox|{dRaUS?J)_B-{w5TaC5?&cG^{m(9(NFMr#e_!-m8!oT&FSC2b$W=D5v@Xm-w|$zpXq%j zzS@C~JP7lD)6eUG@ykLcJ}Gla-30{IxgX&ZRL>8L!hnqQRAI*mX|bp6{(@$K5@HU0@uz#w1{OLXrhyg+R-{RE4|07O`{r)pR{$7X~M5ptI zBmZQIF`66GFRbZPWIzDL_>RPFiQDI4>c{SRD_koi<@*b2#Baq|oEB*kJOA)-IGlTZ z)mF2ezPF%n>nYPrqid5j^M?1cyFabTl>cXlN+xS-XPPwkO|GbRRF$3&k@TNKgv*{B z^z0XbV(mt$7G{X@0`D!X`5&i8mz2`K`vA<)VBpJIu8uRYa)rZ}9XM=1feVwHS zsb!gYBlcC-;W_=(Etc!7JjIHwovK$IvO1%QPoO(pZ4t3#c$e{D4VYkccmMLiaGTHhWmH^u ziY06H2L1=ky+fGsm(G@HSKp53Ww*iF9h*B-{ren8&ju`~#(I`?UbB9DtxMl|eH_ifvqS%fQT7^{U#iNfb#MifdacgPZtW5ccB1p z1ps8A_Af8Ogx3(p2k1wjYB3B52miA%?`vKhUKYXO0u_YO^)EzZzLmuAxDqsMnRy8h>9TozTYWV7Y0|}IA-B#Ys$pTF z?$?ok!*0uf^<#T4o$N4|H$X^GDbn;4Tk$+wF_cXiueo^epqsJ3Qte1#+XPFKhbAeh z*281h4_R6K@VVF0Y_>k29c|FmcEoSH@LyWZZB!3_z|eE$(QrlQO-JYI+m?0lZxVXf zd#YH+n(}!X**43{x@ztSy>|NaDZMwlR&eea({7Y_ej-J;mB2_>&>g8pRlc1G;>jh3 za7n}aA}TGadc!E`2;Mv7PD>kk9ydNO)Jx3-a=erl*EZt3O;O;cE;6DiLgqb8I>RLX zjPw4tpHfzY@P%`j_g{p0UM*W-$ z-%UnM!eFCup~YicBV8joKff1UDWfXP<4av+cp3EK$ITwh87tSMtoIXCil-SgUa0MC zJah^g%ey>V`hg_5?W*vl;Ft2_mOB~#-F%BzUJmo8A>GQ$BCxid*l1bu`x%)NL#i%4 zf-|XylW=PL4Y)lT-Yp;<%|-F@+)WUu8EjG58=@H^^Gd1pol z58Ym&&1JIs((fmPgNvLa_h}J(G9-_Rp1j{iaocWT32o|#c@zC;3pDH8bg-n78ns_6 zYH#P&yDZC_jJ3pp6ke$)fBJ*n1}PV%d=Y?HTI44<{lyszR4O_5$yofpezJ+7OViSN z@@;{>6^@po`y|sk>3jVleO`W29;kYB*PHn=Pisrn>Z!3>@KK0PTRN4HlKE|s`Aor| zp$skx^M0OGL&h2-QAk^)b^wGglGXH%{W6!tpto7Eoox=XHYl?I^RRYi?^q-5 zZ@xLe5zC1sHQ{2*U56!5+`5Sz;WFYd!gDEl;~uxtkR{w+6T`?)J@;Xl;mF}pJL-1B zvS-x&8>T@Kq4C$>$|QeQ?^bi;@{lq&!Ff#g=-!nnHEnBwo> z4M(_VnSvs|yuGhIIPmTekj329=)9@mP@K;$`napN^EjvE-%J>e|x zpE6b@s9%Z4q_1CoEeLBUnr&G9lsZosb26`+V=D^#-YF8qn=lCgEQqd$XC3TEfpOCp zkuO6Z@Q@VQ=8)DGNz=!*WZ$-e9KPdOUB3RJT=8=1IPOk1t-K`{AM4MybvyS~R z{St4V@E`NowT>IU+E>f1l4z*E*+?$Al$;)}m@#fV*YyAV$S#qvpF1KlGRw2cy3ybx z)*hp2^Jfy0R9=#F0?MAL*B9#RWN_U+LOJ>$+nfap5L|iDzo-Du*1KwqG<#GK%!Z*o zdcpc1F5DOGYu$G>ohNdvg$*3d{TsPH-F%zcSR5qei0`nrZz`j(@?ss2QbjRj5yH0ZrBS6U`LGqJJn)a?t`$HXD@{Sw7y%+uem);$~1 z33%3w6B+SGd|rkBjLqM!0z!`Zsu5S{w}@Z{Pax5 zMY*S|$3n*XTX>@HlnDzz=K0l0c^9_8!uclomWVzpfokr?dd2wuU1&*)YGAec7(%rgdC+9 zviTVTp5k{Wx<=hg;EiAch6GE<-2T&wRg2n$kY7kHV6UR?0zaQMOcG7h=Woh1bXwKF zQP6WXySgv*x5WShSPa`}nSMh)fLuqI!1;j6zOe}!%>z{SfzG}F0)gh| z1;R*tXk#Goga8^z;H;eoq}9Q40f`1&E*_u@Wz6>567UbHa^oIRirn8N@b zxG~fSSO|{Q`JhGsLyi$}T0&m)KiFq}uMo*S-eQC%6fjV`fo4-^eMh<6^WDM#4%RC@ zQ;9jL11lBBSvL9ctNFBtj|Ouc8nFcnJ5k%v4acQ*J_2|L6pcqBhpl@@_Bc7MmNrwV=iFiF&&wea{ zqKx59U$hN1^qg{Fa;Sof>EnesH&szRB#wMYsLMK?D&qAf$hD zBAPfm@3yuCc7iWEb|Ng$NZIzg>$!MwBizUH(e=_}bK zdEB1H*-?h|9jKJInAAP834I1+3FB_BcELyXlnC87fIa4%^4h%WfeCXA&5Z?u-WJ)f z?d#gwT`g96GU?s8pIM6=CkDcNx+cg8Zm3g*t43V+!m%ON$#)gB&x$SnaXAA{dg{%B z!rDFiY1K2(a^|U-Lx|6XyR#f73cQkl*p9qu=m8aY$qqXWho`vTCaUNvRgwkeDeKs=KTtGG+g&u54?0_ z1YCRVbEfb!cJQ;PCFaR$FShn|b7tMwSP!K$3!o~{VySg#lf-@0!xcX8!1Js6f<@b{ z7K^+k#B+2G z$`{GUeIm3z9Ez19NMY|GN{ULpk%6{aR3v}VB*(?_C2uFrV0CQ~%gDfRO-zMxD&+gC zK87JB*>4jX33&8E>-u$W}rmzmg?veJ4|vpQ6xv^ z7dOcR3wXdXwG2|1hnCTK4WZq;^5HCcO-rZ+9q&x;a;3WOhp$a&uF|IeqQtFxurV=P zXF&m)ej$$;wZKTC8*4AH#6P}CPWeHu&?ieAhs!?YCo9zNtHtq2IeW=4;`JNV=hZxy z0-q04_+Fw=`{q+Z2(2V29K0f6hyJ3R8tKH6CZ_w&nlSmy<8?5z7Gd{Ai#3(*;EK*X_+*!FQFJ~qjuCAOU?1yefhvu#UI4qA$ zBVI%Zr0U-km9VY4c=>3nq7y8T@09oP9=!_l+*WQ{Jg%5IQ?zs?(>Y+iAcobW+C!LE zD@S?utFteYM~GmN?nc##pt&EkVUt@atwn>4bNU6Nm&Wym_tlI=sVB+9)7^7!^Vrz7 zf|~q>h&S`0P56A(5-nD3D*mL$CZu5|fIr66_#Fxh9J{02LYf(%X2H&U&p>ObwDoa2 z!i2=|>9>WZ9RF``xZsu0y`P#h%I2jNtXu4#bG8Ui z83kC5OOa`zNzeOLCY$6U-HUwt`dIg16B(={E~sm`ZvAMjHv%&)T+){{(SzmDM_qB_yw$oSX&@YuUeTYt_b29l1HoF{7gaKCf&P97n=V=6TfHHGqjMxfu8UF z%XDeV_NPpjUrNX7t+eD+vy?Ec()I8|qpmw0x$;X_a281o_u+aiUAC6B=->_{5Vq!# z6+nC@7x#sa$+`x-l&&}A9$exd`}X`m=Y5W~q|6lq5JklOH5bQ9xtJHdmG0M?k(Bx9_d>z(rH@r*>v=vGX~j;F#l$RH}-!f z6?yb(h?ugczMJm*C182>YqRJ~T8AsQa;bmlukANAxW4&TTVi8yZJmGM+K?tlC>lV) zV7%v-Qxk9?42WY0V{qV(G6o!IBmx7jgCYU{$QWh}Wc&gC4IHkccwu0x4rp8d))_Y4 z-5}X*7TWe^tQJ@gi>D|2@C5~hBC-0@2@c-bwTn4s_ zchMBx1-486v08w`K^*};1V-!;WCZ}@3^2IB>H+9K8YAIIAXkEhAz`2~Xn=ZyfdhI3 zMlupqR^~$qp!ku7{~l@QH)1gGsu3H}5RuBK3)o~TgNs;yTQy)TD;Th*OFZC!O{9T)E->)SoWJu<^o`8|@FX-YT|G#4~~x^)!WCAcj99Iiom=C3{!d+yNDU> zKbeobfU3m91G*K?hc?2H^v% zfQH2j+>RFxR*XYn zj2!zP02Pd3VPWP(N}$Uc{%}Kp^Z*!CBT%FY^Z*PDC)ALS4-FV?fN;l;0NgK-Qv)T% zk%n+EL6AIr00YWr0uZiX+MsxO;K+Z=usc63h)F(8*|kJ_7QzCWbu!9_Z-2?^KE2 zf%B7hjf1T82aJwgE! z5-5Uz9UD*#29p)UHX{Iz<^`dH-w19Dc&dP64MMUJKt%C@%Oe1r8~q0}K{c3|+WQ}L zS&T(Zc`QmQz#`W_EHVOIQp}DKVFUs!a3dHmAj0wi7AFh_D9FZO2g%QK9wSVEuk-ML zJ+6SU5t<*!aG{Nj5a|CTocTXcJimH{zw%z>#lgmhTw>0-?TU{|%fM#z$*0bE_^-iN z4-Vd_!(&uf%paFzW6w}{~CY@fZ;KRsvKMZxYzdwaL*V9 zIvoxJI%^;g0$9CZo|}LPkANB>k;b5Fd5}QB2E~V2J^-FIzY%~!V#pGJDgjPZ5OBnL zv~=oHEqVLF;pJ9L=Aoy%dk%L})56~*fi+1nj{l3g0R=c0ajSp2$D?<`Nm`8F{rThA z@kzA2dae@92!~bD46Le3X>#NF^C6)l&LQ~rX-tiWvt+i{mz2S>Me`>pbq!R5wuV|` z42d^qc3<_qv{(06jc3%+_*aHn`erNgx`d(BTPc>uEN7b?>Kb{0w%oQYkL!oDKeB`r z5s4=w9LydboYaIgR@InVdP3eXe(bHwWk=R(X4dFTHGKSdDp@J4O)5P;bU3WpIqqT= zv5{G8{-uzmE{TJImC$J?BWo^8+S6$Kb;lCkb*!sxx*9bcily_#@1Lm4MN;|d$JiLX z3PG|cE6Lwqn|>6jzn^H+2Hg+qyEiEJDoPOf8lG*Xi7z>RXNEEHyZ1c%*-=r>Mh-fn`(w1A_a&n3vhI4UtA=+ ze(!Sorg=LzOW^dHeV(sTZW4WTWUkjc#qLwZz$aoW9#(Xov^OJS%VL)#NlXgLUbRyR zsCfuGW`<$~Ix&*7j7<^G@|g6FK2Q_4=bb8sBDiMlxR=?}L*OABEz5DuOo0@5?qK z5PYSVriNB1)xs-Z7&>tr37MMj_;zq_UHGscZ@eeMRN&PuE~z>->!=ZX0}tXJ9DN7h_0L1US=vLM$NYvUXr2_8o05^DPJHArE-q$W{~ z*wnvhRMY9#p7iF+^M}_ix2|hux?th7Msf!ZCfx?`y}AA`rZm1|Fh!Xnif@91zL6_SDTd ze0`4{ofQM3`_#G2fdVce_qlmk`sC@KNe4RzTkEBvjCYmrRUxh8(6{4aJ5h(F2!YqT zw$Z3&S-sZ13-l(pCA2G(bU7?60f$a^+#i|kH4sn z57^M(H0P~CBo^d{<(&qQ?uGNz5txzPIy=HIqmP>*9$o30!;OtW8Mmc|d`N?`XArmh zMmk=-_9j=9u334dx2J|E-}OrI0yQyAlv|Buva`t@i$Oclshsrs1?hwr56Z6mWO1YS zs~qw77LU#)5xR(A38<|9>65x>CSlLTUJ2cvGM#O4v|C{PXhYBTZUdJ!GQ!-cAs(0P z`C{TI9m!aQ*U55QI7QCZ2DJF>#14lzi=#K$LATburGBm4>IJ{vn{`#e1u@wCq5_eMSYp`r0n zD&mC1=mge%w$>Ka6)PiIZ<|L+NklkrCLfFNK);T)Nvkm}{k%&vK&2MedHb7VwX~(@ z8MaYHvD{c&HSIj=>Os5`SMHLU%61(Lg$ehE4LL$G?+o0pL6{{@2RBkrGoGf_v?X9Nvn`!SSy;|MjXZMVYnN?F9n>08>u#2#=aSq4%vd1{Ifh6R)`- zuN8wk0??L;mLYzMEAW?;juf?l!GGfR@Xv93uubznaC>#G0IY|B%>XPF!-7nAeI>x{ zMJ>zz4?Yj9j4+IVzu(^Mt|Sh&?p*3#p{&>xefL!G-P6H~_%{4^IBK^Qwg!ZD6xt2* zdEPn1CAWSgI0xK_n-(TDeQ|KIq3; zM5%mcOzo7QayO~DtD~0)S@Hqxj3_F4s=DY2apvlS^vE8G(@MUH5%oHrTKA!bWOt%1 zbV(Fx1@5@j5wgRRER zCF-+J@y7!I>>Uvl{yzYF8_lLuJfb2(=aWEkHXHy7$>0?8Rmyt)vr;5nlMGgx2*K%1 zk=m8$Y|`Yd#?4B3(@}xUZ!3*I%pvyExAxUUzfX6N+2jt$N}iq0s=v#XA{&*C&SP?; z&qXZwz&wnTi-A`J6)MHPzy3MO_o#t_!T)VWvX{xy#2(?A(GqPgZKSEamxzk?9>yyw zMuAt@DIWpEURiikQHb-1uC<)G^aH5?bIu33kS>Kd-qa4hn$3($9+HK5qkJ^QI-L|HB5S~ks^vCtMMbuT%TI8v`G6=k$I`OR`1q>Bj~oXLX%CwZHMWG)tO#l ze;4@ZhX5VPhh_-^9iDepH7KYv%F~O-Er+tyvx=@=a9^WN_kCR2hICX*Yu5A@md~-#_8kKiR47M z&TouW$&HWd3M~~A-AMnoBBFCg|7)#iJ|4y!!;|mls7NQA7W=Xd!R21fHKCBk2vl4iCn>cK<4=-bPruA2D zVI*}`MUf%zGq#6i+Fi4sW$J1z_+?ydYO5T=zD##J?R@!6m#BEFb1aPcz8NF$WvXv( z_btz3%KMXBbxc%Ro1A%LBhGL=LWs%nB**-z-Pzr|rF)#p7P%%P3V zOm{u)*?>G*rqKEq$Z%x%CI;t@fffG`M9<+A7G7(dv>zUPez$sj0AKE`Nqv!`pd4TL zV$D6JJxOnmirZ%Q=QUIi40Wd~K9X*;_D1f7(a1=j}2?Fr$K+XgWg~APu(8f>{AFr_h42}kg zZNB?`nk@G?xm4xYi}KZ2;EO?_iuoM(MpQoHy>;kBLVU%@AWe=jr+zO!T0VK&m|OX} zBM!-kO@mm+ou_)zre@LOasAwR3n>U7klqr^@38Pb2*mq@w$FwbvNzKQjx%@*z{{H~ zpl(zXCUAC%f%0LbToCVDAd%5uczP8AG1Isp_6`VV(M4Q@1TupEUWJ4}(hhMUJ^bK_ z)MW@`BxFFwWVoIN9+(HcSVlAlJHj>4gN2)tyavY>a`fQ=5HlvqG-~XBBoL!PpShhZ zLJj|>%4NdKBLGs>AWsK$?)X6_$wUAU1_5S(2MHwn0KyT#8o*Tqjl?8%K$;Acx&m6X zP>{_u23$fE@4AL`>MS2w!-4#Zc*e{mllb`YOStBg6C4j}Vqy9pbKhrUL}z0MJHOry zd!GCF<5`b#B`>YVF7H`zuXXY$^Kw;!-^fVOs~umT7FU$s*0czFS(B}{+@n3doVSDp zZ%?_|?~$5d8|ZaCxl546{9ku~caSb^;30LlXqnTHiE{xENw!{uc4EIQ?I ze+fXmV|!a3t>D}8dC8kwg`>Hi*cVP#Zhx7e**nR@+hud>2hp>xO}Sa?AhedWyW&Q< zgIDc*BHmXMh82P-STvovn^ZZ`gmHQZ72U9+lIew&arz(3F3_ov-f{Ey{uYZa~7f#Q?^Jf9^1qC zebC!(Pau0~D>d7VjOF>){Yh4QmAy`RsxH{VIPzVyYd^l}j<(5z{P$njRD6aN*FUh( zl0*b-Zpj3hGfb5s$!Q8GIlBV%1aKQ$Rr|%aBUxTi8fnnT&U;HP0a-3qe(jouc{(_Y z9JQreTG-5{_tAywLOe;io1EP@aXFGgb5LzKGn*`f!+IyT|w=Q!M=)KA#EsD^|CHTP;pPKxKphjBzx zx9Ok6oi*vx`n^ckgmC$#l)Df)T>MfL0Phc(fQq_gs@!aPW>4{Q@kz`8;ahx_EG3}J zW$URrn3&C{PYiUq@SNhDX_WiwK3nG5q5xo({WWh2{YC78_XDQJA^+9tkUP|<09 zTE4ttI#*y3ImJXJY{JQaMf>Q&6n)I&_t-LdvO1ZhKFgn2D5zC+o?emMhcKz~iMsgh za(dI~?H)S@ea61_AXKMSOp40jt?uISK(DTe5V@cD&D3{amhl>h^~y1V%gNxS*i!@MtrR8s##x-rakskbZsU6VnNDef(3A|-ejtnV z;9MSw%V>6Y%^|5wvmuVUWav#_@aPvOR~kK2J>#1K6w^Q`DRJxTa^lTxe;v<1b-BQ6r91KwOEbc(pp7avYQu{XccNq@~{R)kM%va>uPuI?bMa;|()U zv4>=Q9ol@yo|wFy@-wtv^=4*Aj)TI=n1D|b=VOdsQg<(bvr>5ZdXmA)#Z$nt>kgg< zYDsn2a76`*p*CJtzn?72GM$RQ1;T)4laO^qY)ZQG?x2s0d4x07l0*^Ww4wN^*{)xw zi;m3@jdiR`;}B!_I4!VWm#R_E&an zc@HDneP=xH&S*`3!WqBUxo0Hc_b8@nvOri-zI@tMF9n%sE`IfJ>B+C5@j`VEqNOk# zcY~PTHiuIrF1?zkiEYwHYZ3wp7kv$kF4t|0E*H<@*^I`*Lj=&};`E{=WgM<^5$Sbu znGa;GxJzsL%*)L-`t zXbD9Mqh}lalU%&K1X4BZ+1<}>D`hSHFaibXlHA_x;W~1*Nr#4E_ydl~9@mnhHFI_Hfpvxuu zcU>-p&uhHj{-MjI_;2WP4g8_YW&Qz;&{pi^EXD|Vg(k*F^u)#6i(Cc^&%fWaqr*e4nPM9-NdY?Zz#$;?V_(Y>3pAjZy{-7oJSXtH&USXV+RHx$;7l+^Ocx}R1JydaCznrUI(xdI@b2Jw?d|Rl zwtti2Iv3{3i*L}b;u~4hZZL1Z^%pSc30S_45RP}80n_pMJt5~k#p@gXf8~s zPEO49+dXx>TkfW&E}5J6*=)Zr>{DJuL$wlIw$`ymPZ#tH*z}ZC>;$PpASzy%r~P=K z$3;746h%%4AY!>gPuGe7xz7ajTyP-^Bky z!Ar8-p1q5I^%S*(m-7>Rl0s9J?93{KG;Rf33f^h)nCsIaS}9h?&I`>{e1IG4E?;1m ziry^h$h_;WcGs(0T_(r!`d=<_0k47wCt8D3H~2?~F_#BA zN~KE9rBhr>&fQu=iYvhZ4-+%=;JGzQQJ4M(1c?UyU|G2w4jczU1MmZ4;3t@6KvNFX z;{uX0Mn>S67$8*u<^8 zdEu=w9u-lm$`!FD6b|-}k8vmRZ{jbeK@N|zlW`Eue2y#j6Cz8I5IqqRhr)Y^j+_Sh zLr!1}(9mvFh5WRHboY5QQbFQ!R&_?ei{8XSkg>O*ZZbJHUTn+h;&dvHTFB*%_`bZ* z4fdnd2Zcioim{i4^JZf5Ooy&aCfsQ?{ueC1F=+S7Q@^N4@Ld7IzuOHMf^d+4KnNI_ z7y=<30HqQDRh|L1jvu471Evbl83za*IKV9$0vH067YGRg-XI850L{V6i`p2%)Pug_ z_eG=tKQC>bxc8B*`)fO0;my~`GvSq2rXDfHG@S3dK3B7>w`n!B@P=JObw)oqijS+< z*9LRo{N_FSZ6+Q3)~xTKTb1gRbC8DxLeY&78KU1cqMH;J)7lImv!?Id)jxMiXQonK zjaW#`x7tFl)m47vIkC#Hk12>;tc_Dl3aY^R%ij$^d+!Da6^DcGcK@Tj0Br{YC;1>$ z2*5yt_+We}Llc0N;e!G64=)c8A>;v4M*_S6bp-M)fC~W%+whto0fq$l1Be?Ny@+dA z;$$zrcqsE&T_i^!Oor?Xdo=~A`$W#& zSbd!X^zY_EAoZHtB$%er7gc@QO$C#_;?c&yHO#wW<7+(M-LsyLZCxc5TdGl5tBhjp zYq%%4$ykwG@%df-YTQn}q2Q+x`C3nO6qLO1Z(c17THE$eMK&0G&7Z9`L;#Q;3TWH| zZUw+*01-k^02_uh;V}X-@j&nqK>h$~&In*m04M@rHGp^_C{zcg0YEyz5GMQ^{h+lx zV01k-$koK#e~)f;F|Et}q_dRWp0=V%hlO6Vb*r++wdODHM=u*q1^!B@WbM29`Lh=s z(;m!CufR0`%v6XMXzfDmd22&V&)=Dgy@$s`i~UoCtq=9cODKNlZd{hJvfAV$OWjY2 zp{#w>Iqx19Wv!w3wgb&@{u)<2&{E6b(l}S}C8mFlD-wf!f*OKy^Y?c)O>|U)q#Z&Ak*_^5 zW$RG5pPcUwUs)bMkO>|)RT;8+vZi+481Uel0*dACp?*K@O3N|P;hKCt@tScZzG>Zk z_eqJSeQ%nQe6hWY+Gf}|3jVJ`Q&!rl8cUv!P!Zt2GyM^jFeyQRTmrloei$f_fC5=T zAST6+0AM%(nS+BM4WL~B?F0r8K$&1XK+y}7MmI4*f$|D4;M)IdtN0<$;|L+{j=@-T zM3NWw?yHszeXvlPo7?6j+$8aBzwT;6$SBIgTV#smp>QG`Q`R|lLY$}!u4E^#T#@Iq zlH$CT&Fq_z9n2$|5Quzyc0aZ^Lh0FN>KU2c2RDTX&zNjR$xyZXyyuC(OKj%$7yjn1 zNYGr4la{wg@I4v8Da61}fRhve6%qwN`3L}-#NgTZ`2chb2`)-7P|*TJC=jT?v_+sm z)ptG^2-YxwPsJorc?IA`P?LWPSM(d>2GG+yMWq+6u>7l4Z?#l4MDr*=W1y%he_919 zLm)xD1!FibAHYzWpi!XQ0-T2rz)%7D8O9G@M1i`E{2+3GOfIM+fj}C9A%OAm0~k0G z^`GF}FjhS&%`Q#%@$varjBF7wOkn-MBMYpbRLXvL7Y`Bzd>IM_mo9!xH4T^mA1{E| zfh#k>jv)Z>4nTL2JfPeH$b*6p0(2h0_=3p~wAA?E0#Mk0Y2N>(dH2*e`((T zrFsAVNb^>4mK3EfY-l#szzMlJ9qN7?Gv@NbScS_E%GzT0P6>}*D` zLaFG%+VHa2z@1vD*srkP6mP))7JyzIcseE8G;m_cZ~sm^oZ4Q}Iw2y_yFoF@Z_5I( zwgF#yLcebW!y1`u~0lC;Mg2&McgwZ`-$!vf_nR>gnJCzg6Whw|jWMvEKM* zXUC?))QYQ+cv2u7d~bd7+{Q)SPm=^QN^|@(O5a7GfMJen@h8Ka(yuEdL56eY~qmBbq6Z>ld7pqcn_mXgSw`>$2+8?XYPmI?b%0G1_(^1oLA zCDxf*H#Wa({;qXibH=_1g?b#(9&B5uD zGNIeBxuhtbhG&o667C{775j=y)vt_iz<9L{MJzy3x3}j97VqX6pEXI~!Eriuvv0csNuu8QhfuZx&&sdYHi zKmO)1pb?N{>MkyL?}H^)7H2_Hu$06VK@@$b-Z1yAbdIDuhs_sdad<#I&i4=5mOqL< z@ME#)d|H(BK`-v5PuXH6u1}2a;ltTODSVG6*VK!|k6N`ateQ>AuPA zmRZ#Lj>$`Go(MW|v&ILdyEn(HWE5HYXd`qmaxZ+OA1fz007Qm{FOS*z+70s9vV%oB zSu`G|949ob+O0A#3dp@&nzLVa{O}PypX&QL=-$;V26=z@B#AVG*AXKJiey*RLq~gS zuup_Z2j}LbPxsyJjjL8okBQcETF=r29ew2d9f=QbPa(Wkh#6amOF1|;x4O8btMToq z{URwoNJoZH>nFT6h_if4dwHZ`fxdXVsOi_t1)hG=w)c1JNSZG_Q~Gq(=GoC8gZ4$g z<;{yq${)HGGrfJ1l5W*Y_U@trWXt-+YSlQ&&77pln9(SWpOcoS-^sO?x*Wpzmt+{( zZutaXxzF)Np2frDES%1uJA}@R5uuc)Vze>`ig9kxWPE)hx_94)>OlCRBChGYR3XJy zb`42s#bbdASN{(AZ$a5>D)0S;>}Lrp3TB^nO7k2(oMWp@HJX(|+28zR`z?$$mcroh zN8KI#E`d5-UJIG#1#HusoUx?gJzq9`39o8=H!e>h~9`1r4VzC+OC){Ob@lAMzXx@*@Rw-ZWve=IoSJICmJS{ zJX-Q;_r`yA9TmU6d)H^^Y5w$LnN?l&TKm+BTZ{6qSIDZ!MMFG6?R?+VNi;3~s#93v z!wg1vu#-fX7~)%HkzMrQuIkn7h^w|fW)!Nn@~Fcnf)^M4oZ}y1bp&6&RDx}Eo$1ZD zuq}$#F*B{J>M*GN&kg=h0<2_WJ(O`0Yq=$odoPDKBdDixosbld6D3Pn^BAsYD=^4E z=unv3RKoJUemmvM1iRg3YpbMGLPx=qah*KsJH`bXRQH5SmIW$a#%!$Nv4Sibp*j>QP)-^`8#Hx z8>M#CD+AYi_x!xd?_C_KC=cdcylNsm$Zqx0?PT31#~Ou4LKj5zcBm#ukwP5Qq9Lzr zA?kUTAfbJ_{qBG+a=Yj```hmc&ae%r96gY94f|(oM)b`n)cyj z>L%^iV$QZ_H2cvCsf0VsZ`cLkG*>s|CN0*V11Xo|tP4QOWg$29aSDRasmQ@?zht%; zBjtkjGhEG!MI$7}@4}geX9GUW&K&>XV0`La@8Q&k;fY_`SoxkJqO8 zGZ%@aVD4L~;2fX(%p%ha*bOFEu35}HwUuqXNGM@u@2`-jdFSzCXvaM>h;5W7L+%T# zVQjs0tNv}R7ksxfk%ceFi!SV26QtcgdGO}&^!j(iTW;&jizFfH`L1J7A04CBOCr@* zCSi(EOeUW9;uVH%z1XKK(#ImrveZvkue@DAeBusFr_~zN+qHnu%R!#I_tz0!-_AbF zKIztfm)&<-bm_@AfwShXd6r{qN8K%66IIy6FN~m&Ovv690{pnR3;1MWblxUwC$nj0 z+nt0F7QM*(L(X^}T(6vN3VKmi3e1@5HNB9#-Ct$b!aO@<2)%XdP73^B!~IC^)+)oZ zEM?A>lg)!y=fve9K7GlQuN)AkJt0VT!b=fC&`@!x#|Pp6(toV8ZN*f;pq8fFdiau35}E zE-Ip!#e_L0jI6Gr^8eM$?%D2X&iT%F-hSU++*POMexCcTy6UQ{>c*(!E%Ibsl~OV1 zcW6yl5E!MV7Q;_8qZJKUFS`09p?fOr@3*>oMH41YmYV6CiBFC0`-pC%6BbR4e{I0K zI*UdOxaP|r(;ctKzV^>+GK*vfZawkQZJyuptKDYbxcd&rj68hnzVDqk<=Yi!%wK=~ zr@H^vZR4B&{9*q+hW|F?(>}lKcu>zDd(Svy=qhLY@YkOG_MUKf!$~_$m^k?AyLS3= z)KlXR&7J%00oTu-{qdyD?p^!qw_jVZ>z~7a-Sdv!&inS_(&V0}JT-8~&AuD9T`v95 zIsGO)_`}3A2CsDA&nF(%WzX|wp77Ji<$k_x)cLoKf9To1cYpUy@0s_1{OOL@|8(!$ zpPzi^=TCiHzHhC`mC9w8EIaV*T^`!y(sfpw_v~4lo%hbzzaB95n)4?<*JHqjJN5nS z*?0Gvw8n_1PPpgc@mp=PT$jqsgBI;MY|xw^Z+!IUHT(Qy^t`1O{pz6PZtGuv)O!ED zCgh1P_FL+c%*vmxxa`@V3_51D=hhq3`;IQxuRDH&ac7TNsmp;a&;9G7@<03Eeo0&t zvdE}k-?{C{aZ8L!UOfE1Teo}Zx_)hOdCK9BmhKpJ)SZVfdF7D7$L=_I*_~$I_wJjU z%{lg(e_y!7<~j(z>6=??(}TWy=DZc2`F7^B6NbMyuI1}5&pdXyC*PQK{2Djzch)U; z?()u7Z|>XqrsC$emp}Q(unk{+>F}+0xoo+!zC8S~sn?$LVfQ}cZe8NSyQa)q?T5ML zI}f{YmGY|JE_m^$e=j)X%54|>adh)@SI(Q&F#D8Ee>`wOWp>ZCCcSdVIv?!*#M9Fr zS!|if%RD#h_ir!%=91GdSS|bVh7%iB?$K+DEgQ$(keYGBBfIV~fAVSfKYibyzs;Hd zWMP{X7hm;*hfm#XkIgsSVe;&mYj?l>nZ-W6YVoTsyyW7uckI_?;!)#{`(?r0?mzY) z@ZE0LZF}AY1264;<>mW)w)AH6XU{KR*7>KU?whq#!zO*EoIml$y)R9UJ$>rC^ShpY z_~7IB-zss!&i9U8d%Kb4sk2`=x#`g_e;b^<`{PF!U!-vRW`m#H{HWEhfBVVT@BFlF z3R}ytzH?Xm@|L#-to;0?C;WQI;hWDo?V>Xt{Ohj$CtNygsoz{2GIzktvkE7i`{-}Y zi*`Qmfw>nhtJ`Q!W}~z3+WY>cU+Olg#~zohxA^!m7e4)c*JWN^?cu(&r*&<~ojdHV z-ZxAd|8UD^4^<9{AJ+RKMorC*URo6 zzU1#aoOI{(?arR_=Dr8@`OAsT_ukh3>y7Si`;Sd$?3x?2SKm=@_22Q@zn-=3%oTSW z)%)kxna5r={mzje-?r+g2_L&fl=bBflt59sAql^ilhq`bPfXtGf>y< zM~^x9wS9K}qw&V;Pg`uaE-N1W#E33Jck7pf9_;qiPj{_*aHn45FWLL}xyx^{P5(U~ z8-3O)XKp`zqiL5sI(7bEPQLZw?wvcmcK+0v_dU8`v28Ef@vh-Vjs2UfWDx z@s>YkpR~(I`;YkIqAi!$P_xmOHy=FtHbX=&ST6a1W}{Uf8okm!AMVoU!KGh6=YU=l zw!eMrk9K|g+VT4=|MG(~U;N|2Nxg5le(NTmDAP3J}Sykg6b4!WUV-aj?%xBbptt7~Q8 zos*v0=8-?PJ9_0;r#w7w?QK__JLSIrGoEZ|(B- zzjl59hO5Vnd+T3~Hy!=)*nR$W>7keRT>i)Jp7>hM`{c9%#pZt;`P!+oE^>!u=6`354)iMW6PG0xbVkKcRTO$3J)S;s=cQ^U+^+ z%MaP?;x3;SW?Z@ZsDlSTf8(cxS1S+hGGVn1x<5PSsH^?a{LGj|3!iU3r}tK?f3V0q zr(U_=5!2_)nv~yos{{AD;O1^qUQIu^{TpBGn|!oWs&{db?1$@Juxito^{p_JVb{Mnj1#>1( zO8MFMU-9&yH@@gKe0sK5c4DX00oxzHedGSypK?ZK+IPGBbVQdSJ9p~5LcI z?0)u;=5Bl3*f#-ML>Z^x8=p=MYqb5TH7RZ>jxinvcjQfgNE$;?F%%$^PG@obi6+;Q zC>U2->6CN%*0`vwNdqwm;JWa9W4_wju=9pAPr0Md2L1i)1JqUu|FO|-?Lz-8;g;_v z#kow*tt_f`P3n0@p1AT^r*cIFHl4|o(fPS-$yLiPYRSMnNqDG#8V4^_W;?A=~if}z^ z2!YP(#%gP?dXhcQ7%(1D(a^1KZTGiXYfTj!O;?D?9!ko_2AC0$jS5TF^~xsYw7Nzm znKGWDXe0yihG7xPLanY-%b1~4ZSAhgkmj#P4j2{1^X=-^R%Ab+2$*sMUQ^Drn6z@X zwYp8+D3(^Y+9(&sw3}>GV>wL_l2Go=7hPlSk*bE8#8RoY_9koV{Cdc9QQ_hLys2uZ1j2-dXqTH$6tpa2>637G_ zq?@4J>PmT%hJ=!EwY7(=?aa~BH%YUgs@GwctU#QhS6-<#(PYey=|xEz2oI~#gaQ+f ziev+gZsI8?(M<`r3#zPVOS%;RU6o1q29uMr}E@gW+qKAehYeSW5c1h6*cy{)A*Ts`O z$PO&{Gd}I-E!8co!rQZ_key(p-7rrjO!E^jJwDO!>51dp4{Ya?_-eYs)2C(<*tHG7`^2N~kWH8I(=9 z-#u$U1V>}oQPKPd`V3Xxtg#lwe9e1X)GoPX`+h^hbKO(7wzKS;q?4TWu-qkIa)Xoc zDuzusN}0SdC(T#EoBheBMFI7sJAYt}`f1_+4 zx8&(As;wQ?I;8oSkCn;YoB4BN7BapC-}tm!n{1i{pQNtq zPEAuU%)iJR8DUUSxmc8R@*Osg(LcppQoUG(m@6YRBb+KW`Tg8@U-Kjnx3rS77LY%fnp|nnqg6Lv{P9FlS*VPqyZ!k z)lc&L^X-zqZ+%#JtyTY{G4PlFS=;omMnute$jVtI?NYMU*{-CBzid;&pInSp(Q8vG zn(4v8l-+_A8mXoxbIjs`u4-$eSo`*^v7a4bmvrmZ@kvfuhBdyQgD5*+#*Ve*Oe1gQ zi(NG4UQK_`EE#jmO(+*^Od#WtsXJ_b@ikEr2CJbVC(0JdguzTjp3wnIXS3-{rJOdGG-dLKGa^tpY5soJh*U1MCU!r`PcHcWi9K(HaKGt0$bxK_K;VJw!+o6%ht%DqZUwY3LX>oV`d zUjil6zqidys7+#yQZ2g7C`vp;`H&N=pY?BOK#`H^2|` zRCC?}m$aU>$dK?Q_3!PKuoe@WGR16b$`~3-EWR+2Eb09Vafc`n7v6ecZo9<^RnnO> z;RgI=IvhXRk}XxQt^2`4nn&(E?v)a2^BjYMKjYK>B!{gL)^r=@c#@*ggY>e=tg*k8 zDgaf{P$_fWY@Qr}SRyaLLn`(Pg=~!W%jff{nrPb3dFxl0^)yN+os%;r(k2Z~Gx`l|s$N@H@$H!*>(38@a9^0G z;LrH9ukF6DmTWY*hd-&9+yg-Zc+Q-s3+O3;`Bf79XRGN@CGy2$tH#S1Wtd8sjfONV zS6dsybDeeiac_q>DTJwoFy+&3t+-ZLgFK9z3iWKcVj3QVL$p+Easi;0L>jF|(2|1I zkpNjeW4_LiB(z_N$f@Q@UhYYrv-2h=a>xyj*ID}|tZ8hFiF~|sE9ogmS>~TTz$86i zDU%jGn?RM;yvb!IY(cRpUda{&_$dKlbf8SxxcyAZ^7JTE=zP9VuW{s$o zxopy)O7iJMiXavUiEAT(2Ny~xDnk4fZ!lr)DZ$;wCyL6zWSM-8pIqmHA}x&SQK%JSVLt6^JI66>4Ej~xk|T*UkWYREEpUTer!x+qG{ThT z&525RW8sj){6BdI*kU=w zvYpw|>T<}~I0k^`b0!F;%CHx=BQ3B+6ZWw|SwR@roPg!+$zuoa{9a5NJ*IOe2JF1U-rKE3BG1MBMzcMisucPzbI3rfh)&MQ^(#zPDwv9O^K3; zm-K1U*qZa!3Yl-V>g`JfrT><1!8bnb)>fAupvH+GGZ96}j-_fYFORXAd9&|gG`Kip zikF1fyBJ8hQ?xp1Tk!cz=Rmj-#!i8vJ1PjVEjJZj%H_sBzzH}2o9!`cP{5KWn~ zwuQ9&)}?M~_p?-rC1F9T-@-K~OM;%XN|dGQ^N(e7{)>>Kf5gPKRH^0!9Lbkt#$0mW zU~7B7Zfzep^a3zGL03devM~prmO;11aF0n{0QO}i0z;{VBer#8yepq<L(Jkb$kqWjAl57lJHpL3!$`3YAhR1XsM90 zy82a(NtTrW>-=6?Q1n~0_${T2xtjAfl(k3KANSlw@!ot3zVT^40hxjoOWQ3ZO~q|E ztl6E29t2R)mA&Tk`g;VR=o=4KSA0zJkKT>I=u%<1ag8qtJlXB&^*@VDxIW8&w9T5M zWAL4bQV~CGcZrUBcA-TGq1710xc)ZLC?B;+)UYy*X(zKKymOj>xB9&u>%G0)>C%gL zW$nMW68OvitZmPJ)~2^2X~LnM0_jAy5%8NZyOi?!N@fznLFr;>e=RR^g6Q#)SGnW} z#Zhy*k6`WY<+^`2kcG$Vc1h>{tfkVaY|h}_Y%1GGRV+*z6bY!}P~e;-+z$FRW?Y-# z9SW}ej`_2V&UwaQ`HLb`&1k(x61Ae4X%EYM~oQKJnPfpi~Nm=UXwcRl1Y}FCIoh} z>3iKZ%wL2aNWy4ZBhtnJNZF<-n1m-qT5=zrY_?bw<)qH7sTyl>x<5H=(HCP+?%-SS zjZgcYynLsy<}kO2&J|r(sIHd0S3+WpCydy_6CmpBaKrQ=BUregb4P2Bg|%`^DJi71fXh_oUY=>R`bCoHAA^2P zI*s8I>c$QN^9b%fGh0DLHGP|%vDZz&-FS!$-#TGmV-HB&aLd1?#`ys*@=Ke za{(!hQc;0H^61DpaX3j^)5`L6d(zC-nmyUX+NrnC+a#I~PGhkUkI|?7y$$?JSW8r2&AL-E6k5fNtNzazH=zK&6uAYV6m>Eas~P0`jP(tTmbN%T9OigATeXcFE1Y z1>g9zTMOD!&XK~E0m!hu3CP(}MVf|5lYHu}F`Nc%n8P7A^4s=NN`b<_WUYdwWXuAt zKHY&QSN>tocLGnI=UecNPrJ3_4hU=BHug1{MhN-EwIDeZFk?2-?6h=5aFyFzI-d-^ zvt=uJ%o;IVXjg6RbAEFFX}P(#V=!)Cwj)2eXpuFHrcAh1%LHl;FiE29w2OLa{!+@D zg?h5Byf#V$&y2S-ULI8n4H?*5!(wmD+Fo0Kw@S11{7{#7uY4eDw3#KTd(LkzGc~1H zhmsmiz8nK)?b6@5zX12C1n71LO~wi%w}%j=wYLd5Q2bqMddHfttB zV{z@o=rlG!xZ>faHO%16Y!nL&Znl!gj5_ePKj@xLH6Ax^SypvHk8+=alJWiGt|5}SPx~%O9@J(nlhJ;Oi44Bnvy&0S<_a4!Ztn{u%hAw?bEz`6FlAxh zF~{t%B`Y=WEl|+2d$)WU==B)if^U4H+`fImkd2P*5=m6Nb_-W26#I`#S#~Z z-2%mw`eaqWp8PbeYutcqEo=dA-Gsn%!|2t!WJB-m^sD;sHH)<^>U_zfhlDk73h)L& z%-2UI$a3`xrtNd*$~bY=27N{8OlwR?hnKVY7#h#JWprVcYo6rUFxQQ*oVinYuKAri zKFQRftoayQjVw@v%L)K~0(gmTf@rPF@aR%nk6sb60#EQQfj%*#ijRsOHD0-|UGlFv zue~(Yx~A4uA7+I$jemY?eUFL^s7KF7`j$}C;fMp3gC+2?jK=A(p9BI=7a|NoPi|EG zBs*cAHXnD%-8)*_^>y#p`#jb2sR|Fug=D&m`08@Q4Qcg z8#C!FdH^!3wl>Y$&OUO~$Zc4gS(kMjc{FP=i$1C6Olp?*_ESxzDAsV6A_1p}=(S0V zW=GiDPRbrC)!ad%3DK-_QEe?wz@Imb+G%6f>KTls9164Xom{sXpo~2s*9Amdp4sxy zI;yQJ%3>`wr~7PAa_UK6E-5W+=u@}031JOdgBsB!0J5?6 zLUh748JJ@*6Dupta%E1dQXp2A2d^Y5N{yBeyGr+Jc&<*Kr*40KpYSEY ztQCCY({Akv)-dAXgZ-+U1=2<9t!P^DD`2*qG+qlPkdI;q`Q;4nna?}k2y7{Tx@MOI zuYBT*bGo8o8v6MbeB;w@ZP0OHP26`^hbz@wlYw<7N99*p6)2RUdMIN+WJo zZ+pqpEjPx2vrDVj7B#9qr}zIX)^%dt33xNC1@#(X8cz|^0?blc8lVqD=z#6tpYcw z;^qznrJcBs11D;lUe?1#bEpP@P01Q|W=m`9bbRl-BI0aN=O=$}vsO~F0FTQqJ>j1e z**R)A{FS^Eu)ox);f}gJxzkD3hJ|r<{MKftM7c)v=a4){SW;2S zWCQUU&Wueu# z;2WQIYsr&YLm~j%#f+N1WYH7|&n)C6VtT8`Y1`q?iwSIODXG+<@~4iE$B#>0C>5%$ zea+f^YYyKg_{mBgdc8ZWX=I0?5HvZV*%-{IjX@(kugIEu0NK?v#dxJY2}VJ&`LqAhm_ zNviF^qEZ#7BCB=|94xn&_zxF2Y4S%oxf(>hzl~DknQNY8#J4Vg-akK{WPR+uF|395 z!NQ`_7lbu9fUUr%5sPKYv@2M2K{}#fdoO^LL%tAW5t4#>4GIcea?b4OuS(||&Z|4! zO=DPt;CR|lD1dr77|vTtf<&qVC`$W=X}%a}aZ+r2`F|29avzj`GgpI^U$C|@7wVql{Sj9fB>l9B5h*JNCUG6zhH zVogceDxsPXf+UQo;1z?bDcAUt+0I*PhsLvmpS0aOSlj7rYXeecIWgcVP@9=h@;0MQ z=vGKAKwH_V(I7~Rid+h9r23)y2X|sr+%i_ZwqP(W|74vRu^(5h+a<4rHFx6Ca7Z|n zXFNeXTEI$dKcQ0y8{?)hm_)sR4~9O)a(+}Mj@n2hY9h|9Shb7yzwx@kOX2Y9sz&8= z!dm>Ys8y8>#STwTtTg8Z<$zkqA+<0gA@TxMO$>O-d;4bd?DczC z+cR}*TV`TILotyy+ajTPP*aFu((_WK@NeYECupK;7N83c1 zP(XzygGWPN7?7h5U7IaNMs?Sgf6R%R@V34M-}tm!dpWEHH)GB-*^COW5EnP^(JeGO z55SX&_2f^U4wAuQvwHaY80}@2R9NI|(jRsTUCmnKJx>}r~B_1Xf`3(dF z*UH(jl2D7l&3V>jRBziQzs-JUuYs&=S*MDaaZOmW{{0Ro_?arbtDw4;M!nogpFZs7`!8bnb?`_HJ zSQFpinhI_K`$@C_tVUm8MY%y$MVT*&9dz@c@?Rvc4gej4aO(Nugln>nhwRC--e3Ht zVcy#(Jv&;vDXfJ`L>oY;tAiC3w|J7MAbD`v*a@JOI&FeFN1=$yk$}Q==mdF<;S;LY z_6=)2eqH*K53J|r)jP7-rCY+9f+fIA>_!?K+?;H?P2a>(h1bAcs{u+!?G2P349O!S zJ8a;*8&j*Tg_`Bt;ZxVjvG%KP!8bnbPjc$5tYKYbh#~KwuMiJ7MPZYUJtVd+bCgO( zf^Tg?HKlZv^66cKxwg_uwYBhEyFGsP^J}pXNT}c&pLT26+gNiUy&G~lLR(TTn;-%@ z@eA;RrF0-Yi6e4KIYb>(ENUCJaxGG+qGD_y5_CNk59-P8prJ-5*jfSH6E#orR_|^6(^uUSp6ly6&$ZJu z*6_KivCt2#tIL2DD{Ts!hUQi3NcHNcdwi0cm{b?FD=eF{FKd+l*Z9f)PQWEEZqhB! z+DUa8Rm1J90ih07ym4|}EQM(4kz(J~XDg)?7FY=|)%F0$K%x$VGX(tvCP%#HbcbSp zouRK@w(uNhLYp-&OJj$hn!#_Xd2qxr4~zd$U9>taxBkBPF3AHJ8e{aSD~0pWCZ%fi zE;%#S_Vb!I#OXe%F6&t9j#!%lI*v!gPHOUnG>B^TuGki;uw07}p|WXIT8NhQK{y3v z_jgjtsxjtj_{ov14Or}^#|o^~YYZQ2vxcvwhoEPX+}PHtwwQ712R;NtslbQAi&~}? ztCYnpgTiei(Ea{)<73w8Rsptf9KEFW-P&e87}hjH@lvGLIHvgFiZ)8bD!)n`AuDwL;5z8I99;u6G>8C96!jXt zLCtx)`$!eBWmelhp5z&I(ey75u||2u6H9#kBxx{uEo5o&wCJRH4`>H%X_;^{vV;fY z$mr)KJc0s#3=-hFa%xkv>_XfG|N@i(HG;Z+Fc?6eIdZ=5)S7d@# zE{UVx!v>0ra@*z?lG|6mx5(dC=(g2{;Y-4M6@25O}?6rN_0S<+>ba-3iTaT*HstURM)d^%2&HyXo*~m2LZ& zd5YtNeqv@O6_Q>m#T_|opq_GB>lkHmUKXjHEKp!$&Dt)(DeQK_T`TQtJtx=2fYqjB zzw8lCL4x0egmIh#!P?=fYwl;p4HC^O0+Ti(xnuk;LBEn%lnW?dYq{4z=7r^^_l+E2 ze%&s4Evx}JQNiY|WHLnL`XHC}eqFu7%lSk}9Zq#sf-P(Q(YfVraO$R9^2q!zWN96g$Lga-GxPqKGCIFI} z2no?C;eS=K!|Q>BH7DQzYx{hK;)D%YtH&i*JQdb3%4jE*Z1FfSIdwF`m99|IiiT7_ zl7~Vrgy_UONDXNs_>yzXv#m8=`5kNf{U7_SK8UsB>(;i$)2!i{N`c|RD?0+T-^hQJ zO(X^Oi7cALP>vDhTg)=?SkYtDl_d88dTV&DsO5jM@0**ovi3D7RPbkf+TYuAVU5zr zmEaZmZj*7i_?burRW(RX>47vE$?RmQULy1;)*44DLkK|wc+^t!-XhE2_NJTf2)+cZ z+ripJGgym<7)gs_f?V4#Ej$NDSBcruYWiqHS!4_ajT7!SVimm>_k){eRpTe`W9{`W zBbQrvC3DSZSc80HqBUYc6O*fyC|vzC5_4J*>mhc9DD%`@z>mH$1PaEBO9S0yurH51hlls8Sl7SnPWMe=EpS4#1_Yp_;m9kr(I5y3o|-K#o3a~dK;`S)`xH#CeoYS9i)a^y_bT1XRx%aRaA|0j+UoRVS6C=|VAj}aVn zD9G#(%a{gfiXgAlLU5}cwc0B`=t)i<^K-B8B|>KhYh6BM4aM&*DQ7EGm9dqMHqB&6 zW~5g-x0J$Y;`AZlq)4#*5>AOM2PKLwsmXC7b{}`@*&n~c(yron!JqMI-;)!<8ipR} z4*f`PBs*vVbfj}k+$wQ$%%wbje%$ZWB(+gojA~X4`URk~HGA?R?`@@JZz>R5G(1rE z-e!EH0%GSOW>oo<@AUHNaa9V7BDY3i8BYb{N&`oIPF{j5YEg_cAzD%_bgQpcKgp9= zdwt3uD>qqJ&t*EU?V^ua16bWc-4t0Fxik7a5BRusK4=twc~f|;ZG8*A@o9gOYkn2h25Ux_FQ^aRTPldF3A8u!b%{Ohh-L zcdyaU7+}XtN=K26%W87|kb@L>fUC|%VtbWU>2pO!0GCx;>)}aG-{yf{fr9F59h2Lv zLFLghp25UOqx8Os-^lp5{|xj%>7j~{^f;M-3jQ`}F+w&JLttOEwTPzYG_G0d z+xl~@4Z_DUDZs`6ZM)j><*Py*MEB$PhKdWB0I$MX-lqz@iZ?C2J;G&HTl&_%Q$5R^V7Yhw3H ztZn_+f{TO2u2$@RZl-I{TC>+ z5qT{ttDQ(Hv^sNwFo|eV;zneLI+3DuE5IX^6Ot|?D_YbU6^>)(SM{Fk%G!~crJGt; zQps~fLD?W8KZ7A10iyR-!>I9YSe4R2EDOt}Pk#qS28d*g!a_$4c00fF1y1{*0%n&9oN<^(Fx{?b|R9d7!;2-F>pf=Lt)Fi9SDUs zEnI?6)N~}KPM*?^qNg?wsJV7B#dEBE{lYSfY;KpVQFpp;-QLeqj zE?yF4izYB&0rlgDYXHQV2GuLn$&n!lr?Rmoc0a}1=?$m6a*`+6{BcJ%bh{loH3(5o zk8qp}JrW>e4+lzXmO_@3Om zQ-d?ARUS^oPx6b%PYr6ix?2r#K&Uy`I{h?HAxR>06y9VXp;?7s-04}vE6+J(X!G$8 zyuSRF8&Wi87VEGl`xsUMg5pVBZyBwaIB=|&BwQ5?X@Y!GMAeCevzFaqD>;2VK;k{= zj5F?eR-b^|Slha}^vIOeu&jqI(!tsTE&$SbES~@jrM|=~SPJR1ay?!IU^rn2hK+I!Q{W{wRL5!|H^Y0+j*hZT1kbZgc`7vg+wz* z%ayBIv*fxV1`>fnzbwe{SLhiJfUFIdC>w#oV$5z$#ChSMq0MVFz0&WjMKOj0JH~*e z@9NaRV|kRJhajz$U)vE%M5YkX2C1e?5LNc|jR!Fm^g>XWD*Q@=G7?wKRIlxR*7pB+ zl~oQ~-sM3Tbv)hk9`4kDT8#-G)D>i7fJZBJbdtoV#i$qhHLu4Q9DAc2tb?o$lbI)( z!J0V!B44$&k1Io)FWT?iu5r4rt~=e2zvKi&x(M=cQh}lr?y(ZvQ@L@31&46sb;xaPer!`focB`%M(73!@0%9;uX8cBk$;KXAD?uQx0 zQp092k|-_lM>7krth*h2FxNw?iewV8&!JU6$)#DFJN4Kbqx$f1owbW!;ko2*(eFj~ z5fyAz&8UdU7J&!YI2tiSNf~pVv!-afChw(5tS8uXo2ez`Irr8KEHpZm#iM4XiEu)p{*~x$5iPx3OlthJHuHkDo0w!~tr9Bzh`gL#5e*9L(Ni z|66jCrK_gU{sEQ|#Ch)O_x5XfX!C|&Ub}D9s5YvrtQ_zWKN&hM0maLs9Wm8FxNg#k zuMthvO^pKpmlSiP27yEk(&qOnsZ2yE@weLAFRX2O^KnbZp4_AENv0YBZIMm$sdtEVP?n36&5b$`lJFX&@I7nGB-ZS+ zKWoG4*0$WI5MDF|BsoSS=w#qN4n($vSU$=CM1|%XXda0UW!)^THmT+unZKAQ*)P`Y z$?sVEzsi-UBeLzKJUndWN90hJzMa`LC z^KgR{Rd*m>fIO?8u;znWEBa; z2hcS~U`(tNc}@@kkb=Yx#HgJ)kZ2j&yw26#ZaYqFn_rhvt@}Bz?9BtEP!^NGKHAK4 ziUGoeVh%m6CNP2jr!OnP5(qCn(xR;fQNK^ko_wV^w7K>D1vhWdy`kX{N4yY^(Wm`M z{x+AN)B_;wQ*clsQ&mNOAnqcTqkvf?BBd1U$mpp7Tnp zthKDGyHT1s)h#qPgXydk-0QRE30TnlVC+tNk?N@UDoscAk5~Aolr@hrk zQWJ4*U~R<%U!MAsC%I)E3TkixV(bR8+L-l3ji(SzMacmJK}B<5Ovo@745=u^K&XTO zNsJAjqQGJ*Ol^*H8EaRaH(*BBr5YM$kdGDGp5uR?nhz^#7h6MsW!X1bLhax)#GI1Z(d-6Qi`d_rks0u5u)^Xue@&1H>u@a}~edy;oY z2<3I9(10qj(7$Q=XHRJgMz_%LY;mU5!LxP}#9tk8XBG$-Z2xte+#?B3Db|jkvPXC)KloExJG$=V81{tO(FXh zXfrUpP>bp1nMEQfv=-rowP)5k{@Q}I?N+z8(c4>F2G4~gYNDtZGr}Tk12g?LVn=JuliVcU+Xg3`5xC@`I%})#5T47`g@mtS2&G5p85L^` z60z%~#`+YMG<4WWyr2UmV>V;>nPNBl9%%KG>>F$Q`fvYk;Y;dc_nWNI#3;7PBCr<$ zr}W0-4h#g1tc_&Mz!%V#5*BWd;u{_5P-t^eO*JvBcGB6#tc~99!#R;H_NzN@69+p1 zbl#4DZUcQkdYF_1JhdqPD97Q*9m))_SaQRbmI$ z>U~MVC5)ma!60yjF{Bz3OQg5RaSVyw68E+u5RoH9E%G|8^Y|$7R{@Ua@o=l37q#@>HOmpE@T>dxD| zk#GqGC{n&n3gj18uNTN=#5St5N)Vzc04Pb;fHKoPQqHAKNt5WQi#*lVmSe5oGre!! zlC=SKC*bzOIyY#-QszScNt6sxfD&y*`K~cRI42+>TDO>|yjD@i3h8k*s7R-zN-qAe zqqUn~UHq5ftbI^tZKuPnO@WFDhez{h0uAnLA=c3F@_k)RXbr@nVFm_1ks!Icntv2= zkfwO_TGii@)^_P8kM-PQp=;}Q1kVNZ3N8wa*b&uO68$JkWhL5Hpa7wKB}wI=scPto zWmamYYVGEfimWx!baB=O-!|GvK=R)EwTX`bjRz+QmKl-@EX&<&NyECT!m!+$+KjL?>JeJW@JyMD0+? z6vkR@1QelbwNF>t=D~IyXTViB8 zb)D8#VFi|pNd_W05ab_->2cnaQZ8SDG%7;3_HoVWUWBzSW6ymge91O-)|MQ_bJ@q3 z65TbPlrx}06aNv&IQk$%Y!P@b#4BZD7LUP0P-2alO!)Ph_|}cJAxBJmY!}w8@{Czsw?QV}Vsl$VD`>4oanisJ(2h^sobkcm=rUe!DfSxHUA32QUv z?mH{2)$@~cSQ9+sI1#eM)wP-;W0;J@$?kOOT1532xDzvafjuZbin^i?A+<&bxp?)S z3~P_wxWV8dtQ}vswvh%YVl1d4f>3j;y&lOmQW*%NGGv1Lbx%)hcr{H0CeFW2?IQD0 zA~Vme+FJ8|L!00I*PoB=$J#sMK*68!X^(HK9vdhKnh*5~u9pV`g(&4>G7`Tp7Af$l zH^)$CtR!kmy$>KnstIlYeX%B6%(2$|;SKKw&-F{4wb`tZ=!C|Tj)YG_Zh1N}+*8X1 zttk1O9~u=ndKublG-%Azp^R3Gi+M)1N}WYnyZqOoeK&;^UaUh37ak|;P;v{dScs&1 z#&0Ua+La0(3If&;lGNzX>mTXZqg+q4Gq-Qv;w&7pAKE1azGFS@!-g#CCaCt&namW6MF`H`Rp-if_ zHj=g3i@o*q0&AOHx3(oO;gu~d6Vv;R?;-9}Nr5G!oey^&YI#lnhVp?1igG4KAi1`=zgAnjm$fyYpYqd9tPMV@qqVCqjkU$hT(c%+WZV`NguF~{ zEwo|$iSaZ(kvYW}DQXulwvk!^)a6^z=(RQB%UC<)-p|HaRzq(`yx`CHw4b*dF6(Ua z5&^4BK;y1;*r1#=cN;Uy4MtSm2ap6(|5dS#_aFrdj>;@~lRPzFRj;kY+UF;nG5!)3 z&LY__gejkPYgb>+PwH3|M-G4!m)T$E_OQy|R>3Fcc0GIpglW#w;dUbrw7 zpK5Cdu(st{TiiB|g_G*6jhg^-MbS&TLF27@5mg4`6QhoUdvw~QSSDc*!wE9>e$1hz zR5OwSwP;p+?Nq6)SbK7hAv>JEP-~+n!pf?I5vbMNn0xO6tLz!z=LFi&95HYri zI1962kaw-E^z{(`Kd!m2-X#mWu>OD6riZnV!~Y+yudlYYuv_T=XKla9P|*MJPWozV z3%l|5f7bTDwlj6Kl3Uaf9AK)cNJ{^I-G5tcZDH36{?FQhH`|l{$4dsQtu5@n!2enM cowfh#?SVsf?gkU6ru)A)N6#1RO#1@e%nwt?ugT?h?fT0YN?jN`gc))790@NB)9}QDMXn zC`Lp@K@{U(qXyJ98j}bHaS#v%RALeZMdD`t;BUnk!=8F?9?e`d+py`FxW3cXRk!Xv z_uR8Pa%hv%Hbo^xk&R$SJ=k;ZNr1Z@PcH*&dvT^D^~vcA_5t#^#AZRU?!x2C?R-@@Ch(%N@A ztzSE5QD^eL75hG2>Eb=N7B{?4C_jiD=7fP1Zs_t@F{W*93yHt!*+K+?D;+y)Il1pB zQe~6NA1-b7*amiH>5UhaSIxh${R`cH@=BAYYad&4I2$E)tIy*%Zl9Y24r+n^Qe z>E~|la{P*FynZ>J+%5*f@|@7IeanwT7+F>rg}(AR^YQhrV=JL$Xa`yUyEQvM8asKE zQt`y8I-a!Wqx$;2b!)e8Z&iNTH=oY^?4w6AJ4;)aG^?I>(t?s^SMGTG>tXH3v_0>H zk;NCZeLj{YWE%q=UH$CF+iS##UC+bYPr7QFXJ3$r)v@t8BR zxb!32(_BT`j$F$Nl^-+DRh;2dw6c}wbK8+w`HL-U>N7jn)$jbMd_eWiuTCf<$ZgpZ zZI1?GhXS$_aH%Yda~=AT3bYaeUlUrcR8Hi^T07E3JY+?F*Gm-JynbhS$)d_kdjdUp z_cOW*u*>k|Hpq3|(3UE2g~iPRx$noJw)`-5J!z}J_Cv>KVb-z_j_Zk!&FCUTMh)u%_ z7qz_Ioy|MnFnZg}8`ch(IcfPVJr=Ye@A}63d)rock2!_h3HZ1IJY=>TS`lNh^0*&F z7IS^>GvD?Ea)d8@J8KtW22!-F6UJShWwfzG9l9Ji=~5Vs?1xkI(C&2&6bZM%9*%R6?JWZvvN z_T%-_D%$tBVDP=|oj**t;IySn#`j%*a__f- z=05r2w~JQ~?S!?MKroLjgV5$YvZRP&?qE&G{H_w%L}XIibve&vR2mFtb<4#;-8Cgm zn#{j?+_L-c??T@6Lesu4R^Scgc+81$%VMEY2n1wo+l>)pnlr(rXJG|`St=H;ZR4vi z9v43Gz`oIQB|*I9-HXpDAa>6ox|*T71h^DD3|SO}GDh0-9A+{6P43_W@Lji#(0cJD zO4D;$wW0S~Q^7hmR1}0giXT@t?t*RkzK@O3JmA`K0v8Jobl?TbMQY03>Y~nNlLr^w zaiF5?Z9o4_S-Gh8^Z`|6=aw8*^4#}i{pZSiUucf^HN%tJ5iUaRyMfD*mT&>3jbOI9 z#jGHXm108JA<{T*`bE!dd-lz5T1>j4m_W{Hy6GvCm&~2ZL8A7_7<(ZjltF1bv2R78 zl!$2C)mGqG5z-0rl1Bpg^+Tez&(HQDDIaAw4m`&W zMHqV7d|EQJVfui1|J+YjJmlg5@0yTki6?i6AxDg%R%7EM8FP*QhMkUdh-Jrq?4h!{ zc+;(Jt!C9UEiBn`S0D5JeGA{Oec#d=L;!gmS4nUdI5D#YqJq1=?`l_S9fVnFa%XLI z|B?w;Xqo9S_Xq zj-w!wDl1ZPmYK7@nlAwZl{Jy62FDcEAe4uIcNIhA za5u!YLeaF9V{zspe1yWCb$lo56)I=u6H_-UtB)sNck0|vuQ6PuZ2{R6fu*=of=2;I z;g`a4nUIzTjv*rMFu_njgM3Y{tsXRD(V|L%S@gzir_pM`=cGm)IWklhE@|jUAN*M} z=^E~b5+6zxv231WBIu`=mUREHzJ4SvHH-hT?<{lqP4dm1g>tnoaUo-ifv^b#r5#(2 z<{-|oEh9e;;xKY$ejjSr+|WMr5?SnDwx72m12}Vy@{kb&o{jWv$9~8I4>ZE@U4e?_ zOHZ*#3Wm_iy7{e57CT{5plkoVvaW7z%ibT<)1?pn-JpLMx>;DdK%R1OwQTGuvU;Gn zCs8AjDnoDw5h~{6h(%;Jx;EE9OfS890ONi5Dc2VFe80Y)#{b68-gu`UUQpfeA%I5J zRmzqOR7&z#$3f^RTLcJ>IF5LL;>lerYd2jxs$yQIvgq|+?|ZXm_4Chf{$RoI@=cxZ z|MHu6FPy$~+nLLP3&JaZKl?Y+#{Th%j_+PN=2u+@Eh|N{omOA|^bsT& z?tkFi?k2w4;4#4<{75>M3?gt}A-E+yh9u(PdtyWb5~4382fNHiZtgbF%kb{q13Q`v zR#AYII*LHnK~PM_Wk$XlWat*NzQ!&$IwaAq9v?IX3enRBE(< z8fD1?k>a)mP8HZ9#8nWWi=ehhaOFqblWEuGPe)L$e|hD(>kMcQJZ2}cxB!HRd5*w& z;9{cV66kZ|82J=O=LSv?;oq`;uBNQHzIx`7)osa2zr0T_Hb+xvAY4y`s2xh13XO6F zVu50U>Su#`fqCG6X>L|rxT{G6QCZ!JV83#~Q5Tj0yAZW#+Fa&>tVmm<1!UL*S1@dh z<)C;7C&bzvkWX|F-|b(C!qa$Dv+pmt+Ek;0UI2cDCO20|!Os-u2uu;PAV9MgV-s+M z8m%BY-2-=^ymZ9=fZ~qp^y1syfXncheZ>KbC}PqaxQ7Gt6&esXc73#EF=&>p!4{Pk zS)Q*(!Gy*gbp7^L6{q!cy3DWc+_F#I1LP~-AN`vt9B)}uoWIH_0*;ILz=ew(RBVA# z5yZ&88mTObLxw+=>lu0C|0CU(&FJk~c5U0PTZjGY^40aNKeWiZrQ7jK6Ox5K)DsaC zK4cI#Mji&?#f9-e#UfwvPzC`g38+~t8dZ_9L1nCk>T@cM`BUFHZMKP`LN>-7c0l!z zEMi1Iq8&Pf=Oey+?cih_--iZ+a+Q_89(GVScYW)ok4}^?56pCaegI{aRlOJgqM!og zPD;kYjvz4j=t?Bgmb8$bWXKV}$SI*41<+lbsDV{w=2dS!k|vX-H79H}DRnj;^H(oI zF9nGr4n3$8vEra3lKbeR9S+WGGmcasyZ}M^19H)dr^-8)&(8d$ef)|R z(}nH#g{{%f#n6DzMIj;J4|+spBgHa_uad5Z4wvN^chXEB+%jG~Yu2VyvNK+zDgE^U zQ(iHb<TYE?5Uro*} zIi{c0aR`mytrzvYCxaK81fAPAFK`3In7Jcph|yPgjs(5%q;?hh0jNz3M+xe0Zt8@| zKXOwCQ(`g18uPTNSqJt~qM3*T6hK=;;Q-MJY*SGps9C~8_J){+G$HcmQ@iH6(t9Z< zdcCIE5|a!2<1t}|HalRE1?FSnkcIHj*m^PgPM-tLjwLDs1T<7|j$C+16I6BL?ztw5 z^uc3x0y)I;T`-X_j&NnAXLJGdo{*S=2%8Z{u7lg?W}P3+Sq-WmG?9B3MW8Bz_cK!8A=vt%NB{yx9Z9l=bqA~ z3Aq%r`#!X07%*SMV|Ga5mU*Ecn9OT>F`Oj64i{TFidhg9aqkh>=xe-<bv5EMBRR;j{$#G-cmoU==uA4{C|& z^&E!M)QCwpZCl*TvfC^lS9R>2cP*nyb<+kkFMxhZH&o>DN{LcdZ$?+8#WzV*AY(ic3a4{?^(}E&KWHo;-4MRp;9>$Cii> z+mX|LcvZ){jovT>k2!6Va&ccFRHB*jxh1$~2O;EK&3z6|4<@6OhZJZzx4!qn{_p2b zqGf#H+2g8DXh1VZ6@XJZ*bLvYLktkWHW)4(1T&*1LDI54Xb~DrMmACt8w4Z>i3@*F zKH7E8{Lv=#!J~5#fi43IA36aR4=Tuor#+p(VCM$lNFhNIX!AgP5Z!a{GRjbZa4jWKRmgOhv;QAdX04Hx|UQ@fr^6zK*8i5m;mZ2 zD1byOk)x5BX2oNgv+A$reMJN5zds##YQL=dqMvCHqOI=^aJhxT`!-<~F-))!s6$fW593p->o8e~_9G)&l1;!toB z;K!~P1rh|oK}@+ecE~*_5gnzidFOw~{`aE;$2M*I#m3P$Zk_qX#^}p0+njxG z&s|qnWY%8MGC_=! z!puXOW|~K+8yIPM4N736xNwdIbIX)ejDD%*ya`8iDCP_6MwL$PMmCzQ?9{|)kG04K z<`y9aFkrz^c^qv8(i~pn30G(^xX_Dz26_ehYJ&{#eeaew_l~$h&D?G~|2k>?z$bU# zIsAru_79LvZd_Pdxu|+wdAB+m_f59f9b;HaA;Bq*NuL0RP~7(%Lth*gDO}YMH4(9o zP%`p{o4-wk?=!kaPAZrh$F&nDOjhw!ReL1)vPaL#jGcy8o!yYyTc%?|87?gCFj`W8T)rUF{8vX@ z*S{}WcM!H}-W2Puz>_->^d+tWbwd{&qSj+70}YsggSn*xl}Fo22`bUsj03|?c#5d12y=+(~4oJ3aFZ)diZNh2mo zbF0IBFeDjsGuJ_^GRwo`AV`t2z}8*lMF#>()-jv(|bjk92#yzmOX6d-w zmfl+PbNk6g0y?&S|>7VkZzVC{PJ_a_)Jq@YTIFk^s&4G02XYTQW>56w}OkuJGH zIblA=t*&!W7c*Htyy@>+E(Q&{{1IfoOU#MyvxA@9f)91!O-n;i9rCu&$rRBnCWN) z%@%qUM&TG#W1AZ3;aj$4G0+(@{6_{lWnbO#H*=E;w+Xs82(VFvJaAQXk!GZcrXG|T zgJ|S?M}WT?DW~b$W-W>`l})OL{_X9lGg)={zH^@Iv$5--)nV(s{p}wARrm3RiZ6dR zZEoW}ruFP>x#?kkipK;&Xc)2Yfg*V<#>^gCL8z4j?bO1h5&;I0Q2SzBsa!2-X{4hC z^Ex}9kYm{LRy^M@+Yxw7P=z6+ayyJrSp`F^n#q(oR-Z=_Q&|zBgGQ)&S@-Z+SNnBs z2A8T;J;@T5UvmCKCde*q$b%Sif^SKDfGP61>x*;Xc@O2tD7V<4e6yV%JCAdOwflosp;6YMEh-n3ZpVs=I zR_}OW|LLX8SW(@=WsmkCU-01Pn@{i5@CtJ{SheM2*kJT@R1O6ujd_M2XiVl^ZmE!Q z*aHFx4QyzkAAfywukpdic}EX4EqN_E@_tWjdgBeM^DMpb_JL+nYy#XIm-f9t8=(iv zp(Qb>6Uv3^?SV?f7)fZ%D$(D@aUqG9s*?tlos}sXbzh%xQ+ie(H;8-*ruX|z?^IgA zhqCPoWGh6kk<*j_Cx#LPzQ}EiilL=@m_-Cx`5k^am!JUzdR_jHxhNVFbL@5`z%bES zU_gWHiLouVpIN?970~lRsGu}q@zkexn$k58Pi}v4fi3))Y0zdUbCKmDAA$BrjHRF+ zXcS70CIPzX4+sbTsN0s8Z7uG)Hz1)Z6e<^!I;hPGSv6)%6pm8X=f+NT< z{j`kNA+kZxY>;x47Z5`RSya1Vf(KRF)C8*U81gY6_1ymK!BPmQGuonnLD;eJ@e)Z% zTd{{;9$ye3KVq&8c4uY%MxDSxQb<_)BT~o<&3}1>NfV|QH$e~?jW@yw0IV5gO(+MG zTSO9Sc4Wm4QZeRbATS7m1EOT!+OaZ2Yev(e5mSxUSrCzdZEO)9h6 z?O^B$r4p1EL%*Efxut&ba{c+MZ#}!FYxZIP+cVIyH_qJ`fBgOb7+PT-a}6TsLm?z| zrVv`71t>~JH&RI8JdV)WhqxAK3d2Ly?wdX<+1dmU2+pGB25QRmS#xS8PMtn`R!Coa z(TwgBPQ^b1MI#qzF%%D`FbqRjaSh$oRiI!{-VqVTec%|S1zlWLBz$4a`ohtG4YXOcNl52>#nu!-kzbV#x4c*HfXaTi;24FzL5jVFfnqEQn_d#to1bWyDs< zy_iLK5H>6U>k?Wbi<9WMzV&}jiq-G;zqJCdC@;K5Avv&Idmc&Csbk<9acCQu+@L_C ztqV{yoFe<6lo3Z{C_mD3~?JqrM|)h_ceo?+GR*_sjD!-?T-7E-eX8{sjo1>y~^OGcNtP(Y9dU4|6zbr zTM8*IwGbw_1CZ1)22N=xq`1^SnBcB7xFDS=-GmgF`UVqVlWwf^ExwYQJrQY&AA>w$EnH}+9n>g7vte`#=2I{PRp_4FmEe>JG79exy-8u}7k z4^rJ>*!w6jweKas_Zi^SWKQeR$z`{x{PT9+Qhr9Qj_w;%X*dRHFB zrS?06o6qX^8Qj!nJc>)*bjf;q!C!S4-Z~0Qt#k?S^#(Y-v5o>$6I}wlCkLF?Qb%#A zlPU4pv+xas|N6qkDG65Rha zxapmB6qp+265yE-P}1AxC@%HLCAeD*ZfeII#iho$1lNY5i~lR6lr}kvOU-c!?sWz? zy;Y6^QwLlE{7(R<_rpVivfM?m#T?}?+h)bf_#K5uZ-8{jA~wX`L`HoO!Vq@*;s zQCwS#+Ad>RDs^!_%AOYLe2?r(FrY0YaC zm)g}5+@l~z9)@|10#nOb0{nmhPHjM=xYSFQ;C8{dJH4}v;!^)uf_r@qH?^CL0#l1v z0{lII55q7iW;!<;1f?J89Ak6qFtzs0HTEP{bpxE*7e;ZZ3oOCy4G|xH>6D%@ic1||3GST+H@*Lh0#pB10^9{w z`NME?QDAD*N`P-Sz^P4J6qow365LYwP}94!C@wW$CAiZJZh9*g1*YyO1Dvl%Ct)m& z|F;Jzy;2mH8l{p2&oa2FZBrDN`lS-wmkn-e_Y}pYrl;!>+qvfh^rZhGSr1*Tr71h{<$z^R>2 z6qg#965P8DZfZLd#ij101a}{B(|egHF10HqxHX*)%e+K^sXHkF-e7=JdzC0IbtM_x ze5Cd04BXTnC5mfsFwo2W>E7_yqWmnw;HGpdQB>+y$^&Qcmlq9cdhZeirq-kcxK|ed zr#324T