Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit packing inputs with 256 symbols do not disable bit packing #65

Closed
zaeleus opened this issue Oct 14, 2022 · 1 comment · Fixed by #66
Closed

Bit packing inputs with 256 symbols do not disable bit packing #65

zaeleus opened this issue Oct 14, 2022 · 1 comment · Fixed by #66

Comments

@zaeleus
Copy link

zaeleus commented Oct 14, 2022

Encoders that use hts_pack disable bit packing if there are more than 16 symbols, e.g., rANS Nx16 and the arithmetic range coder. However, the checks that disable bit packing compare against a potentially truncated value. When attempting to bit pack inputs with 256 symbols, out[c_meta_len] is 0, out[c_meta_len] > 16 fails, and the encoders do not disable bit packing.

An example using the rANS Nx16 test program (version 1.3.0):

$ ruby -e "print (0..255).to_a.pack('C*')" | ./rans4x16pr -r -o 128 | xxd
Took 46003 microseconds,   0.0 MB/s
00000000: a082 0000 8200 0001 0203 0405 0607 0809  ................
00000010: 0a0b 0c0d 0e0f 1011 1213 1415 1617 1819  ................
00000020: 1a1b 1c1d 1e1f 2021 2223 2425 2627 2829  ...... !"#$%&'()
00000030: 2a2b 2c2d 2e2f 3031 3233 3435 3637 3839  *+,-./0123456789
00000040: 3a3b 3c3d 3e3f 4041 4243 4445 4647 4849  :;<=>?@ABCDEFGHI
00000050: 4a4b 4c4d 4e4f 5051 5253 5455 5657 5859  JKLMNOPQRSTUVWXY
00000060: 5a5b 5c5d 5e5f 6061 6263 6465 6667 6869  Z[\]^_`abcdefghi
00000070: 6a6b 6c6d 6e6f 7071 7273 7475 7677 7879  jklmnopqrstuvwxy
00000080: 7a7b 7c7d 7e7f 8081 8283 8485 8687 8889  z{|}~...........
00000090: 8a8b 8c8d 8e8f 9091 9293 9495 9697 9899  ................
000000a0: 9a9b 9c9d 9e9f a0a1 a2a3 a4a5 a6a7 a8a9  ................
000000b0: aaab acad aeaf b0b1 b2b3 b4b5 b6b7 b8b9  ................
000000c0: babb bcbd bebf c0c1 c2c3 c4c5 c6c7 c8c9  ................
000000d0: cacb cccd cecf d0d1 d2d3 d4d5 d6d7 d8d9  ................
000000e0: dadb dcdd dedf e0e1 e2e3 e4e5 e6e7 e8e9  ................
000000f0: eaeb eced eeef f0f1 f2f3 f4f5 f6f7 f8f9  ................
00000100: fafb fcfd feff                           ......
@jkbonfield
Copy link
Collaborator

I think this was probably something that I tightened up in the spec with an explicit "not permitted to have nsym > 16 or nsym == 0" statement at a later stage.

The bug is really one of not knowing how best to deal with errors; eg failed to allocate memory (hard error) vs inappropriate data (soft error). I think I took the wrong choice in making it return a buffer in the latter case, as it's both wasteful of CPU cycles and also punts the error handling elsewhere. Specifically here:

if (!packed || (pmeta_len == 1 && out[c_meta_len] > 16)) {
(which should check for 0 too, as stated in the specification).

However given the handling of the NULL case anyway, hts_pack should probably just treat all error types the same. The caller can, if it chooses, always look at errno for ENOMEM to distinguish.

Thanks for the bug report.

jkbonfield added a commit to jkbonfield/htscodecs that referenced this issue Oct 17, 2022
Previously when too many symbols were present, we still returned
packed data but with 1 byte per symbol.  It was then up to the caller
to reject this as inappropriate use of pack.  This meant that hts_pack
was a transparent transform that always gives back something even if
it's actually just growing the data by 1 byte.

However the CRAMcodecs spec states this scenario is illegal.

The checks in rANS_static4x16pr.c were bugged for nsym==256 as there
wasn't a check of "|| out[c_meta_len] == 0", but now it's a moot point
as NULL will be returned.

Fixes samtools#65
@zaeleus zaeleus changed the title Bit packing inputs with 255 symbols do not disable bit packing Bit packing inputs with 256 symbols do not disable bit packing Oct 17, 2022
jkbonfield added a commit to jkbonfield/htscodecs that referenced this issue Oct 25, 2022
Previously when too many symbols were present, we still returned
packed data but with 1 byte per symbol.  It was then up to the caller
to reject this as inappropriate use of pack.  This meant that hts_pack
was a transparent transform that always gives back something even if
it's actually just growing the data by 1 byte.

However the CRAMcodecs spec states this scenario is illegal.

The checks in rANS_static4x16pr.c were bugged for nsym==256 as there
wasn't a check of "|| out[c_meta_len] == 0", but now it's a moot point
as NULL will be returned.

Fixes samtools#65
daviesrob pushed a commit that referenced this issue Oct 25, 2022
Previously when too many symbols were present, we still returned
packed data but with 1 byte per symbol.  It was then up to the caller
to reject this as inappropriate use of pack.  This meant that hts_pack
was a transparent transform that always gives back something even if
it's actually just growing the data by 1 byte.

However the CRAMcodecs spec states this scenario is illegal.

The checks in rANS_static4x16pr.c were bugged for nsym==256 as there
wasn't a check of "|| out[c_meta_len] == 0", but now it's a moot point
as NULL will be returned.

Fixes #65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants