Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve checking of *out_size in rans4x16 & arith #127

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

jkbonfield
Copy link
Collaborator

These should be set to a minimum of at least rans_compress_bound_4x16() or arith_compress_bound() but as they're user supplied parameters we ought to consider the possibility of them being smaller and the potential for an encode failure. (Although arguably it's a user error too as we're not using the API as intended.)

This happens in many places, but not consistently when very small sizes are given such that we don't have room for the compression header meta-data.

@jkbonfield jkbonfield force-pushed the compress_bound_checking branch 2 times, most recently from 34e6ee0 to 2aca1ef Compare September 3, 2024 11:33
@jkbonfield
Copy link
Collaborator Author

Another round of updates squashed in. I expect there may be issues with other codecs too, but we don't have to solve every problem in every PR. This focuses only on the CRAM 3.1 rans and arith coders.

I have a (not committed as it's hacky) little test which constructs (ABAC)* repetition and then permits me to compress using various "order" options, with manually specified ranges of out_size buffers. Note this is NOT how the API is meant to work. You should call the appropriate function (eg arith_compress_bound) with an input buffer size and the order byte and it'll give you the worst case size of memory needed to compress the data, and it's up to you to ensure you provide a buffer that big.

We do just that in all our use cases, so this is not a bug anywhere for htslib, samtools, etc. However if you're a trouble maker and allocate a buffer too small then we now bail out correctly. Obviously you can still allocate it too small and say it's larger in which case it dies, but that's no different than lying to memcpy etc. Both are API usage errors, but the former at least should have some chance of spotting memory errors before they happen as we've told it how big our (undersized) buffer was.

@jkbonfield
Copy link
Collaborator Author

Somewhat unrelated, I also changed it to always return out_size=0 when failing, as well as returning NULL. I don't know if that's useful or not, but it seemed tidier. It's neither here nor there I guess.

htscodecs/arith_dynamic.c Outdated Show resolved Hide resolved
htscodecs/rANS_static4x16pr.c Outdated Show resolved Hide resolved
These should be set to a minimum of at least rans_compress_bound_4x16()
or arith_compress_bound()  but as they're user supplied parameters we
ought to consider the possibility of them being smaller and the
potential for an encode failure.  (Although arguably it's a user error
too as we're not using the API as intended.)

This happens in many places, but not consistently when very small
sizes are given such that we don't have room for the compression
header meta-data.
@daviesrob daviesrob merged commit 027ebf6 into samtools:master Sep 4, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants