Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgzf_useek and bgzidx_t scope #1239

Open
Poshi opened this issue Feb 19, 2021 · 0 comments
Open

bgzf_useek and bgzidx_t scope #1239

Poshi opened this issue Feb 19, 2021 · 0 comments

Comments

@Poshi
Copy link

Poshi commented Feb 19, 2021

I've been using bgzf methods for processing some BGZ files and I found some issues with them that could be adresses with not too much effort.

My needs were to be able to slice the file in equal sized chunks, so I needed to open the file, get the uncompressed file size, divide and extract.

First, documentation. The code is the documentation. Couldn't find a proper place where the different modules were explained.
Second, bgzf_useek only accepts SEEK_SET. SEEK_CUR should be trivial to implement, and SEEK_END should be easy.

For SEEK_CUR you only need a SEEK_SET to the result of a bgzf_tell() call plus the offset to that number.

For SEEK_END you need to know the lenght of the uncompressed file. That's trickier. You need to access the index, go to the last entry, position yourself there and decompress the last block. Count how many bytes had ben processed, add that to the last entry index and you have the number. From that, SEEK_END is a SEEK_SET from that number minus the offset.

If someone want to implement all of this from the outside, it needs access to the bgzidx_t and bgzidx1_t structs, which are defined in the implementation file. These structs should be moved to the header file or a set of methods to manage the index should be built.

With all of this in place, working with BGZ files should be considerably easier.

Other ideas are the automatic loading of an index when opening the file, if present. Even the automatic generation of an in-memory index if it is required (a random access is tried) and none is found/has been loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant