Index1024.jl

Write-Once-Read-Many Disk Based B-Tree.

The name comes from deciding to use 1024 byte blocks as pages - the Linux kernel block size - as the memory structure. This turned out to not matter for performance but I've kept it for generating fixed sized pages, loaded on demand.

I thought aligning them by block size on disk would improve performance but, like many things one imagines, made things worse. I even copied the whole index into RAM disk and it made no difference to the runtime. Either my code is so terrible that it is CPU bound or Linux read-ahead is a great mitigator.

I needed a way to index 10Gb of Read Only data, which is keyed on the range 1 - 999999999999

julia> bitstring(999999999999)
"0000000000000000000000001110100011010100101001010000111111111111"

The data is also delimited, supplied in a CSV file. e.g.

10024655943,88253,7678,"TR22 0PL","E99999999","E99999999","E06000053","E05011091","E04012730","E18000010","E92000001","E12000009","E14000964","E15000009","E30000252","E06000053","E99999999","E00096400","E01019077","E02006781","E33050815","E38000089","E34999999","E35999999","E2","1B3","E37000005","","E23000035",22165192000016,88276,8001,"TR22 0PL","E99999999","E99999999","E06000053","E05011091","E04012730","E18000010","E92000001","E12000009","E14000964","E15000009","E30000252","E06000053","E99999999","E00096400","E01019077","E02006781","E33050815","E38000089","E34999999","E35999999","E2","1B3","E37000005","","E23000035",22165

I might index the CSV file, or not, that's not relevant to the index, the leaf node will contain whatever that ends up being.

https://lawless-m.github.io/Index1024.jl/

Future Work

Keeping the 1k block size, I want to experiment and use SeaweedFS and store the pages individually and use SeaweedFS FIDs instead of file pointer positions to read new pages.
Perhaps this will make it possible to update the index by re-writing on insert.

I've started the works on a SeaweedFS client

https://github.com/lawless-m/SeaweedFSClient.jl

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Index1024.jl

Future Work

About

Releases

Packages

Languages

License

lawless-m/Index1024.jl

Folders and files

Latest commit

History

Repository files navigation

Index1024.jl

Future Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages