Implementation

I was not able to find any existing data structure whose implementation exactly matches this one. Please let me know if you are able do find it. The closest data structures are:

radix tree
hash array mapped trie
QuadTree
Judy array

General idea

data structure is a tree/trie with branching factor of 32 (2⁵)
32 bit key is split into 5-bit chunks
each node is tied to a specific 5-bit portion of the key space and maps each of its 32 children to a specific value of its portion of the key
chunks from first and second key dimension are interleaved down the node hierarchy, i.e.:
- consider key {X, Y}, where X and Y are 32-bit ints
- let X₀..X₇ and Y₀..Y₇ be X and Y key components split into 5-bit chunks, where X₀ represents lower five bits of X and X₇ represents higher two bits of X + zero padding
- in this scenario, tree levels starting from the root (assuming full occupancy) would address following key chunks:
  - lvl₀ (root): X₇
  - lvl₁ : Y₇
  - lvl₂ : X₆
  - lvl₃ : Y₆
  - lvl₄ : X₅
  - ...
  - lvl₁₂ : X₁
  - lvl₁₃ : Y₁
  - lvl₁₄ : X₀
  - lvl₁₅ : Y₀
- children of lvl₁₅ are actual T values

Optimizations

General idea above has obvious flaws in terms of performance and memory consumption. To rectify them several optimizations were done.

Key space compression

As described in "Intended usage", keys are expected to be closely clustered together. Internally stored keys are translated into the integer space that is positive and is close to zero. This is done by the pair of internally maintained integer shifts.
Depth compression

Storing the 16 levels needed for the full keyspace is not optimal both in terms of space and performance. "Key space compression" reduces the key space to the actually occupied range, and the actual hierarchy depth is only as deep as necessary to accommodate the new reduced key space.

For example, to store elements in the key range of 0..100 only two levels (or four levels in 2 dimesnions) are required.

Tree depth is automatically adjusted when the new elements are added and removed. I.e. the following invariant is preserved: when there are more than two levels in the tree, top two levels (root and its child) must have at least one and two elements each, otherwise these levels are removed and the tree depth is reduced by two.
Node compression

Storing 32-element array on every level is wasteful in terms of space, especially for sparse data. When a node has two or less children, they are stored in a fixed two-element array and their keys are stored alongside in a separate field.
Caching

Sequential access is one of the things that this data structure is optimized for. Primary use case is row-major access order (i.e. A[0,0], A[0,1], A[0,2], A[1,0], A[1,1]...).

For every access, the pointer to the lowest level of the hierarchy is cached. This way, if next access happens to be in the close horizontal vicinity (5-bit address range) of the last access, no CPU cycles are wasted to traverse the whole hierarchy, moreover, all accessed memory is (hopefully) already in the CPU cache.

Due to this caching, this data structure is not thread-safe, even for reading. To read the data simultaneously from multiple threads, read-only view must be created for each thread (See Grid.java header for details).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation.md

implementation.md

Implementation

General idea

Optimizations

Files

implementation.md

Latest commit

History

implementation.md

File metadata and controls

Implementation

General idea

Optimizations