Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: memiavl snapshot format is not optimal #890

Merged
merged 13 commits into from
Mar 1, 2023

Conversation

yihuang
Copy link
Collaborator

@yihuang yihuang commented Feb 21, 2023

Solution:

  • reduce node size (without hash) from 32bytes to 16bytes, leveraging the property of post-order traversal
  • store key/values in same file, so we only reference with one offset field.
  • specialize Get operation for persisted node to improve performance.
  • add optional native-endian implementation to further improve performance sacrificing portability.
  • build optional MPHF index for key-value pairs.

in the end, the on-disk IAVL tree query performance is on-par with tidwall/btree library which is used for cache store.

$ go test -run=^$ -bench=. -benchmem ./ -count=2 -tags nativebyteorder
goos: darwin
goarch: amd64
pkg: github.com/crypto-org-chain/cronos/memiavl
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkOffsetsRoaring-12        	  201961	      6224 ns/op	       0 B/op	       0 allocs/op
BenchmarkOffsetsRoaring-12        	  211574	      5862 ns/op	       0 B/op	       0 allocs/op
BenchmarkOffsetsEliasfano32-12    	39410566	        30.84 ns/op	       0 B/op	       0 allocs/op
BenchmarkOffsetsEliasfano32-12    	39221587	        32.75 ns/op	       0 B/op	       0 allocs/op
BenchmarkByteCompare-12           	289818466	         4.028 ns/op	       0 B/op	       0 allocs/op
BenchmarkByteCompare-12           	292231899	         4.454 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/memiavl-12     	 9985905	       115.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/memiavl-12     	10408438	       111.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/memiavl-disk-12         	 7177082	       159.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/memiavl-disk-12         	 7454918	       158.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/snapshot-get-12         	 5282593	       230.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/snapshot-get-12         	 5292380	       225.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/btree-degree-2-12       	 9466953	       125.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/btree-degree-2-12       	 9190824	       128.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/btree-degree-32-12      	 6432990	       180.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomGet/btree-degree-32-12      	 6528072	       181.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkRandomSet/memiavl-12              	       1	2184414232 ns/op	255999872 B/op	 1999999 allocs/op
BenchmarkRandomSet/memiavl-12              	       1	2197133556 ns/op	255999872 B/op	 1999999 allocs/op
BenchmarkRandomSet/tree2-12                	       1	2338391700 ns/op	203984536 B/op	 2060413 allocs/op
BenchmarkRandomSet/tree2-12                	       1	2310088559 ns/op	203984632 B/op	 2060414 allocs/op
BenchmarkRandomSet/tree32-12               	       1	1118375364 ns/op	139084272 B/op	   69353 allocs/op
BenchmarkRandomSet/tree32-12               	       1	1134007771 ns/op	139084272 B/op	   69353 allocs/op
PASS
ok  	github.com/crypto-org-chain/cronos/memiavl	52.815s

Takeaways:

  • The "nativebyteorder"'s query performance is on-par with pure in-memory data structures (when OS page cache is warm), the portable version is a little bit slower though.
  • Benchmark roaring-bitmap and Elias-Fano on the select operation, Elias-Fano is much better in both compression ratio and reading speed:
    RoaringBitmap size 2004788
    EliasFano size 918496
    
    But Elias-Fano is not fast enough for the keys file, which is accessed too frequently, so we use plain little endian int32 array there.

👮🏻👮🏻👮🏻 !!!! REFERENCE THE PROBLEM YOUR ARE SOLVING IN THE PR TITLE AND DESCRIBE YOUR SOLUTION HERE !!!! DO NOT FORGET !!!! 👮🏻👮🏻👮🏻

PR Checklist:

  • Have you read the CONTRIBUTING.md?
  • Does your PR follow the C4 patch requirements?
  • Have you rebased your work on top of the latest master?
  • Have you checked your code compiles? (make)
  • Have you included tests for any non-trivial functionality?
  • Have you checked your code passes the unit tests? (make test)
  • Have you checked your code formatting is correct? (go fmt)
  • Have you checked your basic code style is fine? (golangci-lint run)
  • If you added any dependencies, have you checked they do not contain any known vulnerabilities? (go list -json -m all | nancy sleuth)
  • If your changes affect the client infrastructure, have you run the integration test?
  • If your changes affect public APIs, does your PR follow the C4 evolution of public contracts?
  • If your code changes public APIs, have you incremented the crate version numbers and documented your changes in the CHANGELOG.md?
  • If you are contributing for the first time, please read the agreement in CONTRIBUTING.md now and add a comment to this pull request stating that your PR is in accordance with the Developer's Certificate of Origin.

Thank you for your code, it's appreciated! :)

@yihuang yihuang requested a review from a team as a code owner February 21, 2023 06:11
@yihuang yihuang requested review from mmsqe and JayT106 and removed request for a team February 21, 2023 06:11
CHANGELOG.md Outdated Show resolved Hide resolved
memiavl/tree.go Fixed Show fixed Hide fixed
memiavl/tree.go Fixed Show fixed Hide fixed
memiavl/snapshot.go Fixed Show fixed Hide fixed
memiavl/snapshot.go Fixed Show fixed Hide fixed
memiavl/snapshot.go Fixed Show fixed Hide fixed
memiavl/persisted_node.go Fixed Show fixed Hide fixed
@@ -148,7 +148,7 @@
if _, err := w.Write(buf[0:n]); err != nil {
return fmt.Errorf("writing size, %w", err)
}
n = binary.PutVarint(buf[:], node.Version())
n = binary.PutVarint(buf[:], int64(node.Version()))

Check failure

Code scanning / gosec

Potential integer overflow by integer type conversion

Potential integer overflow by integer type conversion
memiavl/snapshot.go Outdated Show resolved Hide resolved
@@ -283,7 +283,7 @@
if _, err := w.Write(buf[:n]); err != nil {
return err
}
n = binary.PutVarint(buf[:], node.Version())
n = binary.PutVarint(buf[:], int64(node.Version()))

Check failure

Code scanning / gosec

Potential integer overflow by integer type conversion

Potential integer overflow by integer type conversion
@codecov
Copy link

codecov bot commented Feb 21, 2023

Codecov Report

Merging #890 (b4a10ac) into main (65d7297) will increase coverage by 0.18%.
The diff coverage is 57.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #890      +/-   ##
==========================================
+ Coverage   36.60%   36.78%   +0.18%     
==========================================
  Files          56       57       +1     
  Lines        4065     4170     +105     
==========================================
+ Hits         1488     1534      +46     
- Misses       2406     2460      +54     
- Partials      171      176       +5     
Impacted Files Coverage Δ
versiondb/client/restore_app_db.go 0.00% <0.00%> (ø)
memiavl/tree.go 57.69% <23.52%> (-17.31%) ⬇️
memiavl/mem_node.go 83.83% <47.36%> (-9.43%) ⬇️
memiavl/persisted_node.go 60.93% <53.19%> (-31.38%) ⬇️
memiavl/snapshot.go 50.90% <59.25%> (+1.95%) ⬆️
memiavl/layout_little_endian.go 100.00% <100.00%> (ø)
memiavl/node.go 73.84% <100.00%> (ø)

@yihuang yihuang marked this pull request as draft February 21, 2023 08:07
memiavl/persisted_node.go Outdated Show resolved Hide resolved
memiavl/snapshot.go Outdated Show resolved Hide resolved
}

// Version returns the current tree version
func (t *Tree) Version() int64 {
return t.version
return int64(t.version)

Check failure

Code scanning / gosec

Potential integer overflow by integer type conversion

Potential integer overflow by integer type conversion
@@ -66,12 +78,12 @@
t.version = t.initialVersion
}

return hash, t.version, nil
return hash, int64(t.version), nil

Check failure

Code scanning / gosec

Potential integer overflow by integer type conversion

Potential integer overflow by integer type conversion
tree := New()
tree.initialVersion = initialVersion
tree.initialVersion = uint32(initialVersion)

Check failure

Code scanning / gosec

Potential integer overflow by integer type conversion

Potential integer overflow by integer type conversion
if version >= math.MaxUint32 {
panic("version overflows uint32")
}
return &Tree{version: uint32(version)}

Check failure

Code scanning / gosec

Potential integer overflow by integer type conversion

Potential integer overflow by integer type conversion
memiavl/persisted_node.go Fixed Show fixed Hide fixed
memiavl/snapshot.go Fixed Show fixed Hide fixed
memiavl/persisted_node.go Fixed Show fixed Hide fixed
memiavl/persisted_node.go Fixed Show fixed Hide fixed
memiavl/layout_little_endian.go Fixed Show fixed Hide fixed
Solution:
- store offset table inside keys/values file, so we can reference it with leaf index in nodes.
- leverage the property of post-order traversal to further reduce some offsets.
- specialize `Get` operation for persisted node to improve query performance.
- add native-endian implementation to improve performance.

in the end, the on-disk IAVL tree query performance is on-par with tidwall/btree library
which is used for cache store.
@yihuang yihuang marked this pull request as ready for review February 24, 2023 07:54
height: node.Height(),
size: node.Size(),
height: data.Height(),
size: int64(data.Size()),

Check failure

Code scanning / gosec

Potential integer overflow by integer type conversion

Potential integer overflow by integer type conversion
memiavl/persisted_node.go Fixed Show fixed Hide fixed
memiavl/README.md Outdated Show resolved Hide resolved
memiavl/README.md Outdated Show resolved Hide resolved
memiavl/README.md Outdated Show resolved Hide resolved
memiavl/README.md Outdated Show resolved Hide resolved
memiavl/snapshot.go Fixed Show fixed Hide fixed
memiavl/snapshot.go Fixed Show fixed Hide fixed
return err
}
len2 := uint64(binary.LittleEndian.Uint32(numBuf[:]))
if _, err := io.CopyN(io.Discard, reader, int64(len2)); err != nil {

Check failure

Code scanning / gosec

Potential integer overflow by integer type conversion

Potential integer overflow by integer type conversion
if err := fpNodes.Sync(); err != nil {

// re-open kvs file for reading
input, err := os.Open(kvsFile)

Check failure

Code scanning / gosec

Potential file inclusion via variable

Potential file inclusion via variable
return err
}
defer input.Close()

Check failure

Code scanning / gosec

Deferring unsafe method "Close" on type "*os.File"

Deferring unsafe method "Close" on type "*os.File"
Copy link
Collaborator

@JayT106 JayT106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

memiavl/snapshot.go Outdated Show resolved Hide resolved
memiavl/persisted_node.go Outdated Show resolved Hide resolved
memiavl/snapshot.go Outdated Show resolved Hide resolved
memiavl/snapshot.go Outdated Show resolved Hide resolved
memiavl/snapshot.go Outdated Show resolved Hide resolved
@yihuang yihuang changed the title Problem: (memiavl) offset table is not used to reduce node size Problem: memiavl snapshot format is not optimal Mar 1, 2023
CHANGELOG.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants