-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dbnode] Faster M3TSZ decoding by using 64 bit operations #2827
Conversation
834e25e
to
99f874f
Compare
e130b7e
to
3cf877e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few small comments
Codecov Report
@@ Coverage Diff @@
## master #2827 +/- ##
=======================================
Coverage 70.9% 70.9%
=======================================
Files 1079 1079
Lines 100527 100527
=======================================
Hits 71319 71319
Misses 24148 24148
Partials 5060 5060
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #2827 +/- ##
=======================================
Coverage 72.3% 72.3%
=======================================
Files 1096 1096
Lines 102315 102315
=======================================
Hits 74020 74020
Misses 23195 23195
Partials 5100 5100
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
…nasm/m3tsz-performance
@robskillington we've noticed (in the CPU profile above) that M3TSZ decoding can take a significant chunk of peer bootstrappers processing time (decoding data streamed from peers). Which makes me think we should prioritise this PR already. |
} | ||
return Bit(is.consumeBuffer(1)), nil | ||
// NewIStream creates a new IStream | ||
func NewIStream(reader64 xio.Reader64) *IStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is returning a ptr here for performance reasons? Noticed none of the fields are public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good question. IStream
used to be an interface before this change, and now it has become a struct. I think I left it returning a pointer in order to avoid accidental semantics change at the call sites (to avoid any potential pass-by-value of this struct down the stream).
numBits -= numToRead | ||
res := readBitsInWord(is.current, numBits) | ||
bitsNeeded := numBits - is.remaining | ||
if err := is.readWordFromStream(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following this logic here... shouldn't we check if there are remaining bits to be reading before reading another word from the stream?
Might be worth adding some comments here around the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is such check with an early return few lines above:
if numBits <= is.remaining {
// Have enough bits buffered.
return is.consumeBuffer(numBits), nil
}
I'll add some comments for clarity.
r.index += 8 | ||
return res, 8, nil | ||
} | ||
if r.index >= len(r.data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this check can go first? nbd though haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the current implementation should be more performant. On most of the calls to this method we will have 8 (or more) bytes available, so we will do an early return from the first if, avoiding any other checks.
src/dbnode/x/xio/segment_reader.go
Outdated
sr.lazyInit() | ||
|
||
var ( | ||
nh = len(sr.lazyHead) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the variable names are a little hard to parse here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will rename:
headLen = len(sr.lazyHead)
headTailLen = headLen + len(sr.lazyTail)
Code semantically LGTM. Taking the perf measurements at face value :). |
What this PR does / why we need it:
Changes
IStream
to read and process the encoded data stream by using 64 bit operations (instead of using the standardio.Reader
that reads the data byte-by-byte).According to the microbenchmark based on some real world time series, this improves M3TSZ decoding performance by ~37%. Microbenchmark results:
before this PR -
BenchmarkM3TSZDecode-12 10000 108797 ns/op
after switching to 64 bit decoding -
BenchmarkM3TSZDecode-12 16813 71793 ns/op
after dropping IStream interface and using the struct directly -
BenchmarkM3TSZDecode-12 16867 69272 ns/op
Also included macro-benchmark results on some real world data, obtained using a modified
read_data_files
utility:Command:
read_data_files -b 1596009600000000000 -p ~/testdata/m3db -s 4 -v 0 -B datapoints
Before the change:
After the change:
31% improvement (including the overhead of actually reading the file).
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE
Does this PR require updating code package or user-facing documentation?:
NONE