-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common: add Jamtis base32 encoding #6
common: add Jamtis base32 encoding #6
Conversation
0c2a105
to
7a8baff
Compare
Is this worth to review? Do you still intend to send this "into the ring" as possible alternative to the code that @DangerousFreedom1984 PRed? I am bit confused that you closed and then re-opened .... |
e61709d
to
a175a69
Compare
Yes, I did a hand-written version specifically because I wanted to see a mode where its easy to encode blocks of 5 bits at a time b/c the Jamtis body bit-size probably won't be an even multiple of 8, and the library code that @DangerousFreedom1984 adapted (or other libraries) did not seem like it would make that easy without large rewrites. There's also a no-allocate API provided here, which is quicker for fixed size fields (like Jamtis addresses), and this code also does mis-type and case normalization. |
a175a69
to
f5eb9e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a review and left some comments and questions after there were 2 votes in favor of this version versus 0 votes in favor of @DangerousFreedom1984 's Base32 PR.
I can't claim to understand every bit of calculation that is done here to code and encode, but well, the test cases show that the code works in principle, so I don't think that disqualifies my review.
f5eb9e8
to
1579bb3
Compare
Thanks for the review @rbrunner7 |
1579bb3
to
a66ecf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice how many comments you added, thank you. Future people trying to find their way into the Monero cdebase might be very grateful :)
Looks good to me now.
}; | ||
|
||
// table of the base32 symbols, in Jamtis order | ||
extern const char JAMTIS_ALPHABET[32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these tables really need to be exported? Can they be local to the cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are exported so they can be used by the base32 checksum PR #7 as default tables.
extern unsigned char JAMTIS_INVERTED_ALPHABET[256]; | ||
|
||
// constants in the inverted table that signal an ascii code is invalid or ignoreable, respectively | ||
static constexpr const unsigned char BADC = 255; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with these constants, why export them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are exported so they can be used by the base32 checksum PR #7 as default tables.
|
||
enum class Mode | ||
{ | ||
encoded_lossy, // when decoding, discard odd encoded LSB bits left at end of tail (default). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is lossy useful? And how to select not lossy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mode::binary_lossy
in useful for encoding exact blocks of 5 bits so that the encoded base32 string isn't as long. For example, Jamtis address body sizes will be an odd number of bits long, not divisible by 8. Thus, we can make the encoded string one byte shorter since there's leftover bits in the binary that we aren't using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You select binary_lossy
by passing it as the mode
parameter in each function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was expected when divisible by 5-bits that no additional values would be appended. I guess not. But then how does someone select lossless mode? There is no enum
for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it isn't explicit, almost all base32 libraries take the "encoded lossy" approach which preserves every bit in the raw data and discards extraneous encoded string bits, since that's the expected behavior 90% of the time. That's the default behavior for this code too, but now you have the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then how does someone select lossless mode?
You could make a lossless mode if and only if you forced the user to only encode raw data for which the byte length is divisible by 5, and decode encoded strings of which the length is divisible by 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be added, although I don't know when that would be useful.
return static_cast<ssize_t>(Error::invalid_char); | ||
|
||
// write symbol bits to current pointed-to byte | ||
decoded_buf_out[byte_offset] |= v << 3 >> bit_offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the << 3
here. This should shift the first value by 3 (left), and I don't see how that could be accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since MSBs are encoded "before" LSBs, we shift the 5-bit alphabet index up 3 to align it with the first bit in the byte, the MSB, then we shift it according to the bit_offset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a design choice where we could've encoded the LSBs before the MSBs, and not needed the << 3
but I like MSB->LSB type of encoding because it makes more sense for humans when you convert the raw data into a binary string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also what most base32 libraries do anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, looked at the encoding algorithm to see what you did. I suppose it doesn't matter, as long as its consistent behavior.
Thanks for the review @vtnerd, the newest commit should have all those changes you requested |
a66ecf6
to
f60bc53
Compare
see encoding scheme spec here: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024#35-base32-encoding 1. No-allocate API provided 2. "binary-lossy" mode, which lets us encrypt blocks of 5 bits at a time, useful for Jamtis addresses 3. Normalizes mis-typed characters and has case-insensitive decoding 4. Ignores hyphens when decoding 5. Error code handling
f60bc53
to
ad1cb23
Compare
see encoding scheme spec here: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024#35-base32-encoding
This PR is an alternative to #2. The motivation for this alternative was less code for reviewers (only about 60 real lines of code) and additional built-in functionality of the existing library
cppcodec
. The unit tests here include a sanity check for allowing for Jamtis address prefixes "xmra{1..9}{t,s,m}..." and a test to make sure that the added dependency doesn't change underneath our feet:base32.future_modification_protection
.