Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: add Jamtis base32 encoding #6

Merged

Conversation

jeffro256
Copy link

@jeffro256 jeffro256 commented Sep 10, 2023

see encoding scheme spec here: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024#35-base32-encoding

This PR is an alternative to #2. The motivation for this alternative was less code for reviewers (only about 60 real lines of code) and additional built-in functionality of the existing library cppcodec. The unit tests here include a sanity check for allowing for Jamtis address prefixes "xmra{1..9}{t,s,m}..." and a test to make sure that the added dependency doesn't change underneath our feet: base32.future_modification_protection.

src/common/base32_monero.h Outdated Show resolved Hide resolved
@rbrunner7
Copy link
Member

Is this worth to review? Do you still intend to send this "into the ring" as possible alternative to the code that @DangerousFreedom1984 PRed? I am bit confused that you closed and then re-opened ....

@jeffro256 jeffro256 force-pushed the jamtis_base32_sm branch 2 times, most recently from e61709d to a175a69 Compare September 15, 2023 07:20
@jeffro256
Copy link
Author

jeffro256 commented Sep 15, 2023

Yes, I did a hand-written version specifically because I wanted to see a mode where its easy to encode blocks of 5 bits at a time b/c the Jamtis body bit-size probably won't be an even multiple of 8, and the library code that @DangerousFreedom1984 adapted (or other libraries) did not seem like it would make that easy without large rewrites. There's also a no-allocate API provided here, which is quicker for fixed size fields (like Jamtis addresses), and this code also does mis-type and case normalization.

Copy link
Member

@rbrunner7 rbrunner7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a review and left some comments and questions after there were 2 votes in favor of this version versus 0 votes in favor of @DangerousFreedom1984 's Base32 PR.

I can't claim to understand every bit of calculation that is done here to code and encode, but well, the test cases show that the code works in principle, so I don't think that disqualifies my review.

tests/unit_tests/base32.cpp Outdated Show resolved Hide resolved
tests/unit_tests/base32.cpp Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved
tests/unit_tests/base32.cpp Show resolved Hide resolved
tests/unit_tests/base32.cpp Show resolved Hide resolved
@jeffro256
Copy link
Author

Thanks for the review @rbrunner7

Copy link
Member

@rbrunner7 rbrunner7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice how many comments you added, thank you. Future people trying to find their way into the Monero cdebase might be very grateful :)

Looks good to me now.

src/common/base32.h Outdated Show resolved Hide resolved
};

// table of the base32 symbols, in Jamtis order
extern const char JAMTIS_ALPHABET[32];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these tables really need to be exported? Can they be local to the cpp?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are exported so they can be used by the base32 checksum PR #7 as default tables.

extern unsigned char JAMTIS_INVERTED_ALPHABET[256];

// constants in the inverted table that signal an ascii code is invalid or ignoreable, respectively
static constexpr const unsigned char BADC = 255;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with these constants, why export them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are exported so they can be used by the base32 checksum PR #7 as default tables.

src/common/base32.cpp Outdated Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved
src/common/base32.cpp Outdated Show resolved Hide resolved

enum class Mode
{
encoded_lossy, // when decoding, discard odd encoded LSB bits left at end of tail (default).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is lossy useful? And how to select not lossy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mode::binary_lossy in useful for encoding exact blocks of 5 bits so that the encoded base32 string isn't as long. For example, Jamtis address body sizes will be an odd number of bits long, not divisible by 8. Thus, we can make the encoded string one byte shorter since there's leftover bits in the binary that we aren't using.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You select binary_lossy by passing it as the mode parameter in each function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was expected when divisible by 5-bits that no additional values would be appended. I guess not. But then how does someone select lossless mode? There is no enum for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it isn't explicit, almost all base32 libraries take the "encoded lossy" approach which preserves every bit in the raw data and discards extraneous encoded string bits, since that's the expected behavior 90% of the time. That's the default behavior for this code too, but now you have the option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then how does someone select lossless mode?

You could make a lossless mode if and only if you forced the user to only encode raw data for which the byte length is divisible by 5, and decode encoded strings of which the length is divisible by 8.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be added, although I don't know when that would be useful.

return static_cast<ssize_t>(Error::invalid_char);

// write symbol bits to current pointed-to byte
decoded_buf_out[byte_offset] |= v << 3 >> bit_offset;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the << 3 here. This should shift the first value by 3 (left), and I don't see how that could be accurate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since MSBs are encoded "before" LSBs, we shift the 5-bit alphabet index up 3 to align it with the first bit in the byte, the MSB, then we shift it according to the bit_offset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a design choice where we could've encoded the LSBs before the MSBs, and not needed the << 3 but I like MSB->LSB type of encoding because it makes more sense for humans when you convert the raw data into a binary string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also what most base32 libraries do anyways.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, looked at the encoding algorithm to see what you did. I suppose it doesn't matter, as long as its consistent behavior.

@jeffro256
Copy link
Author

Thanks for the review @vtnerd, the newest commit should have all those changes you requested

see encoding scheme spec here: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024#35-base32-encoding

1. No-allocate API provided
2. "binary-lossy" mode, which lets us encrypt blocks of 5 bits at a time, useful for Jamtis addresses
3. Normalizes mis-typed characters and has case-insensitive decoding
4. Ignores hyphens when decoding
5. Error code handling
@rbrunner7 rbrunner7 merged commit d7e89f7 into seraphis-migration:seraphis_wallet Sep 26, 2023
@jeffro256 jeffro256 deleted the jamtis_base32_sm branch September 26, 2023 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants