-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base32 algorithm with a basic unit_test #2
base32 algorithm with a basic unit_test #2
Conversation
src/common/base32.cpp
Outdated
|
||
|
||
} // namespace base32 | ||
} // namespace tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ends of files need newlines. If you glance over the PR on github you will see a nice symbol complaining about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
add space
A quick comment about the fact that this PR has now already two commits: In my experience, the established workflow for PRs, especially small ones like this one, is not to make new commits for it when changing something (for whatever reason, be it review comments or be it on your own), but amend the one commit using Commenting on the PR can become difficult with changes distributed over several commits, and the commit history of the branch we merge into becomes longer with no clear advantage. |
src/common/base32.cpp
Outdated
static constexpr char base32_monero_alphabet[] = {'x', 'm', 'r', 'b', 'a', 's', 'e', '3', '2', 'c', 'd', | ||
'f', 'g', 'h', 'i', 'j', 'k', 'n', 'p', 'q', 't', 'u', | ||
'w', 'y', '0', '1', '4', '5', '6', '7', '8', '9'}; | ||
//------------------------------------------------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see here a very good opportunity to explain why we use a different alphabet than almost anybody else for "our" Base32 variant, and mention which characters are missing or different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the agreed alphabet proposed by tevador after some research and discussion: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024#35-base32-encoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes, I know, you know, but a reader new to the code and Seraphis in general will not, hence my idea why not make some comments?
* | ||
* Adapted from https://github.com/ahmed-masud/libbase32, | ||
* commit 79761b2b79b0545697945efe0987a8d3004512f9. | ||
* Quite different now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite different now.
I wonder quite a bit about this. Why, explained on a high conceptual level, did the code have to change much? The different alphabet is clear of course, but beyond that?
I have a hard time anyway to compare your code with the forked original code, which makes it very hard to quickly decide whether the code is still correct without basically looking at everything. The original ist this file, right? https://github.com/ahmed-masud/libbase32/blob/master/base32.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, misunderstand from my side, this line is not from you, @DangerousFreedom1984 , but already present in the code that you forked:
Adapted from https://github.com/ahmed-masud/libbase32
As far as I see you don't give info in this PR where you got the code from, unfortunately, but I think I found it here: https://github.com/tplgy/cppcodec/blob/master/cppcodec/detail/base32.hpp , with parts from other files of that repo copied in, to have it all in one file I suppose.
Which still leads me to the question: That file from Ahmed Masud I linked above is wonderfully small, and wonderfully readable. Is it really not fit for purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I copied most of the code from https://github.com/tplgy/cppcodec/blob/master/cppcodec/detail/base32.hpp as it seems to me the most reliable and efficient base32 library.
The code from Ahmed does not seem to be as efficient as this one as they improved it. I may be wrong though.
Since both are MIT licenses I believe that the correct way to copy is by pasting the exact header in our header file, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since both are MIT licenses I believe that the correct way to copy is by pasting the exact header in our header file, right?
I am a unsure myself. The wording of the license is a bit different, maybe just a different iteration. At least we are on the safe side if we copy verbatim I guess.
size_t decoded_max_size(size_t encoded_size) noexcept; | ||
size_t encoded_size(size_t binary_size) noexcept; | ||
void decode_block(std::string& decoded, const alphabet_index_t* idx); | ||
void decode_tail(std::string& decoded, const alphabet_index_t* idx, size_t idx_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know what "decode the tail" means, and whether we really need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need it. It decodes correctly the last remaining block. You can check the idea of the algorithm here:
https://herongyang.com/Encoding/Base32-Encoding-Algorithm.html
I believe that this is the typical PR that needs different implementations, comparison and performance tests. I dont expect this exact code to be used in the final version but so far I do believe that it is correct, optimized and will serve for our purposes at the moment.
It is useful to have multiple commits during review, otherwise you need to re-review the entire PR after every update. |
It's easy to overlook, but GitHub solves exactly that problem: It gives you a view that shows only the changes resulting from a force-push. This screenshot is from one of my PRs: The "Compare" link to the right gives you this handy view of the first changes I did based on early review comments. Here is a PR from @jeffro256 with 4 such after-review-comments force-pushes, here one from @moneromooo-monero with 6, and here one from @j-berman with 4. They all seemed to stick to a single commit over the time the PRs were revised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these are benign.
|
||
if (alphabet_index_info::is_invalid(*alphabet_index_ptr)) | ||
{ | ||
throw std::invalid_argument("decode: Symbol error"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've typically returned bool
in these types of (simple/basic) functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any problem in using 'throw' ?
tests/unit_tests/base32.cpp
Outdated
|
||
ASSERT_EQ(encoded_test, "gskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr02"); | ||
|
||
base32::decode(encoded_test, recovered_test); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be some tests for invalid characters that "return false
".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included.
tests/unit_tests/base32.cpp
Outdated
|
||
TEST(base32, encode_decode) | ||
{ | ||
do_test_simple(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but why have the function at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was dumb. More tests were created.
src/common/base32.cpp
Outdated
'w', 'y', '0', '1', '4', '5', '6', '7', '8', '9'}; | ||
//------------------------------------------------------------------------------------------------------------------- | ||
//------------------------------------------------------------------------------------------------------------------- | ||
static constexpr uint8_t binary_block_size() { return 5; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be constants instead of functions, the compiler doesn't have to reserve static memory for them if the address is never taken, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
//------------------------------------------------------------------------------------------------------------------- | ||
//------------------------------------------------------------------------------------------------------------------- | ||
static constexpr char symbol(alphabet_index_t idx) { return base32_monero_alphabet[idx]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick again, but this function isn't really doing much.
src/common/base32.cpp
Outdated
auto remaining_src_len = src_end - src; | ||
if (!remaining_src_len || remaining_src_len >= binary_block_size()) | ||
{ | ||
abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible? Seems like a good candidate for assert
. Or just throw std::logic_error
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, abort is too abrupt.
src/common/base32.cpp
Outdated
} | ||
} | ||
//------------------------------------------------------------------------------------------------------------------- | ||
void encode(const std::string input, std::string& encoded_out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const std::string& input
or even better epee::span<const std::uint8_t>
. It doesn't appear the null termination byte can ever be used in the encoder or decoder.
src/common/base32.cpp
Outdated
encode(encoded_out, binary, binary_size); | ||
} | ||
//------------------------------------------------------------------------------------------------------------------- | ||
void decode(const std::string input, std::string& decoded_out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, const std::string&
or epee::span<const std::uint8_t>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
src/common/base32.cpp
Outdated
alphabet_indexes[0] = alphabet_index_info::eof_idx; | ||
|
||
alphabet_index_t* const alphabet_index_start = &alphabet_indexes[0]; | ||
alphabet_index_t* const alphabet_index_end = &alphabet_indexes[encoded_block_size()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps mark alphabet_index const* const
for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do that but then I would have to use const_cast in this line alphabet_index_ptr = const_cast<alphabet_index_t* (alphabet_index_start);
Which I dont know if it make things safer.
src/common/base32.cpp
Outdated
|
||
if (last_index_ptr != alphabet_index_start) | ||
{ | ||
if (alphabet_index_ptr >= alphabet_index_end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this even possible given the logic of the function ? An assert()
if this should never fail is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, better.
Thank you for the review @vtnerd. I also included @jeffro256 unit_tests from PR #6 . |
One file base32 algorithm using specified Jamtis alphabet.