Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base32 algorithm with a basic unit_test #2

Conversation

DangerousFreedom1984
Copy link

One file base32 algorithm using specified Jamtis alphabet.



} // namespace base32
} // namespace tools
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ends of files need newlines. If you glance over the PR on github you will see a nice symbol complaining about it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@rbrunner7
Copy link
Member

A quick comment about the fact that this PR has now already two commits:

In my experience, the established workflow for PRs, especially small ones like this one, is not to make new commits for it when changing something (for whatever reason, be it review comments or be it on your own), but amend the one commit using git commit --amend and then make a force-push with git push --force.

Commenting on the PR can become difficult with changes distributed over several commits, and the commit history of the branch we merge into becomes longer with no clear advantage.

static constexpr char base32_monero_alphabet[] = {'x', 'm', 'r', 'b', 'a', 's', 'e', '3', '2', 'c', 'd',
'f', 'g', 'h', 'i', 'j', 'k', 'n', 'p', 'q', 't', 'u',
'w', 'y', '0', '1', '4', '5', '6', '7', '8', '9'};
//-------------------------------------------------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see here a very good opportunity to explain why we use a different alphabet than almost anybody else for "our" Base32 variant, and mention which characters are missing or different.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the agreed alphabet proposed by tevador after some research and discussion: https://gist.github.com/tevador/50160d160d24cfc6c52ae02eb3d17024#35-base32-encoding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes, I know, you know, but a reader new to the code and Seraphis in general will not, hence my idea why not make some comments?

*
* Adapted from https://github.com/ahmed-masud/libbase32,
* commit 79761b2b79b0545697945efe0987a8d3004512f9.
* Quite different now.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite different now.

I wonder quite a bit about this. Why, explained on a high conceptual level, did the code have to change much? The different alphabet is clear of course, but beyond that?

I have a hard time anyway to compare your code with the forked original code, which makes it very hard to quickly decide whether the code is still correct without basically looking at everything. The original ist this file, right? https://github.com/ahmed-masud/libbase32/blob/master/base32.c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, misunderstand from my side, this line is not from you, @DangerousFreedom1984 , but already present in the code that you forked:

Adapted from https://github.com/ahmed-masud/libbase32

As far as I see you don't give info in this PR where you got the code from, unfortunately, but I think I found it here: https://github.com/tplgy/cppcodec/blob/master/cppcodec/detail/base32.hpp , with parts from other files of that repo copied in, to have it all in one file I suppose.

Which still leads me to the question: That file from Ahmed Masud I linked above is wonderfully small, and wonderfully readable. Is it really not fit for purpose?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I copied most of the code from https://github.com/tplgy/cppcodec/blob/master/cppcodec/detail/base32.hpp as it seems to me the most reliable and efficient base32 library.
The code from Ahmed does not seem to be as efficient as this one as they improved it. I may be wrong though.
Since both are MIT licenses I believe that the correct way to copy is by pasting the exact header in our header file, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both are MIT licenses I believe that the correct way to copy is by pasting the exact header in our header file, right?

I am a unsure myself. The wording of the license is a bit different, maybe just a different iteration. At least we are on the safe side if we copy verbatim I guess.

size_t decoded_max_size(size_t encoded_size) noexcept;
size_t encoded_size(size_t binary_size) noexcept;
void decode_block(std::string& decoded, const alphabet_index_t* idx);
void decode_tail(std::string& decoded, const alphabet_index_t* idx, size_t idx_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to know what "decode the tail" means, and whether we really need this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need it. It decodes correctly the last remaining block. You can check the idea of the algorithm here:
https://herongyang.com/Encoding/Base32-Encoding-Algorithm.html

I believe that this is the typical PR that needs different implementations, comparison and performance tests. I dont expect this exact code to be used in the final version but so far I do believe that it is correct, optimized and will serve for our purposes at the moment.

@UkoeHB
Copy link

UkoeHB commented Jun 9, 2023

A quick comment about the fact that this PR has now already two commits:

It is useful to have multiple commits during review, otherwise you need to re-review the entire PR after every update.

@rbrunner7
Copy link
Member

It is useful to have multiple commits during review, otherwise you need to re-review the entire PR after every update

It's easy to overlook, but GitHub solves exactly that problem: It gives you a view that shows only the changes resulting from a force-push. This screenshot is from one of my PRs:

image

The "Compare" link to the right gives you this handy view of the first changes I did based on early review comments.

Here is a PR from @jeffro256 with 4 such after-review-comments force-pushes, here one from @moneromooo-monero with 6, and here one from @j-berman with 4. They all seemed to stick to a single commit over the time the PRs were revised.

Copy link

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these are benign.


if (alphabet_index_info::is_invalid(*alphabet_index_ptr))
{
throw std::invalid_argument("decode: Symbol error");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've typically returned bool in these types of (simple/basic) functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any problem in using 'throw' ?


ASSERT_EQ(encoded_test, "gskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr0fmgskwr02");

base32::decode(encoded_test, recovered_test);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some tests for invalid characters that "return false".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included.


TEST(base32, encode_decode)
{
do_test_simple();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but why have the function at all?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was dumb. More tests were created.

'w', 'y', '0', '1', '4', '5', '6', '7', '8', '9'};
//-------------------------------------------------------------------------------------------------------------------
//-------------------------------------------------------------------------------------------------------------------
static constexpr uint8_t binary_block_size() { return 5; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be constants instead of functions, the compiler doesn't have to reserve static memory for them if the address is never taken, etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
//-------------------------------------------------------------------------------------------------------------------
//-------------------------------------------------------------------------------------------------------------------
static constexpr char symbol(alphabet_index_t idx) { return base32_monero_alphabet[idx]; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick again, but this function isn't really doing much.

auto remaining_src_len = src_end - src;
if (!remaining_src_len || remaining_src_len >= binary_block_size())
{
abort();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible? Seems like a good candidate for assert. Or just throw std::logic_error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, abort is too abrupt.

}
}
//-------------------------------------------------------------------------------------------------------------------
void encode(const std::string input, std::string& encoded_out)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::string& input or even better epee::span<const std::uint8_t>. It doesn't appear the null termination byte can ever be used in the encoder or decoder.

encode(encoded_out, binary, binary_size);
}
//-------------------------------------------------------------------------------------------------------------------
void decode(const std::string input, std::string& decoded_out)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, const std::string& or epee::span<const std::uint8_t>?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

alphabet_indexes[0] = alphabet_index_info::eof_idx;

alphabet_index_t* const alphabet_index_start = &alphabet_indexes[0];
alphabet_index_t* const alphabet_index_end = &alphabet_indexes[encoded_block_size()];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mark alphabet_index const* const for both?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that but then I would have to use const_cast in this line alphabet_index_ptr = const_cast<alphabet_index_t* (alphabet_index_start);
Which I dont know if it make things safer.


if (last_index_ptr != alphabet_index_start)
{
if (alphabet_index_ptr >= alphabet_index_end)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even possible given the logic of the function ? An assert() if this should never fail is ok.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, better.

@DangerousFreedom1984
Copy link
Author

Thank you for the review @vtnerd. I also included @jeffro256 unit_tests from PR #6 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants