Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use size_t where appropriate #363

Closed
minad opened this issue Oct 8, 2019 · 36 comments
Closed

use size_t where appropriate #363

minad opened this issue Oct 8, 2019 · 36 comments
Milestone

Comments

@minad
Copy link
Member

minad commented Oct 8, 2019

In particular in mp_int. 2.0 refactoring!

@minad minad added this to the v2.0.0 milestone Oct 9, 2019
@minad
Copy link
Member Author

minad commented Oct 18, 2019

@dmitry-lipetsk interesting! Are there issues/mistakes you found while doing the porting? You mention that in your readme. It would be nice to backport them here.

@dmitry-lipetsk
Copy link
Contributor

If I remember correcly, I described all detected issues here (in Issues).

@minad
Copy link
Member Author

minad commented Oct 18, 2019

Ok thank you!

@czurnieden
Copy link
Contributor

Are there plans to tackle the carefully ignored integer overflow in mp_count_bits? That problem is theoretical only for 32/64-bit large size_t but not so much for the LibCs for some 16-bit architectures with 16-bit size_t (e.g.: AVR Libc ) with enough memory.

Adding an overflow check in mp_count_bits is not a problem, signaling the error is. We could return SIZE_MAX (as the error MP_SOF maybe?) which is safe because (sizeof(size_t) * CHAR_BIT)/MP_DIGIT_BIT < SIZE_MAX for all of the current MP_nBIT sizes.

E.g.: something like this:

/* Useful elsewhere, too? */
#define MP_USED_MAX (SIZE_MAX/MP_DIGIT_BIT)

size_t mp_count_bits(const mp_int *a)
{
   size_t   r;
   mp_digit q;

   /* shortcut */
   if (MP_IS_ZERO(a)) {
      return 0;
   }

   /* get number of digits and add that */
   r = a->used;
   if (r > MP_USED_MAX) {
      return SIZE_MAX /*MP_SOF*/;
   }
   r = (r - 1) * MP_DIGIT_BIT;

   /* take the last digit and count the bits in it */
   q = a->dp[a->used - 1];
   while (q > 0u) {
      ++r;
      q >>= 1u;
   }
   return r;
}

But is it worth the hassle? I don't know.

@nijtmans
Copy link
Collaborator

I don't think this is a problem at all. Reasoning: The problem arises when a->used > MP_USED_MAX. Well, a->used is always smaller than a->alloc (since you cannot use more bytes than you allocate). So, the problem can only exist when a->alloc > MP_USED_MAX. This means more bytes than MP_USED_MAX*MP_DIGIT_BIT (= SIZE_MAX) were allocated for the mp_int. That's impossible, since such a malloc will fail. Conclusion: This situation can simply never happen if a was initialised by libtommath.

@nijtmans
Copy link
Collaborator

Hm ... Thinking more about it, yes it can happen. I forgot the difference between bytes/bits. As soon as the mp_int occupies more than 1/8th of the maximum addressable memory, r might overflow.

@minad
Copy link
Member Author

minad commented Oct 19, 2019

We could always check before malloc that size*digitbits does not overflow?

@nijtmans
Copy link
Collaborator

I prefer your original proposal: Checking for it and returning SIZE_MAX in case of overflow.

@nijtmans
Copy link
Collaborator

..... or make r an "unsigned long long" and return that. No machine can allocate that much memory

@minad
Copy link
Member Author

minad commented Oct 19, 2019

Or simply return mp_err.

@czurnieden
Copy link
Contributor

Or simply return mp_err.

Would work only now with int (just return something negative) but not later with size_t which is unsigned according to the standard.

Or change the function to something like e.g.: mp_err mp_count_bits(mp_int *a, size_t *size) ?

@minad
Copy link
Member Author

minad commented Oct 19, 2019

Change the function

@czurnieden
Copy link
Contributor

Change the function

Started that proces in #391
Might take some time to end it, it is used at a lot of places ;-)

@minad
Copy link
Member Author

minad commented Oct 20, 2019

@czurnieden please wait a bit until we made a better plan. This is a major change which touches everything.

@minad
Copy link
Member Author

minad commented Oct 22, 2019

I started to make a plan regarding the more specific size types. For clarity I introduced typedefs for the types in question. It doesn't mean that we will introduce these typedefs, it is just to make clear what is used and where it is used.

typedef int mp_digitcount;
typedef int mp_bitcount;

typedef struct  {
   mp_digitcount used, alloc;
   mp_sign sign;
   mp_digit *dp;
} mp_int;

/* use mp_digitcount */
mp_err mp_grow(mp_int *a, mp_digitcount size);
mp_err mp_init_size(mp_int *a, mp_digitcount size);
void mp_rshd(mp_int *a, mp_digitcount b);
mp_err mp_lshd(mp_int *a, mp_digitcount b);
mp_err mp_rand(mp_int *a, mp_digitcount digits);
int mp_prime_rabin_miller_trials(mp_digitcount size);
mp_err mp_prime_rand(mp_int *a, int t, mp_digitcount size, int flags);

/* use mp_bitcount */
mp_bitcount mp_count_bits(const mp_int *a);
mp_err mp_div_2d(const mp_int *a, mp_bitcount b, mp_int *c, mp_int *d);
mp_err mp_mul_2d(const mp_int *a, mp_bitcount b, mp_int *c);
mp_err mp_mod_2d(const mp_int *a, mp_bitcount b, mp_int *c);
mp_err mp_2expt(mp_int *a, mp_bitcount b);
mp_err mp_signed_rsh(const mp_int *a, mp_bitcount b, mp_int *c);
mp_bitcount mp_cnt_lsb(const mp_int *a);

It might make sense to define the types as size_t, but then we get an overflow issue for the bitcount. Maybe use a larger type for the bitcount then. But should it be platform specific, since size_t is also platform specific?

Ping @czurnieden, @nijtmans, @sjaeckel

EDIT: I should add - we also have to be careful about size_t, since size_t counts bytes and not digits. This means calling mp_init with SIZE_MAX is an overflow. The allocation request could not be satisfied for sure, but we have to make sure that no overflow happens (are we using calloc there which ensures that?).

@nijtmans
Copy link
Collaborator

Yes, that's like the idea.

One more function:

int mp_prime_rabin_miller_trials(mp_bitcount size);

I would propose mp_digitcount to be size_t, and mp_bitcount to be simply "unsigned long". That means for 16-bit and 32-bit platforms we can allocate 2^32 bits = 2^29 bytes (=536870912 bytes = 0.5 gigabytes ), that's at least double what we can do in v1.2.0. For 64bit platforms (LP64), it's the same as size_t, and can even address 2^64 bits of memory, so 2^61 bytes (=2305843009213693952 bytes), much more than any memory in the forseeable future....

@nijtmans
Copy link
Collaborator

One more:

mp_bitcount mp_cnt_lsb(const mp_int *a);

@minad
Copy link
Member Author

minad commented Oct 22, 2019

@nijtmans Shall we introduce typedefs or simply use the plain types? Typedefs obfuscate the underlying type but clarify the usage.

@nijtmans
Copy link
Collaborator

The bitcount part is committed here: #398

@nijtmans
Copy link
Collaborator

@nijtmans Shall we introduce typedefs or simply use the plain types? Typedefs obfuscate the underlying type but clarify the usage.

I prefer the plain types. But I can live with typedefs too.

@minad
Copy link
Member Author

minad commented Oct 22, 2019

@nijtmans I would prefer if we wait a bit with PRs until the discussion is finished and we know what we want. I closed the similar PR by @czurnieden, but also due to conflicts with other PRs.

@nijtmans
Copy link
Collaborator

Please consider this commit as base for discussion. I have no problem with throwing it away if a different outcome finalizes. This way we can see exactly what the changes are, so we can discuss about it.

@minad
Copy link
Member Author

minad commented Oct 22, 2019

Ok, but let's keep the discussion here. I am a bit split about the typedef question. Using unsigned long looks arbitrary. In the future it could well happen that someone introduces another function which should use bitcount, but uses int mistakenly.

@sjaeckel
Copy link
Member

That means for 16-bit and 32-bit platforms we can allocate 2^32 bits = 2^29 bytes (=536870912 bytes = 0.5 gigabytes ), that's at least double what we can do in v1.2.0. For 64bit platforms (LP64), it's the same as size_t, and can even address 2^64 bits of memory, so 2^61 bytes (=2305843009213693952 bytes), much more than any memory in the forseeable future....

how likely is it that someone wants to work with 2^32 bit wide MPI's or wider?

The allocation request could not be satisfied for sure, but we have to make sure that no overflow happens (are we using calloc there which ensures that?).

yes we use calloc there

I could btw. live with using unsinged long long or uint64_t for both, but I'm tired of these LP/LLP problems and want to avoid using unsigned long.

Whether it should be a typedef or not... depends on what we use as basetype

@minad
Copy link
Member Author

minad commented Oct 23, 2019

I say we define typedefs mp_bitcount and mp_digitcount. mp_digitcount as size_t since this is tied to allocations. For mp_digitcount we use size_t on 32/64 bit platforms (no lp/llp issue) and uint32_t on 16 bit architectures. Then we ensure at allocation time that no overflow can happen. Overflow is only possible for 32 bit architectures then since there both digitcount and bitcount are 32 bit.

@minad
Copy link
Member Author

minad commented Oct 23, 2019

Using uint64_t for both is not feasible on 16 bit archs. On 32/64 bit archs we could simply use size_t for both. But then we should drop MP_16BIT and I am not sure if that is desired.

@minad
Copy link
Member Author

minad commented Oct 23, 2019

Alternatively just use size_t for both and ensure at allocation time that the bitcount won't overflow. This restricts as to 8kib numbers on 16 bit archs. This was my original proposal but @nijtmans didn't like it for some reaso. Could you elaborate @nijtmans? While 8kib numbers are not large you can only allocate 8 of them on a 16bit architecture, so it is not that small after all. But on 16bit archs there are also long pointers which might allow for more memory. I have no idea if such scenarios would work with ltm.

I think using size_t for both would be reasonable for both 32 and 64 bit archs and that's what we should focus on the most.

@minad
Copy link
Member Author

minad commented Oct 23, 2019

@dmitry-lipetsk seems also to use size_t for both digit and bit count.

@sjaeckel
Copy link
Member

I'm also fine with size_t btw.

@nijtmans
Copy link
Collaborator

I concur: size_t is probably the best option for both. My experiment (now closed) shows that when using a different type for both situations, results in the need of a lot of type-casts in various situations. Well, I don't like type-casts :-)

@czurnieden
Copy link
Contributor

how likely is it that someone wants to work with 2^32 bit wide MPI's or wider?

"[H]ow likely is it […]" is a term many "Famous Last Words" begin with ;-)

But it can happen accidentally, so it would be nice to have some kind of check; allocation time would be a nice place to pin it, I think, like, who was it, @minad suggested.

size_t?
I'm all for it!

And the problem with MP_16BIT is not that hard, don't forget that these are small MCUs, they won't do a lot of number crunching and it i still enough for cryptography with larger primes.

@minad
Copy link
Member Author

minad commented Oct 24, 2019

Good! Let's go with size_t then 👍

How shall we proceed with a PR? There are not much open finished other PRs right now, which would lead to conflicts. Maybe work together on one big PR changing all the functions and the struct as mentioned in my comment before? I am not sure if it is possible to split the work into smaller units.

@minad
Copy link
Member Author

minad commented Oct 27, 2019

Someone wants to take a stab at this? Maybe we should start to convert each function step by step with conversion warnings turned off, only ensuring that the test suite stays green. Afterwards fix the conversion warnings. This way we can split the work a bit, I hope.

@minad minad mentioned this issue Oct 28, 2019
@minad
Copy link
Member Author

minad commented Oct 29, 2019

I made a first attempt in #430, but abandoned that one for now. At first I want to do some preparatory simplifications #434.

Besides that, when experimenting with this, I thought we should maybe discuss a bit more.

Moving to size_t means two things - changing the width and changing the signedness.
I see two downsides if we do that:

  • Moving to an unsigned type has the obvious problem that we have to be more careful about loop variables, due the wrap around. This is certainly a problem if one converts an existing code base. Many style guides also don't recommend unsigned types for that reason, since they lead to more error prone code. I partially agree with that, certainly more care is needed.

  • Changing the size is also a bit questionable. For example I think on 16 bit platforms size_t/unsigned int is a good choice, allowing 8KiB numbers. On 32 bit platforms int allows 256MiB, unsigned int/size_t allows 512MiB which is more than enough. On 64 bit platforms, size_t feels like overkill however. We could however still just use unsigned int since numbers larger than 512MiB are still not common I think. GMP also uses int everywhere. Maybe we could also introduce our own mp_count=int/unsigned int type which is either 16 or 32 bit if we decide against size_t.

@minad
Copy link
Member Author

minad commented Feb 17, 2020

Right now I don't have intentions to implement this change. I am not convinced anymore, in particular due to the issues with loop variables. Instead of such a needless refactoring, we should focus on more pressing correctness issues like #475 etc.

@minad minad closed this as completed Feb 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants