use size_t where appropriate #363

minad · 2019-10-08T20:49:40Z

In particular in mp_int. 2.0 refactoring!

dmitry-lipetsk · 2019-10-18T17:46:03Z

https://github.com/dmitry-lipetsk/libtommath-for-ibprovider--set01/blob/master/ibp_external__libtommatch_set01__tommath.h

:)

minad · 2019-10-18T17:51:49Z

@dmitry-lipetsk interesting! Are there issues/mistakes you found while doing the porting? You mention that in your readme. It would be nice to backport them here.

dmitry-lipetsk · 2019-10-18T18:12:32Z

If I remember correcly, I described all detected issues here (in Issues).

minad · 2019-10-18T18:40:23Z

Ok thank you!

czurnieden · 2019-10-18T22:01:59Z

Are there plans to tackle the carefully ignored integer overflow in mp_count_bits? That problem is theoretical only for 32/64-bit large size_t but not so much for the LibCs for some 16-bit architectures with 16-bit size_t (e.g.: AVR Libc ) with enough memory.

Adding an overflow check in mp_count_bits is not a problem, signaling the error is. We could return SIZE_MAX (as the error MP_SOF maybe?) which is safe because (sizeof(size_t) * CHAR_BIT)/MP_DIGIT_BIT < SIZE_MAX for all of the current MP_nBIT sizes.

E.g.: something like this:

/* Useful elsewhere, too? */
#define MP_USED_MAX (SIZE_MAX/MP_DIGIT_BIT)

size_t mp_count_bits(const mp_int *a)
{
   size_t   r;
   mp_digit q;

   /* shortcut */
   if (MP_IS_ZERO(a)) {
      return 0;
   }

   /* get number of digits and add that */
   r = a->used;
   if (r > MP_USED_MAX) {
      return SIZE_MAX /*MP_SOF*/;
   }
   r = (r - 1) * MP_DIGIT_BIT;

   /* take the last digit and count the bits in it */
   q = a->dp[a->used - 1];
   while (q > 0u) {
      ++r;
      q >>= 1u;
   }
   return r;
}

But is it worth the hassle? I don't know.

nijtmans · 2019-10-19T19:09:59Z

I don't think this is a problem at all. Reasoning: The problem arises when a->used > MP_USED_MAX. Well, a->used is always smaller than a->alloc (since you cannot use more bytes than you allocate). So, the problem can only exist when a->alloc > MP_USED_MAX. This means more bytes than MP_USED_MAX*MP_DIGIT_BIT (= SIZE_MAX) were allocated for the mp_int. That's impossible, since such a malloc will fail. Conclusion: This situation can simply never happen if a was initialised by libtommath.

nijtmans · 2019-10-19T19:12:22Z

Hm ... Thinking more about it, yes it can happen. I forgot the difference between bytes/bits. As soon as the mp_int occupies more than 1/8th of the maximum addressable memory, r might overflow.

minad · 2019-10-19T19:16:41Z

We could always check before malloc that size*digitbits does not overflow?

nijtmans · 2019-10-19T19:19:38Z

I prefer your original proposal: Checking for it and returning SIZE_MAX in case of overflow.

nijtmans · 2019-10-19T19:22:50Z

..... or make r an "unsigned long long" and return that. No machine can allocate that much memory

minad · 2019-10-19T19:25:48Z

Or simply return mp_err.

czurnieden · 2019-10-19T20:06:46Z

Or simply return mp_err.

Would work only now with int (just return something negative) but not later with size_t which is unsigned according to the standard.

Or change the function to something like e.g.: mp_err mp_count_bits(mp_int *a, size_t *size) ?

minad · 2019-10-19T20:09:29Z

Change the function

czurnieden · 2019-10-19T21:45:32Z

Change the function

Started that proces in #391
Might take some time to end it, it is used at a lot of places ;-)

minad · 2019-10-20T16:45:36Z

@czurnieden please wait a bit until we made a better plan. This is a major change which touches everything.

minad · 2019-10-22T13:27:20Z

I started to make a plan regarding the more specific size types. For clarity I introduced typedefs for the types in question. It doesn't mean that we will introduce these typedefs, it is just to make clear what is used and where it is used.

typedef int mp_digitcount;
typedef int mp_bitcount;

typedef struct  {
   mp_digitcount used, alloc;
   mp_sign sign;
   mp_digit *dp;
} mp_int;

/* use mp_digitcount */
mp_err mp_grow(mp_int *a, mp_digitcount size);
mp_err mp_init_size(mp_int *a, mp_digitcount size);
void mp_rshd(mp_int *a, mp_digitcount b);
mp_err mp_lshd(mp_int *a, mp_digitcount b);
mp_err mp_rand(mp_int *a, mp_digitcount digits);
int mp_prime_rabin_miller_trials(mp_digitcount size);
mp_err mp_prime_rand(mp_int *a, int t, mp_digitcount size, int flags);

/* use mp_bitcount */
mp_bitcount mp_count_bits(const mp_int *a);
mp_err mp_div_2d(const mp_int *a, mp_bitcount b, mp_int *c, mp_int *d);
mp_err mp_mul_2d(const mp_int *a, mp_bitcount b, mp_int *c);
mp_err mp_mod_2d(const mp_int *a, mp_bitcount b, mp_int *c);
mp_err mp_2expt(mp_int *a, mp_bitcount b);
mp_err mp_signed_rsh(const mp_int *a, mp_bitcount b, mp_int *c);
mp_bitcount mp_cnt_lsb(const mp_int *a);

It might make sense to define the types as size_t, but then we get an overflow issue for the bitcount. Maybe use a larger type for the bitcount then. But should it be platform specific, since size_t is also platform specific?

Ping @czurnieden, @nijtmans, @sjaeckel

EDIT: I should add - we also have to be careful about size_t, since size_t counts bytes and not digits. This means calling mp_init with SIZE_MAX is an overflow. The allocation request could not be satisfied for sure, but we have to make sure that no overflow happens (are we using calloc there which ensures that?).

nijtmans · 2019-10-22T15:01:51Z

Yes, that's like the idea.

One more function:

int mp_prime_rabin_miller_trials(mp_bitcount size);

I would propose mp_digitcount to be size_t, and mp_bitcount to be simply "unsigned long". That means for 16-bit and 32-bit platforms we can allocate 2^32 bits = 2^29 bytes (=536870912 bytes = 0.5 gigabytes ), that's at least double what we can do in v1.2.0. For 64bit platforms (LP64), it's the same as size_t, and can even address 2^64 bits of memory, so 2^61 bytes (=2305843009213693952 bytes), much more than any memory in the forseeable future....

nijtmans · 2019-10-22T15:14:54Z

One more:

mp_bitcount mp_cnt_lsb(const mp_int *a);

minad · 2019-10-22T15:30:02Z

@nijtmans Shall we introduce typedefs or simply use the plain types? Typedefs obfuscate the underlying type but clarify the usage.

nijtmans · 2019-10-22T15:30:19Z

The bitcount part is committed here: #398

nijtmans · 2019-10-22T15:32:33Z

@nijtmans Shall we introduce typedefs or simply use the plain types? Typedefs obfuscate the underlying type but clarify the usage.

I prefer the plain types. But I can live with typedefs too.

minad · 2019-10-22T15:32:42Z

@nijtmans I would prefer if we wait a bit with PRs until the discussion is finished and we know what we want. I closed the similar PR by @czurnieden, but also due to conflicts with other PRs.

nijtmans · 2019-10-22T15:36:01Z

Please consider this commit as base for discussion. I have no problem with throwing it away if a different outcome finalizes. This way we can see exactly what the changes are, so we can discuss about it.

minad · 2019-10-22T15:41:24Z

Ok, but let's keep the discussion here. I am a bit split about the typedef question. Using unsigned long looks arbitrary. In the future it could well happen that someone introduces another function which should use bitcount, but uses int mistakenly.

sjaeckel · 2019-10-23T16:02:41Z

That means for 16-bit and 32-bit platforms we can allocate 2^32 bits = 2^29 bytes (=536870912 bytes = 0.5 gigabytes ), that's at least double what we can do in v1.2.0. For 64bit platforms (LP64), it's the same as size_t, and can even address 2^64 bits of memory, so 2^61 bytes (=2305843009213693952 bytes), much more than any memory in the forseeable future....

how likely is it that someone wants to work with 2^32 bit wide MPI's or wider?

The allocation request could not be satisfied for sure, but we have to make sure that no overflow happens (are we using calloc there which ensures that?).

yes we use calloc there

I could btw. live with using unsinged long long or uint64_t for both, but I'm tired of these LP/LLP problems and want to avoid using unsigned long.

Whether it should be a typedef or not... depends on what we use as basetype

minad · 2019-10-23T16:20:38Z

I say we define typedefs mp_bitcount and mp_digitcount. mp_digitcount as size_t since this is tied to allocations. For mp_digitcount we use size_t on 32/64 bit platforms (no lp/llp issue) and uint32_t on 16 bit architectures. Then we ensure at allocation time that no overflow can happen. Overflow is only possible for 32 bit architectures then since there both digitcount and bitcount are 32 bit.

minad · 2019-10-23T16:24:40Z

Using uint64_t for both is not feasible on 16 bit archs. On 32/64 bit archs we could simply use size_t for both. But then we should drop MP_16BIT and I am not sure if that is desired.

minad · 2019-10-23T17:11:05Z

Alternatively just use size_t for both and ensure at allocation time that the bitcount won't overflow. This restricts as to 8kib numbers on 16 bit archs. This was my original proposal but @nijtmans didn't like it for some reaso. Could you elaborate @nijtmans? While 8kib numbers are not large you can only allocate 8 of them on a 16bit architecture, so it is not that small after all. But on 16bit archs there are also long pointers which might allow for more memory. I have no idea if such scenarios would work with ltm.

I think using size_t for both would be reasonable for both 32 and 64 bit archs and that's what we should focus on the most.

minad · 2019-10-23T17:16:14Z

@dmitry-lipetsk seems also to use size_t for both digit and bit count.

sjaeckel · 2019-10-23T17:57:57Z

I'm also fine with size_t btw.

nijtmans · 2019-10-23T20:33:01Z

I concur: size_t is probably the best option for both. My experiment (now closed) shows that when using a different type for both situations, results in the need of a lot of type-casts in various situations. Well, I don't like type-casts :-)

czurnieden · 2019-10-23T22:13:29Z

how likely is it that someone wants to work with 2^32 bit wide MPI's or wider?

"[H]ow likely is it […]" is a term many "Famous Last Words" begin with ;-)

But it can happen accidentally, so it would be nice to have some kind of check; allocation time would be a nice place to pin it, I think, like, who was it, @minad suggested.

size_t?
I'm all for it!

And the problem with MP_16BIT is not that hard, don't forget that these are small MCUs, they won't do a lot of number crunching and it i still enough for cryptography with larger primes.

minad · 2019-10-24T05:28:10Z

Good! Let's go with size_t then 👍

How shall we proceed with a PR? There are not much open finished other PRs right now, which would lead to conflicts. Maybe work together on one big PR changing all the functions and the struct as mentioned in my comment before? I am not sure if it is possible to split the work into smaller units.

minad · 2019-10-27T22:34:36Z

Someone wants to take a stab at this? Maybe we should start to convert each function step by step with conversion warnings turned off, only ensuring that the test suite stays green. Afterwards fix the conversion warnings. This way we can split the work a bit, I hope.

minad · 2019-10-29T16:55:44Z

I made a first attempt in #430, but abandoned that one for now. At first I want to do some preparatory simplifications #434.

Besides that, when experimenting with this, I thought we should maybe discuss a bit more.

Moving to size_t means two things - changing the width and changing the signedness.
I see two downsides if we do that:

Moving to an unsigned type has the obvious problem that we have to be more careful about loop variables, due the wrap around. This is certainly a problem if one converts an existing code base. Many style guides also don't recommend unsigned types for that reason, since they lead to more error prone code. I partially agree with that, certainly more care is needed.
Changing the size is also a bit questionable. For example I think on 16 bit platforms size_t/unsigned int is a good choice, allowing 8KiB numbers. On 32 bit platforms int allows 256MiB, unsigned int/size_t allows 512MiB which is more than enough. On 64 bit platforms, size_t feels like overkill however. We could however still just use unsigned int since numbers larger than 512MiB are still not common I think. GMP also uses int everywhere. Maybe we could also introduce our own mp_count=int/unsigned int type which is either 16 or 32 bit if we decide against size_t.

minad · 2020-02-17T22:57:05Z

Right now I don't have intentions to implement this change. I am not convinced anymore, in particular due to the issues with loop variables. Instead of such a needless refactoring, we should focus on more pressing correctness issues like #475 etc.

minad added this to the v2.0.0 milestone Oct 9, 2019

This was referenced Oct 20, 2019

Change type for result of mp_count_bits to size_t #391

Closed

use size_t for mp_radix_size #396

Merged

mp_prime_next_prime: use mp_bool for bbs_style #397

Merged

nijtmans mentioned this issue Oct 22, 2019

Use unsigned long for bitsizes #398

Closed

minad mentioned this issue Oct 27, 2019

stdbool/stdint refactoring, various optimizations/simplifications #405

Merged

minad mentioned this issue Oct 28, 2019

use size_t #430

Closed

minad mentioned this issue Oct 30, 2019

Alternative 1: Suffix renamings (no suffix) #437

Closed

minad closed this as completed Feb 17, 2020

use size_t where appropriate #363

use size_t where appropriate #363

Comments

minad commented Oct 8, 2019

dmitry-lipetsk commented Oct 18, 2019

minad commented Oct 18, 2019

dmitry-lipetsk commented Oct 18, 2019

minad commented Oct 18, 2019

czurnieden commented Oct 18, 2019

nijtmans commented Oct 19, 2019

nijtmans commented Oct 19, 2019

minad commented Oct 19, 2019

nijtmans commented Oct 19, 2019

nijtmans commented Oct 19, 2019

minad commented Oct 19, 2019

czurnieden commented Oct 19, 2019

minad commented Oct 19, 2019

czurnieden commented Oct 19, 2019

minad commented Oct 20, 2019

minad commented Oct 22, 2019 • edited Loading

nijtmans commented Oct 22, 2019

nijtmans commented Oct 22, 2019

minad commented Oct 22, 2019

nijtmans commented Oct 22, 2019

nijtmans commented Oct 22, 2019

minad commented Oct 22, 2019

nijtmans commented Oct 22, 2019

minad commented Oct 22, 2019

sjaeckel commented Oct 23, 2019

minad commented Oct 23, 2019

minad commented Oct 23, 2019

minad commented Oct 23, 2019

minad commented Oct 23, 2019

sjaeckel commented Oct 23, 2019

nijtmans commented Oct 23, 2019

czurnieden commented Oct 23, 2019

minad commented Oct 24, 2019

minad commented Oct 27, 2019

minad commented Oct 29, 2019

minad commented Feb 17, 2020

minad commented Oct 22, 2019 •

edited

Loading