Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML-KEM refactor #1763

Merged
merged 13 commits into from
Aug 23, 2024
Merged

ML-KEM refactor #1763

merged 13 commits into from
Aug 23, 2024

Conversation

dkostic
Copy link
Contributor

@dkostic dkostic commented Aug 13, 2024

Issues:

CryptoAlg-2614

Description of changes:

Previous to this change, ML-KEM was implemented such that
the code for a parameter set was selected by defining the correct
C pre-processor flag (for example, if you want to compile the code
for ML-KEM 512 parameter set you would #define KYBER_K 2).
The consequence of this was that we had to compile the code
three times for three ML-KEM parameter sets. We do this by adding
a C file for each parameter set where we first define the corresponding
KYBER_K value and then include all the ML-KEM C files. Besides being
an awkward way to handle multiple parameter sets, this will not work
for the FIPS module where we bundle all C files inside bcm.c and
compile it as a single compilation unit.

In this change we refactor ML-KEM by parametrizing all functions that
depend on values that are specific to a parameter set, i.e., that directly
or indirectly depend on the value of KYBER_K. To do this, in params.h
we define a structure that holds those ML-KEM parameters and functions
that initialize a given structure with values corresponding to a parameter
set. This structure can then be passed to every function that requires it.
For example, polyvec_ntt function was previously implemented like this:

void polyvec_ntt(polyvec *r) {
  unsigned int i;
  for(i=0;i<KYBER_K;i++)
    poly_ntt(&r->vec[i]);
}

We now change that to:

void polyvec_ntt(ml_kem_params *params, polyvec *r) {
  unsigned int i;
  for(i=0;i<params->k;i++)
    poly_ntt(&r->vec[i]);
}

Of course, now we have to change all callers of polyvec_ntt to also
have ml_kem_params as an input argument, and then callers of the
caller, etc. These changes bubble up to the highest level API defined
in kem.h where we now have:

int crypto_kem_keypair(ml_kem_params *params, ...);
int crypto_kem_enc(ml_kem_params * params, ...);
int crypto_kem_dec(ml_kem_params * params, ...);
int crypto_kem_keypair_derand(ml_kem_params *params, ...);
int crypto_kem_enc_derand(ml_kem_params * params, ...);

Then we can easily implement these functions for specific parameter sets,
which can be seen in ml_kem.c file. For example:

int ml_kem_512_ipd_keypair(...) {
  ml_kem_params params;
  ml_kem_512_params_init(&params);
  return ml_kem_keypair_ref(&params, ...);
}
int ml_kem_768_ipd_keypair(...) {
  ml_kem_params params;
  ml_kem_768_params_init(&params);
  return ml_kem_keypair_ref(&params, ...);
}

Call-outs:

Point out areas that need special attention or support during the review process. Discuss architecture or design changes.

Testing:

How is this change tested (unit tests, fuzz tests, etc.)? Are there any testing steps to be verified by the reviewer?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.32%. Comparing base (79d5d16) to head (9c591df).
Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1763      +/-   ##
==========================================
- Coverage   78.44%   78.32%   -0.13%     
==========================================
  Files         580      581       +1     
  Lines       96780    97176     +396     
  Branches    13863    13930      +67     
==========================================
+ Hits        75921    76113     +192     
- Misses      20243    20440     +197     
- Partials      616      623       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dkostic dkostic marked this pull request as ready for review August 15, 2024 23:25
@dkostic dkostic requested a review from a team as a code owner August 15, 2024 23:25
@dkostic dkostic changed the title [DRAFT] ML-KEM refactor ML-KEM refactor Aug 16, 2024
crypto/ml_kem/ml_kem_ipd_ref_common/params.h Outdated Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/params.h Outdated Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/params.h Outdated Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/params.h Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/params.h Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/indcpa.c Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/indcpa.c Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/indcpa.c Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/indcpa.c Outdated Show resolved Hide resolved
crypto/ml_kem/ml_kem_ipd_ref_common/indcpa.c Outdated Show resolved Hide resolved
Copy link

@sgmenda-aws sgmenda-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with minor changes

Comment on lines +112 to +113
} else {
cbd3(r, buf);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else is risky because release builds may remove the above assert so this code path may be reached by code with eta1 != 3. Might be worth replacing with else if (params->eta1 == 3) and do a separate else that is a no-op.

Copy link
Contributor

@WillChilds-Klein WillChilds-Klein Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ or move the assert to that else clause.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the assert still be a no-op in release builds?

Comment on lines -24 to +28
const uint8_t seed[KYBER_SYMBYTES])
const uint8_t *seed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't seed always KYBER_SYMBYTES = 32 bytes long, irrespective of parameter set? And we use exactly KYBER_SYMBYTES bytes of seed, so seems worth keeping the constraint.

@@ -157,15 +162,15 @@ static unsigned int rej_uniform(int16_t *r,
**************************************************/
#define GEN_MATRIX_NBLOCKS ((12*KYBER_N/8*(1 << 12)/KYBER_Q + XOF_BLOCKBYTES)/XOF_BLOCKBYTES)
// Not static for benchmarking
void gen_matrix(polyvec *a, const uint8_t seed[KYBER_SYMBYTES], int transposed)
void gen_matrix(ml_kem_params *params, polyvec *a, const uint8_t *seed, int transposed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above for seed.

uint8_t *c,
const uint8_t *m,
const uint8_t *pk,
const uint8_t *coins)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poly_getnoise_* assumes coins is KYBER_SYMBYTES, so might be worth keeping the constraint.

#include "params.h"

static void ml_kem_params_init(ml_kem_params *params, size_t k) {
assert((k == 2) || (k == 3) || (k == 4));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

release builds may ignore this assert, leading to unexpected data for the below conditionals.


for(k=0;k<8;k++)
r->vec[i].coeffs[8*j+k] = ((uint32_t)(t[k] & 0x7FF)*KYBER_Q + 1024) >> 11;
assert((params->poly_vec_compressed_bytes == params->k * 352) ||
Copy link
Contributor

@WillChilds-Klein WillChilds-Klein Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this code would be a bit more legible if we did e.g. #define K_MLKEM_1024 352 for the 3 security levels in params.h

Copy link
Contributor

@WillChilds-Klein WillChilds-Klein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code changes look good, no blockers. per sanketh's earlier comments, can we keep parameter-invariant buffers as byte arrays rather than pointers?

NOTE: my review consisted of validating that refactored code remained unchanged and that null dereferences aren't possible given current initialization logic. I assumed the correctness of the existing implementation.

@@ -6,22 +6,25 @@
#include "polyvec.h"

#define gen_matrix KYBER_NAMESPACE(gen_matrix)
void gen_matrix(polyvec *a, const uint8_t seed[KYBER_SYMBYTES], int transposed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to sanketh's earlier comment -- seed size is invariant over parameters, right? why can't we define this as an array rather than a pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will fix in #1775

@dkostic dkostic merged commit 7b5ba89 into aws:main Aug 23, 2024
106 checks passed
smittals2 added a commit that referenced this pull request Sep 17, 2024
## What's Changed
* Use OPENSSL_STATIC_ASSERT which handles all the platform/compiler/C s…
by @andrewhop in #1791
* ML-KEM refactor by @dkostic in #1763
* ML-KEM-IPD to ML-KEM as defined in FIPS 203 by @dkostic in
#1796
* Add KDA OneStep testing to ACVP by @skmcgrail in
#1792
* Updating erroneous documentation for BIO_get_mem_data and subsequent
usage by @smittals2 in #1752
* No-op impls for several EVP_PKEY_CTX functions by @justsmth in
#1759
* Drop "ipd" suffix from ML-KEM related code by @dkostic in
#1797
* Upstream merge 2024 08 19 by @skmcgrail in
#1781
* ML-KEM move to the FIPS module by @dkostic in
#1802
* Reduce collision probability for variable names by @torben-hansen in
#1804
* Refactor ENGINE API and memory around METHOD structs by @smittals2 in
#1776
* bn: Move x86-64 argument-based dispatching of bn_mul_mont to C. by
@justsmth in #1795
* Check at runtime that the tool is loading the same libcrypto it was
built with by @andrewhop in #1716
* Avoid matching prefixes of a symbol as arm registers by @torben-hansen
in #1807
* Add CI for FreeBSD by @justsmth in
#1787
* Move curve25519 implementations to fips module except spake25519 by
@torben-hansen in #1809
* Add CAST for SP 800-56Cr2 One-Step function by @skmcgrail in
#1803
* Remove custom PKCS7 ASN1 functions, add new structs by
@WillChilds-Klein in #1726
* NASM use default debug format by @justsmth in
#1747
* Add KDF in counter mode ACVP Testing by @skmcgrail in
#1810
* add support for OCSP_request_verify by @samuel40791765 in
#1778
* Fix GitHub/CodeBuild Purge Lambda by @justsmth in
#1808
* KBKDF_ctr_hmac FIPS Service Indicator by @skmcgrail in
#1798
* Update x509 tool to write all output to common BIO which is a file or
stdout by @andrewhop in #1800
* Add ML-KEM to speed.cc, bump AWSLC_API_VERSION to 30 by @andrewhop in
#1817
* Add EVP_PKEY_asn1_* functions by @justsmth in
#1751
* Improve portability of CI integration script by @torben-hansen in
#1815
* Upstream merge 2024 08 23 by @justsmth in
#1799
* Replace ECDSA_METHOD with EC_KEY_METHOD and add the associated API by
@smittals2 in #1785
* Cherrypick "Add some barebones support for DH in EVP" by
@samuel40791765 in #1813
* Add KDA OneStep (SSKDF_digest and SSKDF_hmac) to FIPS indicator by
@skmcgrail in #1793
* Add EVP_Digest one-shot test XOFs by @WillChilds-Klein in
#1820
* Wire-up ACVP Testing for SHA3 Signatures with RSA by @skmcgrail in
#1805
* Make SHA3 (not SHAKE) Approved for EVP_DigestSign/Verify, RSA and
ECDSA. by @nebeid in #1821
* Begin tracking RelWithDebInfo library statistics by @andrewhop in
#1822
* Move EVP ed25519 function table under FIPS module by @torben-hansen in
#1826
* Avoid C11 Atomics on Windows by @justsmth in
#1824
* Improve pre-sandbox setup by @torben-hansen in
#1825
* Add OCSP round trip integration test with minor fixes by
@samuel40791765 in #1811
* Add various PKCS7 getters and setters by @WillChilds-Klein in
#1780
* Run clang-format on pkcs7 code by @WillChilds-Klein in
#1830
* Move KEM API and ML-KEM definitions to FIPS module by @torben-hansen
in #1828
* fix socat integration CI by @samuel40791765 in
#1833
* Retire out-of-module KEM folder by @torben-hansen in
#1832
* Refactor RSA_METHOD and expand API by @smittals2 in
#1790
* Update benchmark documentation in tool/readme.md by @andrewhop in
#1812
* Pre jail unit test by @torben-hansen in
#1835
* Move EVP KEM implementation to in-module and correct OID by
@torben-hansen in #1838
* More minor symbols Ruby depends on by @samuel40791765 in
#1837
* ED25519 Power-on Self Test / CAST / KAT by @skmcgrail in
#1834
* ACVP ML-KEM testing by @skmcgrail in
#1840
* ACVP ECDSA SHA3 Digest Testing by @skmcgrail in
#1819
* ML-KEM Service Indicator for EVP_PKEY_keygen, EVP_PKEY_encapsulate,
EVP_PKEY_decapsulate by @skmcgrail in
#1844
* Add ML-KEM CAST for KeyGen, Encaps, and Decaps by @skmcgrail in
#1846
* ED25519 Service Indicator by @skmcgrail in
#1829
* Update Allowed RSA KeySize Generation to FIPS 186-5 specification by
@skmcgrail in #1823
* Add ED25519 ACVP Testing by @skmcgrail in
#1818
* Make EDDSA/Ed25519 POST lazy initalized by @skmcgrail in
#1848
* add support for PEM Parameters without ASN1 hooks by @samuel40791765
in #1831
* Add OpenVPN tip of main to CI by @smittals2 in
#1843
* Ensure SSE2 is enabled when using optimized assembly for 32-bit x86 by
@graebm in #1841
* Add support for `EVP_PKEY_CTX_ctrl_str` - Step #1 by @justsmth in
#1842
* Added SHA3/SHAKE XOF functionality by @jakemas in
#1839
* Migrated ML-KEM SHA3/SHAKE usage to fipsmodule by @jakemas in
#1851
* AVX-512 support for RSA Signing by @pittma in
#1273
andrewhop pushed a commit that referenced this pull request Oct 11, 2024
### Issues:
CryptoAlg-2724

### Description of changes: 

#### Parameterization of ML-DSA
Previous to this change, ML-DSA was implemented such that the code for a
parameter set was selected by defining the correct C pre-processor flag
(for example, if you want to compile the code for ML-DSA-65 parameter
set you would `#define DILITHIUM_MODE 3`).

The consequence of this was that we had to compile the code three times
for three ML-DSA parameter sets. We do this by adding a C file for each
parameter set where we first define the corresponding `DILITHIUM_MODE`
value and then include all the ML-DSA C files. Besides being an awkward
way to handle multiple parameter sets, this will not work for the FIPS
module where we bundle all C files inside `bcm.c` and compile it as a
single compilation unit.

In this change we refactor ML-DSA by parametrizing all functions that
depend on values that are specific to a parameter set, i.e., that
directly or indirectly depend on the value of `DILITHIUM_MODE`. To do
this, in `params.h` we define a structure that holds those ML-DSA
parameters and functions that initialize a given structure with values
corresponding to a parameter set. This structure can then be passed to
every function that requires it. For example, `polyvecl_add` function
was previously implemented as:

```
void polyvecl_add(polyvecl *w,
                  const polyvecl *u, 
                  const polyvecl *v) {
  unsigned int i;

  for(i = 0; i < L; ++i)
    poly_add(&w->vec[i], &u->vec[i], &v->vec[i]);
}
```
Is now changed to:

```
void polyvecl_add(ml_dsa_params *params,
                  polyvecl *w,
                  const polyvecl *u,
                  const polyvecl *v) {
  unsigned int i;

  for(i = 0; i < params->l; ++i)
    poly_add(&w->vec[i], &u->vec[i], &v->vec[i]);
}

```

Of course, now we have to change all callers of `polyvecl_add` to also
have `ml_dsa_params` as an input argument, and then callers of the
caller, etc. These changes bubble up to the highest level API defined in
sign.h where we now have:

```

int crypto_sign_keypair(ml_dsa_params *params, uint8_t *pk, uint8_t *sk);

int crypto_sign_signature(ml_dsa_params *params,
                          uint8_t *sig, size_t *siglen,
                          const uint8_t *m, size_t mlen,
                          const uint8_t *ctx, size_t ctxlen,
                          const uint8_t *sk);

int crypto_sign(ml_dsa_params *params,
                uint8_t *sm, size_t *smlen,
                const uint8_t *m, size_t mlen,
                const uint8_t *ctx, size_t ctxlen,
                const uint8_t *sk);

int crypto_sign_verify(ml_dsa_params *params,
                       const uint8_t *sig, size_t siglen,
                       const uint8_t *m, size_t mlen,
                       const uint8_t *ctx, size_t ctxlen,
                       const uint8_t *pk);

int crypto_sign_open(ml_dsa_params *params,
                     uint8_t *m, size_t *mlen,
                     const uint8_t *sm, size_t smlen,
                     const uint8_t *ctx, size_t ctxlen,
                     const uint8_t *pk);
```

Then we can easily implement these functions for specific parameter
sets, which can be seen in `sig_dilithium3.c` file. For example:

```
int ml_dsa_65_keypair(uint8_t *public_key /* OUT */,
                       uint8_t *secret_key /* OUT */) {
  ml_dsa_params params;
  ml_dsa_65_params_init(&params);
  return crypto_sign_keypair(&params, public_key, secret_key);
}

int ml_dsa_65_sign(uint8_t *sig               /* OUT */,
                    size_t *sig_len            /* OUT */,
                    const uint8_t *message     /* IN */,
                    size_t message_len         /* IN */,
                    const uint8_t *ctx         /* IN */,
                    size_t ctx_len             /* IN */,
                    const uint8_t *secret_key  /* IN */) {
  ml_dsa_params params;
  ml_dsa_65_params_init(&params);
  return crypto_sign_signature(&params, sig, sig_len, message, message_len,
                                             ctx, ctx_len, secret_key);
}

int ml_dsa_65_verify(const uint8_t *message    /* IN */,
                      size_t message_len        /* IN */,
                      const uint8_t *sig        /* IN */,
                      size_t sig_len            /* IN */,
                      const uint8_t *ctx        /* IN */,
                      size_t ctx_len            /* IN */,
                      const uint8_t *public_key /* IN */) {
  ml_dsa_params params;
  ml_dsa_65_params_init(&params);
  return crypto_sign_verify(&params, sig, sig_len, message, message_len,
                                          ctx, ctx_len, public_key);
}
```
As such, files:
- `dilithium3r3_ref.c`
- `api.h`
- `config.h`

are no longer required, and have been removed.

#### Other Changes
Also modified in this PR are the KAT test framework in
`p_dilithium_test.cc`. This KAT framework has been modified to support
multiple levels of ML-DSA (to be added in a later PR).

### Call-outs:
See similar changes to ML-KEM in #1763

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants