Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging up js-quic for Linux, Windows, MacOS #7

Closed
11 tasks done
CMCDragonkai opened this issue Apr 12, 2023 · 26 comments
Closed
11 tasks done

Packaging up js-quic for Linux, Windows, MacOS #7

CMCDragonkai opened this issue Apr 12, 2023 · 26 comments
Assignees
Labels
development Standard development r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 12, 2023

Specification

As js-quic is a native package, it has much more complicated packaging procedure making it similar to js-db. However this is different because it uses Rust instead of C++, therefore the entire toolchain is different from js-db.

Rather than tsc, node, node-gyp, napi-macros and node-gyp-build. This is just tsc, node and napi-rs.

What needs to be done is:

  1. Remove initial experimental scaffolding that was using neon. For example the neon-cli package is no longer needed and cargo-cp-artifact is also not required.
  2. Remove initial experimental scaffolding that was using napi-macros and node-addon-api and node-gyp. All of these packages and related configuration is no longer needed. This includes removing node-gyp-build too.
  3. We are only using napi-rs and the related rust and cargo tools are brought in from the shell.nix.
  4. Unrelated and useless packages should be removed like threads.
  5. The npm run napi-build and npm run napi-build-prod should be integrated into the npm run prebuild script located in src/scripts/prebuild.js.
  6. The entire prebuild.js was ported from js-db which uses node-gyp. This does not use node-gyp at all, and instead needs to use napi build command instead. All relevant flags and options should be adapted to napi build which comes from the @napi-rs/cli package.
  7. The purpose of prebuild.js is be able to build binaries and put them into a prebuild directory. This directory is packaged as part of the npm distribution, and so binary compiled artifacts are "prebuilt" before they are downloaded by downstream users. We should maintain this structure.
  8. There is some work on native packages that spread their binary artifacts into different NPM pacakges, and for the npm to install the ones that are required, thus reducing the final closure size. Haven't experimented with this yet, but I see some other native packages making use of this pattern.
  9. Currently the napi build ends up putting the artifact into quic.linux-x64-gnu.node on the project root. This should be put into the equivalent prebuild directories that js-db does.
  10. If we do use the same prebuild directories... we might recover the ability to use node-gyp-build to look it up. But right now all that logic is written into src/native/quiche.ts. Alternatively we give up on node-gyp-build entirely and just use our own script to find the right bindings.

Note that on MacOS and Windows we are using homebrew and chocolatey respectively. This means the rust toolchain would come from those 2 areas instead of Nix. No way to use Nix on MacOS just yet, so it's just a good idea to ensure that we are in fact building with the same Rust version.

Additional context

Tasks

  • 1. Remove initial experimental scaffolding that was using neon. For example the neon-cli package is no longer needed and cargo-cp-artifact is also not required.
  • 2. Remove initial experimental scaffolding that was using napi-macros and node-addon-api and node-gyp. All of these packages and related configuration is no longer needed. This includes removing node-gyp-build too.
  • 3. We are only using napi-rs and the related rust and cargo tools are brought in from the shell.nix.
  • 4. Unrelated and useless packages should be removed like threads.
  • 6. The npm run napi-build and npm run napi-build-prod should be integrated into the npm run prebuild script located in src/scripts/prebuild.js.
  • 7. The entire prebuild.js was ported from js-db which uses node-gyp. This does not use node-gyp at all, and instead needs to use napi build command instead. All relevant flags and options should be adapted to napi build which comes from the @napi-rs/cli package.
  • 8. The purpose of prebuild.js is be able to build binaries and put them into a prebuild directory. This directory is packaged as part of the npm distribution, and so binary compiled artifacts are "prebuilt" before they are downloaded by downstream users. We should maintain this structure.
  • 9. There is some work on native packages that spread their binary artifacts into different NPM pacakges, and for the npm to install the ones that are required, thus reducing the final closure size. Haven't experimented with this yet, but I see some other native packages making use of this pattern.
  • 10. Currently the napi build ends up putting the artifact into quic.linux-x64-gnu.node on the project root. This should be put into the equivalent prebuild directories that js-db does.
  • 11. If we do use the same prebuild directories... we might recover the ability to use node-gyp-build to look it up. But right now all that logic is written into src/native/quiche.ts. Alternatively we give up on node-gyp-build entirely and just use our own script to find the right bindings.
  • Test on CI/CD
@CMCDragonkai CMCDragonkai added the development Standard development label Apr 12, 2023
@CMCDragonkai
Copy link
Member Author

Nothing on mobile just yet... but that will be important in the future. Mobile usage is complicated by the fact that the same code will need to compile to target the mobile platforms, and the bindings won't be napi-rs since it's not an Node API, instead native modules like in React would need to be used.

@CMCDragonkai
Copy link
Member Author

Branch and tag protections have been applied to this repo now.

image

image

@CMCDragonkai
Copy link
Member Author

Will temporarily rebase and squash the master branch later in order to clean up the history too.

@CMCDragonkai
Copy link
Member Author

@tegefaulkes to track this.

@CMCDragonkai
Copy link
Member Author

The codesee maps is integrated into this https://app.codesee.io/maps/c108c990-f46b-11ed-ad69-ede85191d1b1.

Gitlab pull mirroring isn't working, so this requires manual pushes atm.

git remote add cicd [email protected]:MatrixAI/open-source/js-quic.git
git push cicd
git push cicd <branch>

@CMCDragonkai
Copy link
Member Author

History has been squashed... and reset. Both staging and master is now on the same commits.

@tegefaulkes can you try to rebuild/re-run any tests to see I didn't screw it up. Just clone the repo to a new directory before trying anything, then if it is all good you can delete both repos and just clone from the new repo.

@CMCDragonkai
Copy link
Member Author

Currently CI/CD is failing... need to investigate why. I haven't reviewed any of the tests or whether the code is linted or not, and I don't know if mac/windows build scripts will work. These will need to be debugged on our local systems in office.

@tegefaulkes
Copy link
Contributor

just ran the tests, I'm getting the following.

[nix-shell:~/matixWorkspace/gitRepos/js-quic]$ npm run test

> @matrixai/[email protected] test
> jest

Determining test suites to run...
GLOBAL SETUP
 PASS  tests/utils.test.ts (6.647 s)
  utils
     detect IPv4 mapped IPv6 addresses (24 ms)
     detect IPv4 mapped IPv6 Dec addresses (9 ms)
     detect IPv4 mapped IPv6 Hex addresses (12 ms)
     to IPv4 mapped IPv6 addresses Dec (1 ms)
     to IPv4 mapped IPv6 addresses Hex (3 ms)
     from IPv4 mapped IPv6 addresses (3 ms)
     to canonical IP address (4 ms)
     resolves zero IP to local IP (4 ms)

 FAIL  tests/QUICSocket.test.ts
   Test suite failed to run

    Failed requiring possible native bindings: quic-linux-x64.node,@matrixai/quic-linux-x64

      73 |     }
      74 |   }
    > 75 |   throw new Error(
         |         ^
      76 |     `Failed requiring possible native bindings: ${prebuildTargets.concat(npmTargets)}`
      77 |   );
      78 | }

      at requireBinding (src/native/quiche.ts:75:9)
      at Object.<anonymous> (src/native/quiche.ts:117:25)
      at Object.<anonymous> (src/native/index.ts:1:1)

 FAIL  tests/QUICClient.test.ts
   Test suite failed to run

    Failed requiring possible native bindings: quic-linux-x64.node,@matrixai/quic-linux-x64

      73 |     }
      74 |   }
    > 75 |   throw new Error(
         |         ^
      76 |     `Failed requiring possible native bindings: ${prebuildTargets.concat(npmTargets)}`
      77 |   );
      78 | }

      at requireBinding (src/native/quiche.ts:75:9)
      at Object.<anonymous> (src/native/quiche.ts:117:25)
      at Object.<anonymous> (src/native/index.ts:1:1)

 FAIL  tests/QUICStream.test.ts
   Test suite failed to run

    Failed requiring possible native bindings: quic-linux-x64.node,@matrixai/quic-linux-x64

      73 |     }
      74 |   }
    > 75 |   throw new Error(
         |         ^
      76 |     `Failed requiring possible native bindings: ${prebuildTargets.concat(npmTargets)}`
      77 |   );
      78 | }

      at requireBinding (src/native/quiche.ts:75:9)
      at Object.<anonymous> (src/native/quiche.ts:117:25)
      at Object.<anonymous> (src/native/index.ts:1:1)

 FAIL  tests/concurrency.test.ts
   Test suite failed to run

    Failed requiring possible native bindings: quic-linux-x64.node,@matrixai/quic-linux-x64

      73 |     }
      74 |   }
    > 75 |   throw new Error(
         |         ^
      76 |     `Failed requiring possible native bindings: ${prebuildTargets.concat(npmTargets)}`
      77 |   );
      78 | }

      at requireBinding (src/native/quiche.ts:75:9)
      at Object.<anonymous> (src/native/quiche.ts:117:25)
      at Object.<anonymous> (src/native/index.ts:1:1)

Test Suites: 4 failed, 1 passed, 5 total
Tests:       8 passed, 8 total
Snapshots:   0 total
Time:        7.267 s, estimated 35 s
Ran all test suites.
GLOBAL TEARDOWN

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 17, 2023

That's cause you need to build it now npm run prebuild. We don't use the one that is being put into the project root anymore.

@CMCDragonkai
Copy link
Member Author

Some of your stream tests are really slow btw. We need to review all tests and make them pass.

I've just run them too and I get:

[nix-shell:~/Projects/js-quic]$ npm run test

> @matrixai/[email protected] test
> jest

Determining test suites to run...
GLOBAL SETUP
ERROR:QUICClient Test.QUICClient.QUICConnection afa13aa3:Connection timed out
ERROR:QUICClient Test.QUICClient:ErrorQUICConnectionTimeout
ERROR:QUICClient Test.QUICClient.QUICConnection 9257e992:recv error TlsFail
ERROR:QUICClient Test.QUICClient:ErrorQUICConnectionTLSFailure
ERROR:QUICClient Test.QUICClient.QUICConnection b9f45a27:recv error TlsFail
ERROR:QUICClient Test.QUICClient:ErrorQUICConnectionTLSFailure
ERROR:QUICClient Test.QUICClient.QUICConnection 191abda0:recv error TlsFail
ERROR:QUICClient Test.QUICClient:ErrorQUICConnectionTLSFailure
ERROR:QUICClient Test.QUICClient.QUICConnection 44d016bf:recv error TlsFail
ERROR:QUICClient Test.QUICClient:ErrorQUICConnectionTLSFailure
ERROR:QUICClient Test.QUICClient.QUICConnection ae8b5bb4:recv error TlsFail
ERROR:QUICClient Test.QUICClient:ErrorQUICConnectionTLSFailure
ERROR:QUICClient Test.QUICClient.QUICConnection 18af7ff4:recv error TlsFail
ERROR:QUICClient Test.QUICClient:ErrorQUICConnectionTLSFailure
ERROR:QUICClient Test.QUICClient.QUICConnection 0831c291:Connection timed out
ERROR:QUICClient Test.QUICServer.QUICConnection 71583001-0831c291:Connection timed out
ERROR:QUICClient Test.QUICClient.QUICConnection 9c27faeb:Connection timed out
ERROR:QUICClient Test.QUICServer.QUICConnection 7bae4ed5-9c27faeb:Connection timed out
ERROR:QUICClient Test.QUICClient.QUICConnection c9ab4e7e:Connection timed out
ERROR:QUICClient Test.QUICClient:ErrorQUICConnectionTimeout
 PASS  tests/QUICClient.test.ts (25.664 s)
  QUICClient
    ✓ times out when there is no server (3042 ms)
    ✎ todo client times out after connection stops responding
    ✎ todo server times out after connection stops responding
    ✎ todo server handles socket error
    ✎ todo client handles socket error
    dual stack client
      ✓ to ipv4 server succeeds (with seed=1978936421) (429 ms)
      ✓ to ipv6 server succeeds (with seed=1161311110) (385 ms)
      ✓ to dual stack server succeeds (with seed=-114001202) (333 ms)
    TLS rotation
      ✓ existing connections config is unchanged and still function (with seed=-302456834) (331 ms)
      ✓ new connections use new config (with seed=-815516438) (584 ms)
    graceful tls handshake
      ✓ server verification succeeds (with seed=565140315) (324 ms)
      ✓ client and server verification succeeds (with seed=1464301031) (421 ms)
      ✓ graceful failure verifying server (with seed=1245969200) (6061 ms)
      ✓ graceful failure verifying client and server (with seed=681974782) (3078 ms)
      ○ skipped client verification succeeds (with seed=731304151)
      ○ skipped graceful failure verifying client (with seed=850795266)
    UDP nat punching
      ✓ server can send init packets (2002 ms)
      ✓ init ends when connection establishes (1038 ms)
      ✓ init returns with existing connections (36 ms)
    handles random packets
      ✓ client handles random noise from server (with seed=1188545511) (119 ms)
      ✓ client handles random noise from external (with seed=-887653356) (108 ms)
      ✓ server handles random noise from client (with seed=-1570099967) (82 ms)
      ✓ server handles random noise from external (with seed=1386981673) (119 ms)
    keepalive
      ✓ connection can time out on client (210 ms)
      ✓ connection can time out on server (210 ms)
      ✓ keep alive prevents timeout on client (336 ms)
      ✓ keep alive prevents timeout on server (340 ms)
      ✓ client keep alive prevents timeout on server (339 ms)
      ✓ Keep alive does not prevent connection timeout (3000 ms)

 PASS  tests/QUICSocket.test.ts
  QUICSocket
    ✓ enabling ipv6 only prevents binding to ipv4 hosts (13 ms)
    ✓ disabling ipv6 only does not prevent binding to ipv6 hosts (1 ms)
    ✓ ipv4 wildcard to ipv4 succeeds (1 ms)
    ✓ ipv6 wildcard to ipv6 succeeds (1 ms)
    ✓ socket should throw if stopped with active connections (17 ms)
    ✓ socket should stop when forced with active connections (1 ms)
    ipv4
      ✓ type will be `ipv4` (2 ms)
      ✓ to ipv4 succeeds (3 ms)
      ✓ to ipv6 fails (11 ms)
      ✓ to dual stack succeeds (1 ms)
    ipv6
      ✓ type will be `ipv6` (1 ms)
      ✓ to ipv6 succeeds (1 ms)
      ✓ to ipv4 fails (2 ms)
      ✓ to dual stack succeeds (1 ms)
    ipv6 only when using `::`
      ✓ type will be `ipv6` (1 ms)
      ✓ to ipv4 fails (1 ms)
      ✓ to ipv6 succeeds (1 ms)
      ✓ to dual stack succeds (1 ms)
    dual stack
      ✓ type will be `ipv4&ipv6` (1 ms)
      ✓ to ipv4 succeeds (1 ms)
      ✓ to ipv6 succeeds (1 ms)
      ✓ to dual stack succeeds (2 ms)
    ipv4 mapped ipv6 - dotted decimal variant
      ✓ type will be `ipv4`
      ✓ to ipv4 fails (1 ms)
      ✓ to ipv6 fails (1 ms)
      ✓ to ipv4 mapped ipv6 succeeds (2 ms)
    ipv4 mapped ipv6 - hex variant
      ✓ type will be `ipv4`
      ✓ to ipv4 fails (1 ms)
      ✓ to ipv6 fails (1 ms)
      ✓ to ipv4 mapped ipv6 succeeds (1 ms)

 PASS  tests/QUICStream.test.ts (29.541 s)
  QUICStream
    ✓ should create streams (with seed=-1671438937) (16000 ms)
    ✓ destroying stream should clean up on both ends while streams are used (with seed=899455710) (4186 ms)
    ✓ should send data over stream (with seed=-246057904) (1243 ms)
    ✓ should propagate errors over stream for writable (with seed=1029918994) (2079 ms)
    ✓ should clean up streams when connection ends (with seed=-2141210820) (4527 ms)
    ✓ streams should contain metadata (with seed=-874283823) (57 ms)
    ✓ streams can be cancelled (with seed=-420901957) (852 ms)
    ○ skipped should propagate errors over stream for readable (with seed=1406844894)

 PASS  tests/concurrency.test.ts (11.345 s)
  Concurrency tests
    ✓ Multiple clients connecting to a server (with seed=-1095171874) (3469 ms)
    ✓ Multiple clients sharing a socket (with seed=1573595033) (6069 ms)
    ✓ Multiple clients sharing a socket with a server (with seed=348828160) (1299 ms)

 PASS  tests/utils.test.ts
  utils
    ✓ detect IPv4 mapped IPv6 addresses (4 ms)
    ✓ detect IPv4 mapped IPv6 Dec addresses (1 ms)
    ✓ detect IPv4 mapped IPv6 Hex addresses (2 ms)
    ✓ to IPv4 mapped IPv6 addresses Dec
    ✓ to IPv4 mapped IPv6 addresses Hex (2 ms)
    ✓ from IPv4 mapped IPv6 addresses (4 ms)
    ✓ to canonical IP address (2 ms)
    ✓ resolves zero IP to local IP (3 ms)

Test Suites: 5 passed, 5 total
Tests:       3 skipped, 4 todo, 71 passed, 78 total
Snapshots:   0 total
Time:        67.265 s
Ran all test suites.
GLOBAL TEARDOWN

@CMCDragonkai
Copy link
Member Author

One of the problems with using brew is that it doesn't always have exact versions. In the case of rust we only have just stable. Long term solution is to move to our own CI/CD system using our own machines with nix on them including macos. Because we cannot afford time to setup nix on gitlab's macos machines.

@CMCDragonkai
Copy link
Member Author

Looks like I also need:

  thread 'main' panicked at 'Unable to find libclang: "couldn't find any valid shared libraries matching: ['clang.dll', 'libclang.dll'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"', C:\Users\gitlab_runner\.cargo\registry\src\github.com-1ecc6299db9ec823\bindgen-0.60.1\src/lib.rs:2172:31

For the windows compilation. Even though it is not necessary in Linux. Bringing that into Chocolatey requires llvm.

Furthermore the universal architecture isn't being recognised, I'm updating the napi to see if that helps.

@CMCDragonkai
Copy link
Member Author

For windows I'm now hitting this problem: cloudflare/boring#121. Not sure how to proceed here for now. It appears to work on their Github actions, but they are using an older llvm. I may need to try version 11. Also using RUST_BACKTRACE=1 right now to see if it provides more clues.

@CMCDragonkai
Copy link
Member Author

For macos, it appears that we actually need to build separately the arm64 and x64, then use the universal subcommand to combine them together.

I think this can be done by running the prebuild script twice... and using universal to combine them.

@CMCDragonkai
Copy link
Member Author

Regarding macos, I'm testing on the local system now.

It appears that arm64 macos requires an additional thing to be able to cross compile for x86_64.

A message pops up saying:

error[E0463]: can't find crate for `core`
  |
  = note: the `x86_64-apple-darwin` target may not be installed
  = help: consider downloading the target with `rustup target add x86_64-apple-darwin`

This might also be needed on the macos machines on gitlab too. The need to add additional targets.

@CMCDragonkai
Copy link
Member Author

Ok it appears that the command napi universal is just ending up doing:

spawnSync('lipo', ['-create', '-output', outPath, ...srcPaths])

This command is part of xcode.

I could skip using napi universal since it's specific to a particular structure, and instead just call this directly to combine the binaries.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 17, 2023

Actually this lipo command is specific to macos, and I realise we don't really need it, integrating it makes it complicated either in prebuild.js or in the .gitlab-ci.yml. We can just distribute 4 packages:

  "optionalDependencies": {
    "@matrixai/quic-linux-x64": "0.0.1",
    "@matrixai/quic-win32-x64": "0.0.1",
    "@matrixai/quic-darwin-x64": "0.0.1",
    "@matrixai/quic-darwin-arm64": "0.0.1"
  },

At least it will end up being quite specific. So on arm64, load the arm64 version, on the x64, load the x64 version.

@CMCDragonkai
Copy link
Member Author

Seems like brew update is necessary in case the installed brew has a too old index...

Right now specific rust versions is not necessary, but cmake is still necessary.

@CMCDragonkai
Copy link
Member Author

We should have a working linux and macos build.

Windows will depend on cloudflare/boring#121

@CMCDragonkai
Copy link
Member Author

There's an existing rust installation already on the gitlab macos.

warning: it looks like you have an existing installation of Rust at:
warning: /Users/gitlab/.asdf/shims
warning: It is recommended that rustup be the primary Rust installation.
warning: Otherwise you may have confusion unless you are careful with your PATH
warning: If you are sure that you want both rustup and your already installed Rust
warning: then please reply `y' or `yes' or set RUSTUP_INIT_SKIP_PATH_CHECK to yes
warning: or pass `-y' to ignore all ignorable checks.
error: cannot install while Rust is installed
warning: continuing (because the -y flag is set and the error is ignorable)

If we use rustup during installation, does the installed rust override it?

@CMCDragonkai
Copy link
Member Author

There's a script that rustup-init leaves:

#!/bin/sh
# rustup shell setup
# affix colons on either side of $PATH to simplify matching
case ":${PATH}:" in
    *:"$HOME/.cargo/bin":*)
        ;;
    *)
        # Prepending path in case a system-installed rustc needs to be overridden
        export PATH="$HOME/.cargo/bin:$PATH"
        ;;
esac

But it seems to be appending the path instead of prepending the path.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 17, 2023

Apparently the MacOS will only be for premium: https://gitlab.com/gitlab-com/runner-saas-macos-access-requests/-/issues/233#note_1394334519

Will probably have to upgrade soon enough.

We may need to switch up the tags to saas-macos-medium-m1 and use macos-12-xcode-14 by 31st May.

@CMCDragonkai
Copy link
Member Author

MacOS builds are succeeding:

image

@tegefaulkes tests are failing though, some tests take WAY too long, you need to split it up or optimise it. See pipeline test timeouts: https://gitlab.com/MatrixAI/open-source/js-quic/-/pipelines on macos job and occasionally on linux job too.

Also you probably want to clean up your logging messages too, it's a bit ugly.

@CMCDragonkai
Copy link
Member Author

I'm aware that rust also is capable of full cross compilation directly from Linux. I should explore this later, because that could simplify things in the CI/CD system.

@CMCDragonkai
Copy link
Member Author

If this could work... it might be better to move all native packages to using rust instead of the old node-gyp system which is overly complicated.

@CMCDragonkai
Copy link
Member Author

The CI/CD is all tested. The windows build is no go for now. Not important, will track that for a later issue.

The only thing left is for @tegefaulkes to clean up the tests, and then test out the version scripts and the prepublish scripts. We'll leave that until the tests are done and 0.0.1 is being published. Will close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

No branches or pull requests

2 participants