Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: validate that strip-types does not insert any code #54141

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ChALkeR
Copy link
Member

@ChALkeR ChALkeR commented Jul 31, 2024

We can do this while the mode is strip-types only

The wasm binary is not expected to insert any other changes in the sourcecode except for the allowed symbols / patterns
We can use this to increase trust in the code that wasm binary outputs

It is still possible to do unexpected things under this limitation, but it makes anything funny much harder for a supply chain attack

Now does this if internal swc lib is e.g. switched to a codegen mode unexpectedly:
image

Existing tests should pass, this is just a safeguard
Should also add some multi-byte stripping tests

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Jul 31, 2024
@ChALkeR ChALkeR force-pushed the chalker/swc-safeguard branch 5 times, most recently from 31545ec to 46fd9dd Compare July 31, 2024 11:58
@ChALkeR ChALkeR marked this pull request as ready for review July 31, 2024 12:15
@ChALkeR ChALkeR force-pushed the chalker/swc-safeguard branch 2 times, most recently from 8e3566e to 4ab7ef5 Compare July 31, 2024 12:22
@marco-ippolito
Copy link
Member

I'm ok with 346469c, n2 can we do that in amaro with a flag?

@marco-ippolito marco-ippolito added the strip-types Issues or PRs related to strip-types support label Jul 31, 2024
@ChALkeR ChALkeR force-pushed the chalker/swc-safeguard branch 3 times, most recently from 294585f to 239e6bb Compare July 31, 2024 12:30
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the wrong place to solve this.
This adds non-null overhead for loading every single file to defend against code that we control. Let's fix the reproducible build problem.

I'm ok if we run this check in CI either in the form of a test or an optional check.

@ChALkeR
Copy link
Member Author

ChALkeR commented Jul 31, 2024

This adds non-null overhead for loading every single file

@mcollina not "every single file", but every local typescript file under --experimental-strip-types flag which is processed by wasm, where perf issues were not being addressed yet and are on the roadmap

This does not affect existing setups

Running this in CI only won't give any sufficient gurantees

to defend against code that we control.

We don't

No one here reviewed those 500 MiBs of deps in /amaro repo

No one even noticed those are not being in fact used from that repo but are coming from elsewhere

Copy link
Contributor

@aduh95 aduh95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, regardless of reproducable builds it seems useful to have some guarentee that V8 will indeed execute users' code and nothing else. It only slow down files with types to strip, not all files IIUC.

nit: Any chance we can avoid the Buffer -> string -> Buffer conversion in e.g.

const code = tsParse(stringify(source));

Maybe we could supply another parameter that contains the original source in addition to the stringified version.

Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this solves? We don't trust the strip code and another layer during runtime to make sure it does? Shouldn't this just be a test that runs in CI prior to release? Can download x number of repositories run the type stripping and test that it does what it should.

@ChALkeR
Copy link
Member Author

ChALkeR commented Jul 31, 2024

@ronag a supply chain attack like the one on ESLint can hide itself and activate itself under specific conditions

Testing on CI does not help validate blackboxes against intentional attacks (only against unintentional mistakes)

Sandboxing + validating in runtime does

@joyeecheung
Copy link
Member

joyeecheung commented Jul 31, 2024

Instead of doing checks at runtime, maybe we can just have a flag that spits out the type-stripped code to a directory (or just to stdout and let users redirect as they like?) That might also be useful for those who try to debug the type stripping, in case there are any bugs.

@ChALkeR
Copy link
Member Author

ChALkeR commented Jul 31, 2024

As for the perf concern: this adds 3% overhead to pure transformSync (not counting file load, execution, etc -- even less with those counted)

I.e. this check is ~33x faster than transformSync even without the buffer optimization mentioned above

(and affects only files transformed from typescript under --experimental-strip-types flag by wasm)

@ChALkeR
Copy link
Member Author

ChALkeR commented Jul 31, 2024

@joyeecheung that seems much more complex (user options/format/output decisions etc) and doesn't solve the problem of a potentally carefully hidden exploit in the supply chain that activates only after specific conditions

We can't expect users to check all code generated at runtime

Especially given that it's immediately executed either way

@ronag
Copy link
Member

ronag commented Jul 31, 2024

We also have to consider that this might make future progress and iteration on this feature more difficult (possibly turning away contributors). For example moving beyond just type stripping.

@ronag
Copy link
Member

ronag commented Jul 31, 2024

How does bun and deno solve this?

@ChALkeR
Copy link
Member Author

ChALkeR commented Jul 31, 2024

@ronag landing this would fix the immediate concern, it doesn't mean that it can't be removed once we figure out {build chain / what gets into the bundle / how do we review it} process

Which should be done by the time something except typestripping is added

Comment on lines +312 to +314
const sourceBuf = BufferFrom(source);
const codeBuf = BufferFrom(code);
if (sourceBuf.length !== codeBuf.length) { return false; }
Copy link
Contributor

@aduh95 aduh95 Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, have you considered doing a comparison of the JS strings? It seems to me it would be much simpler (and maybe more performant) than comparing buffers, especially given the parameters are already strings. wdyt?

for (let i = 0; i < source.length; i++) {
  const char = StringPrototypeCodePointAt(code, i);
  if (char !== 0x3b && char !== 0x20 && char !== StringPrototypeCodePointAt(source, i)) {
     return false;
  }
}
return true;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aduh95 will recheck!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use that because the original impl which this rechecks is byte-level based
But it can be done, yes

and maybe more performant

perf results are inconclusive on this even after initial optimization of this for fast paths -- this is faster for multi-byte but slower for regular ascii sources

will try another version shortly

Copy link
Member Author

@ChALkeR ChALkeR Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the code I used was:

  function doesTSStripTypesResultMatchSource2(code, source) {
    if (code.length !== source.length) { return false; }
    for (let i = 0; i < code.length; i++) {
      const a = code.codePointAt(i)
      if (a === source.codePointAt(i)) continue
      if (a !== 0x20 && a !== 0x3b && a !== 0xa0 && a !== 0x2002 && a !== 0xfeff) return false
    }
    return true
  }
  1. The set of chars is different
  2. This also misses a check for now that 0xfeff can come only after a space

I think we can land Buffer-based impl and optimize it later if needed (hopefully not)

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I think we can remove this at a later step.

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

Attention: Patch coverage is 44.00000% with 28 lines in your changes missing coverage. Please review.

Project coverage is 87.06%. Comparing base (b4fd1fd) to head (1e8101d).
Report is 4 commits behind head on main.

Files Patch % Lines
lib/internal/modules/helpers.js 44.00% 28 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #54141   +/-   ##
=======================================
  Coverage   87.06%   87.06%           
=======================================
  Files         643      643           
  Lines      181576   181625   +49     
  Branches    34894    34895    +1     
=======================================
+ Hits       158088   158140   +52     
- Misses      16759    16777   +18     
+ Partials     6729     6708   -21     
Files Coverage Δ
lib/internal/modules/helpers.js 90.95% <44.00%> (-6.76%) ⬇️

... and 33 files with indirect coverage changes

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Jul 31, 2024
Copy link
Member

@marco-ippolito marco-ippolito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the tsc meeting, I believe this check belongs inside amaro behind a configuration flag that should be enabled by node

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jul 31, 2024
@nodejs-github-bot
Copy link
Collaborator

@ChALkeR
Copy link
Member Author

ChALkeR commented Jul 31, 2024

Yes, that seems to solve the concerns as well!
Will file a PR shortly today

@ChALkeR
Copy link
Member Author

ChALkeR commented Jul 31, 2024

One thing that is degraded from moving into amaro directly is that it' now not possible to use safe methods that were cached prior to user code being loaded

Which means that there will be an easy way for the users to circumvent this check when it's moved into amaro itself

... which seems to be an acceptable risk / out of scope

@RafaelGSS
Copy link
Member

Which means that there will be an easy way for the users to circumvent this check when it's moved into amaro itself
... which seems to be an acceptable risk / out of scope

Possibly, yes. But, according to our threat model, this is ok.

@ChALkeR
Copy link
Member Author

ChALkeR commented Jul 31, 2024

Yeah, just mentioning for visibility/transparency
Moving into the lib is still ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ci PRs that need a full CI run. strip-types Issues or PRs related to strip-types support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants