Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad transform of fake this param #71

Closed
patricklx opened this issue Feb 29, 2024 · 8 comments · Fixed by #74 or #83 · May be fixed by #72 or #80
Closed

bad transform of fake this param #71

patricklx opened this issue Feb 29, 2024 · 8 comments · Fixed by #74 or #83 · May be fixed by #72 or #80

Comments

@patricklx
Copy link

patricklx commented Feb 29, 2024

const { Preprocessor } = require('content-tag');

const p = new Preprocessor();

const res = p.process(`
f = function(this: Context, ...args) {
    function t(this: Context, ...args) {};
    <template></template>
}`);

console.log(res);

results to

import { template } from "@ember/template-compiler";
f = function(this: Context, ...args) {
   function t(this1: Context, ...args1) {}
   ;
   template(``, {
       eval () {
           return eval(arguments[0]);
       }
   });
};

removing the template will make it work correctly

chancancode added a commit to chancancode/content-tag that referenced this issue Mar 1, 2024
Ultimately the goal is to fix embroider-build#71 (which this commit does *not* do)

While working on that I decided to refactor the existing import
related code and add some commentary to help others understand how
it works. This just moves code around without changing behavior.
@chancancode
Copy link
Contributor

(I looked into this with @zvkemp but the following conclusions are mine)

Refer to the commentary in #72 for a primer on how this works.

TL;DR My bottomline conclusion is that we are incorrectly using the hygiene code in SWC and need to do something else, probably roll our own much more narrower/targeted version

  1. Bug in swc_ecma_transforms_base::hygiene?

    Yes, I think you can say, the most direct cause for this issue is that there is a bug in swc_ecma_transforms_base::hygiene, where it incorrectly considers the TypeScript this type annotation as a regular, renameable identifier. Perhaps we can even file that bug and get it fixed. We'd then have to rebase our fork and ingest it.

  2. They don't run the hygiene pass like this

    I think part of the reason why it wasn't caught upstream by now, is that they don't run the hygiene pass this early. It seems like it generally does not expect TypeScript to be present at this stage, as it's typically used as a minifier, or at the equivalent time in dev builds but in non-mangling mode.

  3. It's overly broad for what we need

    When you run the hygiene pass on the code, it flattens out and renames all identifiers it consider to be a possible collision, causing other variables we didn't touch nor care about to be renamed. This results in a slightly awkward development experience. I have noticed things getting renamed foo1, etc, when debugging in the Chrome console, I just didn't make the connection who was doing that. It would be good to avoid that if possible, and I think it's quite unexpected when content-tag touches the other parts of the code it has no business changing.

  4. We are using it wrong and we are not getting correct results in cases we actually care about

    Consider this:

    import { template } from "@ember/template-compilation";
    
    export default function(template) {
      console.log(template); // the local variable
      return <template>hi</template>;
    };
    
    console.log(template); // the import

    Here, you would expect that the hygiene code to do something like this:

    import { template as template1 } from "@ember/template-compilation";
    
    export default function(template) {
      console.log(template); // the local variable
      return template1(`X`, { eval() { return eval(arguments[0])} });
    };
    
    console.log(template1); // the import

    Or perhaps:

    import { template } from "@ember/template-compilation";
    
    export default function(template1) {
      console.log(template1); // the local variable
      return template(`X`, { eval() { return eval(arguments[0])} });
    };
    
    console.log(template); // the import

    But it actually does neither. Instead, you get the following incorrect code:

    import { template } from "@ember/template-compilation";
    
    export default function(template) {
      console.log(template); // the local variable
      return template(`X`, { eval() { return eval(arguments[0])} });
    };
    
    console.log(template); // the import

    These are exactly the kind of cases that is hard to get right, and if the SWC hygiene code does this for us it would perhaps be worth the trouble with the other renames. But it just doesn't, and it's not really a bug, it's just that we are using it differently (incorrectly) than what it's intended for.

    The way the SWC macro expansion hygiene system works is that it expects there to be user code (input source code) and synthesized code, and the "hygiene" in this case is that synthesized code exists in a parallel namespace universe that is doesn't interfere with user code and vice versa (i.e. hygienic macros). Code introduced by the macros won't accidentally shadow user code, and likewise user code won't shadow stuff form the synthesized code, so that it works reliably regardless the insertion position.

    Specifically – the top_level_mark exists for precisely this reason. We are expected run the "clean" AST through the resolver before inserting any synthesized code. The resolver would tag all existing identifiers with a mark/syntax context that is derived from the top_level_mark. Any other code inserted thereafter and aren't derived from this mark are considered synthesized code. (By the way, we currently violate this assumption by calling resolve too late, which I fixed in Refactor import insertion code into its own module #72.)

    This is fine in the case when there is no preexisting import, and we use unique_identifier! to generate a hygienic identifier for the imported template function. However, in the case where there is a preexisting import, we are reusing the existing identifier, which is considered user code (descendant of the top_level_mark), violating the hygiene assumptions.

    Why does it matter? One keep reason for the distinction between user code and synthesized code is eval(). When there is an eval() statement, SWC suppress the renaming for that scope and any of its parents. The logic is that the eval statement may need to see any possible upvars in their original name. However, SWC is free to (and will) rename any synthesized identifiers, because that is by definition non-breaking for the user eval code. (Check out feat(es/renamer): Rename synthesized identifiers even on eval swc-project/swc#6818, which is also when top_level_mark was introduced.)

    So that explains what is going on with the example I gave. We basically put the hygienic rename code in an impossible situation. Technically the example I gave isn't actually unsolvable, but you can easily construct ones that are, and the bottomline is that we violated the fundamental assumptions of their algorithm.

So, my conclusion is that I think the built-in hygiene stuff from SWC is a poor fit for what we want to do, and it both does too much and too little. I think we'll probably have to write our own visitors, loosely based on what rename/hygiene does, but also tweak it to fit our purpose. We can probably reuse some parts of those code, maybe the rename::Analyzer/rename::Scope stuff, but I'll have to play with it to know.

We tried the obvious/naive thing of just implementing a Renamer (basically what hygiene does), but the assumptions are baked in too deep and that didn't work out. The code really expect you to actually rename the identifiers, and if you try to return the same one in cases where it expects you to rename them, you'll end up being stuck in an infinite loop.

@patricklx
Copy link
Author

Wouldn't it be enough to rename the identifiers that we introduced for template?
Since we also control the exact swc version, we can make assumptions to what we can rename template.
We are already visiting nodes, so we could also visit identifiers and check if there is a collision.
If there is, we rename our identifiers. I guess it's only template?
Probably not template1, but maybe _template_

@chancancode
Copy link
Contributor

That's "more or less" what I meant by "write our own visitors, loosely based on..."

Yep, it's just all of those things, but done correctly, devil is in the details 😄

Yep, template is the only identifier we care about.

We just have to make sure we visit all the right types of identifiers to check for all the right types of scopes – blocks, functions, take into account var hoisting, eval, etc.

In the case there is no existing import, then yes we just have to pick a name that doesn't collide with anything in the places where there is a template tag and the replacement name has to be collision free in all of the intermediate scopes as well.

In the case where there is an existing import, it gets complicated real fast (but as I said, the existing SWC code also doesn't handle this correctly), especially when there are evals (which we basically guarantee there is going to be), because renaming the existing import name can break user code that relies on the template (or whatever they named it) being there in that exact name. So in the case where a <template> tag is trapped in a spot where the imported name is shadowed, we are kind of stuck, the easiest thing to do is probably to introduce a top-level alias (const template1 = tempalteOrOriginalSometimesShadowedName;).

95% of the time it's basically doing a bunch of deep analysis to discover we don't have to do anything. 50% of the challenge is to do that analysis correctly, and the other 50% of the challenge is to make sure we handle the edge cases where we do have to do something.

All said and done, we can maybe reuse 30% of the SWC code, but it's still helpful to have that code available as a guide, informing us what kind of nodes we need to care about for the purpose of the analysis, etc. All the potentially useful parts are already exported, so we don't really need to make changes in our fork. It's best to keep the code outside of the SWC tree as much as possible to make it easier to merge in upstream changes. Plus you don't really want to be messing around with changing something as core as this.

If we can find overlapping availability considering timezone differences, would also be happy to pair on it.

@patricklx
Copy link
Author

You mean there could be an eval which expects the template identifier to be available which is introduced by content-tag? Or from an existing import?
I think we should not touch any existing identifiers or imports.
If there is an existing import, we should just leave it and we can add our own import with different local specifier?

@patricklx
Copy link
Author

patricklx commented Mar 2, 2024

Mayb would be enough to rebase swc on latest:

swc-project/swc#236 (comment)

Edit, looks like content-tag is already using v1.3.92

@ef4
Copy link
Collaborator

ef4 commented Mar 2, 2024 via email

@chancancode
Copy link
Contributor

@SergeAstapov
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment