Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New deriving that generates O(n) code for PartialOrd, Ord, etc. #15503

Merged
merged 3 commits into from
Jul 11, 2014

Conversation

pnkfelix
Copy link
Member

@pnkfelix pnkfelix commented Jul 7, 2014

Instead of generating a separate case (albeit trivial) for each of the N*N cases when comparing two instances of an enum with N variants, this deriving uses the strategy outlined here: #15375 (comment)

In particular, it generates code that looks more like this:

    match (this, that, ...) {
      (Variant1, Variant1, Variant1) => ... // delegate Matching on Variant1
      (Variant2, Variant2, Variant2) => ... // delegate Matching on Variant2
      ...
      _ => {
        let index_tup = {
          let idx_this = match this { Variant1 => 0u, Variant2 => 1u, ... };
          let idx_that = match that { Variant1 => 0u, Variant2 => 1u, ... };
          ...
          (idx_this, idx_that, ...)
        };
        ... // delegate to catch-all; it can inspect `index_tup` for its logic
      }
    }

While adding a new variant to the const_nonmatching flag (and renaming it to on_nonmatching) to allow expressing the above (while still allowing one to opt back into the old O(N^2) and in general O(N^K) (where K is the number of self arguments) code generation behavior), I had two realizations:

  1. Observation: Nothing except for the comparison derivings (PartialOrd, Ord, PartialEq, Eq) were even using the old O(N^K) code generator. So after this hypothetically lands, nothing needs to use them, and thus that code generation strategy could be removed, under the assumption that it is very unlikely that any deriving mode will actually need that level of generality.
  2. Observation: The new code generator I am adding can actually be unified with all of the other code generators that just dispatch on the variant tag (they all assume that there is only one self argument).

These two observations mean that one can get rid of the const_nonmatching (aka on_nonmatching) entirely. So I did that too in this PR.

The question is: Do we actually want to follow through on both of the above observations? I'm pretty sure the second observation is a pure win. But there might be someone out there with an example that invalidates the reasoning in the first observation. That is, there might be a client out there with an example of hypothetical deriving mode that wants to opt into the O(N^K) behavior. So, if that is true, then I can revise this PR to resurrect the on_nonmatching flag and provide a way to access the O(N^K) behavior.

The manner in which I choose to squash these commits during a post-review rebase depends on the answer to the above question.

Fix #15375.

@pnkfelix
Copy link
Member Author

pnkfelix commented Jul 7, 2014

r? @huonw

@lilyball
Copy link
Contributor

lilyball commented Jul 7, 2014

I am generally in favor of this kind of rewrite. But more concretely, do you have any information on what practical effect this has? The two metrics of interest are

  1. Does this affect compile-time? I hope it speeds it up. Especially in the pathological 500-variant enum that @chris-morgan had issues with.
  2. Does this affect runtime performance?

@brson
Copy link
Contributor

brson commented Jul 7, 2014

Awesome effort, @pnkfelix.

@sfackler
Copy link
Member

sfackler commented Jul 7, 2014

I wouldn't be too concerned about keeping the old O(n^k) code around just in case somebody may potentially want to use it someday. It'll still be in the history and we can resurrect it if needed, or it can get pulled out into a separate library.

@pnkfelix
Copy link
Member Author

pnkfelix commented Jul 7, 2014

@kballard I'll address your questions in reverse order. :)

-2. Does this affect runtime performance?

It might. But only because it hasn't been quite as tightly optimized as it should be.

I was thinking of the particular case of C-like enums on the way home today.

Those cases are covered by this (not yet implemented) optimization: I really should be letting comparisons of two enums that both have no subparts (e.g. B and D in enum E { A(int), B, C(f64), D } fall through the catch-all as well, since the catch-all will have all of the information it needs to give the answers in such cases.

Once such an optimization is in place, then for C-like enums all of the cases would just fall through to the catch-all (i.e. there would be no preceding arms).


The other piece of the story there would be speeding up the mapping from the value to its discriminant index. Currently I am just emitting that code in place as another match statement, but in principle that could (and should?) be replaced by a compiler intrinsic, as discussed here: #15375 (comment)


Anyway, I haven't done any benchmarking yet of the generated code. I will try to whip some up in parallel with the rest of the review.

As for the compile times and code size measures, I have informally benchmarked that; I'll post those results in a follow-up comment.

@pnkfelix
Copy link
Member Author

pnkfelix commented Jul 7, 2014

@kballard okay here is a gist with some semi-formal benchmark results.

https://gist.github.com/pnkfelix/3dcc567222d0906f1c42

The gist has both the program I used to generate the test data and the benchmarking driver script, but the important thing are the two "files" at the bottom, which are transcripts comparing my stage0 rustc behavior with my stage1 rustc behavior.

The main points to note are:

  • Regarding expanded code size (which is one thing that affects overall compile times): stage0 on line 28 shows that for an enum with 150 variants, the generated source code size is 2.6 megabytes. Compare that with stage1 on line 28, where the same 150 variant enum has a generated source code size of 148K.
  • Regarding compile times themselves, stage0 on line 55 shows that for that enum of 150 variants, compiling to object code took 11 seconds. The analogous result for stage1 on line 55 is 0.55 seconds.

Since we're talking about quadratic code size generation here, the problem note above only gets worse and worse as the number of variants in the enum grows. E.g for a 200 variant enum, the stage0 expanded code size I observe for this generator program is 4.6M, versus 200K for stage1.

(There is also a difference in the code size of the final output binary, which you can see at the gist as well, but I think the number of enum variants has to grow quite large, e.g. > 200 variants, before those get so drastically bad as what I'm reporting above for the expanded code size, presumably because much of the time that we are spending in compilation is LLVM getting the generated code back into something semi-reasonable. Still, the generated code for the new deriving is smaller when the number of variants is large, and I believe with the optimizations I outlined in my previous comment, it should compete favorably with our current deriving system even for enums with small numbers of variants.)

@huonw
Copy link
Member

huonw commented Jul 7, 2014

Those numbers look good @pnkfelix! (I will review later today.)

@lilyball
Copy link
Contributor

lilyball commented Jul 7, 2014

@pnkfelix Awesome!

first tuple, for other1 are in the second tuple, etc.)
non-matching variants of the enum, but with all state hidden from
the consequent code. The first component holds Idents for all of
the self arguments; the second component is a slice of all of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idents of the Self arguments, right? (i.e. the Self type, not just the self value)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is indeed meant to refer to all arguments of Self type, not just the self value.

I think I used lowercase self in some places because the pre-existing file sometimes used lowercase self, e.g. phrases like "nonself args" that are referring to all arguments of non Self type.

But I agree uppercase Self is clearer. I will at least update my own new/changed comments to use Self, and see if I can rephrase any other occurrences of self args as well.

@huonw
Copy link
Member

huonw commented Jul 8, 2014

That is, there might be a client out there with an example of hypothetical deriving mode that wants to opt into the O(N^K) behavior

Thinking about it now, I don't think any generic deriving-like mode can do anything meaningful with the contained data when the variants don't match: there's rarely a non-trivial relationship between the fields e.g. types will only match by chance/in rare cases. So I'm 100% happy with the old code disappearing, the new code is so much nicer. (Far less O(N^K) recursion, while retaining the entirety of the generally useful information. ❤️)

@pnkfelix
Copy link
Member Author

I still want to try to adopt this refactoring suggested by huon:

(Theoretically, I guess those two Vec could combined into one as a Vec<(Ident, Ident)> (or with a struct with informative field names, heh), since these tag-idents correspond directly to each Self arg ...)

After I do that then I'll have a go at pushing.

@pnkfelix
Copy link
Member Author

Okay the refactoring @huonw suggested is probably doable, but doing it well (i.e. not just a local transformation, but rather propagating its good effects outward to the client code) requires wider revision to the code than I want to take on right now. So I am just going to rebase atop master and push.

…ent.

In particular, I want authors of deriving modes to understand what
they are opting into (namely quadratic code size or worse) when they
select NonMatchesExplode.
In the above formulas, `n` is the number of variants, and `k` is the
number of self-args fed into deriving.  In the particular case of
interest (namely `PartialOrd` and `Ord`), `k` is always 2, so we are
basically comparing `O(n)` versus `O(n^2)`.

Also, the stage is set for having *all* enum deriving codes go through
`build_enum_match_tuple` and getting rid of `build_enum_match`.

Also, seriously attempted to clean up the code itself.  Added a bunch
of comments attempting to document what I learned as I worked through
the original code and adapted it to this new strategy.
Remove the `NonMatchesExplode` variant now that no deriving impl uses it.
Removed `EnumNonMatching` entirely.
Remove now irrelevant `on_matching` field and `HandleNonMatchingEnums` type.
Removed unused `EnumNonMatchFunc` type def.

Drive-by: revise `EnumNonMatchCollapsedFunc` doc.

Made all calls to `expand_enum_method_body` go directly to
`build_enum_match_tuple`.

Alpha-rename `enum_nonmatch_g` back to `enum_nonmatch_f` to reduce overall diff noise.
Inline sole call of `some_ordering_const`.
Inline sole call of `ordering_const`.

Removed a bunch of code that became dead after the above changes.
bors added a commit that referenced this pull request Jul 11, 2014
…r=huonw

Instead of generating a separate case (albeit trivial) for each of the N*N cases when comparing two instances of an enum with N variants, this `deriving` uses the strategy outlined here: #15375 (comment)

In particular, it generates code that looks more like this:

```rust
    match (this, that, ...) {
      (Variant1, Variant1, Variant1) => ... // delegate Matching on Variant1
      (Variant2, Variant2, Variant2) => ... // delegate Matching on Variant2
      ...
      _ => {
        let index_tup = {
          let idx_this = match this { Variant1 => 0u, Variant2 => 1u, ... };
          let idx_that = match that { Variant1 => 0u, Variant2 => 1u, ... };
          ...
          (idx_this, idx_that, ...)
        };
        ... // delegate to catch-all; it can inspect `index_tup` for its logic
      }
    }
```

While adding a new variant to the `const_nonmatching` flag (and renaming it to `on_nonmatching`) to allow expressing the above (while still allowing one to opt back into the old `O(N^2)` and in general `O(N^K)` (where `K` is the number of self arguments) code generation behavior), I had two realizations:

 1. Observation: Nothing except for the comparison derivings (`PartialOrd`, `Ord`, `PartialEq`, `Eq`) were even using the old `O(N^K)` code generator.  So after this hypothetically lands, *nothing* needs to use them, and thus that code generation strategy could be removed, under the assumption that it is very unlikely that any `deriving` mode will actually need that level of generality.
 2. Observation: The new code generator I am adding can actually be unified with all of the other code generators that just dispatch on the variant tag (they all assume that there is only one self argument).

These two observations mean that one can get rid of the `const_nonmatching` (aka `on_nonmatching`) entirely.  So I did that too in this PR.

The question is: Do we actually want to follow through on both of the above observations?  I'm pretty sure the second observation is a pure win.  But there *might* be someone out there with an example that invalidates the reasoning in the first observation.  That is, there might be a client out there with an example of hypothetical deriving mode that wants to opt into the `O(N^K)` behavior.  So, if that is true, then I can revise this PR to resurrect the `on_nonmatching` flag and provide a way to access the `O(N^K)` behavior.

The manner in which I choose to squash these commits during a post-review rebase depends on the answer to the above question.

Fix #15375.
@bors bors closed this Jul 11, 2014
@bors bors merged commit 5cee578 into rust-lang:master Jul 11, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 18, 2023
SCIP: Report the correct version of rust-analyzer in the metadata

Previously this was hard coded to "0.1". The SCIP protocol allows this to be an arbitrary string:

```
message ToolInfo {
  // Name of the indexer that produced this index.
  string name = 1;
  // Version of the indexer that produced this index.
  string version = 2;
  // Command-line arguments that were used to invoke this indexer.
  repeated string arguments = 3;
}
```

so use the same string reported by `rust-analyzer --version`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#[deriving(PartialOrd)] is O(N^2) code size for N enum variants
6 participants