Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concrete-eval: Use concrete eval information even if the result is big #47283

Merged
merged 1 commit into from
Oct 24, 2022

Conversation

Keno
Copy link
Member

@Keno Keno commented Oct 22, 2022

When we originally implemented concrete-eval, we decided not to create Const lattice elements for constants that are too large, on the fear that these would end up in the IR and blowing up the size. Now that we have some experience with this, I think that decision was incorrect for several reasons:

  1. We've already performed the concrete evaluation (and allocated the big object), so we're just throwing away precision here that we could have otherwise used (Although if the call result is unused, we probably shouldn't do concrete eval at all - see Don't attempt concrete eval if there's no information to be gained #46774).
  2. There's a number of other places in the compiler where we read large values into Const. Unless we add these kinds of check there too, we need to have appropriate guards in the optimizer and the cache anyway, to prevent the IR size blowup.
  3. It turns out that throwing away this precision actually causes significant performance problems for code that is just over the line. Consider for example a lookup of a small struct inside a large, multi-level constant structure. The final result might be quite tiny, but if we refuse to propagate the intermediate levels, we might end up doing an (expensive) full-constprop when propagating the constant information could have given us a much cheaper concrete evaluation.

This commit simply removes that check. If we see any regressions as a result of this, we should see if there are additional guards needed in the optimizer.

Copy link
Sponsor Member

@aviatesk aviatesk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider for example a lookup of a small struct inside a large, multi-level constant structure. The final result might be quite tiny, but if we refuse to propagate the intermediate levels, we might end up doing an (expensive) full-constprop when propagating the constant information could have given us a much cheaper concrete evaluation.

Maybe we can have a benchmark target for this kind of case in BaseBenchmarks.jl?
This PR itself LGTM.

@Keno
Copy link
Member Author

Keno commented Oct 22, 2022

Maybe we can have a benchmark target for this kind of case in BaseBenchmarks.jl?

Yes, should be reasonably to do. Just build a trie or a heap out of tuples or something.

@aviatesk
Copy link
Sponsor Member

Yes, should be reasonably to do. Just build a trie or a heap out of tuples or something.

I will try to come up with one. For the meanwhile, let's try the other benchmarks and merge this if there are no apparent regressions.

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

…o large

When we originally implemented concrete-eval, we decided not to create `Const`
lattice elements for constants that are too large, on the fear that these
would end up in the IR and blowing up the size. Now that we have some
experience with this, I think that decision was incorrect for several reasons:

1. We've already performed the concrete evaluation (and allocated the big
   object), so we're just throwing away precision here that we could
   have otherwise used (Although if the call result is unused, we probably
   shouldn't do concrete eval at all - see #46774).
2. There's a number of other places in the compiler where we read large
   values into `Const`. Unless we add these kinds of check there too,
   we need to have appropriate guards in the optimizer and the cache
   anyway, to prevent the IR size blowup.
3. It turns out that throwing away this precision actually causes
   significant performance problems for code that is just over the line.
   Consider for example a lookup of a small struct inside a large,
   multi-level constant structure. The final result might be quite
   tiny, but if we refuse to propagate the intermediate levels,
   we might end up doing an (expensive) full-constprop when propagating
   the constant information could have given us a much cheaper
   concrete evaluation.

This commit simply removes that check. If we see any regressions
as a result of this, we should see if there are additional guards
needed in the optimizer.
@aviatesk
Copy link
Sponsor Member

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@Keno
Copy link
Member Author

Keno commented Oct 24, 2022

Nanosolider is fine here right, since "allinference" went down it actually got faster, optimization time for this one frame going up would make sense if things became more inlineable and optimization has to do more work, but since allinference got faster on the same benchmark, inference overall would be faster.

@aviatesk
Copy link
Sponsor Member

Sounds reasonable. Let's get this merged.

@aviatesk aviatesk merged commit 13c0060 into master Oct 24, 2022
@aviatesk aviatesk deleted the kf/constpropbig branch October 24, 2022 04:46
Keno added a commit that referenced this pull request Oct 24, 2022
It is possible for concrete-eval to refine effects, but be unable
to inline (because the constant is too large, c.f. #47283).
However, in that case, we would still like to use the extra
effect information to make sure that the optimizer can delete
the statement if it turns out to be unused. There are two cases:
the first is where the call is not inlineable at all. This
one is simple, because we just apply the effects on the :invoke
as we usually would. The second is trickier: If we do end up
inlining the call, we need to apply the overriden effects to
every inlined statement, because we lose the identity of the
function as a whole. This is a bit nasty and I don't really like
it, but I'm not sure what a better alternative would be. We could
always refuse to inline calls with large-constant results
(since we currently pessimize what is being inlined anyway), but
I'm not sure that would be better. This is a simple solution and
works for the case I have in practice, but we may want to revisit
it in the future.
Keno added a commit that referenced this pull request Oct 24, 2022
It is possible for concrete-eval to refine effects, but be unable
to inline (because the constant is too large, c.f. #47283).
However, in that case, we would still like to use the extra
effect information to make sure that the optimizer can delete
the statement if it turns out to be unused. There are two cases:
the first is where the call is not inlineable at all. This
one is simple, because we just apply the effects on the :invoke
as we usually would. The second is trickier: If we do end up
inlining the call, we need to apply the overriden effects to
every inlined statement, because we lose the identity of the
function as a whole. This is a bit nasty and I don't really like
it, but I'm not sure what a better alternative would be. We could
always refuse to inline calls with large-constant results
(since we currently pessimize what is being inlined anyway), but
I'm not sure that would be better. This is a simple solution and
works for the case I have in practice, but we may want to revisit
it in the future.
@@ -1483,15 +1487,12 @@ function handle_concrete_result!(cases::Vector{InliningCase}, result::ConcreteRe
return true
end

may_inline_concrete_result(result::ConcreteResult) =
isdefined(result, :result) && is_inlineable_constant(result.result)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hoping someday we can get rid of this arbitrary size cutoff entirely, and just do something simple like:

is_inlineable_constant(x) = (!ismutable(x) && datatype_pointerfree(typeof(x))) || x isa Type || x isa Symbol

this avoids embedding pointers to random bits of mutable memory, but otherwise drops the random size cut-off

Keno added a commit that referenced this pull request Oct 27, 2022
It is possible for concrete-eval to refine effects, but be unable
to inline (because the constant is too large, c.f. #47283).
However, in that case, we would still like to use the extra
effect information to make sure that the optimizer can delete
the statement if it turns out to be unused. There are two cases:
the first is where the call is not inlineable at all. This
one is simple, because we just apply the effects on the :invoke
as we usually would. The second is trickier: If we do end up
inlining the call, we need to apply the overriden effects to
every inlined statement, because we lose the identity of the
function as a whole. This is a bit nasty and I don't really like
it, but I'm not sure what a better alternative would be. We could
always refuse to inline calls with large-constant results
(since we currently pessimize what is being inlined anyway), but
I'm not sure that would be better. This is a simple solution and
works for the case I have in practice, but we may want to revisit
it in the future.
Keno added a commit that referenced this pull request Oct 27, 2022
It is possible for concrete-eval to refine effects, but be unable
to inline (because the constant is too large, c.f. #47283).
However, in that case, we would still like to use the extra
effect information to make sure that the optimizer can delete
the statement if it turns out to be unused. There are two cases:
the first is where the call is not inlineable at all. This
one is simple, because we just apply the effects on the :invoke
as we usually would. The second is trickier: If we do end up
inlining the call, we need to apply the overriden effects to
every inlined statement, because we lose the identity of the
function as a whole. This is a bit nasty and I don't really like
it, but I'm not sure what a better alternative would be. We could
always refuse to inline calls with large-constant results
(since we currently pessimize what is being inlined anyway), but
I'm not sure that would be better. This is a simple solution and
works for the case I have in practice, but we may want to revisit
it in the future.
Keno added a commit that referenced this pull request Oct 28, 2022
This undoes some of the choices made in #47283 and #47305.
As Shuhei correctly pointed out, even with the restriction to
`nothrow`, adding the extra flags on the inlined statements
results in incorrect IR. Also, my bigger motivating test case
turns out to be insufficiently optimized without the effect_free
flags (which I removed in the final revision of #47305).
I think for the time being, the best course of action here
is to just stop inlining concrete-eval'ed
calls entirely. The const result is available for inference,
so in most cases the call will get deleted. If there's an
important case we care about where this does not happen,
we should take a look at that separately.
Keno added a commit that referenced this pull request Oct 28, 2022
This undoes some of the choices made in #47283 and #47305.
As Shuhei correctly pointed out, even with the restriction to
`nothrow`, adding the extra flags on the inlined statements
results in incorrect IR. Also, my bigger motivating test case
turns out to be insufficiently optimized without the effect_free
flags (which I removed in the final revision of #47305).
I think for the time being, the best course of action here
is to just stop inlining concrete-eval'ed
calls entirely. The const result is available for inference,
so in most cases the call will get deleted. If there's an
important case we care about where this does not happen,
we should take a look at that separately.
Keno added a commit that referenced this pull request Oct 28, 2022
…rge (#47371)

This undoes some of the choices made in #47283 and #47305.
As Shuhei correctly pointed out, even with the restriction to
`nothrow`, adding the extra flags on the inlined statements
results in incorrect IR. Also, my bigger motivating test case
turns out to be insufficiently optimized without the effect_free
flags (which I removed in the final revision of #47305).
I think for the time being, the best course of action here
is to just stop inlining concrete-eval'ed
calls entirely. The const result is available for inference,
so in most cases the call will get deleted. If there's an
important case we care about where this does not happen,
we should take a look at that separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants