Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Enzyme and reexport ADTypes.AutoEnzyme #1887

Draft
wants to merge 81 commits into
base: master
Choose a base branch
from
Draft

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Sep 28, 2022

Note: This does not work yet


I opened this PR to make it easier to debug (and possibly fix) issues with Enzyme.

Currently, the following example does not work (note that the snippet does not require the PR which solely reexports AutoEnzyme at this point):

using Turing
using Enzyme
using ADTypes
Enzyme.API.runtimeActivity!(true);
Enzyme.API.typeWarning!(false);

@model function model()
    m ~ Normal(0, 1)
    s ~ InverseGamma()
    x ~ Normal(m, s)
end

sample(model() | (; x=0.5), NUTS(; adtype = ADTypes.AutoEnzyme()), 10)

With Enzyme#main my Julia (1.8.1) segfaults. An incomplete (it filled my whole terminal) output: https://gist.github.com/devmotion/1352197f2354c6fecddd7b778ec4bcf7#file-log-txt

The example works (latest releases of Turing, Enzyme, and ADTypes on Julia 1.10.0) but the following warnings show up:

warning: didn't implement memmove, using memcpy as fallback which can result in errors
warning: didn't implement memmove, using memcpy as fallback which can result in errors

@coveralls
Copy link

coveralls commented Nov 13, 2022

Pull Request Test Coverage Report for Build 11521373400

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 647 unchanged lines in 18 files lost coverage.
  • Overall coverage decreased (-41.9%) to 44.523%

Files with Coverage Reduction New Missed Lines %
src/essential/container.jl 1 87.1%
src/mcmc/abstractmcmc.jl 2 92.86%
src/variational/VariationalInference.jl 4 0.0%
src/mcmc/gibbs.jl 6 77.38%
src/mcmc/gibbs_conditional.jl 12 0.0%
src/mcmc/is.jl 16 0.0%
src/mcmc/hmc.jl 20 78.36%
src/stdlib/RandomMeasures.jl 22 0.0%
ext/TuringDynamicHMCExt.jl 29 0.0%
src/mcmc/mh.jl 31 60.9%
Totals Coverage Status
Change from base Build 11521350911: -41.9%
Covered Lines: 691
Relevant Lines: 1552

💛 - Coveralls

@codecov
Copy link

codecov bot commented Nov 13, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 44.52%. Comparing base (269081e) to head (98f2c2b).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (269081e) and HEAD (98f2c2b). Click for more details.

HEAD has 33 uploads less than BASE
Flag BASE (269081e) HEAD (98f2c2b)
54 21
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1887       +/-   ##
===========================================
- Coverage   86.39%   44.52%   -41.88%     
===========================================
  Files          22       22               
  Lines        1573     1552       -21     
===========================================
- Hits         1359      691      -668     
- Misses        214      861      +647     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/essential/ad.jl Outdated Show resolved Hide resolved
src/essential/ad.jl Outdated Show resolved Hide resolved
@wsmoses
Copy link
Collaborator

wsmoses commented Jun 26, 2023

Also if you want to disable the warnings you can set it like so (https://github.com/EnzymeAD/Enzyme.jl/blob/c29e6119c7963ddb22f1363726f762455748e193/src/api.jl#L414
)

Enzyme.API.typeWarning!(false)

@wsmoses
Copy link
Collaborator

wsmoses commented Jun 26, 2023

You also may want to set the version to 0.11.2 since your CI currently is running at 0.11.0 (⌃ [7da242da] Enzyme v0.11.0)

@wsmoses
Copy link
Collaborator

wsmoses commented Jun 27, 2023

@devmotion this PR (EnzymeAD/Enzyme.jl#914) should fix the immediate issues you see on CI if you want to try.

@yebai
Copy link
Member

yebai commented Aug 15, 2024

that way we can make sure enzyme+turing's 1.10 suport doesn't regress

I have a relatively strong view about adding more unit and integration tests to Enzyme instead of replying to packages like Turing to catch Enzyme failures.

If this PR was merged, Turing CI might frequently be broken due to enzyme issues, which could be a big distraction.

@yebai yebai marked this pull request as draft August 15, 2024 10:23
@yebai
Copy link
Member

yebai commented Aug 15, 2024

Marked as draft since this depends on

@devmotion
Copy link
Member Author

If this PR was merged, Turing CI might frequently be broken due to enzyme issues, which could be a big distraction.

I wonder if we should run Enzyme tests in a separate GH action, similar to e.g. separate nightly tests, that could be allowed to fail and maybe make it easier to see that everything else still passes?

@wsmoses
Copy link
Collaborator

wsmoses commented Aug 15, 2024

Totally up to you guys how you want to do testing, but I'd recommend you put it wherever you currently test AD packages (which I think is here?) and nevertheless these tests were useful historically and it would be good to ensure they run to catch regressions.

And @yebai we've added relevant minimized tests to Enzyme for each o reelvant functinoality which caused issues in the past (we now have thousands of distinct tests in our suites). Happy to add more integration tests of course, if you have a PR, or want to add this repo as a downstream CI like here: https://github.com/EnzymeAD/Enzyme.jl/pull/1675/files

I also don't think this PR depends on those PR's (as these tests pass regardless). Of course adding the tests in those repos is good too (and currently waiting action from the Turing side for MWE's), but I don't think a PR adding useful tests should be blocked on an unrelated PR which also adds other useful tests.

@yebai
Copy link
Member

yebai commented Aug 21, 2024

Yes, it is a good idea to separate Enzyme tests into a separate CI task. We should consider doing the same for all tested autodiff backends. In addition, we can reduce CI time by testing gradient correctness only (see #2307).

Also, testing a small, carefully chosen set of Turing models on Enzyme's CI suite is very helpful, too.

@wsmoses
Copy link
Collaborator

wsmoses commented Sep 2, 2024

bumping this. having tests is generally a good thing (and will prevent breakages in the future), and this PR does nothing different from how existing AD packages function.

Totally up to you if you want to refactor how AD backend testing works, but that can be a follow up PR.

@yebai
Copy link
Member

yebai commented Sep 4, 2024

@mhauru, can you adapt the following distributions and Turing test suite to help create an integration test PR for Enzyme?

These integration tests could supersede TuringLang/DistributionsAD.jl#254.

@wsmoses
Copy link
Collaborator

wsmoses commented Sep 12, 2024 via email

@mhauru
Copy link
Member

mhauru commented Sep 12, 2024

Merging this before/at the same time as Enzyme's Turing CI makes sense. However, I think we need EnzymeAD/Enzyme.jl#1811 and EnzymeAD/Enzyme.jl#1812 fixed before merging this. I'm surprised that the Turing test suite didn't catch them, need to investigate why, maybe add some tests here.


using AdvancedPS: AdvancedPS

include("container.jl")

export @model,
@varname,
AutoEnzyme,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's export this as Turing.Experimental.AutoEnzyme until Enzyme becomes more stable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the threshold for being considered stable here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhauru and @penelopeysm probably have a lot more experience on this.

My heuristic threshold:

  • Enzyme passes all Distributions.jl and Turing.jl tests
  • No known segfaults for Enzyme

for a continuous period of 8 weeks.

@mhauru
Copy link
Member

mhauru commented Oct 24, 2024

I've merged the latest master and upgraded to Enzyme v0.12. We are still being held back from v0.13 by Bijectors.jl. There are a number of new test failures, because

  1. There seem to be some regressions, tests that used to pass but now don't. I need to investigate.
  2. We had previously only tested Enzyme on the HMC and SGHMC tests. I hadn't realised/had forgotten that we weren't testing Enzyme with the full test suite. I've now added it to test/mcmc/gibbs.jl,test/mcmc/abstractmcmc.jl, and a couple of others as well. Everywhere where there is a loop over AD backends.
  3. Runtime activity is no longer a global setting in Enzyme. Having to change how we set runtime activity I think is a good opportunity to take stock of its effect, so I've just removed using it for now. Let's try to get to a point where the only test failures are ones where Enzyme says "you may need to use runtime activity", see how many there are, and only then enable it.

Getting Bijectors.jl to support Enzyme v0.13 I think has to be the next step, because otherwise any of the failures we see here might already be fixed on v0.13, and thus minimising and reporting them is pointless.

@wsmoses
Copy link
Collaborator

wsmoses commented Nov 6, 2024

gentle bump here

@yebai
Copy link
Member

yebai commented Nov 11, 2024

It would be good to address EnzymeAD/Enzyme.jl#1812, #2307 and TuringLang/Bijectors.jl#341 before merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants