Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] quickrun result error inspection #491

Closed
wants to merge 5 commits into from
Closed

Conversation

dwhswenson
Copy link
Member

This adds an inspect-result command to the CLI, which takes a result JSON file and outputs some basic information on each unit.

The information included here can definitely be expanded later (timings, possibly more details on simulation parameters, etc). But I thought it was most urgent to get error messages out.

Here's an example output from an edge that had a NaN failure:
$ openfe inspect-result easy_rbfe_lig_ejm_46_solvent_lig_jmc_28_solvent.json
This edge was a SUCCESS.
This edge consists of 4 units.

Unit ProtocolUnitResult-e8e4c380537f4bfd89639cf52b8a8b95 ran successfully.

Unit ProtocolUnitFailure-4c981804da4c40b198d7fc310401c755 failed with an error:
SimulationNaNError: Propagating replica 1 at state 1 resulted in a NaN!
The state of the system and integrator before the error were saved in results/easy_rbfe_lig_ejm_46_solvent_lig_jmc_28_solvent/shared_RelativeHybridTopologyProtocolUnit-60df96a488094f63978880d19487d299_attempt_0/nan-error-logs
Traceback (most recent call last):
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openmmtools/multistate/multistatesampler.py", line 1326, in _propagate_replica
    mcmc_move.apply(thermodynamic_state, sampler_state, context_cache=self.sampler_context_cache)
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openmmtools/mcmc.py", line 1151, in apply
    super(LangevinDynamicsMove, self).apply(thermodynamic_state, sampler_state,
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openmmtools/mcmc.py", line 755, in apply
    raise IntegratorMoveError(err_msg, self, context)
openmmtools.mcmc.IntegratorMoveError: Potential energy is NaN after 20 attempts of integration with move LangevinDynamicsMove

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/gufe/protocols/protocolunit.py", line 308, in execute
    outputs = self._execute(context, **inputs)
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/equil_rfe_methods.py", line 674, in _execute
    outputs = self.run(scratch_basepath=ctx.scratch, shared_basepath=ctx.shared)
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openfe/protocols/openmm_rfe/equil_rfe_methods.py", line 617, in run
    sampler.equilibrate(int(equil_steps / mc_steps))  # type: ignore
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openmmtools/multistate/multistatesampler.py", line 696, in equilibrate
    self._propagate_replicas()
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openmmtools/utils/utils.py", line 95, in _wrapper
    return func(*args, **kwargs)
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openmmtools/multistate/multistatesampler.py", line 1299, in _propagate_replicas
    propagated_states, replica_ids = mpiplus.distribute(self._propagate_replica, range(self.n_replicas),
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/mpiplus/mpiplus.py", line 523, in distribute
    all_results = [task(job_args, *other_args, **kwargs) for job_args in distributed_args]
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/mpiplus/mpiplus.py", line 523, in <listcomp>
    all_results = [task(job_args, *other_args, **kwargs) for job_args in distributed_args]
  File "/lila/home/henrym3/micromamba/envs/openfe-0.10.1/lib/python3.10/site-packages/openmmtools/multistate/multistatesampler.py", line 1337, in _propagate_replica
    raise SimulationNaNError(message)
openmmtools.multistate.utils.SimulationNaNError: Propagating replica 1 at state 1 resulted in a NaN!
The state of the system and integrator before the error were saved in results/easy_rbfe_lig_ejm_46_solvent_lig_jmc_28_solvent/shared_RelativeHybridTopologyProtocolUnit-60df96a488094f63978880d19487d299_attempt_0/nan-error-logs


Unit ProtocolUnitResult-3be6df093b0e4f61ad443dc966595621 ran successfully.

Unit ProtocolUnitResult-5ea0dbdc7ee94e86a547f66e421adb95 ran successfully.

Still needs tests, so that'll come after review comments on output.

@dwhswenson
Copy link
Member Author

ah, shoot, this PR has mixed in some other stuff. I'll move the file to a clean PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant