Tests: Manually stop daemon after `verdi devel revive` test #5689

sphuber · 2022-10-05T22:33:12Z

There was a problem where the verdi process pause test in the
tests/cmdline/commands/test_process.py would except because the
timeout would be hit. The direct result was because the daemon worker
could not load the node from the database, which in turns was because
the session was in a pending rollback state. This was because a previous
operation on the database excepted. This exception seemed to be due to
the daemon trying to call CalcJob.delete_state or
Process.delete_checkpoint in the on_terminated calls. For some
reason, the update statement that would be executed for this, to remove
the relevant attribute key, would match 0 rows. The suspicion is because
the relevant node had already been removed from the database, probably
because another test, ran between the two daemon tests, had cleaned the
database and so the node no longer existed, but the process task somehow
did.

It is not quite clear exactly where the problem lies, but for now the
temporary work-around is to manually stop the daemon in the first test,
which apparently cleans the state such that the original exception is no
longer hit and the daemon doesn't get stuck with an inconsistent session.

sphuber · 2022-10-05T22:37:18Z

I think this should fix the problem, although the solution is not really ideal. It is a bit of a workaround since I still couldn't understand a 100% what is going on. This solution seems to work for now though and since it is really messing with all builds, we might want to consider merging this while we investigate further to find the real root cause.

Warnings are raised when a profile is loaded that configures a RabbitMQ server with an unsupported version or if the installed `aiida-core` code is not a released version. These warnings are not relevant for testing and so they are suppressed by setting the relevant config options. The options are set on the automatically created config in the case of the temporary test profile, as well as the test profile that is created manually before for the tests run in the Github Actions workflow.

The `Computer` created by the `aiida_localhost` fixture configures the `core.direct` scheduler plugin, which does not support setting a maximum memory directive. Doing so leads to a warning being logged everytime a job is submitted to the computer.

If the `submit_and_wait` fixture times out waiting for the submitted process to reach the desired state, usually there is a problem with the daemon workers. To make debugging easier, the status of the daemon as well as the content of the daemon log file are included in the exception message.

There was a problem where the `verdi process pause` test in the `tests/cmdline/commands/test_process.py` would except because the timeout would be hit. The direct result was because the daemon worker could not load the node from the database, which in turns was because the session was in a pending rollback state. This was because a previous operation on the database excepted. This exception seemed to be due to the daemon trying to call `CalcJob.delete_state` or `Process.delete_checkpoint` in the `on_terminated` calls. For some reason, the update statement that would be executed for this, to remove the relevant attribute key, would match 0 rows. The suspicion is because the relevant node had already been removed from the database, probably because another test, ran between the two daemon tests, had cleaned the database and so the node no longer existed, but the process task somehow did. It is not quite clear exactly where the problem lies, but for now the temporary work-around is to manually stop the daemon in the first test, which apparently cleans the state such that the original exception is no longer hit and the daemon doesn't get stuck with an inconsistent session.

sphuber · 2022-10-07T11:04:12Z

@chrisjsewell I am merging this soon since it is blocking all other PRs. Let me know if you want to have a look still or I can go ahead

chrisjsewell

cheers

sphuber requested a review from chrisjsewell October 5, 2022 22:37

sphuber changed the title ~~Tests: Manually stop daemon after verdi devel revive test~~ Tests: Manually stop daemon after verdi devel revive test Oct 6, 2022

sphuber force-pushed the main branch from 1c5dfef to 41e9bf3 Compare October 7, 2022 06:34

sphuber added 4 commits October 7, 2022 12:03

sphuber force-pushed the fix/5687/daemon-fixture-delete-checkpoint branch from 40dc604 to ae90d17 Compare October 7, 2022 10:03

chrisjsewell approved these changes Oct 7, 2022

View reviewed changes

sphuber merged commit ea135d3 into aiidateam:main Oct 7, 2022

sphuber deleted the fix/5687/daemon-fixture-delete-checkpoint branch October 7, 2022 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests: Manually stop daemon after `verdi devel revive` test #5689

Tests: Manually stop daemon after `verdi devel revive` test #5689

sphuber commented Oct 5, 2022

sphuber commented Oct 5, 2022

sphuber commented Oct 7, 2022

chrisjsewell left a comment

Tests: Manually stop daemon after verdi devel revive test #5689

Tests: Manually stop daemon after verdi devel revive test #5689

Conversation

sphuber commented Oct 5, 2022

sphuber commented Oct 5, 2022

sphuber commented Oct 7, 2022

chrisjsewell left a comment

Choose a reason for hiding this comment

Tests: Manually stop daemon after `verdi devel revive` test #5689

Tests: Manually stop daemon after `verdi devel revive` test #5689