Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor task killing & simplify task state deltas #6457

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

MetRonnie
Copy link
Member

@MetRonnie MetRonnie commented Nov 4, 2024

In preparation for tackling the last piece of the cylc remove proposal:

When all flows are removed from active tasks, Cylc should first attempt to kill the job as it serves no further purpose and may cause resource contention issues if left running, however, failure of the kill command should be tolerated, though logged.

I extracted and de-deduped the kill mechanism into a scheduler method.

Also, it seems to me the data store manager's delta_task_state() method should include is_held, is_queued and is_runahead.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • No dependency changes
  • Tests are included (or explain why tests are not needed).
  • Changelog entry not needed as refactor.
  • Docs not needed
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@MetRonnie MetRonnie added small code refactor Large code refactors labels Nov 4, 2024
@MetRonnie MetRonnie added this to the 8.4.0 milestone Nov 4, 2024
@MetRonnie MetRonnie self-assigned this Nov 4, 2024

Returns number of tasks that could not be killed.
"""
jobless = self.get_run_mode() == RunMode.SIMULATION
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Need to follow-up with skip mode #6039)

@@ -8,13 +8,13 @@
<span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5">̿● a </span>
<span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5">-</span><span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5">̿⊗ Y </span>
<span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5">̿⊗ b </span>
<span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5">-</span><span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5">̿⊘ 2 </span>
<span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5">-</span><span style="color:#000000;background:#e5e5e5"> </span><span style="color:#000000;background:#e5e5e5">̎⊘ 2 </span>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the change in icon is that previously, the held state was not updated in the data store by the test fixture:

def set_task_state(schd, task_states):
"""Force tasks into the desired states.
Task states should be of the format (cycle, task, state, is_held).
"""
for cycle, task, state, is_held in task_states:
itask = schd.pool.get_task(cycle, task)
if not itask:
itask = schd.pool.spawn_task(task, cycle, {1})
itask.state_reset(state, is_held=is_held)
schd.data_store_mgr.delta_task_state(itask)

@MetRonnie MetRonnie marked this pull request as draft November 5, 2024 11:12
@MetRonnie MetRonnie marked this pull request as ready for review November 5, 2024 11:41
@@ -1322,7 +1322,6 @@ def _process_message_failed(self, itask, event_time, message, forced):
no_retries = True
if itask.state_reset(TASK_STATUS_FAILED, forced=forced):
self.setup_event_handlers(itask, self.EVENT_FAILED, message)
self.data_store_mgr.delta_task_state(itask)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of L1328 just below

tp_id, PbTaskProxy(id=tp_id))
tp_delta.stamp = f'{tp_id}@{update_time}'
tp_delta.state = itask.state.status
tp_delta.is_held = itask.state.is_held
Copy link
Member

@oliver-sanders oliver-sanders Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently:

attr updater
status delta_task_state
is_held delta_task_held
is_queued delta_task_queued
is_runahead delta_task_runahead

I'm not sure why we would move is_held in with delta_task_state? Unless we wanted to move all the attrs into delta_task_state?

Copy link
Member Author

@MetRonnie MetRonnie Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like it makes sense to move all into delta_task_state? These are all handled by 1 method on TaskProxy, so it would reduce chances of forgetting to update them all in the data store. And it would be more efficient in the data store to do all at once

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do it all in one, but I think it would be less (not more) efficient for the data store to do this all at once as we would be pushing out all four attributes every time any one of them changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's a concern, could do

if tproxy.is_held != itask.state.is_held:
    tp_delta.is_held = itask.state.is_held

etc.

Or drop the commit as I've separated it out into its own commit now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I think, either we leave as is, or we combine all the related attrs and compare to previous value as you suggested.

status=(TASK_STATUS_FAILED if jobless else None),
is_held=True,
)
self.data_store_mgr.delta_task_state(itask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.data_store_mgr.delta_task_state(itask)
self.data_store_mgr.delta_task_state(itask)
self.data_store_mgr.delta_task_held(itask)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed if we include itask.is_held in delta_task_state()

@MetRonnie MetRonnie changed the title Refactor task killing Refactor task killing & simplify task state deltas Nov 5, 2024
Copy link
Member

@dwsutherland dwsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval conditional on your option of doing this:

if tproxy.is_held != itask.state.is_held:
    tp_delta.is_held = itask.state.is_held

And you'd want to do the same with queue and runahead ..etc
Also check/do nothing if nothing has changed (unless that's impossible upstream)..

It's only if a field is set that it will be sent as a delta, so if we can avoid sending a field, that's ideal.

@MetRonnie
Copy link
Member Author

@dwsutherland I noticed that if you inspect tp_delta.is_held it already as a value (True/False) at time of creation of the PbTaskProxy object. However, unless you proactively set a value on it, it does not get included in the delta received at the UI. What sorcery is this? 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code refactor Large code refactors small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants