Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicate decref of subtask input chunk #2611

Merged
merged 2 commits into from
Dec 10, 2021

Conversation

Catch-Bull
Copy link
Contributor

  • fix decrease subtask input chunk

This is a race condition.

Let us suppose stage A have two Subtask S1 and S2, and they have the same input chunk C

  1. S1 got an error, and stage_processor.done has been set.
  2. S2 call set_subtask_result, it already reduces reference count of C but not set stage_processor.results[C.key]
  3. TaskProcessorActor find stage_processor got an error and call self._cur_processor.incref_stage(stage_processor) in function TaskProcessorActor.start, it will also reduce the reference count of C which is input of S2.

What do these changes do?

Related issue number

Fixes #2610

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

* fix decrease subtask input chunk
@hekaisheng hekaisheng added mod: task service to be backported Indicate that the PR need to be backported to stable branch type: bug Something isn't working labels Dec 9, 2021
@hekaisheng hekaisheng added this to the v0.9.0a1 milestone Dec 9, 2021
@qinxuye qinxuye changed the title [BUG FIX] fix duplicate decref of subtask input chunk [BUG] fix duplicate decref of subtask input chunk Dec 9, 2021
@wjsi wjsi changed the title [BUG] fix duplicate decref of subtask input chunk Fix duplicate decref of subtask input chunk Dec 10, 2021
Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@hekaisheng hekaisheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hekaisheng hekaisheng merged commit 889ffb7 into mars-project:master Dec 10, 2021
wjsi pushed a commit to wjsi/mars that referenced this pull request Dec 13, 2021
hekaisheng pushed a commit that referenced this pull request Dec 13, 2021
@hekaisheng hekaisheng added backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch labels Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported already PR has been backported mod: task service type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] race condition: duplicate decref of subtask input chunk
3 participants