Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor recover job #4515

Merged
merged 2 commits into from
Aug 25, 2022
Merged

refactor recover job #4515

merged 2 commits into from
Aug 25, 2022

Conversation

liwenhui-soul
Copy link
Contributor

@liwenhui-soul liwenhui-soul commented Aug 12, 2022

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:

Description:

How do you solve it?

  1. a meta job that used to be RUNNING would be set to FAILED when process restart

  2. only jobs that are failed or stopped could be recovered

  3. all recovered jobs would be set to QUEUE

  4. for a balance job:
    (1) if there's a newer finished balance job, the stopped or failed balance job can't be recovered.
    (2) only the lasted stopped or failed balance job could be recovered

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Checklist:

Tests:

  • Unit test(positive and negative cases)
  • Function test
  • Performance test
  • N/A

Affects:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to be reflected in release notes and how to describe:

ex. Fixed the bug .....

@Sophie-Xie Sophie-Xie added the ready-for-testing PR: ready for the CI test label Aug 16, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #4515 (552b729) into master (c884477) will increase coverage by 0.03%.
The diff coverage is 92.19%.

@@            Coverage Diff             @@
##           master    #4515      +/-   ##
==========================================
+ Coverage   84.66%   84.69%   +0.03%     
==========================================
  Files        1357     1357              
  Lines      135081   135160      +79     
==========================================
+ Hits       114360   114475     +115     
+ Misses      20721    20685      -36     
Impacted Files Coverage Δ
src/clients/meta/MetaClient.cpp 76.30% <0.00%> (-0.08%) ⬇️
src/meta/processors/job/AdminJobProcessor.cpp 65.95% <0.00%> (ø)
src/meta/processors/job/JobManager.cpp 74.20% <87.36%> (+1.10%) ⬆️
src/storage/test/GetPropTest.cpp 95.34% <94.59%> (-0.05%) ⬇️
src/meta/test/JobManagerTest.cpp 99.11% <100.00%> (+0.07%) ⬆️
src/storage/exec/QueryUtils.h 94.01% <100.00%> (+0.10%) ⬆️
src/graph/context/Result.cpp 70.00% <0.00%> (-7.78%) ⬇️
src/graph/executor/StorageAccessExecutor.h 62.22% <0.00%> (-4.45%) ⬇️
src/kvstore/wal/WalFileIterator.cpp 64.40% <0.00%> (-4.24%) ⬇️
src/kvstore/raftex/RaftPart.cpp 70.34% <0.00%> (-3.23%) ⬇️
... and 32 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

if (optJob.getStatus() == cpp2::JobStatus::RUNNING && je->isMetaJob()) {
jds.emplace_back(optJob);
jds.emplace_back(std::move(optJob));
Copy link
Contributor

@critical27 critical27 Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this would break the rule below?

for a balance job:
(1) if there's a newer finished balance job, the stopped or failed balance job can't be recovered.
(2) only the lasted stopped or failed balance job could be recovered

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the guarantee is at jobmanager.cpp:832

}
}
}
for (auto& jd : jds) {
jd.setStatus(cpp2::JobStatus::QUEUE, true);
jd.setStatus(cpp2::JobStatus::FAILED, true);
Copy link
Contributor

@critical27 critical27 Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you prefer FAILED instead of QUEUED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the rule 1

@critical27 critical27 merged commit 0844f6e into vesoft-inc:master Aug 25, 2022
@Sophie-Xie Sophie-Xie linked an issue Aug 27, 2022 that may be closed by this pull request
@foesa-yang foesa-yang self-requested a review October 13, 2022 05:53
@Sophie-Xie Sophie-Xie added the doc affected PR: improvements or additions to documentation label Oct 13, 2022
@jinyingsunny
Copy link

checked ent-3.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc affected PR: improvements or additions to documentation ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

change the job status into "queue" when recover job
7 participants