-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better management of LastDeclareJobs - no more wrong fallback-system activations #904
Conversation
Bencher
Click to view all benchmark results
Bencher - Continuous Benchmarking View Public Perf Page Docs | Repo | Chat | Help |
Bencher
🚨 8 ALERTS: Threshold Boundary Limits exceeded!
Click to view all benchmark results
Bencher - Continuous Benchmarking View Public Perf Page Docs | Repo | Chat | Help |
Bencher
Click to view all benchmark results
Bencher - Continuous Benchmarking View Public Perf Page Docs | Repo | Chat | Help |
Bencher
🚨 1 ALERT: Threshold Boundary Limit exceeded!
Click to view all benchmark results
Bencher - Continuous Benchmarking View Public Perf Page Docs | Repo | Chat | Help |
Can you add an MG test for this situation? |
I never wrote a MG test, but I can try to dig into it and add it. |
This is a perefect candidate for an MG test, whenever we do a PR that fix an issue that arise from message ordering we should add an MG test. This will add a lot of value since these kind of issue are hard to find. Spec are important, but a test suite that handle all the corner cases, that for obviuous reasons are not discussed in the spec, is also important |
Agree, will do it after some more tests. |
the original description on #901 says:
Do we also have a bug on the pool? Should we create an issue for that? |
The pool doesn't have bugs, but the issue is caused by the fact that a wrong/old prev_hash is set by JDC using the |
This reverts commit 040d81d.
I reverted last commit since it introduced no meaningful changes. |
s.last_declare_mining_job_sent | ||
s.last_declare_mining_jobs_sent | ||
.get(&request_id) | ||
.expect("LastDeclareJob not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should never panic while we are keeping a lock otherwise the shared data get unusable for everyone. If there is something that is obviously unreachable, is ok to put an expect but is not very clear why the code can never panic you should write an explanation. Otherwise, will be a lot better to take the Result or the Option out of the safe_lock and if needed unwrap it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you should remove the job btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somthing like that:
fn get_last_declare_job_sent(self_mutex: &Arc<Mutex<Self>>, request_id: u32) -> LastDeclareJob {
let id = self_mutex
.safe_lock(|s| {
s.last_declare_mining_jobs_sent
.remove(&request_id)
.clone()
})
.unwrap();
id.expect("Impossible to get last declare job sent").clone().expect("This is ok")
}
Some consideration:
- having and HashMap of Option do no make much sense, I would remove the inner Option so we do not have last exepct.
- not having the required id, is a possibility, the function should return an Option and the caller should handle the case of downstream sending shares with wrong id (closing the connection, or the entire process)
- as I said below could be enough having an array of 2 elements rather then an hashmap here, (1) and (2) still apply
.unwrap() | ||
.safe_lock(|s| { | ||
//check hashmap size in order to not let it grow indefinetely | ||
if s.last_declare_mining_jobs_sent.len() < 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this number "10" really require a comment. Why is 10 and not 2 or 3? How many job we expect to have in the same time? I guess max is 2? If not why? And is ok having more than 3 jobs, if not we should just close the process cause something unexpected is happening, and we don't want people keep mining (paying electricity bill) when we are not sure that they are producing shares that will get payed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe would be better to just have an array with [job-1, job0]. Are there cases where we would need to access an older job? If downstream send share for a job that have is 7 job old, I would say that something is off either here or in the downstream so the safest thing to do would be close the connection. Maybe I'm missing something but I would like to have it addressed in a comment.
@Fi3 Thanks for review. |
This PR addresses the way
last_declare_mining_jobs_sent
are managed by JDC.Before this PR, we could end up in a situation in which we were not able to exit from the loop inside
on_set_new_prev_hash
function because of this line:stratum/roles/jd-client/src/lib/job_declarator/mod.rs
Line 362 in 0b13191
The situation occured every time a new job was created and declared right after the future job linked to the new prev_hash. Because of this line the second job (not future) was extracted in this scenario (instead of the right previous one, the future one). In this way the future_job was not able to be inserted into
future_jobs
through this line and so we never exited from the loop.This implied that that specific prev_hash was constantly set by the loop, causing the behaviour described under issue #901.
Fixes #901