Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better management of LastDeclareJobs - no more wrong fallback-system activations #904

Merged
merged 10 commits into from
May 21, 2024

Conversation

GitGab19
Copy link
Collaborator

This PR addresses the way last_declare_mining_jobs_sent are managed by JDC.
Before this PR, we could end up in a situation in which we were not able to exit from the loop inside on_set_new_prev_hash function because of this line:

match s.future_jobs.remove(&id) {

The situation occured every time a new job was created and declared right after the future job linked to the new prev_hash. Because of this line the second job (not future) was extracted in this scenario (instead of the right previous one, the future one). In this way the future_job was not able to be inserted into future_jobs through this line and so we never exited from the loop.

This implied that that specific prev_hash was constantly set by the loop, causing the behaviour described under issue #901.

Fixes #901

@GitGab19 GitGab19 linked an issue May 10, 2024 that may be closed by this pull request
Copy link
Contributor

github-actions bot commented May 10, 2024

🐰Bencher

ReportTue, May 21, 2024 at 14:10:04 UTC
ProjectStratum v2 (SRI)
Branchpatch-901
Testbedsv1
Click to view all benchmark results
BenchmarkEstimated CyclesEstimated Cycles Results
estimated cycles | (Δ%)
Estimated Cycles Upper Boundary
estimated cycles | (%)
InstructionsInstructions Results
instructions | (Δ%)
Instructions Upper Boundary
instructions | (%)
L1 AccessesL1 Accesses Results
accesses | (Δ%)
L1 Accesses Upper Boundary
accesses | (%)
L2 AccessesL2 Accesses Results
accesses | (Δ%)
L2 Accesses Upper Boundary
accesses | (%)
RAM AccessesRAM Accesses Results
accesses | (Δ%)
RAM Accesses Upper Boundary
accesses | (%)
get_authorize✅ (view plot)8,512.00 (+1.05%)8,707.41 (97.76%)✅ (view plot)3,746.00 (+0.25%)3,849.93 (97.30%)✅ (view plot)5,247.00 (+0.17%)5,391.14 (97.33%)✅ (view plot)9.00 (+9.95%)10.55 (85.30%)✅ (view plot)92.00 (+2.39%)93.75 (98.14%)
get_submit✅ (view plot)95,613.00 (+0.07%)96,137.99 (99.45%)✅ (view plot)59,439.00 (-0.03%)59,768.12 (99.45%)✅ (view plot)85,348.00 (-0.04%)85,816.06 (99.45%)✅ (view plot)58.00 (+4.54%)62.72 (92.48%)✅ (view plot)285.00 (+0.92%)288.10 (98.92%)
get_subscribe✅ (view plot)8,083.00 (+1.34%)8,272.92 (97.70%)✅ (view plot)2,841.00 (+0.44%)2,939.35 (96.65%)✅ (view plot)3,963.00 (+0.29%)4,097.78 (96.71%)✅ (view plot)19.00 (+16.19%)19.84 (95.76%)✅ (view plot)115.00 (+2.07%)117.03 (98.26%)
serialize_authorize✅ (view plot)12,231.00 (+0.41%)12,455.28 (98.20%)✅ (view plot)5,317.00 (+0.18%)5,420.93 (98.08%)✅ (view plot)7,411.00 (+0.13%)7,555.15 (98.09%)✅ (view plot)12.00 (+7.64%)13.69 (87.64%)✅ (view plot)136.00 (+0.75%)139.02 (97.83%)
serialize_deserialize_authorize✅ (view plot)24,510.00 (+0.25%)24,671.74 (99.34%)✅ (view plot)9,898.00 (+0.06%)10,012.74 (98.85%)✅ (view plot)13,955.00 (+0.01%)14,125.59 (98.79%)✅ (view plot)39.00 (+4.83%)41.95 (92.96%)✅ (view plot)296.00 (+0.48%)297.20 (99.60%)
serialize_deserialize_handle_authorize✅ (view plot)30,211.00 (+0.23%)30,328.15 (99.61%)✅ (view plot)12,101.00 (+0.08%)12,204.93 (99.15%)✅ (view plot)17,116.00 (+0.03%)17,270.35 (99.11%)✅ (view plot)64.00 (+8.95%)64.86 (98.67%)✅ (view plot)365.00 (+0.31%)366.66 (99.55%)
serialize_deserialize_handle_submit✅ (view plot)126,438.00 (+0.03%)127,030.50 (99.53%)✅ (view plot)73,224.00 (-0.02%)73,610.22 (99.48%)✅ (view plot)104,938.00 (-0.03%)105,491.89 (99.47%)✅ (view plot)128.00 (+5.69%)133.00 (96.24%)✅ (view plot)596.00 (+0.14%)599.39 (99.43%)
serialize_deserialize_handle_subscribe✅ (view plot)27,531.00 (+0.26%)27,610.08 (99.71%)✅ (view plot)9,643.00 (+0.13%)9,741.35 (98.99%)✅ (view plot)13,631.00 (+0.08%)13,774.29 (98.96%)✅ (view plot)71.00 (+8.18%)72.24 (98.28%)✅ (view plot)387.00 (+0.25%)388.47 (99.62%)
serialize_deserialize_submit✅ (view plot)115,069.00 (+0.02%)115,606.69 (99.53%)✅ (view plot)68,001.00 (-0.05%)68,371.60 (99.46%)✅ (view plot)97,549.00 (-0.06%)98,093.00 (99.45%)✅ (view plot)74.00 (+6.45%)75.17 (98.44%)✅ (view plot)490.00 (+0.35%)492.63 (99.47%)
serialize_deserialize_subscribe✅ (view plot)22,950.00 (+0.37%)23,097.21 (99.36%)✅ (view plot)8,195.00 (+0.14%)8,295.36 (98.79%)✅ (view plot)11,535.00 (+0.08%)11,678.11 (98.77%)✅ (view plot)43.00 (+8.50%)44.36 (96.93%)✅ (view plot)320.00 (+0.52%)321.19 (99.63%)
serialize_submit✅ (view plot)99,896.00 (+0.02%)100,443.58 (99.45%)✅ (view plot)61,483.00 (-0.03%)61,817.72 (99.46%)✅ (view plot)88,196.00 (-0.04%)88,670.81 (99.46%)✅ (view plot)58.00 (+3.67%)61.75 (93.92%)✅ (view plot)326.00 (+0.43%)329.08 (99.06%)
serialize_subscribe✅ (view plot)11,378.00 (+0.58%)11,579.45 (98.26%)✅ (view plot)4,188.00 (+0.30%)4,286.35 (97.71%)✅ (view plot)5,823.00 (+0.22%)5,957.72 (97.74%)✅ (view plot)19.00 (+15.15%)19.21 (98.90%)✅ (view plot)156.00 (+0.74%)158.95 (98.14%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Copy link
Contributor

github-actions bot commented May 10, 2024

🐰Bencher

ReportTue, May 21, 2024 at 14:09:57 UTC
ProjectStratum v2 (SRI)
Branch904/merge
Testbedsv1

🚨 8 ALERTS: Threshold Boundary Limits exceeded!
BenchmarkMeasure (units)ViewValueLower BoundaryUpper Boundary
client-submit-serialize-deserialize-handle/client-submit-serialize-deserialize-handleLatency (nanoseconds (ns))🚨 (view plot | view alert)8,905.90 (+5.62%)8,861.00 (100.51%)
client-sv1-authorize-serialize-deserialize/client-sv1-authorize-serialize-deserializeLatency (nanoseconds (ns))🚨 (view plot | view alert)722.72 (+3.63%)719.94 (100.39%)
client-sv1-authorize-serialize/client-sv1-authorize-serializeLatency (nanoseconds (ns))🚨 (view plot | view alert)267.90 (+7.86%)257.74 (103.94%)
client-sv1-get-authorize/client-sv1-get-authorizeLatency (nanoseconds (ns))🚨 (view plot | view alert)170.65 (+8.55%)162.96 (104.72%)
client-sv1-get-submitLatency (nanoseconds (ns))🚨 (view plot | view alert)7,106.50 (+6.16%)7,066.98 (100.56%)
client-sv1-subscribe-serialize-deserialize-handle/client-sv1-subscribe-serialize-deserialize-handleLatency (nanoseconds (ns))🚨 (view plot | view alert)780.96 (+4.40%)779.04 (100.25%)
client-sv1-subscribe-serialize-deserialize/client-sv1-subscribe-serialize-deserializeLatency (nanoseconds (ns))🚨 (view plot | view alert)651.88 (+5.87%)641.84 (101.56%)
client-sv1-subscribe-serialize/client-sv1-subscribe-serializeLatency (nanoseconds (ns))🚨 (view plot | view alert)222.12 (+7.23%)221.58 (100.24%)

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns) | (Δ%)
Latency Upper Boundary
nanoseconds (ns) | (%)
client-submit-serialize✅ (view plot)7,193.60 (+3.80%)7,290.13 (98.68%)
client-submit-serialize-deserialize✅ (view plot)8,188.30 (+4.35%)8,282.69 (98.86%)
client-submit-serialize-deserialize-handle/client-submit-serialize-deserialize-handle🚨 (view plot | view alert)8,905.90 (+5.62%)8,861.00 (100.51%)
client-sv1-authorize-serialize-deserialize-handle/client-sv1-authorize-serialize-deserialize-handle✅ (view plot)925.46 (+2.97%)928.32 (99.69%)
client-sv1-authorize-serialize-deserialize/client-sv1-authorize-serialize-deserialize🚨 (view plot | view alert)722.72 (+3.63%)719.94 (100.39%)
client-sv1-authorize-serialize/client-sv1-authorize-serialize🚨 (view plot | view alert)267.90 (+7.86%)257.74 (103.94%)
client-sv1-get-authorize/client-sv1-get-authorize🚨 (view plot | view alert)170.65 (+8.55%)162.96 (104.72%)
client-sv1-get-submit🚨 (view plot | view alert)7,106.50 (+6.16%)7,066.98 (100.56%)
client-sv1-get-subscribe/client-sv1-get-subscribe✅ (view plot)293.91 (+5.09%)294.97 (99.64%)
client-sv1-subscribe-serialize-deserialize-handle/client-sv1-subscribe-serialize-deserialize-handle🚨 (view plot | view alert)780.96 (+4.40%)779.04 (100.25%)
client-sv1-subscribe-serialize-deserialize/client-sv1-subscribe-serialize-deserialize🚨 (view plot | view alert)651.88 (+5.87%)641.84 (101.56%)
client-sv1-subscribe-serialize/client-sv1-subscribe-serialize🚨 (view plot | view alert)222.12 (+7.23%)221.58 (100.24%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Copy link
Contributor

github-actions bot commented May 10, 2024

🐰Bencher

ReportTue, May 21, 2024 at 14:09:57 UTC
ProjectStratum v2 (SRI)
Branchpatch-901
Testbedsv2
Click to view all benchmark results
BenchmarkEstimated CyclesEstimated Cycles Results
estimated cycles | (Δ%)
Estimated Cycles Upper Boundary
estimated cycles | (%)
InstructionsInstructions Results
instructions | (Δ%)
Instructions Upper Boundary
instructions | (%)
L1 AccessesL1 Accesses Results
accesses | (Δ%)
L1 Accesses Upper Boundary
accesses | (%)
L2 AccessesL2 Accesses Results
accesses | (Δ%)
L2 Accesses Upper Boundary
accesses | (%)
RAM AccessesRAM Accesses Results
accesses | (Δ%)
RAM Accesses Upper Boundary
accesses | (%)
client_sv2_handle_message_common✅ (view plot)2,065.00 (+0.55%)2,134.87 (96.73%)✅ (view plot)473.00 (+0.49%)486.40 (97.25%)✅ (view plot)735.00 (+0.44%)754.70 (97.39%)✅ (view plot)7.00 (-4.55%)11.72 (59.75%)✅ (view plot)37.00 (+0.76%)38.68 (95.65%)
client_sv2_handle_message_mining✅ (view plot)8,203.00 (+0.02%)8,345.60 (98.29%)✅ (view plot)2,137.00 (+0.46%)2,171.37 (98.42%)✅ (view plot)3,163.00 (+0.60%)3,215.25 (98.37%)✅ (view plot)35.00 (-10.00%)43.49 (80.49%)✅ (view plot)139.00 (+0.04%)142.15 (97.79%)
client_sv2_mining_message_submit_standard✅ (view plot)6,242.00 (-0.60%)6,389.89 (97.69%)✅ (view plot)1,750.00 (+0.03%)1,763.19 (99.25%)✅ (view plot)2,557.00 (+0.13%)2,575.63 (99.28%)✅ (view plot)16.00 (-9.34%)22.65 (70.65%)✅ (view plot)103.00 (-0.91%)106.91 (96.35%)
client_sv2_mining_message_submit_standard_serialize✅ (view plot)14,825.00 (+0.24%)15,057.26 (98.46%)✅ (view plot)4,694.00 (+0.01%)4,707.19 (99.72%)✅ (view plot)6,755.00 (+0.02%)6,774.86 (99.71%)✅ (view plot)46.00 (-3.53%)52.81 (87.11%)✅ (view plot)224.00 (+0.54%)230.49 (97.18%)
client_sv2_mining_message_submit_standard_serialize_deserialize✅ (view plot)27,560.00 (+0.22%)27,885.79 (98.83%)✅ (view plot)10,545.00 (+0.03%)10,558.59 (99.87%)✅ (view plot)15,340.00 (+0.01%)15,359.78 (99.87%)✅ (view plot)85.00 (+0.88%)90.42 (94.01%)✅ (view plot)337.00 (+0.47%)346.34 (97.30%)
client_sv2_open_channel✅ (view plot)4,527.00 (+0.68%)4,614.48 (98.10%)✅ (view plot)1,461.00 (+0.06%)1,474.45 (99.09%)✅ (view plot)2,152.00 (-0.02%)2,172.74 (99.05%)✅ (view plot)13.00 (+7.83%)15.41 (84.38%)✅ (view plot)66.00 (+1.16%)68.24 (96.71%)
client_sv2_open_channel_serialize✅ (view plot)14,266.00 (+0.23%)14,483.48 (98.50%)✅ (view plot)5,064.00 (+0.02%)5,077.45 (99.74%)✅ (view plot)7,316.00 (-0.02%)7,338.91 (99.69%)✅ (view plot)39.00 (+4.41%)41.85 (93.19%)✅ (view plot)193.00 (+0.38%)199.42 (96.78%)
client_sv2_open_channel_serialize_deserialize✅ (view plot)22,708.00 (+0.19%)23,073.18 (98.42%)✅ (view plot)7,987.00 (+0.05%)8,001.16 (99.82%)✅ (view plot)11,613.00 (+0.01%)11,635.95 (99.80%)✅ (view plot)77.00 (+4.68%)82.22 (93.65%)✅ (view plot)306.00 (+0.24%)316.18 (96.78%)
client_sv2_setup_connection✅ (view plot)4,699.00 (-0.04%)4,765.24 (98.61%)✅ (view plot)1,502.00 (+0.05%)1,515.45 (99.11%)✅ (view plot)2,279.00 (+0.10%)2,299.64 (99.10%)✅ (view plot)8.00 (-11.84%)13.24 (60.41%)✅ (view plot)68.00 (+0.05%)69.58 (97.73%)
client_sv2_setup_connection_serialize✅ (view plot)16,368.00 (+0.59%)16,477.39 (99.34%)✅ (view plot)5,963.00 (+0.01%)5,976.45 (99.77%)✅ (view plot)8,653.00 (-0.02%)8,677.32 (99.72%)✅ (view plot)45.00 (+0.79%)49.13 (91.60%)✅ (view plot)214.00 (+1.30%)217.01 (98.61%)
client_sv2_setup_connection_serialize_deserialize✅ (view plot)35,596.00 (+0.14%)35,768.61 (99.52%)✅ (view plot)14,814.00 (+0.03%)14,828.16 (99.90%)✅ (view plot)21,751.00 (+0.01%)21,771.95 (99.90%)✅ (view plot)102.00 (+1.79%)114.37 (89.19%)✅ (view plot)381.00 (+0.29%)384.69 (99.04%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Copy link
Contributor

github-actions bot commented May 10, 2024

🐰Bencher

ReportTue, May 21, 2024 at 14:09:57 UTC
ProjectStratum v2 (SRI)
Branchpatch-901
Testbedsv2

🚨 1 ALERT: Threshold Boundary Limit exceeded!
BenchmarkMeasure (units)ViewValueLower BoundaryUpper Boundary
client_sv2_open_channel_serializeLatency (nanoseconds (ns))🚨 (view plot | view alert)305.29 (+7.11%)297.72 (102.54%)

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns) | (Δ%)
Latency Upper Boundary
nanoseconds (ns) | (%)
client_sv2_handle_message_common✅ (view plot)44.54 (-0.73%)49.92 (89.23%)
client_sv2_handle_message_mining✅ (view plot)74.95 (+1.82%)83.79 (89.45%)
client_sv2_mining_message_submit_standard✅ (view plot)14.63 (-0.14%)14.69 (99.62%)
client_sv2_mining_message_submit_standard_serialize✅ (view plot)248.38 (-6.07%)283.92 (87.48%)
client_sv2_mining_message_submit_standard_serialize_deserialize✅ (view plot)600.54 (+0.82%)642.24 (93.51%)
client_sv2_open_channel✅ (view plot)169.90 (+2.36%)172.29 (98.61%)
client_sv2_open_channel_serialize🚨 (view plot | view alert)305.29 (+7.11%)297.72 (102.54%)
client_sv2_open_channel_serialize_deserialize✅ (view plot)371.30 (-1.24%)401.37 (92.51%)
client_sv2_setup_connection✅ (view plot)162.86 (-0.71%)174.80 (93.17%)
client_sv2_setup_connection_serialize✅ (view plot)465.23 (-1.70%)498.32 (93.36%)
client_sv2_setup_connection_serialize_deserialize✅ (view plot)953.79 (-1.52%)1,039.57 (91.75%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

@Fi3
Copy link
Collaborator

Fi3 commented May 11, 2024

Can you add an MG test for this situation?

@GitGab19
Copy link
Collaborator Author

GitGab19 commented May 13, 2024

Can you add an MG test for this situation?

I never wrote a MG test, but I can try to dig into it and add it.
I don't know if I can have the granularity I need from MG. I need to have a NewTemplate from TP right after a SetNewPrevHash is sent. Maybe @lorbax knows how to do this (?)

@Fi3
Copy link
Collaborator

Fi3 commented May 14, 2024

This is a perefect candidate for an MG test, whenever we do a PR that fix an issue that arise from message ordering we should add an MG test. This will add a lot of value since these kind of issue are hard to find.

Spec are important, but a test suite that handle all the corner cases, that for obviuous reasons are not discussed in the spec, is also important

@GitGab19 GitGab19 marked this pull request as draft May 14, 2024 11:01
@GitGab19
Copy link
Collaborator Author

This is a perefect candidate for an MG test, whenever we do a PR that fix an issue that arise from message ordering we should add an MG test. This will add a lot of value since these kind of issue are hard to find.

Spec are important, but a test suite that handle all the corner cases, that for obviuous reasons are not discussed in the spec, is also important

Agree, will do it after some more tests.
I just turned this PR into draft since I noticed there's still a possible case not handled correctly, leading to the issue related to this PR.

@plebhash
Copy link
Collaborator

the original description on #901 says:

  • Sometime (I still don't know exactly why) a valid share is sent to the pool (in this case through JDC) with the correct prev_hash.
  • When the valid share is checked by the pool, the pool uses a wrong prev_hash in the check_target function, and so the share is not a valid one for the pool.

Do we also have a bug on the pool? Should we create an issue for that?
Or is that a false alarm and this PR solves everything related to this problem?

@GitGab19
Copy link
Collaborator Author

the original description on #901 says:

  • Sometime (I still don't know exactly why) a valid share is sent to the pool (in this case through JDC) with the correct prev_hash.
  • When the valid share is checked by the pool, the pool uses a wrong prev_hash in the check_target function, and so the share is not a valid one for the pool.

Do we also have a bug on the pool? Should we create an issue for that? Or is that a false alarm and this PR solves everything related to this problem?

The pool doesn't have bugs, but the issue is caused by the fact that a wrong/old prev_hash is set by JDC using the SetCustomMiningJob message. So the bug is what I'm trying to fix in this PR.

@GitGab19
Copy link
Collaborator Author

I reverted last commit since it introduced no meaningful changes.
And as discussed with @Fi3, it's better (and cleaner) as it was before.

@GitGab19
Copy link
Collaborator Author

I also identified a new bug (described in issue #920).
I pushed a basic (but not elegant) fix for that.
But issue #920 will be still to be solved properly, removing the timeout I just introduced.

@plebhash plebhash merged commit 1f289e4 into stratum-mining:dev May 21, 2024
13 checks passed
s.last_declare_mining_job_sent
s.last_declare_mining_jobs_sent
.get(&request_id)
.expect("LastDeclareJob not found")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should never panic while we are keeping a lock otherwise the shared data get unusable for everyone. If there is something that is obviously unreachable, is ok to put an expect but is not very clear why the code can never panic you should write an explanation. Otherwise, will be a lot better to take the Result or the Option out of the safe_lock and if needed unwrap it out

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you should remove the job btw

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somthing like that:

    fn get_last_declare_job_sent(self_mutex: &Arc<Mutex<Self>>, request_id: u32) -> LastDeclareJob {
        let id = self_mutex
            .safe_lock(|s| {
                s.last_declare_mining_jobs_sent
                    .remove(&request_id)
                    .clone()
            })
            .unwrap();
        id.expect("Impossible to get last declare job sent").clone().expect("This is ok")
    }

Some consideration:

  1. having and HashMap of Option do no make much sense, I would remove the inner Option so we do not have last exepct.
  2. not having the required id, is a possibility, the function should return an Option and the caller should handle the case of downstream sending shares with wrong id (closing the connection, or the entire process)
  3. as I said below could be enough having an array of 2 elements rather then an hashmap here, (1) and (2) still apply

.unwrap()
.safe_lock(|s| {
//check hashmap size in order to not let it grow indefinetely
if s.last_declare_mining_jobs_sent.len() < 10 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this number "10" really require a comment. Why is 10 and not 2 or 3? How many job we expect to have in the same time? I guess max is 2? If not why? And is ok having more than 3 jobs, if not we should just close the process cause something unexpected is happening, and we don't want people keep mining (paying electricity bill) when we are not sure that they are producing shares that will get payed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe would be better to just have an array with [job-1, job0]. Are there cases where we would need to access an older job? If downstream send share for a job that have is 7 job old, I would say that something is off either here or in the downstream so the safest thing to do would be close the connection. Maybe I'm missing something but I would like to have it addressed in a comment.

@GitGab19
Copy link
Collaborator Author

@Fi3 Thanks for review.
I opened an issue to better track the changes you suggested and to add MG test for this specific scenario

@plebhash plebhash mentioned this pull request May 23, 2024
Fi3 pushed a commit that referenced this pull request May 28, 2024
@plebhash plebhash mentioned this pull request May 28, 2024
@plebhash plebhash mentioned this pull request Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JDC executes fallback system too frequently
3 participants