Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes average stats bug #436

Merged
merged 3 commits into from
May 12, 2023
Merged

Conversation

zainkabani
Copy link
Contributor

When there are multiple concurrent active servers, the stats reporting reports a 0 value average. This is because of the old value being updated and then the average being overridden when a new server pointing to the same address stats gets updated.

SERVER STATS: "postgres_shard_0_replica_0", 452826896
total_value: 1648, old_value: 0
average: 329
SERVER STATS: "postgres_shard_0_replica_1", -1989878371
total_value: 1585, old_value: 0
average: 317
SERVER STATS: "postgres_shard_0_replica_0", 1447777444
total_value: 1648, old_value: 1648
average: 0
SERVER STATS: "postgres_shard_0_replica_0", -1279761882
total_value: 1648, old_value: 1648
average: 0
SERVER STATS: "postgres_shard_0_replica_1", 1763453831
total_value: 1585, old_value: 1585
average: 0
SERVER STATS: "postgres_shard_0_replica_1", 2126090134
total_value: 1585, old_value: 1585
average: 0
SERVER STATS: "postgres_shard_0_replica_0", -969830376
total_value: 1648, old_value: 1648
average: 0

This PR fixes this by running a check to make sure that we only ever do this once for any given AddressStats Arc object.

@zainkabani zainkabani marked this pull request as ready for review May 12, 2023 00:12
@@ -107,8 +107,19 @@ impl Collector {
loop {
interval.tick().await;

for stats in SERVER_STATS.read().values() {
stats.address_stats().update_averages();
// Hold read lock for duration of update to retain all server stats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused here. We only have one instance of the collector running, so there is only one task updating averages. How can we run update_averages() twice or more even with multiple server connections connected to the same address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue here comes from all the servers having the same reference to the AddressStats. When we update address stat's old_total. The next server that we try to update averages for will see no difference between the current total and the old total cause it was updated in the previous iteration of the loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an example of this in the description of the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah okay. Confusing! Arcs are tricky. Ok, sounds good, I think this is fine for now. Would be good to investigate further and maybe remove the leaked reference?

@levkk levkk merged commit 7326069 into postgresml:main May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants