Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: upload_mlbf_to_remote_settings hangs on the diff #15170

Open
1 task done
KevinMind opened this issue Nov 15, 2024 · 1 comment · Fixed by mozilla/addons-server#22859 · May be fixed by mozilla/addons-server#22861
Open
1 task done

[Bug]: upload_mlbf_to_remote_settings hangs on the diff #15170

KevinMind opened this issue Nov 15, 2024 · 1 comment · Fixed by mozilla/addons-server#22859 · May be fixed by mozilla/addons-server#22861
Assignees
Labels
repository:addons-server Issue relating to addons-server

Comments

@KevinMind
Copy link
Contributor

KevinMind commented Nov 15, 2024

What happened?

When trying to get the changed_count (which requires generating a diff) we end up executing an O(n^2) loop on millions of records which hangs without a timeout or error, preventing the cron from finishing and also not raising any error other than the grafana (filter not uploaded) error

grafana: https://earthangel-b40313e5.influxcloud.net/d/IWZpIQgMk?orgId=1
logs: https://console.cloud.google.com/kubernetes/pod/us-west1/webservices-high-prod/amo-prod/addons-server-v1-cronjob-upload-mlbf-to-remote-setti-28860rv9s9/details?invt=Abhivw&project=moz-fx-webservices-high-prod&cloudshell=true

image

What did you expect to happen?

The loop should take a matter of seconds and have complexity O(n)

Additionally, if this type of error were to happen, we should be able to catch it on stage before prod and finally we should get more explicit warnings/errors that this kind of loop is running for a long time. It has been over 12 hours and no notifications.

Is there an existing issue for this?

  • I have searched the existing issues

┆Issue is synchronized with this Jira Task

@willdurand
Copy link
Member

Additionally, if this type of error were to happen, we should be able to catch it on stage before prod and finally we should get more explicit warnings/errors that this kind of loop is running for a long time. It has been over 12 hours and no notifications.

This needs its own issue.

@KevinMind KevinMind reopened this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repository:addons-server Issue relating to addons-server
Projects
None yet
2 participants