Agent QOS
#1378
Replies: 1 comment 5 replies
-
What happens if we reinterpret channel/notify to simply be something like a Rust Waker ? It's job isn't to carry job details -- it just awakens agents to poll again. Ideally it awakens a single idle one. Maybe it awakens all idle ones and they race, which would probably still be okay. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We've observed significant backlogs in the agent due to the high volume of auto-discovers jobs. This has a severe negative impact on user experience, because the agent processes jobs sequentially, so users sometimes need to sit and wait for the agent to process 30+ auto-discovers before it will even start to process their interactive discover. So let's talk about what we can do to fix this.
The most obvious and straight forward solution seems to be the introduction of a prioritization mechanism within the agent, so that it will process interactive jobs before processing any automated background jobs.
The main issue with that approach is due to how the agent processes jobs. Each time a job row is inserted or updated, the agent will send a
HandlerInvocation
onto an unbounded channel. There's one channel for all job types, and we process eachHandlerInvocation
in the order it was received. TheHandlerInvocation
doesn't contain any information about each job. It only has the name of the job table (e.g.publications
). So there's no direct way to prioritize the differentHandlerInvocations
directly.Put another way: It would be easy to update the
dequeue
function for each handler to have it prioritize interactive jobs over automated ones. But it wouldn't help us to prioritize an interactivepublications
job over an automateddiscovers
job. Ifn
backgrounddiscovers
jobs are created, followed by 1 interactivepublications
job, then alln
backgrounddiscovers
jobs would still be processed before the interactivepublications
job.The behavior we want is for all interactive jobs, of all types, to be processed before any background jobs.
One possible path forward would be to update the event payload that's sent with
NOTIFY
to include a boolean indicating whether it's a background job. Currently these notifications only contain the table name. Adding abackground
boolean would allow us to putHandlerInvocation
s in a sorted data structure and handle the interactive ones before the background ones.That's not quite as straight forward as it sounds. For one thing, we still use polling, so we'd need to somehow account for
background
with that. Perhaps each poll of a handler could actually be two separate polls, one that allows background jobs, and another that does not. The other issue is that we currently executepg_notify
for every update of a job table, even those that update it to a terminal status. I expect we'll want to update that logic to only notify for jobs wherejob_status->>type = 'queued'
.At this point, I'm reasonably confident that we can significantly improve the latency of handling interactive jobs with this approach. But overall, our job handling still feels pretty gross and unwieldy. I don't like how we can get notified about one job and then
dequeue
a different job. And there's still the issue of how we hold locks for a long time when processing publications, which seems like it will require a slightly different approach for how we handle jobs. So I'm going to spend a little time thinking about job handling more holistically, and see if there's a way to solve both of those issues at once. If I don't think of anything better soon, then we can just do the relatively quick and dirty thing described above.Beta Was this translation helpful? Give feedback.
All reactions