Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message queues #19

Open
3 tasks
jonathan-d-zhang opened this issue Aug 1, 2023 · 4 comments
Open
3 tasks

Message queues #19

jonathan-d-zhang opened this issue Aug 1, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@jonathan-d-zhang
Copy link

Tracking issue for issues for implementing message queue

  • Client
  • Mainframe
  • Cronjob/uploader thing
@import-pandas-as-numpy
Copy link
Member

Minimum project spec for rabbitMQ integration:

  • The Loader restarting should never cause duplicate messages to propagate down to clients.
  • The RabbitMQ instance restarting should never cause duplicate messages to propagate down to clients.
  • A client failing to ack a job by either being terminated, losing connection, or entering an unhandled fail state should cause the package to be requeued.
  • The RabbitMQ instance must support and implement authentication to modify the queue in any respect.
  • Messages should NEVER by default enter a state where they are being repeatedly queued and retried. The most robust example of this is file size failures, whereby a client should never receive a job that has already been rejected for a file size too large state if it is not explicitly flagged to handle those files.

Rationale: The premise is to keep work off client nodes as much as possible. Sending duplicate information to the client and relying on the server side deduplication represents an enormous amount of wasted compute. This deduplication must occur prior to a client ever interfacing with a job, and an effort should be made to address a robust number of edge cases.

Furthermore:

  • Modifications to the current codebase will not be accepted without tests. Our system works now, and switching over to the MQ setup represents a massive number of modifications to all components of our scanning framework. This will be tested extensively before acceptance.

@Robin5605
Copy link

Some thoughts on authentication -
We can use RabbitMQ's built in Authentication, Authorization, and Access Control feature to provision each client with a username/password combination.

All clients should be restricted to basic.publish on the results queue, and basic.consume on the incoming jobs queue.
Mainframe should also have provisioned credentials with only basic.consume on the results queue
Loader should have basic.publish on the incoming jobs queue

@Robin5605
Copy link

On further thought - is there a need for a return queue? Can clients simply POST their results directly to the Dragonfly API?
I assume since they will have to interface with the API anyway to fetch their ruleset when they detect they're out of date, they may as well POST results directly the API

@import-pandas-as-numpy
Copy link
Member

On further thought - is there a need for a return queue? Can clients simply POST their results directly to the Dragonfly API? I assume since they will have to interface with the API anyway to fetch their ruleset when they detect they're out of date, they may as well POST results directly the API

Having a return queue likely helps alleviate situations where many clients are scanning many packages (and the API cannot keep up) but I'm not sure that's a goal worth aspiring to right now. It's definitely an effective future-proof scenario though.

The intention when we discussed this was that clients would POST directly to the API anyway. Clients bouncing a single request for rules off the API itself wasn't something I had really considered-- typically these rules are queued with the current ruleset SHA correct? I don't see that being an issue, but I'd be a little concerned that if we ever spin up multiple clients, and they're moving through packages quickly, we would be making potentially hundreds of requests to this endpoint per second. Not that it's serving that much, and I don't anticipate it would be an issue, but it bares mentioning nonetheless. If we put it behind any sort of rate limiting, we'll likely footgun ourselves in that regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants