Message queues #19

jonathan-d-zhang · 2023-08-01T21:57:10Z

Tracking issue for issues for implementing message queue

Client
Mainframe
Cronjob/uploader thing

import-pandas-as-numpy · 2023-08-04T03:42:56Z

Minimum project spec for rabbitMQ integration:

The Loader restarting should never cause duplicate messages to propagate down to clients.
The RabbitMQ instance restarting should never cause duplicate messages to propagate down to clients.
A client failing to ack a job by either being terminated, losing connection, or entering an unhandled fail state should cause the package to be requeued.
The RabbitMQ instance must support and implement authentication to modify the queue in any respect.
Messages should NEVER by default enter a state where they are being repeatedly queued and retried. The most robust example of this is file size failures, whereby a client should never receive a job that has already been rejected for a file size too large state if it is not explicitly flagged to handle those files.

Rationale: The premise is to keep work off client nodes as much as possible. Sending duplicate information to the client and relying on the server side deduplication represents an enormous amount of wasted compute. This deduplication must occur prior to a client ever interfacing with a job, and an effort should be made to address a robust number of edge cases.

Furthermore:

Modifications to the current codebase will not be accepted without tests. Our system works now, and switching over to the MQ setup represents a massive number of modifications to all components of our scanning framework. This will be tested extensively before acceptance.

Robin5605 · 2023-08-07T01:27:25Z

Some thoughts on authentication -
We can use RabbitMQ's built in Authentication, Authorization, and Access Control feature to provision each client with a username/password combination.

All clients should be restricted to basic.publish on the results queue, and basic.consume on the incoming jobs queue.
Mainframe should also have provisioned credentials with only basic.consume on the results queue
Loader should have basic.publish on the incoming jobs queue

Robin5605 · 2023-08-11T18:43:29Z

On further thought - is there a need for a return queue? Can clients simply POST their results directly to the Dragonfly API?
I assume since they will have to interface with the API anyway to fetch their ruleset when they detect they're out of date, they may as well POST results directly the API

import-pandas-as-numpy · 2023-08-11T19:23:42Z

On further thought - is there a need for a return queue? Can clients simply POST their results directly to the Dragonfly API? I assume since they will have to interface with the API anyway to fetch their ruleset when they detect they're out of date, they may as well POST results directly the API

Having a return queue likely helps alleviate situations where many clients are scanning many packages (and the API cannot keep up) but I'm not sure that's a goal worth aspiring to right now. It's definitely an effective future-proof scenario though.

The intention when we discussed this was that clients would POST directly to the API anyway. Clients bouncing a single request for rules off the API itself wasn't something I had really considered-- typically these rules are queued with the current ruleset SHA correct? I don't see that being an issue, but I'd be a little concerned that if we ever spin up multiple clients, and they're moving through packages quickly, we would be making potentially hundreds of requests to this endpoint per second. Not that it's serving that much, and I don't anticipate it would be an issue, but it bares mentioning nonetheless. If we put it behind any sort of rate limiting, we'll likely footgun ourselves in that regard.

jonathan-d-zhang added the enhancement New feature or request label Aug 1, 2023

jonathan-d-zhang assigned jonathan-d-zhang and Robin5605 Aug 1, 2023

Robin5605 mentioned this issue Aug 10, 2023

Keycloak migration tracking issue #20

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Message queues #19

Message queues #19

jonathan-d-zhang commented Aug 1, 2023

import-pandas-as-numpy commented Aug 4, 2023

Robin5605 commented Aug 7, 2023

Robin5605 commented Aug 11, 2023

import-pandas-as-numpy commented Aug 11, 2023

Message queues #19

Message queues #19

Comments

jonathan-d-zhang commented Aug 1, 2023

import-pandas-as-numpy commented Aug 4, 2023

Robin5605 commented Aug 7, 2023

Robin5605 commented Aug 11, 2023

import-pandas-as-numpy commented Aug 11, 2023