-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIFO queues #279
FIFO queues #279
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the code for parsing SNS attributes from SQS messages that would need some adjustment
lib/template.js
Outdated
@@ -461,7 +477,7 @@ module.exports = (options = {}) => { | |||
Type: 'AWS::ECS::Service', | |||
Properties: { | |||
Cluster: options.cluster, | |||
DesiredCount: options.minSize, | |||
DesiredCount: !options.fifo ? options.minSize : 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👉 pinning the number of containers to 1
lib/template.js
Outdated
@@ -475,7 +491,7 @@ module.exports = (options = {}) => { | |||
options.placementStrategies; | |||
|
|||
|
|||
Resources[prefixed('ScalingRole')] = { | |||
if (!options.fifo) Resources[prefixed('ScalingRole')] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👉 disabling autoscaling
lib/template.js
Outdated
@@ -511,7 +527,7 @@ module.exports = (options = {}) => { | |||
} | |||
}; | |||
|
|||
Resources[prefixed('ScalingTarget')] = { | |||
if (!options.fifo) Resources[prefixed('ScalingTarget')] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👉 disabling autoscaling
lib/template.js
Outdated
@@ -528,7 +544,7 @@ module.exports = (options = {}) => { | |||
} | |||
}; | |||
|
|||
Resources[prefixed('ScaleUp')] = { | |||
if (!options.fifo) Resources[prefixed('ScaleUp')] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👉 disabling autoscaling
lib/template.js
Outdated
@@ -548,7 +564,7 @@ module.exports = (options = {}) => { | |||
} | |||
}; | |||
|
|||
Resources[prefixed('ScaleUpTrigger')] = { | |||
if (!options.fifo) Resources[prefixed('ScaleUpTrigger')] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👉 disabling autoscaling
lib/template.js
Outdated
@@ -734,7 +750,7 @@ module.exports = (options = {}) => { | |||
} | |||
}; | |||
|
|||
Resources[prefixed('LambdaScalingRole')] = { | |||
if (!options.fifo) Resources[prefixed('LambdaScalingRole')] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👉 disabling autoscaling
lib/template.js
Outdated
|
||
Resources[prefixed('ScalingLambda')] = { | ||
if (!options.fifo) Resources[prefixed('ScalingLambda')] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👉 disabling autoscaling
lib/template.js
Outdated
@@ -785,7 +801,7 @@ module.exports = (options = {}) => { | |||
} | |||
}; | |||
|
|||
Resources[prefixed('CustomScalingResource')] = { | |||
if (!options.fifo) Resources[prefixed('CustomScalingResource')] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👉 disabling autoscaling
@rclark I added some documentation to this branch. I think that the functionality is tested by the combination of template validation and snapshots. So this is ready for a review. This branch seems to be working well in our prototype. But I'd be happy to sit on it until the functionality is more proven, if you're hesitant to merge into master. |
dde6ee4
to
29d4471
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @davidtheclark -- just a couple of documentation points to cover. Then we should merge/tag/publish a minor version when you're ready to roll with your first fifo system.
docs/using-a-fifo-queue.md
Outdated
|
||
To use a FIFO queue, set the template option `fifo: true`. | ||
|
||
A FIFO queue behaves differently and has some limitations, so you should read the AWS documentation and the details below if you are considering the switch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we link out here to what AWS documentation is helpful to read?
docs/using-a-fifo-queue.md
Outdated
|
||
### Triggering FIFO workers | ||
|
||
With a standard queue, you trigger a Watchbot worker by sending an SNS message to the watcher's topic. With a FIFO queue, you will send a message directly to the watcher's FIFO queue, instead. Make sure you have read about the AWS documentation about the expected for these messages (for example, you need to include a `MessageGroupId`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you have read about the AWS documentation about the expected for these messages
This sentence is a little clumsy. Additional about
and expected
vs expectations
(I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha.
@rclark the AWS docs for dead-letter queues suggest not using one for most FIFO queues:
Do you think we should worry about this and add the ability to turn off the dead letter queue? Or should we just recommend that you should be cognizant and maybe set the |
Perhaps we should get more clarity here, but generally I don't believe this suggestion. If you have a dead letter threshold of 10, then the FIFO queue should try that message 10 times and if it can't process, drop it in the dead letter queue. Now the FIFO queue can move on to the next message. Strictly speaking you are now out of order. This could be important for some applications where maybe the next message literally cannot proceed without the first one succeeding. But for many use cases, that may not be true -- the series of messages may not depend on each other but need ordering for another purpose. So maybe there's a |
Yeah, I agree that the suggestion does not apply to our flagship FIFO use case, where we actually do want a dead-letter queue. What do you think of ignoring this situation for now? If in the future somebody needs the strict behavior we could add a |
Sure: build it when the need arises. |
I remembered that we need to merge and release this! Everything has been working just fine in our Publisher tests so far. Don't want to let this branch get more stale. I squashed and rebased off master. @rclark could you take a quick look over the diff to double-check that my rebase didn't introduce any suspect lines? Then I will, as you suggest above, "merge/tag/publish a minor version". cc @mapbox/frontend-platform |
Published as v4.13.0 |
This PR explores adding a configuration option that allows you to launch a watchbot stack that operates against a FIFO queue. On the first pass, the biggest adjustments watchbot and its users need to be aware of:
Usually watchbot makes an SNS topic to feed an SQS queue. When an SQS message is received, it is expected that the message came via SNS, which leads to the
Subject
andMessage
(SNS message attributes) being parsed and exposed to the worker as environment variables. You can't send messages to a FIFO queue via SNS topic -- instead you have to send them directly to the queue. This means the parsing assumptions that watchbot makes may be invalid. The least invasive approach I can think of here would be to adjust message parsing in a backwards compatible way, and expose theMessageGroupId
to the worker as theSubject
environment variable, and theMessage
as the SQS message body.When feeding a message to a FIFO queue, you must provide a
MessageGroupId
. This acts as a sort of "shard" for the queue -- FIFO behavior is only guaranteed for messages that share an identicalMessageGroupId
. We need some additional documentation in here to explain this.Scaling behavior in watchbot is based on the total number of messages in the queue. If there are 50 messages, concurrent processing scales up to 50 (or the configured
MaxSize
if lower). In a FIFO queue, really you would want to scale on the number ofMessageGroupId
s that are being fed to your queue. If there are 50 messages in the queue, but only 2MessageGroupId
s, then watchbot would launch 50 containers for processing, but only 2 at a time would be able to process messages.Currently this PR locks the number of containers to
1
if you specifyfifo: true
, and turns off autoscaling entirely. This is probably not the right approach to take. Given that something like "number of MessageGroupIds" is not a metric that SQS provides in cloudwatch, and given that it is entirely dependent on how the user sends messages to the queue, I don't think we're going to find an ideal scaling behavior here. We should probably re-enable scaling, and document that in a FIFO queue, its important that you consider up-front how manyMessageGroupIds
your queue will have, and set your stack'sMaxSize
to that value or lower.fixes #278
cc @davidtheclark