Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize signal ingestion pipeline and create perf-test cli util #3337

Merged
merged 6 commits into from
May 4, 2023

Conversation

wssheldon
Copy link
Contributor

@wssheldon wssheldon commented May 2, 2023

This PR introduces a blocking thread-safe FIFO queue in the signal ingestion pipeline. This ensures signals are processed in the order they were received and prevents race conditions where a signal instance can override a shared value (e.g. case associations). The queue has a max size of 500 that is also used to limit the number of instances retrieved by the query in process_signals. We also create a scoped_session for each run of the scheduler.

A boundary of .filter(SignalInstance.created_at >= one_hour_ago) was added to the ingestion query to prevent cases where we are unable to keep up for an hour, which should never happen, and if it does is an acceptable scenario to prevent performance issues and outages.

Caching logic was introduced in _should_update_signal_message to prevent the Slack chat.update method from being ratelimited.

Finally, a stress test CLI utility was created that attempts to replicate signal ingestion via the /instances API endpoint.

Usage:

dispatch scheduler perf-test --api-token redacted` --num-instances 5000
...
Sent 5000 of 5000 signal instances:/Users/wshel/Projects/dispatch/src/dispatch/cli.py:send_signal_instances:769
Elapsed time: 281.17 seconds

Screenshot 2023-05-02 at 8 56 19 AM

I tested a push of 5000 instances from my M1 MBP.

timer results:

Non-deduplicated signal instance (calls many external API's) and results in new case.

DEBUG:function.elapsed.time.dispatch.signal.scheduled.process_signal_instance: 29.435003541992046

Deduplicated signal instance.

DEBUG:function.elapsed.time.dispatch.signal.scheduled.process_signal_instance: 0.0374222089885734

We can play around with these configurable options to see what makes the most sense:

@scheduler.add(every(1).minutes, name="signal-process")
MAX_SIGNAL_INSTANCES = 500

@wssheldon wssheldon added the enhancement New feature or request label May 2, 2023
@wssheldon wssheldon requested a review from mvilanova May 2, 2023 16:00
src/dispatch/cli.py Outdated Show resolved Hide resolved
src/dispatch/cli.py Outdated Show resolved Hide resolved
src/dispatch/cli.py Outdated Show resolved Hide resolved
src/dispatch/cli.py Outdated Show resolved Hide resolved
src/dispatch/cli.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants