Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ORM usage, db_session instantiation, and tuning #3365

Merged
merged 6 commits into from
May 15, 2023

Conversation

wssheldon
Copy link
Contributor

@wssheldon wssheldon commented May 5, 2023

This change introduces performance improvements for deduplicated Signal instances that are ingested. By optimizing the ORM usage across the flow (primarily by removing loads of unnecessary relationships, columns, and rows) this path is about 4x faster. To process 500 deduplicated instances, it takes roughly 10 seconds.

The create_signal_messages had the most room for improvement, because it unnecessarily was loading all signal_instances and their associated raw data. It previously took about 3 seconds (and exponentially more as more and more signals are ingested). It is about 5-6x faster.

Before:
DEBUG:function.elapsed.time.dispatch.plugins.dispatch_slack.case.messages.create_signal_messages: 3.1490371670006425

After:
DEBUG:function.elapsed.time.dispatch.plugins.dispatch_slack.case.messages.create_signal_messages: 0.056947083001432475:/Users/wshel/Projects/dispatch/src/dispatch/decorators.py:wrapper:185

Creating a new case from scratch (non-deduped) now takes about 6 seconds. It previously took roughly 30 seconds.

DEBUG:function.elapsed.time.dispatch.signal.scheduled.process_signal_instance: 29.435003541992046

This is primarily due to the removal of external resources such as Google docs, drive, etc.

The default dedupe filter performance is improved by fetching one row to determine if we dedupe instead of all of them in the time frame and only fetching the necessary column (case_id).

Screenshot 2023-05-05 at 10 04 21 AM

Testing

  • Python tests
  • Playwright E2E tests
  • Run all schedulers end to end
  • Run with one entity type defined
  • Run with multiple entity types defined
  • Run with many different Signal types being sent at once
  • Run with engagement filter
  • Run with defined dedupe filter
  • Run with defined snooze filter

@wssheldon wssheldon added the enhancement New feature or request label May 5, 2023
@wssheldon wssheldon requested a review from mvilanova May 5, 2023 17:08
@kevgliss
Copy link
Contributor

kevgliss commented May 15, 2023

I've modified the approach here a little bit. Instead of relying on the scheduler, we have one dedicated process that processes all new signals in some reasonable order.

With the other improvements around case creation, this is likely "good enough" for now and we can look into running multiple of these processes concurrently if required (instead of a scheduler doing it for us).

I also added awrk lua script to aid performance testing.

@kevgliss kevgliss self-requested a review May 15, 2023 18:35
@kevgliss kevgliss merged commit 4416b2d into master May 15, 2023
@kevgliss kevgliss deleted the enhancement/signal-process-optimization branch May 15, 2023 18:53
kevgliss added a commit that referenced this pull request May 17, 2023
kevgliss added a commit that referenced this pull request May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants