Optimize ORM usage, db_session instantiation, and tuning #3365

wssheldon · 2023-05-05T17:07:58Z

This change introduces performance improvements for deduplicated Signal instances that are ingested. By optimizing the ORM usage across the flow (primarily by removing loads of unnecessary relationships, columns, and rows) this path is about 4x faster. To process 500 deduplicated instances, it takes roughly 10 seconds.

The create_signal_messages had the most room for improvement, because it unnecessarily was loading all signal_instances and their associated raw data. It previously took about 3 seconds (and exponentially more as more and more signals are ingested). It is about 5-6x faster.

Before:
DEBUG:function.elapsed.time.dispatch.plugins.dispatch_slack.case.messages.create_signal_messages: 3.1490371670006425

After:
DEBUG:function.elapsed.time.dispatch.plugins.dispatch_slack.case.messages.create_signal_messages: 0.056947083001432475:/Users/wshel/Projects/dispatch/src/dispatch/decorators.py:wrapper:185

Creating a new case from scratch (non-deduped) now takes about 6 seconds. It previously took roughly 30 seconds.

DEBUG:function.elapsed.time.dispatch.signal.scheduled.process_signal_instance: 29.435003541992046

This is primarily due to the removal of external resources such as Google docs, drive, etc.

The default dedupe filter performance is improved by fetching one row to determine if we dedupe instead of all of them in the time frame and only fetching the necessary column (case_id).

Testing

Python tests
Playwright E2E tests
Run all schedulers end to end
Run with one entity type defined
Run with multiple entity types defined
Run with many different Signal types being sent at once
Run with engagement filter
Run with defined dedupe filter
Run with defined snooze filter

…task

src/dispatch/decorators.py

kevgliss · 2023-05-15T18:18:31Z

I've modified the approach here a little bit. Instead of relying on the scheduler, we have one dedicated process that processes all new signals in some reasonable order.

With the other improvements around case creation, this is likely "good enough" for now and we can look into running multiple of these processes concurrently if required (instead of a scheduler doing it for us).

I also added awrk lua script to aid performance testing.

Optimize ORM usage, db_session instantiation, and tuning

3144630

wssheldon added the enhancement New feature or request label May 5, 2023

wssheldon requested a review from mvilanova May 5, 2023 17:08

wssheldon added 2 commits May 5, 2023 10:08

Merge branch 'master' into enhancement/signal-process-optimization

bf83b42

ensure top level db_session is closed if exception occurs in project …

3a74735

…task

wssheldon commented May 8, 2023

View reviewed changes

src/dispatch/decorators.py Show resolved Hide resolved

Adds process processor

f33b671

kevgliss added 2 commits May 15, 2023 11:19

More ruff

16adcda

Merge branch 'master' into enhancement/signal-process-optimization

ace312e

kevgliss self-requested a review May 15, 2023 18:35

kevgliss approved these changes May 15, 2023

View reviewed changes

kevgliss merged commit 4416b2d into master May 15, 2023

kevgliss deleted the enhancement/signal-process-optimization branch May 15, 2023 18:53

kevgliss added a commit that referenced this pull request May 17, 2023

Rolls back session changes in #3365

699f121

kevgliss added a commit that referenced this pull request May 17, 2023

Rolls back session changes in #3365 (#3396)

cf1a77c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ORM usage, db_session instantiation, and tuning #3365

Optimize ORM usage, db_session instantiation, and tuning #3365

wssheldon commented May 5, 2023 •

edited by kevgliss

Loading

kevgliss commented May 15, 2023 •

edited

Loading

Optimize ORM usage, db_session instantiation, and tuning #3365

Optimize ORM usage, db_session instantiation, and tuning #3365

Conversation

wssheldon commented May 5, 2023 • edited by kevgliss Loading

kevgliss commented May 15, 2023 • edited Loading

wssheldon commented May 5, 2023 •

edited by kevgliss

Loading

kevgliss commented May 15, 2023 •

edited

Loading