Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Execute Privacy Requests with Celery #621

Merged
merged 71 commits into from
Jun 22, 2022
Merged

Conversation

seanpreston
Copy link
Contributor

@seanpreston seanpreston commented Jun 9, 2022

Purpose

  • This PR updates the way privacy requests are dispatched into processing from a background process into a Celery task

Changes

  • converts PrivacyRequestRunner.run into run_privacy_request
  • Removes PrivacyRequestRunner

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #632

@seanpreston seanpreston changed the title Celery privacy request dispatch Execute Privacy Requests with Celery Jun 9, 2022
@seanpreston seanpreston mentioned this pull request Jun 9, 2022
10 tasks
@seanpreston seanpreston marked this pull request as ready for review June 12, 2022 17:00
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is working well now Sean, nice work getting the all the config and the worker pieces sorted out.

my lingering concern is that that all tests were passing in earlier rounds of CR while there were several celery pieces still broken. Have you thought about ways to end-to-end test this reliably?

docker-compose.yml Show resolved Hide resolved
src/fidesops/core/config.py Show resolved Hide resolved
from fidesops.tasks.scheduled.scheduler import scheduler
from fidesops.util.async_util import run_async
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surfacing this again @seanpreston

fidesops.toml Show resolved Hide resolved
src/fidesops/tasks/__init__.py Outdated Show resolved Hide resolved
@seanpreston seanpreston added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Jun 22, 2022
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @seanpreston thanks for adding the extra step of caching the request id and asserting this in tests

def get_async_execution_task(self) -> Optional[AsyncResult]:
"""Returns a task reflecting the state of this privacy request's asynchronous execution."""
task_id = self.get_cached_task_id()
res: AsyncResult = AsyncResult(task_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If task_id is None, perhaps the cache has expired, this will throw an error - I see we're just using this in tests right now though.

Comment on lines +403 to +404
assert pr.get_cached_task_id() is not None
assert pr.get_async_execution_task() is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding these assertions

Comment on lines +218 to +224
def cache_task_id(self, task_id: str) -> None:
"""Sets a task_id for this privacy request's asynchronous execution."""
cache: FidesopsRedis = get_cache()
cache.set(
get_async_task_tracking_cache_key(self.id),
task_id,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this is stored in the same db index as privacy-request resources, but celery itself is using a different index (1) , keys prefixed with celery-task-meta-*

Comment on lines +131 to +135
def queue_privacy_request(
privacy_request_id: str,
from_webhook_id: Optional[str] = None,
from_step: Optional[str] = None,
) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea here, since we've got this extra step now of caching the task idea

@pattisdr pattisdr merged commit c222ce0 into main Jun 22, 2022
@pattisdr pattisdr deleted the celery-privacy-request-dispatch branch June 22, 2022 22:19
@pattisdr
Copy link
Contributor

A follow-up, i imagine we'll also need to add some sane defaults for task concurrency. The code this replaces limited the number of concurrent background threads that could be utilized, because in light testing we could open up too many connections to databases.

sanders41 pushed a commit that referenced this pull request Sep 22, 2022
 Updates the way privacy requests are dispatched into processing from a background process into a Celery task
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
run unsafe ci checks Triggers running of unsafe CI checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace PrivacyRequestRunner.submit with a Celery task
4 participants