Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding internals: Jobs / tasks / actions #625

Open
AndreasAlbertQC opened this issue Apr 19, 2023 · 0 comments
Open

Understanding internals: Jobs / tasks / actions #625

AndreasAlbertQC opened this issue Apr 19, 2023 · 0 comments

Comments

@AndreasAlbertQC
Copy link
Collaborator

Context

Hi!

I'm trying to better understand the quetz code base. In order to do that, I'd like to start by documenting some of the internals and design choices, starting with the topic of background job scheduling. I have compiled below what I understand so far based on docs and code.

@wolfv @btel @janjagusch (and whoever else knows more): I would be super grateful if you could help me out with corrections, additions, or any additional pointers. I'd be happy to contribute the resulting understanding into developer docs.

My current understanding of jobs/tasks/actions in quetz

There are two main ways for quetz to handle background tasks:

  • Mode 1: the starlette BackgroundTask API is used. This mode has nothing to do with the "worker" settings in the quetz config.
    • Scheduling:
      • route functions in main.py depend on fastapi.BackgroundTasks .
      • In order to schedule a task for execution, it is appended to the list
    • Execution:
    • Where is it used:
      • This method is used to execute indexing.update_indexes in a number of route functions:
        • delete_package
        • delete_package_version
        • post_file_to_package
        • post_upload
        • post_file_to_channel
  • Mode 2: "channel actions". This mode uses the "worker" settings in the quetz config.
    • Scheduling:
      • route functions depend on get_tasks_worker, which returns a Task instance, which exposes execute_channel_action
      • execute_channel_action allows to schedule one-offs as well as repeating scheduled jobs.
      • execute_channel_action does not execute anything, but writes the job definition to the DB
    • Execution:
      • Execution is managed by the Supervisor process.
      • By default, the Supervisor is started together with the server through cli.run / cli.start.
      • Alternatively, the supervisor process may also be started separately through the cli without starting the server. This requires you to pass in the deployment directory.
      • The Supervisor uses the worker section of the configuration in order to decide how to execute the job.
        • If worker=="thread" (the default), the job is executed in a thread pool
        • If worker=="subprocess", the job is executed as a subprocess of the supervisor process
        • If worker=="redis", then the job is not executed by the supervisor directly, but is sent to the redis queue.

Some questions

  • Is the above summary correct?
  • Why does mode 1 exist?
    • AFAICT, it is only used to run indexing.update_indexes , which is also available for running in mode 2. Since mode 2 is more flexible in configuring how tasks should be executed, it seems sensible to use mode 2 everywhere.
  • Is anyone currently successfully using mode 2 with worker=="redis" in a real-life setup?
    • When I try to set it up, I cannot seem to make it work because quetz.config.Config is not (un-)pickleable.
  • Why is quetz.Config designed the way it is? There are a few things going on here:
    • The design uses a singleton-ish pattern implemented through __new__. What is the motivation for this approach?
    • There also seems to be some support for searching for config files and possibly dynamically combining multiple config sources. What requirements make this necessary? Are there requirements that would prevent us from replacing the current Config with something super straightforward like pydantic.BaseSettings?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant