Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - conda-store-server worker detection does not work when scaling api servers #897

Open
soapy1 opened this issue Oct 17, 2024 · 2 comments
Assignees
Labels
area: container 📦 area: user experience 👩🏻‍💻 Items impacting the end-user experience needs: investigation 🔎 needs: triaging 🚦 Someone needs to have a look at this issue and triage roadmap: operations type: bug 🐛 Something isn't working

Comments

@soapy1
Copy link
Contributor

soapy1 commented Oct 17, 2024

Describe the bug

When the conda-store server is starting up, it waits for a worker to become available. It outputs the following message when it's waiting for a worker and then one is found.

2024-10-17 23:07:56   WARNING CondaStoreServer: 379: Waiting for worker... Use --standalone if running outside of docker
2024-10-17 23:07:56      INFO CondaStoreServer: 373: Worker initialized

However, if there is a deployed worker first, then a server is started up, the worker never gets detected and the user is shown the warning message:

...
2024-10-17 23:19:20   WARNING CondaStoreServer: 379: Waiting for worker... Use --standalone if running outside of docker
2024-10-17 23:19:35   WARNING CondaStoreServer: 379: Waiting for worker... Use --standalone if running outside of docker
2024-10-17 23:19:40   WARNING CondaStoreServer: 379: Waiting for worker... Use --standalone if running outside of docker
...

Note, that when this happens, the server is still functional and can submit tasks to workers, and the available workers can pickup tasks.

Expected behavior

Users should not receive a message WARNING CondaStoreServer: 379: Waiting for worker... Use --standalone if running outside of docker when workers are available.

How to Reproduce the problem?

  1. launch a conda-store worker
  2. launch a conda-store server
  3. view the logs for conda-store server, noting the warning message

For example, if using kubernetes:

  1. confirm all pods are up after setting up the cluster (kubectl get po)
  2. scale down the replicas for the server deployment
$ kubectl scale  deployment/conda-store-server --replicas=0
  1. scale up the replicas
$ kubectl scale  deployment/conda-store-server --replicas=1
  1. view the logs
$ kubectl logs deployment/conda-store-server

...
2024-10-17 23:19:30   WARNING CondaStoreServer: 379: Waiting for worker... Use --standalone if running outside of docker
2024-10-17 23:19:35   WARNING CondaStoreServer: 379: Waiting for worker... Use --standalone if running outside of docker
2024-10-17 23:19:40   WARNING CondaStoreServer: 379: Waiting for worker... Use --standalone if running outside of docker

Output

No response

Versions and dependencies used.

conda-store - main branch

Anything else?

Looks like this functionality was introduced #705, to resolve #605.

@soapy1 soapy1 added the type: bug 🐛 Something isn't working label Oct 17, 2024
@peytondmurray peytondmurray added area: user experience 👩🏻‍💻 Items impacting the end-user experience area: container 📦 needs: triaging 🚦 Someone needs to have a look at this issue and triage needs: investigation 🔎 labels Oct 19, 2024
@peytondmurray peytondmurray self-assigned this Oct 25, 2024
@peytondmurray
Copy link
Contributor

Will investigate to figure out what the best path forward is.

@soapy1
Copy link
Contributor Author

soapy1 commented Oct 25, 2024

This happens because as part of the server determines if a worker is available by querying the workers table in the database. But the server start up process involves clearing all entries in the workers table. So, if a worker comes up before the server, the server will never "know" that a worker is available.

Here is the snippet that clears the table https://github.com/conda-incubator/conda-store/blob/main/conda-store-server/conda_store_server/_internal/server/app.py#L394-L413

I think this isn't the best pattern. Like, the server should not need to know if a worker is available. I think the best step forward is to remove this functionality and remove the workers table in the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: container 📦 area: user experience 👩🏻‍💻 Items impacting the end-user experience needs: investigation 🔎 needs: triaging 🚦 Someone needs to have a look at this issue and triage roadmap: operations type: bug 🐛 Something isn't working
Projects
Status: Ready 🛎️
Development

No branches or pull requests

2 participants