CI build runs slow due to redis and more #3820

bufke · 2022-05-24T15:33:28Z

Description

Building the kpi docker image takes about 14 minutes. It contains references to redis

ERROR:celery.backends.redis:Connection to Redis lost: Retry (4/20) in 1.00 second.

Redis isn't needed to build the docker image. It may be worth reviewing the build process more generally. It should not attempt to make service connections outside of the pypi and apt repos.

Steps to Reproduce

https://gitlab.com/kobo_toolbox/kpi/-/jobs/2500185890

The text was updated successfully, but these errors were encountered:

jnm · 2022-08-20T03:30:19Z

So we're killing off #3440 to get rid of the Redis annoyance. I would object, but no one seems to use the runtime-configurable autoscale feature. It was built for UNHCR and they don't use it; just recently on OCHA, even I forgot about it when dealing with exactly the situation it was designed for.

I think it's worth noting that when #3440 was made, it did not cause 40 seconds of wasted time during the CI build process. I'm not sure what has changed since then, but it doesn't feel like the last time we'll ever grapple with Python code trying to access a database when none is available. It's a fair point, though, that Django discourages talking to databases in AppConfig.ready(), and #3440 did that.

I find #3863 dissatisfying because, while it cuts off a sore thumb, it does not address the overall issues mentioned in this issue. We should answer these questions:

Do we ever need to run management commands or other application code in "non-runtime" environments like the Docker build process?
If so, how do we distinguish between runtime and non-runtime?

Some further thoughts:

Would it be better to avoid the the management commands in our current Dockerfile altogether?
1. Django itself stores .mo files in git. We could do that too and get rid of compilemessages as a build step.
2. Could collectstatic be invoked at runtime and put its junk directly in the proper place, instead of having it deposit files in one location that then has to be rsynced to the volume that's shared with the nginx container?
Is it really useful to have defaults for database URLs? It's hard to imagine these existing for any reason other than to be overwritten, especially when we bizarrely have both localhost and redis_cache in the Redis defaults:
```
kpi$ grep 'redis://' kobo/settings/base.py
CELERY_BROKER_URL = os.environ.get('KPI_BROKER_URL', 'redis://localhost:6379/1')
'REDIS_SESSION_URL', default='redis://redis_cache:6380/2'
'default': env.cache(default='redis://redis_cache:6380/3'),
```
1. The non-localhost hostnames look like they were set to match kobo-docker service names. If kobo-docker really depends on the default values, which I doubt, then let kobo-docker handle setting the environment variables appropriately.
2. Removing the defaults would make it efficient to tell when the databases are not accessible, but that wouldn't differentiate between configuration errors and "non-runtime" mode.

bufke · 2022-08-22T18:55:23Z

Regarding the original need to quickly scale celery, with the move to k8s we don't have to manually scale up workers. But if we did, I would argue it's better to do so with a env var than the current solution.

There is a stronger guarantee that a env var fed directly to a celery init script will produce consistent and predictable results. We could view it as a pure function with 1 saved state variable that will produce the same result every time and has no outside dependency. A signal that reads a database value and then executes a celery task function that then reconfigures celery is a weaker guarantee due to the many possibly failing dependencies (Postgres, Redis, Celery, conflicting values from the init triggered celery async task vs the synchronous signal).
If we have correctly identified that a higher concurrency is desired (in a io wait scenario) why not always use the higher value?

I'm for merging #3863 but I did offer a less disruptive alternative if you'd rather wait. I think it's is a step in the direction I'd like to see.

1 and 2 - I think it's fine to run management commands at build time, such as collectstatic (assuming the target is a local file system). I don't believe we should depend on external resources at build time (nor app startup) except for pypi, npm, and docker registries that are explicitly build time resources. That's the distinction.

Further thought 1) compilemessages has lead to some "why does my build not work today" issues seen on internal chat when something is wrong with translations. I'd rather see this done in a CI bot or manually on some schedule. Storing the mo file in git is fine. It's storing a snapshot of the translations at a point in time. Maybe it's not perfectly normalized as it's really a computed value, but it's good enough and the mo files do have unique meaning in that they are known to be working translations.

I don't love the usage of rsync (or chmod) that doubles file sizes and slows Docker builds. I think there is a way we could remove this but I would need to consider how to do so without breaking existing desired behavior.

We could align the python defaults to docker compose defaults. That would greatly simplify kobo-docker compose files, maybe to the point where it's 50 lines of yaml and for developers, absolutely 0 configuration. docker compose up and that's it. Minimal for vanilla looking end users (Django secret key, etc still get defined). Of course that isn't the reality today and the real answer is - I'm used to working in this zero-conf manner and so I write defaults without thinking about it.

Zero conf set ups are far beyond the scope of "speed up docker builds" but I'd love to discuss more on video chat sometime.

bufke self-assigned this May 24, 2022

This was referenced Jun 13, 2022

Make Docker build smaller and faster #3840

Closed

Manage celery autoscale via environment variables instead of constance #3863

Merged

bufke closed this as completed Jan 10, 2024

This was referenced Jan 10, 2024

Put the language MO files inside this repository (?) #4801

Open

Remove database URL defaults #4802

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI build runs slow due to redis and more #3820

CI build runs slow due to redis and more #3820

bufke commented May 24, 2022

jnm commented Aug 20, 2022

bufke commented Aug 22, 2022 •

edited

Loading

CI build runs slow due to redis and more #3820

CI build runs slow due to redis and more #3820

Comments

bufke commented May 24, 2022

Description

Steps to Reproduce

jnm commented Aug 20, 2022

bufke commented Aug 22, 2022 • edited Loading

bufke commented Aug 22, 2022 •

edited

Loading