Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerized chroma arguments customization #1658

Merged
merged 10 commits into from
Feb 7, 2024

Conversation

MrZoidberg
Copy link
Contributor

@MrZoidberg MrZoidberg commented Jan 18, 2024

Description of changes

Summarize the changes made by this PR.

  • New functionality
    • Added an ability to customize the default arguments that are passed from docker run or docker compose command field to uvicorn chromadb.app:app. I needed it to be able to customize the port because in certain scenarios it cannot be change (i.e. ECS where internal port is proxies as is). The default arguments are not changed: --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30
    • Added ENV variables for basic customization with default values:
CHROMA_HOST_ADDR="0.0.0.0"
CHROMA_HOST_PORT=8000
CHROMA_WORKERS=1
CHROMA_LOG_CONFIG="chromadb/log_config.yml"
CHROMA_TIMEOUT_KEEP_ALIVE=30

Test plan

How are these changes tested?

  • Tested locally using docker build and docker run commands
  • Tested customization in docker-compose - now it works as expected.

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

TODO: Deployment docs needs to be updated to cover container arguments customization.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

docker-compose.yml Outdated Show resolved Hide resolved
bin/docker_entrypoint.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@beggers beggers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good -- I agree with Hammad that we should leave the ports in docker-compose.yml as 8000. I suspect that's the reason for the failing tests too.

Copy link
Contributor

@tazarov tazarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MrZoidberg, thanks for this PR, but we need to step back and think through some of the ramifications of doing the suggested changes.

docker-compose.yml Outdated Show resolved Hide resolved
bin/docker_entrypoint.sh Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
export IS_PERSISTENT=1
export CHROMA_SERVER_NOFILE=65535
exec uvicorn chromadb.app:app --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30
echo "Starting server with args: ${@}"
exec uvicorn chromadb.app:app ${@}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrZoidberg, I don't think we should proceed with this change. The main reason for this is that not having sensible defaults (which by the way can be exposed via ENV vars), will result in a broken docker image where people will not be able to do:

docker run -d --rm --name chromadb -p 8000:8000  chromadb/chroma:latest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENTRYPOINT is the process that's executed inside the container. CMD is the default set of arguments that are supplied to the ENTRYPOINT process.
By stripping executable out of CMD I think you're making container a bit more secure as it's harder to ovveride executable.
I made this change because in the current Chroma docker you CANNOT override the port easily.

With this change both commands works (build with docker build -t server .):

  • docker run -it --rm --name chromadb -p 8000:8000 server --> results in chroma running on port 8000
  • docker run -it --rm --name chromadb -p 8002:8002 server --workers 1 --host 0.0.0.0 --port 8002 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30 --> results in chroma running on port 80002

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert this change if that's the preference, so the default CMD in Dockerfile would be uvicorn chromadb.app:app --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30 and in docker_entrypoint.sh line 10 would be exec ${@}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrZoidberg, thanks for the explanation. I think reconfigurability is important to accommodate more use cases like yours. Security is also important. Overall, I agree with your reasoning on this. Do you think it merits investigating env vars instead of actual unicorn params? Or perhaps a mix of both.

Here's my reasoning on this:

  • Exposing uvicorn params is great for power users who know their stuff, but not so great for users who don't necessarily know a thing about uvicorn, and they shouldn't be forced to learn it to be able to configure a port
  • Exposing env vars for common config params, great for less advanced users, limiting for users that want more control over uvicorn

Conclusion - let's meet in the middle - env vars in the Dockerfile with sensible defaults. Those same vars can be added to the default CMD

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tazarov it this something you described?

image

note: expose doesn't do anything and to my knowledge can't be set dynamically other than using build-time ARG.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV only sets sensible defaults. If the user sets -e CHROMA_HOST_PORT=8002, then the ENV will be overwritten by whatever the user sets.

Your suggestion is spot on with what I was thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will check to make sure it's working with defaults and custom values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

works. will push changes soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with updates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrZoidberg, thank you for all the efforts and for sticking with me through this. I think the PR looks excellent and serves your and, I hope, countless other users' use cases without compromising on usability and security and not negatively impacting existing DX.

@beggers have a look, but I think this is good to go after the tests pass.

@MrZoidberg
Copy link
Contributor Author

@tazarov I roll-backed bash related changes and answered the concern regarding docker-entrypoint.sh. Please check

@tazarov
Copy link
Contributor

tazarov commented Jan 19, 2024

@MrZoidberg, the test are failing as the new Docker image does not expect uvicorn command but only params:

Can you also update those two files, to align with the docker-compose.yaml changes:

  • docker-compose.test.yml
  • docker-compose.test-auth.yml

@MrZoidberg
Copy link
Contributor Author

@tazarov done

@tazarov
Copy link
Contributor

tazarov commented Jan 21, 2024

@tazarov done

@MrZoidberg, thank you. I still feel uneasy about whether the changes in Docker can cause troubles for user deployments.

Here is my reasoning:

Today, Chroma is used across 100s of thousands of deployments varying from public to private. It is not unreasonable to think that some of those deployments will have their own docker-compose files with the uvicorn command. All these deployments will get an error when they pull in the new Chroma image.

So, let's step back for a second and re-evaluate. I have this idea to keep the backward compatibility; We can check if the first argument passed to docker_entrypoint.sh is unicorn and, if so, shift args by 1.

The test result would be that running docker-compose with the new docker-compose.yaml and the old one should not yield an error (like we've seen in the failing tests before you updated the test compose files).

I would also suggest adding a deprecation warning in the event unicorn is the first arg.

@MrZoidberg, what do you think?

@MrZoidberg
Copy link
Contributor Author

@tazarov a bit busy. I can do that surely. Plz note that currently custom CMD in docker-compose just don't work at all - that's the first I checked when I faced the problem with port. Will do this change in upcoming days.

@MrZoidberg
Copy link
Contributor Author

@tazarov hi, I've updated the docker entrypoint script to work with args that start with uvicorn correctly but produce a warning.
Please check if that's what you expect. Thanks!

image

@tazarov
Copy link
Contributor

tazarov commented Feb 1, 2024

@MrZoidberg, thanks for all your patience and persistence with this. Let me run some local tests, and then I think we should be good to go.

@tazarov
Copy link
Contributor

tazarov commented Feb 1, 2024

Test the following:

  • docker run --rm --name chromadb -p 8001:8000 -v ./chroma:/chroma/chroma -e IS_PERSISTENT=TRUE -e ANONYMIZED_TELEMETRY=TRUE chroma-unvironless
  • docker run --rm --name chromadb -p 8001:8000 -v ./chroma:/chroma/chroma -e IS_PERSISTENT=TRUE -e ANONYMIZED_TELEMETRY=TRUE chroma-unvironless uvicorn chromadb.app:app --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30
  • docker compose with command: "uvicorn chromadb.app:app --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30
  • docker compose with command: "--workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"

All seems to be working fine. The warning is raised as expected when uvicorn is used in the command.

@MrZoidberg
Copy link
Contributor Author

Thanks. Waiting for the merge and gonna replace my custom-build image with the official one.

@beggers beggers merged commit 691ac3f into chroma-core:main Feb 7, 2024
97 checks passed
@beggers
Copy link
Contributor

beggers commented Feb 7, 2024

Thank you @MrZoidberg!

tazarov pushed a commit to amikos-tech/chroma-core that referenced this pull request Feb 10, 2024
## Description of changes

*Summarize the changes made by this PR.*
 - New functionality
- Added an ability to customize the default arguments that are passed
from `docker run` or `docker compose` `command` field to `uvicorn
chromadb.app:app`. I needed it to be able to customize the port because
in certain scenarios it cannot be change (i.e. ECS where internal port
is proxies as is). The default arguments are not changed: `--workers 1
--host 0.0.0.0 --port 8000 --proxy-headers --log-config
chromadb/log_config.yml --timeout-keep-alive 30`
	 - Added ENV variables for basic customization with default values:
```	 
CHROMA_HOST_ADDR="0.0.0.0"
CHROMA_HOST_PORT=8000
CHROMA_WORKERS=1
CHROMA_LOG_CONFIG="chromadb/log_config.yml"
CHROMA_TIMEOUT_KEEP_ALIVE=30
```

## Test plan
*How are these changes tested?*

- Tested locally using `docker build` and `docker run` commands
- Tested customization in `docker-compose` - now it works as expected.

## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
repository](https://github.com/chroma-core/docs)?*

TODO: Deployment docs needs to be updated to cover container arguments
customization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants