Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rid ourselves of pipenv, requirements files, dev containers and other unecessary complexity #914

Merged
merged 8 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 3 additions & 12 deletions .docker/app_dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,10 @@ COPY webapp/package.json webapp/yarn.lock ./

# Using a custom node_modules location to avoid mounting it outside of docker
RUN --mount=type=cache,target=/root/.cache/yarn yarn install --frozen-lockfile --modules-folder /node_modules
ENV PATH $PATH:/node_modules/.bin
ENV PATH=$PATH:/node_modules/.bin

FROM base as development

# Install cypress dependencies for local testing
RUN apt-get update \
&& apt install -y libgtk2.0-0 libgtk-3-0 libgbm-dev libnotify-dev libgconf-2-4 libnss3 libxss1 libasound2 libxtst6 xauth xvfb \
&& rm -rf /var/lib/apt/lists/*

CMD [ "yarn", "serve", "--port", "8081"]

FROM base as production
ENV NODE_ENV production
FROM base AS production
ENV NODE_ENV=production

# These get replaced by the entrypoint script for production builds.
# Set the real values in `.env` files or an external docker-compose.
Expand Down
37 changes: 9 additions & 28 deletions .docker/server_dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,7 @@ RUN apt update && apt install -y gnupg curl tree mdbtools && apt clean
WORKDIR /opt
RUN wget https://fastdl.mongodb.org/tools/db/mongodb-database-tools-ubuntu2204-x86_64-100.9.0.deb && apt install ./mongodb-database-tools-*-100.9.0.deb

FROM base as app

WORKDIR /app
FROM base AS app

COPY --from=ghcr.io/astral-sh/uv:0.4 /uv /usr/local/bin/uv
ENV UV_LINK_MODE=copy \
Expand All @@ -23,31 +21,18 @@ ENV UV_LINK_MODE=copy \
UV_PROJECT_ENVIRONMENT=/opt/.venv \
UV_PYTHON=python3.10

WORKDIR /opt
COPY pydatalab/pyproject.toml /_lock/
COPY pydatalab/uv.lock /_lock/
COPY pydatalab/requirements/requirements-all.txt /_lock/
RUN uv venv
RUN uv pip install -r /_lock/requirements-all.txt

# Create development image using flask's dev server with hot-reload
FROM app as development


WORKDIR /app
ENV FLASK_APP "pydatalab.main"
ENV FLASK_ENV "development"
ENV PORT=5001
CMD [ "/bin/bash", "-c", "source /opt/.venv/bin/activate && exec flask run --reload --port ${PORT} --host 0.0.0.0" ]

# Create production image using gunicorn and minimal dependencies
FROM app as production

WORKDIR /opt
RUN [ "uv", "pip", "install", "gunicorn" ]
COPY ./pydatalab/pyproject.toml .
COPY ./pydatalab/uv.lock .
RUN uv sync --locked --no-dev --all-extras

FROM app AS production
WORKDIR /app

# Install the local version of the package and mount the repository data to get version info
COPY ./pydatalab/ ./
RUN git config --global --add safe.directory /
RUN --mount=type=bind,target=/.git,source=./.git uv pip install --python /opt/.venv/bin/python --no-deps .

# This will define the number of gunicorn workers
ARG WEB_CONCURRENCY=4
Expand All @@ -57,10 +42,6 @@ ARG PORT=5001
EXPOSE ${PORT}
ENV PORT=${PORT}

# Install the local version of the package and mount the repository data to get version info
RUN git config --global --add safe.directory /
RUN --mount=type=bind,target=/.git,source=./.git ["uv", "pip", "install", "--python", "/opt/.venv/bin/python", "--no-deps", "."]

CMD ["/bin/bash", "-c", "/opt/.venv/bin/python -m gunicorn --preload -w ${WEB_CONCURRENCY} --error-logfile /logs/pydatalab_error.log --access-logfile - -b 0.0.0.0:${PORT} 'pydatalab.main:create_app()'"]

HEALTHCHECK --interval=30s --timeout=30s --start-interval=15s --start-period=30s --retries=3 CMD curl --fail http://localhost:${PORT}/healthcheck/is_ready || exit 1
36 changes: 2 additions & 34 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
- name: Install dependencies
working-directory: ./pydatalab
run: |
uv sync --locked --all-extras
uv sync --locked --all-extras --dev

- name: Run pre-commit
working-directory: ./pydatalab
Expand Down Expand Up @@ -83,7 +83,7 @@ jobs:
- name: Install locked versions of dependencies
working-directory: ./pydatalab
run: |
uv sync --locked --all-extras
uv sync --locked --all-extras --dev

- name: Run all tests
working-directory: ./pydatalab
Expand Down Expand Up @@ -120,38 +120,6 @@ jobs:
working-directory: ./webapp
run: yarn build

docker:
name: Test dev Docker builds
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build Docker containers
uses: docker/bake-action@v5
with:
files: docker-compose.yml
load: true
targets: ${{ matrix.target }}
set: |
app_dev.cache-to=type=gha,scope=build-app_dev,mode=max
app_dev.cache-from=type=gha,scope=build-app_dev
app_dev.tags=datalab-app_dev:latest
api_dev.cache-to=type=gha,scope=build-api_dev,mode=max
api_dev.cache-from=type=gha,scope=build-api_dev
api_dev.tags=datalab-api_dev:latest
database_dev.cache-to=type=gha,scope=build-database-dev,mode=max
database_dev.cache-from=type=gha,scope=build-database_dev
database_dev.tags=datalab-database_dev:latest

- name: Start services
run: |
# Launch dev container profiles and wait for them to come up with healthchecks
docker compose up app_dev api_dev database_dev --wait --no-build --force-recreate -d

e2e:
name: e2e tests
runs-on: ubuntu-latest
Expand Down
3 changes: 1 addition & 2 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ build:
- asdf plugin add uv
- asdf install uv latest
- asdf global uv latest
- cd pydatalab && uv venv
- cd pydatalab && uv pip install -r requirements/requirements-all-dev.txt
- cd pydatalab && uv sync --all-extras --dev
- cd pydatalab && uv pip install .
- cd pydatalab && .venv/bin/mkdocs build --site-dir $READTHEDOCS_OUTPUT/html
89 changes: 28 additions & 61 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ instance, in which case please check out the separate Python API package at

The instructions below outline how to make a development installation on your local machine.
We strongly recommend following the [deployment instructions](deployment.md) on [docs.datalab-org.io](https://docs.datalab-org.io/en/stable/deployment/) if you are deploying for use in production.
These instructions are also useful for developers who want to use Docker to create a reproducible development environment.

This repository consists of two components:

Expand All @@ -23,6 +22,7 @@ This repository consists of two components:
To run *datalab*, you will need to install the environments for each component.

Firstly, from the desired folder, clone this repository from GitHub to your local machine with `git clone https://github.com/datalab-org/datalab`.
If you are not familiar with `git` or GitHub, you can do worse than reading through the [GitHub getting started documentation](https://docs.github.com/en/get-started/start-your-journey/about-github-and-git).

### `pydatalab` server installation

Expand All @@ -33,96 +33,73 @@ The instructions in this section will leave you with a running *datalab* server
*datalab* uses MongoDB as its database backend.
This requires a MongoDB server to be running on your desired host machine.

1. Install the free MongoDB community edition (full instructions on the [MongoDB website](https://docs.mongodb.com/manual/installation/)).
* For Mac users, MongoDB is available via [HomeBrew](https://github.com/mongodb/homebrew-brew).
- You can alternatively run the MongoDB via Docker using the config in this package with `docker compose up database` (see [deployment instructions](deploy.md).
1. Install the free MongoDB community edition (see the full instructions for your OS on the [MongoDB website](https://docs.mongodb.com/manual/installation/)).
* For MacOS users, MongoDB is available via [HomeBrew](https://github.com/mongodb/homebrew-brew).
* You can alternatively run the MongoDB via Docker using the config in this package with `docker compose up database` (see [deployment instructions](deploy.md)).
* If you wish to view the database directly, MongoDB has several GUIs, e.g. [MongoDB Compass](https://www.mongodb.com/products/compass) or [Studio 3T](https://robomongo.org/).
- For persistence, you will need to set up MongoDB to run as a service on your computer (or run manually each time you run the `pydatalab` server).
* For persistence, you will need to set up MongoDB to run as a service on your computer (or run manually each time you run the `pydatalab` server).

#### Python setup

The next step is to set up a Python environment that contains all of the required dependencies with the correct versions.
You will need Python 3.10 or higher to run *datalab*; we recommend using
something like [`pyenv`](https://github.com/pyenv/pyenv) to manage Python versions on your machine, to avoid breakages based on your OS's Python versioning.
You will need Python 3.10 or higher to run *datalab*; we recommend using a tool to manage Python versions on your machine, to avoid breakages based on your OS's Python versioning (e.g., [`pyenv`](https://github.com/pyenv/pyenv) or [`uv`](https://github.com/astral-sh/uv)).

##### Using virtual environments with `uv` or `venv`
##### Installation with `uv` or `venv`

We recommend using a virtual environment tool of your choice to manage the dependencies for the
Python server, for example [`uv`](https://github.com/astral-sh/uv) (see
repository for installation instructions), or the
standard library Python `venv` module.
We recommend using [`uv`](https://github.com/astral-sh/uv) (see the linked repository or https://docs.astral.sh/uv for installation instructions) for managing your *datalab* installation.

You could also use the standard library `venv` module, but this will not allow you to install pinned dependencies as easily, and is significantly slower than `uv`.

1. Create a virtual environment for *datalab*, ideally inside the `pydatalab` directory.
- For `uv`, this can be done with `uv venv`.
- For `uv`, you can run `uv venv` (when installing using `uv sync`, this will be done automatically on installation).
- For `venv`, this can be done with `python -m venv .venv`.
- You will be left with a folder called `.venv` that bundles a Python
environment.
2. Activate the virtual environment (optional for `uv`) and install dependencies. One can either use the loosely pinned dependencies in `pyproject.toml`, or the locked versions in the `requirements/requirements-all-dev.txt` and `requirements/requirements-all.txt` files.
- Either way, you will be left with a folder called `.venv` in your `pydatalab` directory that bundles an entire Python environment.
2. Activate the virtual environment (again, optional for `uv`) and install dependencies. One can either use the loosely pinned dependencies in `pyproject.toml`, or the locked versions in `uv.lock`.

=== "Installation with `uv`"

```shell
# EITHER: Install all dependencies with locked versions, then install the local package
uv pip install -r requirements/requirements-all-dev.txt
uv pip install -e '.[all,dev]'
# EITHER: Install all dependencies with locked versions (recommended)
uv sync --all-extras --dev --locked
# OR: Install all dependencies with loosely pinned versions
uv pip install -e '.[all,dev]'
uv pip install -e '.[all]'
```

=== "Installation with `venv`"

```shell
source .venv/bin/activate
# EITHER: Install all dependencies with locked versions, then install the local package
pip install -r requirements/requirements-all-dev.txt
pip install -e '.[all, dev]'
# OR: Install all dependencies with loosely pinned versions
pip install -e '.[all, dev]'
# Install all dependencies with loosely pinned versions
pip install -e '.[all]'
```

##### Using `pipenv` (DEPRECATED)

Previously, *datalab* used `pipenv` for dependency management.
We maintain a `pipenv` lockfile (`Pipfile.lock`) of all dependencies that must be installed to run the server, though this will be removed in future versions.

To make use of this file:

1. Install `pipenv` on your machine.
- Detailed instructions for installing `pipenv`, `pip` and Python itself can be found on the [`pipenv` website](https://pipenv.pypa.io/en/latest/install/#installing-pipenv).
- We recommend you install `pipenv` from PyPI (with `pip install pipenv` or `pip install --user pipenv`) for the Python distribution of your choice (in a virtual environment or otherwise). `pipenv` will be used to create its own virtual environment for installation of the `pydatalab` package.
1. Install the `pydatalab` package.
- Navigate to the `pydatalab` folder and run `pipenv sync --dev`.
- The default Python executable on your machine must be 3.10+, otherwise this must be specified explicitly at this point).
- This will create a `pipenv` environment for `pydatalab` and all of its dependencies that is registered within *this folder* only.
- You can remove this environment to start fresh at any time by running `pipenv --rm` from within this directory.

#### Running the development server

1. Run the server from the `pydatalab` folder with either:

=== "Launching with `uv` or `venv`"
=== "Launching with `uv`"

```shell
cd pydatalab
source .venv/bin/activate
flask --app 'pydatalab:main' run --reload
uv run flask --app 'pydatalab.main' run --reload --port 5001
```

=== "Launching with `pipenv`"
=== "Launching with `venv`"

```shell
cd pydatalab
pipenv run flask --app 'pydatalab:main' run --reload
source .venv/bin/activate
ml-evs marked this conversation as resolved.
Show resolved Hide resolved
flask --app 'pydatalab.main' run --reload --port 5001
```

The server should now be accessible at [http://localhost:5001](http://localhost:5001).
If the server is running, navigating to this URL will display a simple dashboard.

Should you wish to contribute to/modify the Python code, you may wish to perform these extra steps:

1. From an activated virtual environment, run `pre-commit install` to begin using `pre-commit` to check all of your modifications when you run `git commit`.
1. From an activated virtual environment (or prepending `uv run`), run `pre-commit install` to begin using `pre-commit` to check all of your modifications when you run `git commit`.
- The hooks that run on each commit can be found in the top-level `.pre-commit-config.yml` file.
1. From an activate virtual environment, the tests on the Python code can be run by executing `pytest` from the `pydatalab/` folder.
1. From an activated virtual environment, the tests on the Python code can be run by executing `pytest` from the `pydatalab/` folder (or `uv run pytest`).

#### Additional notes

Expand Down Expand Up @@ -161,9 +138,9 @@ Previously, *datalab* used `pipenv` for dependency management, which enforced a
strict lockfile of dependencies that effectively forced all dependencies to be updated when
adding a new one.
This is no longer the case, and the `pyproject.toml` file is now the canonical
source of dependencies, however, `requirements` files are maintained for the
purpose of strict locking for deployment and testing.
Now, we use the `pip-tools`-esque functionality of `uv` to create lock files
source of dependencies, with `uv.lock` providing the strict locked versions for
testing.
Now, we use the `uv` functionality to create lock files
(and thus it is assumed that you installed the package in a `uv` virtual
environment, as described above).

Expand All @@ -176,18 +153,8 @@ underlying project updates.
Finally, recreate the lock files with:

```shell
uv pip compile pyproject.toml -o requirements/requirements-all-dev.txt --extra all --extra dev
uv pip compile pyproject.toml -o requirements/requirements-all.txt --extra all
uv lock
```

You should then inspect the changes to the requirements files (only your new
package and its subdependencies should have been added) and commit the changes.

> Regenerating the `Pipfile.lock` will not be necessary for long, but in this
> case it can be synced with the requirements.txt files via `pipenv install -r requirements/requirements-all-dev.txt`,
> and the resulting `Pipfile.lock` can be committed to the repository.

### Test server authentication/authorisation

There are two approaches to authentication when developing *datalab* features locally.
Expand Down
44 changes: 0 additions & 44 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,6 @@ services:
ports:
- "8081:8081"

app_dev:
profiles: ["dev"]
build:
context: .
dockerfile: .docker/app_dockerfile
target: development
volumes:
- ./logs:/logs
- ./webapp:/app
ports:
- "8081:8081"

api:
profiles: ["prod"]
build:
Expand All @@ -53,38 +41,6 @@ services:
environment:
- PYDATALAB_MONGO_URI=mongodb://database:27017/datalabvue

api_dev:
profiles: ["dev"]
build:
context: .
dockerfile: .docker/server_dockerfile
target: development
depends_on:
- database_dev
volumes:
- ./logs:/logs
- ./.git:/.git
- ./pydatalab/src:/app
ports:
- "5001:5001"
networks:
- backend
environment:
- PYDATALAB_MONGO_URI=mongodb://database_dev:27017/datalabvue

database_dev:
profiles: ["dev"]
build:
context: .
dockerfile: .docker/mongo_dockerfile
volumes:
- ./logs:/var/logs/mongod
restart: unless-stopped
networks:
- backend
ports:
- "27017:27017"

database:
profiles: ["prod"]
build:
Expand Down
Loading
Loading