Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile for census builder #267

Merged
merged 3 commits into from
Mar 17, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

ARG COMMIT_SHA
ENV COMMIT_SHA=${COMMIT_SHA}


# RUN apt-get update && \
# apt-get install -y python3 libhdf5-dev python3-h5py gettext moreutils build-essential libxml2-dev python3-dev python3-pip zlib1g-dev python3-requests python3-aiohttp llvm jq && \
# rm -rf /var/lib/apt/lists/*

RUN apt update && apt -y install python3.10-venv python3-pip awscli gh jq

ADD tools/cell_census_builder/ /tools/cell_census_builder
ADD tools/scripts/requirements.txt .
ADD entrypoint.py .
ADD build-census.yaml .

RUN python3 -m pip install -r requirements.txt

ENTRYPOINT ["./entrypoint.py"]
11 changes: 11 additions & 0 deletions build-census.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
census-builder:
uri:
/data/cell-census-small/
verbose:
true
commands:
build:
manifest:
/data/manifest-small.csv
test-disable-dirty-git-check:
true
31 changes: 31 additions & 0 deletions entrypoint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/python3

import yaml
import subprocess

def add_args(opts, args):
for opt_key, opt_val in opts.items():
if opt_key == "uri":
args.append(opt_val)
elif opt_key == "commands":
continue
elif isinstance(opt_val, bool):
args.append(f"--{opt_key}")
else:
args.append(f"--{opt_key}")
args.append(opt_val)


with open("build-census.yaml") as y:
args = ["python3", "-m", "tools.cell_census_builder"]
config = yaml.safe_load(y)
builder = config["census-builder"]
uri = builder["uri"]
add_args(builder, args)
commands = builder["commands"]
for cmd, opts in commands.items():
subcommand_args = args.copy()
subcommand_args.append(cmd)
add_args(opts, subcommand_args)
print(subcommand_args)
subprocess.call(subcommand_args)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling subprocess, we could modify the builder so that it could be called directly. This would require, at minimum, to modify the main function to accept a Namespace, but I think for now I'd like to validate this approach before we make major changes.

3 changes: 3 additions & 0 deletions entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer necessary, but I included it as an example if we wanted a simplified build process (without the yaml file)


python3 -m tools.cell_census_builder /data/cell-census-small/ -v build --manifest /data/manifest-small.csv --test-disable-dirty-git-check
10 changes: 9 additions & 1 deletion tools/cell_census_builder/util.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import os
import time
import urllib.parse
from typing import Any, Iterator, Optional, Union

import git
import numpy as np
import numpy.typing as npt
import pandas as pd
Expand Down Expand Up @@ -134,6 +134,12 @@ def get_git_commit_sha() -> str:
"""
Returns the git commit SHA for the current repo
"""
# Try to get the git commit SHA from the COMMIT_SHA env variable
commit_sha_var = os.getenv("COMMIT_SHA")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because we don't want the .git folder to be in the Docker image, so we need to pass the commit SHA externally.

if commit_sha_var is not None:
return commit_sha_var
import git # Scoped import - this requires the git executable to exist on the machine

repo = git.Repo(search_parent_directories=True)
hexsha: str = repo.head.object.hexsha
return hexsha
Expand All @@ -143,6 +149,8 @@ def is_git_repo_dirty() -> bool:
"""
Returns True if the git repo is dirty, i.e. there are uncommitted changes
"""
import git # Scoped import - this requires the git executable to exist on the machine

repo = git.Repo(search_parent_directories=True)
is_dirty: bool = repo.is_dirty()
return is_dirty