-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pip packages and pip GH installs in a conda environment.yaml file list. #4
Comments
+1 for support of |
main issue with pip is lack of determinism. But we can probably work around that |
The install from |
In a way that is what conda-env does, right? Not sure if it issues the --no-deps though. I don't want to make this overly complicated, the first pass should be just a warning instead of a failure: "you have pip packages in your, this env won't pass a round trip and won't be fully reproducible. The pip packages are dropped from the lock file." |
Remove pip dependencies that were parsed from the environment file; conda doesn't emit them into an `--explicit` export format since they're managed by pip in the conda env. Related to conda#4.
This attempts to address some parts of conda#4: - no longer error when producing lock files for environments with pip dependencies - output a `*.lock.pip` file that contains the pip-specified dependencies for the target environment Note that if any pip args were passed in the original environment spec file, they're dropped, since conda-lock processes the resolved environment from conda (eg if you pass `--extra-index-url` etc). Included some help text when producing pip 'lock' files.
Remove pip dependencies that were parsed from the environment file; conda doesn't emit them into an `--explicit` export format since they're managed by pip in the conda env. Related to conda#4.
Pip can check hashes for downloaded files: |
I guess that the main challenge here is to make conda really understand what was installed by TL;DR this may be more difficult than it sounds b/c one would have to dig into @mariusvniekerk what do you say? Am I way off here? |
So we can put some of the pip packages as a magic comment. |
One option that wouldn't quite achieve reproducibility on the from copy import deepcopy
from pathlib import Path
from conda_lock.conda_lock import solve_specs_for_arch, ensure_conda, fn_to_dist_name, search_for_md5s, run_lock
import yaml
def load_env_file(file_name):
with open(file_name, 'r') as f:
data = yaml.load(f, Loader=yaml.Loader)
return data
def write_env_file(env_data, file_name):
with open(file_name, 'w') as f:
data = yaml.dump(env_data, stream=f, Dumper=yaml.Dumper)
def lock_conda_specs(conda_dependencies: list, channels: list) -> list:
conda_path = ensure_conda()
platform = 'linux-64'
"""Given a list of conda dependencies return a list of locked dependencies."""
dry_run_install = solve_specs_for_arch(
conda=conda_path,
channels=channels,
specs=conda_dependencies,
platform=platform
)
link_actions = dry_run_install["actions"]["LINK"]
if not dry_run_install['success']:
raise RuntimeError('solve failed')
for link in link_actions:
link[
"url_base"
] = f"{link['base_url']}/{link['platform']}/{link['dist_name']}"
link["url"] = f"{link['url_base']}.tar.bz2"
link["url_conda"] = f"{link['url_base']}.conda"
link_dists = {link["dist_name"] for link in link_actions}
fetch_actions = dry_run_install["actions"]["FETCH"]
fetch_by_dist_name = {
fn_to_dist_name(pkg["fn"]): pkg for pkg in fetch_actions
}
non_fetch_packages = link_dists - set(fetch_by_dist_name)
if len(non_fetch_packages) > 0:
for search_res in search_for_md5s(
conda_path,
[x for x in link_actions if x["dist_name"] in non_fetch_packages],
platform,
):
dist_name = fn_to_dist_name(search_res["fn"])
fetch_by_dist_name[dist_name] = search_res
pkgs = []
for pkg in link_actions:
url = fetch_by_dist_name[pkg["dist_name"]]["url"]
md5 = fetch_by_dist_name[pkg["dist_name"]]["md5"]
pkgs.append(f"{url}#{md5}")
return pkgs
def lock_env_data(env_data):
"""Convert conda environment dependencies to locked specs using conda_lock.
Args:
env_data ([type]): [description]
"""
# split dependencies into conda dependencies and pip dependencies
deps = env_data['dependencies']
conda_deps = [dep for dep in deps if isinstance(dep, str)]
pip_deps = [dep for dep in deps if not isinstance(dep, str)]
if len(pip_deps) > 1:
raise ValueError("there is more than one dictionary in dependencies. Should be only pip")
locked_conda_deps = lock_conda_specs(conda_deps, env_data['channels'])
if pip_deps:
locked_conda_deps.append(pip_deps[0])
locked_env_data = deepcopy(env_data)
locked_env_data['dependencies'] = locked_conda_deps
return locked_env_data
def lock_env_file(env_file, locked_env_file):
env_data = load_env_file(env_file)
locked_env_data = lock_env_data(env_data)
write_env_file(locked_env_data, locked_env_file)
return locked_env_file As an example, it converts this environment: name: test
channels:
- conda-forge
dependencies:
- python=3.7
- pandas=1.0.5
- pip:
- sidetable=0.7.0
prefix: /opt/conda/envs/test To this: channels:
- conda-forge
dependencies:
- https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
- https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2020.6.20-hecda079_0.tar.bz2#1b1cca86e95c416a8e7eb6062af6d503
- https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.34-hc38a660_9.tar.bz2#aa1e7603f8dd36f8d60026cda3f1fb2c
- https://conda.anaconda.org/conda-forge/linux-64/libgfortran-ng-7.5.0-hdf63c60_16.tar.bz2#d403b27c431064370f9d1b1962f8a86b
- https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-9.3.0-hdf63c60_16.tar.bz2#2c7c23cdad4f42f924d19029ef97475c
- https://conda.anaconda.org/conda-forge/linux-64/libgomp-9.3.0-h24d8f2e_16.tar.bz2#48f89ebfddb4ac93e74b0f4ab14c4a13
- https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-1_gnu.tar.bz2#561e277319a41d4f24f5c05a9ef63c04
- https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-9.3.0-h24d8f2e_16.tar.bz2#846daf5c2a4dd387047cc5ccc6b9c613
- https://conda.anaconda.org/conda-forge/linux-64/libffi-3.2.1-he1b5a44_1007.tar.bz2#11389072d7d6036fd811c3d9460475cd
- https://conda.anaconda.org/conda-forge/linux-64/libopenblas-0.3.10-pthreads_hb3c22a3_4.tar.bz2#8e3914247353e97a184909dbee132bfb
- https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.2-he1b5a44_1.tar.bz2#d3da4932f3d8e6b3c81fcf177d1e6eab
- https://conda.anaconda.org/conda-forge/linux-64/openssl-1.1.1g-h516909a_1.tar.bz2#6fdcd20ec22aeffa10b6102bccc47e7f
- https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.5-h516909a_1.tar.bz2#33f601066901f3e1a85af3522a8113f9
- https://conda.anaconda.org/conda-forge/linux-64/zlib-1.2.11-h516909a_1009.tar.bz2#93486907c6757170a5125198506d9cf8
- https://conda.anaconda.org/conda-forge/linux-64/libblas-3.8.0-17_openblas.tar.bz2#fdd1790e564778bf0c616e639badfe58
- https://conda.anaconda.org/conda-forge/linux-64/readline-8.0-he28a2e2_2.tar.bz2#4d0ae8d473f863696088f76800ef9d38
- https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.10-hed695b0_0.tar.bz2#9a3e126468fa7fb6a54caad41b5a2d45
- https://conda.anaconda.org/conda-forge/linux-64/libcblas-3.8.0-17_openblas.tar.bz2#28f6376d1c4ca5e0fc287fb0484e37a1
- https://conda.anaconda.org/conda-forge/linux-64/liblapack-3.8.0-17_openblas.tar.bz2#09cfbcdb4888dc9010b4cbc60e55c6ad
- https://conda.anaconda.org/conda-forge/linux-64/sqlite-3.33.0-h4cf870e_0.tar.bz2#b22603a9c94d2cda5911f7a2cd55aa95
- https://conda.anaconda.org/conda-forge/linux-64/python-3.7.8-h425cb1d_1_cpython.tar.bz2#3197fc7597f6d13d32350dd93e15f3e2
- https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.7-1_cp37m.tar.bz2#658a5c3d766bfc6574480204b10a6f20
- https://conda.anaconda.org/conda-forge/noarch/pytz-2020.1-pyh9f0ad1d_0.tar.bz2#e52abc1f0fd70e05001c1ceb2696f625
- https://conda.anaconda.org/conda-forge/noarch/six-1.15.0-pyh9f0ad1d_0.tar.bz2#1eec421f0f1f39e579e44e4a5ce646a2
- https://conda.anaconda.org/conda-forge/linux-64/numpy-1.19.1-py37h7ea13bd_2.tar.bz2#f05213c1f8539d8ee086139df2b762c7
- https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.8.1-py_0.tar.bz2#0d0150ed9c2d25817f5324108d3f7571
- https://conda.anaconda.org/conda-forge/linux-64/pandas-1.0.5-py37h0da4684_0.tar.bz2#6fddaa88968614a9be807964f586e91c
- pip:
- sidetable=0.7.0
name: test
prefix: /opt/conda/envs/test |
@sterlinm one relatively crude approach that we can take is to harvest the pip packages and embed it as a special comment in the lockfile that installers can use |
It would be wonderful to make progress on this issue. Does the new pip resolver help? Also, pip-tools can create "lock" files as well. |
Using the new resolver / pip-tools might be feasible with some pretty aggressive hacks
And for the installer
|
@mariusvniekerk That seems like a feasible approach. We do something similar now, but don't actually ensure the transitive dependencies of the anaconda and pip packages are compatible. Would it be possible reverse the order of the pip and conda resolution in your algorithm? One things we've noticed is that |
Also, this tool seems relevant. It somehow combines conda, pip, and nix packages, and has it's own dependency resolution approach: https://github.com/DavHau/mach-nix. |
Just to chip in: support for |
@RafalSkolasinski we have a similar problem, but we use conda lock for the anaconda dependencies and pip-tools for the pip packages. Of course, there could be some inconsistencies between the conda and pip lock files, but the setup is still deterministic, so it doesn't break randomly. |
I have the same issue as @RafalSkolasinski and @nbren12. I'm considering using @nbren12's approach of running the conda-lock and pip-compile tools independently, but I'm a bit nervous about incompatibilities from pip overwriting the conda dependency versions, and also this results in wasted space in Docker images. Still, good point that at least it's deterministic. For posterity, I asked a StackOverflow question about this: https://stackoverflow.com/questions/68171629/how-do-i-pin-versioned-dependencies-in-python-when-using-both-conda-and-pip |
In case it helps others, I went with a heavier weight approach of installing the conda+pip dependencies in a temporary conda enviroment and then using I wrote up this approach here: https://gist.github.com/jli/b2d2d62ad44b7fcb5101502c08dca1ae |
@jli for a realized env like that you can use https://github.com/olegtarasov/conda-export That "heavy weight" approach is problem the only way to solve this at the moment. |
@ocefpaf Hm, I'm not sure I understand how I guess |
Oh. Sorry, it was missing some context. Not using conda-export per se but the part that can figure out what was pip installed vs conda installed could be used to performa "conda-lock" in a realized env. |
Original example environment in this issue includes a
#197 is related but not currently equivalent since it specifically references installing from private GitHub repos. Should that issue be expanded, this one reopened, or a new one created? |
If you have something like:
conda-lock will crash. We should either fail more gracefully or add support pip installs too.
The text was updated successfully, but these errors were encountered: