Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pip packages and pip GH installs in a conda environment.yaml file list. #4

Closed
ocefpaf opened this issue Feb 24, 2020 · 22 comments

Comments

@ocefpaf
Copy link
Contributor

ocefpaf commented Feb 24, 2020

If you have something like:

name: test
channels:
  - conda-forge
dependencies:
  - python=3.7
  - pip:
    - some-pkg
    - git+https://github.com/someuser/another-pkg.git@master

conda-lock will crash. We should either fail more gracefully or add support pip installs too.

@marcelotrevisani
Copy link
Member

+1 for support of pip install --no-deps

@mariusvniekerk
Copy link
Collaborator

main issue with pip is lack of determinism. But we can probably work around that

@ocefpaf
Copy link
Contributor Author

ocefpaf commented Feb 26, 2020

The install from master example above is just wrong IMO and I added it to discuss some sort of warning for conda-lock. For PyPI packages we can probably look into pip-tools for inspiration.

@ocefpaf
Copy link
Contributor Author

ocefpaf commented Feb 27, 2020

+1 for support of pip install --no-deps

In a way that is what conda-env does, right? Not sure if it issues the --no-deps though.

I don't want to make this overly complicated, the first pass should be just a warning instead of a failure: "you have pip packages in your, this env won't pass a round trip and won't be fully reproducible. The pip packages are dropped from the lock file."

noahp added a commit to noahp/conda-lock that referenced this issue May 24, 2020
Remove pip dependencies that were parsed from the environment file;
conda doesn't emit them into an `--explicit` export format since they're
managed by pip in the conda env.

Related to conda#4.
noahp added a commit to noahp/conda-lock that referenced this issue May 24, 2020
This attempts to address some parts of conda#4:

- no longer error when producing lock files for environments with pip
dependencies
- output a `*.lock.pip` file that contains the pip-specified
dependencies for the target environment

Note that if any pip args were passed in the original environment spec
file, they're dropped, since conda-lock processes the resolved
environment from conda (eg if you pass `--extra-index-url` etc).

Included some help text when producing pip 'lock' files.
noahp added a commit to noahp/conda-lock that referenced this issue May 25, 2020
Remove pip dependencies that were parsed from the environment file;
conda doesn't emit them into an `--explicit` export format since they're
managed by pip in the conda env.

Related to conda#4.
@noahp
Copy link
Contributor

noahp commented May 27, 2020

Pip can check hashes for downloaded files:
https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode

@ocefpaf
Copy link
Contributor Author

ocefpaf commented May 27, 2020

Pip can check hashes for downloaded files:
pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode

I guess that the main challenge here is to make conda really understand what was installed by pip. I'm not even sure that an old bug, that made packages with an - in the name, be wrongly identified as pip packages even when installed with conda.

TL;DR this may be more difficult than it sounds b/c one would have to dig into conda itself.

@mariusvniekerk what do you say? Am I way off here?

@mariusvniekerk
Copy link
Collaborator

So we can put some of the pip packages as a magic comment.
This also does imply that conda-lock have an install mode so that it runs the appropriate pip installs since I'm pretty sure conda install isn't going to

@sterlinm
Copy link

sterlinm commented Sep 1, 2020

One option that wouldn't quite achieve reproducibility on the pip end but still might be useful is to use conda-lock to update the conda dependencies in the environment.yml file with the locked conda packages while leaving the pip section of the dependencies alone. It wouldn't guarantee the reproducibility of the pip packages but those are always installed after the conda environment is solved, so it still might be an improvement for some people. I have a clunky script that's doing something like this.

from copy import deepcopy
from pathlib import Path
from conda_lock.conda_lock import solve_specs_for_arch, ensure_conda, fn_to_dist_name, search_for_md5s, run_lock
import yaml

def load_env_file(file_name):
    with open(file_name, 'r') as f:
        data = yaml.load(f, Loader=yaml.Loader)
    return data

def write_env_file(env_data, file_name):
    with open(file_name, 'w') as f:
        data = yaml.dump(env_data, stream=f, Dumper=yaml.Dumper)

def lock_conda_specs(conda_dependencies: list, channels: list) -> list:
    conda_path = ensure_conda()
    platform = 'linux-64'
    """Given a list of conda dependencies return a list of locked dependencies."""
    dry_run_install = solve_specs_for_arch(
        conda=conda_path,
        channels=channels,
        specs=conda_dependencies,
        platform=platform
    )

    link_actions = dry_run_install["actions"]["LINK"]
    if not dry_run_install['success']:
        raise RuntimeError('solve failed')
    for link in link_actions:
        link[
            "url_base"
        ] = f"{link['base_url']}/{link['platform']}/{link['dist_name']}"
        link["url"] = f"{link['url_base']}.tar.bz2"
        link["url_conda"] = f"{link['url_base']}.conda"
    link_dists = {link["dist_name"] for link in link_actions}

    fetch_actions = dry_run_install["actions"]["FETCH"]

    fetch_by_dist_name = {
        fn_to_dist_name(pkg["fn"]): pkg for pkg in fetch_actions
    }

    non_fetch_packages = link_dists - set(fetch_by_dist_name)
    if len(non_fetch_packages) > 0:
        for search_res in search_for_md5s(
            conda_path,
            [x for x in link_actions if x["dist_name"] in non_fetch_packages],
            platform,
        ):
            dist_name = fn_to_dist_name(search_res["fn"])
            fetch_by_dist_name[dist_name] = search_res

    pkgs = []
    for pkg in link_actions:
        url = fetch_by_dist_name[pkg["dist_name"]]["url"]
        md5 = fetch_by_dist_name[pkg["dist_name"]]["md5"]
        pkgs.append(f"{url}#{md5}")
    
    return pkgs



def lock_env_data(env_data):
    """Convert conda environment dependencies to locked specs using conda_lock.

    Args:
        env_data ([type]): [description]
    """
    # split dependencies into conda dependencies and pip dependencies
    deps = env_data['dependencies']
    conda_deps = [dep for dep in deps if isinstance(dep, str)]
    pip_deps = [dep for dep in deps if not isinstance(dep, str)]
    if len(pip_deps) > 1:
        raise ValueError("there is more than one dictionary in dependencies. Should be only pip")

    locked_conda_deps = lock_conda_specs(conda_deps, env_data['channels'])
    if pip_deps:
        locked_conda_deps.append(pip_deps[0])
    locked_env_data = deepcopy(env_data)
    locked_env_data['dependencies'] = locked_conda_deps
    return locked_env_data

def lock_env_file(env_file, locked_env_file):
    env_data = load_env_file(env_file)
    locked_env_data = lock_env_data(env_data)
    write_env_file(locked_env_data, locked_env_file)
    return locked_env_file

As an example, it converts this environment:

name: test
channels:
 - conda-forge
dependencies:
 - python=3.7
 - pandas=1.0.5
 - pip:
    - sidetable=0.7.0
prefix: /opt/conda/envs/test

To this:

channels:
- conda-forge
dependencies:
- https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
- https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2020.6.20-hecda079_0.tar.bz2#1b1cca86e95c416a8e7eb6062af6d503
- https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.34-hc38a660_9.tar.bz2#aa1e7603f8dd36f8d60026cda3f1fb2c
- https://conda.anaconda.org/conda-forge/linux-64/libgfortran-ng-7.5.0-hdf63c60_16.tar.bz2#d403b27c431064370f9d1b1962f8a86b
- https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-9.3.0-hdf63c60_16.tar.bz2#2c7c23cdad4f42f924d19029ef97475c
- https://conda.anaconda.org/conda-forge/linux-64/libgomp-9.3.0-h24d8f2e_16.tar.bz2#48f89ebfddb4ac93e74b0f4ab14c4a13
- https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-1_gnu.tar.bz2#561e277319a41d4f24f5c05a9ef63c04
- https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-9.3.0-h24d8f2e_16.tar.bz2#846daf5c2a4dd387047cc5ccc6b9c613
- https://conda.anaconda.org/conda-forge/linux-64/libffi-3.2.1-he1b5a44_1007.tar.bz2#11389072d7d6036fd811c3d9460475cd
- https://conda.anaconda.org/conda-forge/linux-64/libopenblas-0.3.10-pthreads_hb3c22a3_4.tar.bz2#8e3914247353e97a184909dbee132bfb
- https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.2-he1b5a44_1.tar.bz2#d3da4932f3d8e6b3c81fcf177d1e6eab
- https://conda.anaconda.org/conda-forge/linux-64/openssl-1.1.1g-h516909a_1.tar.bz2#6fdcd20ec22aeffa10b6102bccc47e7f
- https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.5-h516909a_1.tar.bz2#33f601066901f3e1a85af3522a8113f9
- https://conda.anaconda.org/conda-forge/linux-64/zlib-1.2.11-h516909a_1009.tar.bz2#93486907c6757170a5125198506d9cf8
- https://conda.anaconda.org/conda-forge/linux-64/libblas-3.8.0-17_openblas.tar.bz2#fdd1790e564778bf0c616e639badfe58
- https://conda.anaconda.org/conda-forge/linux-64/readline-8.0-he28a2e2_2.tar.bz2#4d0ae8d473f863696088f76800ef9d38
- https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.10-hed695b0_0.tar.bz2#9a3e126468fa7fb6a54caad41b5a2d45
- https://conda.anaconda.org/conda-forge/linux-64/libcblas-3.8.0-17_openblas.tar.bz2#28f6376d1c4ca5e0fc287fb0484e37a1
- https://conda.anaconda.org/conda-forge/linux-64/liblapack-3.8.0-17_openblas.tar.bz2#09cfbcdb4888dc9010b4cbc60e55c6ad
- https://conda.anaconda.org/conda-forge/linux-64/sqlite-3.33.0-h4cf870e_0.tar.bz2#b22603a9c94d2cda5911f7a2cd55aa95
- https://conda.anaconda.org/conda-forge/linux-64/python-3.7.8-h425cb1d_1_cpython.tar.bz2#3197fc7597f6d13d32350dd93e15f3e2
- https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.7-1_cp37m.tar.bz2#658a5c3d766bfc6574480204b10a6f20
- https://conda.anaconda.org/conda-forge/noarch/pytz-2020.1-pyh9f0ad1d_0.tar.bz2#e52abc1f0fd70e05001c1ceb2696f625
- https://conda.anaconda.org/conda-forge/noarch/six-1.15.0-pyh9f0ad1d_0.tar.bz2#1eec421f0f1f39e579e44e4a5ce646a2
- https://conda.anaconda.org/conda-forge/linux-64/numpy-1.19.1-py37h7ea13bd_2.tar.bz2#f05213c1f8539d8ee086139df2b762c7
- https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.8.1-py_0.tar.bz2#0d0150ed9c2d25817f5324108d3f7571
- https://conda.anaconda.org/conda-forge/linux-64/pandas-1.0.5-py37h0da4684_0.tar.bz2#6fddaa88968614a9be807964f586e91c
- pip:
  - sidetable=0.7.0
name: test
prefix: /opt/conda/envs/test

@mariusvniekerk
Copy link
Collaborator

@sterlinm one relatively crude approach that we can take is to harvest the pip packages and embed it as a special comment in the lockfile that installers can use

@nbren12
Copy link

nbren12 commented Feb 18, 2021

It would be wonderful to make progress on this issue. Does the new pip resolver help? Also, pip-tools can create "lock" files as well.

@mariusvniekerk
Copy link
Collaborator

Using the new resolver / pip-tools might be feasible with some pretty aggressive hacks

  1. do the regular conda solve with an added pip
  2. determine which of those packages are in fact python packages (non-trivial)
  3. reverse name map those conda names to pypi names (similar to what we do for pyproject.toml etc)
  4. use those names + resolved version numbers + the pip specified packages to generate a requirements.IN
  5. solve that thing with pip-tools
  6. prune away all the things that conda provides
  7. add the pip packages + hashes to a magic comment somewhere in the lock

And for the installer

  1. Teach the installer about the magic comment block

@nbren12
Copy link

nbren12 commented Mar 4, 2021

@mariusvniekerk That seems like a feasible approach. We do something similar now, but don't actually ensure the transitive dependencies of the anaconda and pip packages are compatible.

Would it be possible reverse the order of the pip and conda resolution in your algorithm? One things we've noticed is that pip-tools does not work very well for some packages (e.g. cartopy) which require system libraries be installed before running the setup.py. Since we are using conda anyway, it would be nice to avoid running pip-compile on these tricky packages.

@nbren12
Copy link

nbren12 commented Mar 4, 2021

Also, this tool seems relevant. It somehow combines conda, pip, and nix packages, and has it's own dependency resolution approach: https://github.com/DavHau/mach-nix.

@RafalSkolasinski
Copy link

Just to chip in: support for pip packages is currently what stops us from exploring using the tool.
We look for option to "lock" environments for ML model servings. Some packages that we need to include do not come as conda packages and need to be listed as pip deps and locking their version - together with their 2nd+ level deps is critical.

@nbren12
Copy link

nbren12 commented Mar 17, 2021

@RafalSkolasinski we have a similar problem, but we use conda lock for the anaconda dependencies and pip-tools for the pip packages. Of course, there could be some inconsistencies between the conda and pip lock files, but the setup is still deterministic, so it doesn't break randomly.

@jli
Copy link

jli commented Jun 29, 2021

I have the same issue as @RafalSkolasinski and @nbren12. I'm considering using @nbren12's approach of running the conda-lock and pip-compile tools independently, but I'm a bit nervous about incompatibilities from pip overwriting the conda dependency versions, and also this results in wasted space in Docker images. Still, good point that at least it's deterministic.

For posterity, I asked a StackOverflow question about this: https://stackoverflow.com/questions/68171629/how-do-i-pin-versioned-dependencies-in-python-when-using-both-conda-and-pip

@jli
Copy link

jli commented Jul 1, 2021

In case it helps others, I went with a heavier weight approach of installing the conda+pip dependencies in a temporary conda enviroment and then using conda env export to generate a lock file that includes both conda and pip packages.

I wrote up this approach here: https://gist.github.com/jli/b2d2d62ad44b7fcb5101502c08dca1ae

@ocefpaf
Copy link
Contributor Author

ocefpaf commented Jul 1, 2021

@jli for a realized env like that you can use https://github.com/olegtarasov/conda-export

That "heavy weight" approach is problem the only way to solve this at the moment.

@jli
Copy link

jli commented Jul 1, 2021

@ocefpaf Hm, I'm not sure I understand how conda-export helps. It seems to go the opposite direction from what I want? (I want to give a high-level spec (w/ direct dependencies and minimal version constraints), and get out a low-level lock file w/ all dependencies (including transitive) at specific versions.)

I guess conda-export would be useful to get out the high-level spec from an existing environment that was created in an adhoc way?

@ocefpaf
Copy link
Contributor Author

ocefpaf commented Jul 1, 2021

Oh. Sorry, it was missing some context. Not using conda-export per se but the part that can figure out what was pip installed vs conda installed could be used to performa "conda-lock" in a realized env.

@mariusvniekerk
Copy link
Collaborator

mariusvniekerk commented Mar 3, 2022

This is now supported by #122. Pr is #124

@srstsavage
Copy link

Original example environment in this issue includes a git+https:// pip dependency, which doesn't work in conda-lock 1.1.1.

$ cat environment.yml               
name: test                                                                                                                                                                      
channels:                                                                               
  - conda-forge                                                                                                                                                                 
dependencies:                                                                           
  - python=3.7                                                                                                                                                                  
  - pip:                                                                                
    - xarray                                                                                                                                                                    
    - git+https://github.com/pandas-dev/[email protected]     
$ conda list | grep conda-lock                                                                                                                                                  
conda-lock                1.1.1              pyhd8ed1ab_0    conda-forge
$ conda-lock -p osx-64 -p linux-64 2>&1 | tail
    parsed_req = Requirement.parse(requirement_specifier)
  File "/home/shane/miniconda3/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3139, in parse
    req, = parse_requirements(s)
  File "/home/shane/miniconda3/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3084, in parse_requirements
    yield Requirement(line)
  File "/home/shane/miniconda3/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3094, in __init__
    super(Requirement, self).__init__(requirement_string)
  File "/home/shane/miniconda3/lib/python3.9/site-packages/pkg_resources/_vendor/packaging/requirements.py", line 100, in __init__
    raise InvalidRequirement(
pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'+https:/'": Expected stringEnd

#197 is related but not currently equivalent since it specifically references installing from private GitHub repos. Should that issue be expanded, this one reopened, or a new one created?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants