-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Multiple Categories for Sub-Dependencies in Lockfile #390
Conversation
✅ Deploy Preview for conda-lock ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@maresb Any chance you can review this sometime this week? |
It's really tough for me since I'm in the middle of a big move... I wanted to ask... have you checked that the resulting lockfiles install with Micromamba? |
I feel really bad. Despite appearances I'm very eager to get this in. |
@maresb No worries! I wanted to remind you just in case, but not a big deal at all. i have tested this PR on a very large production environment lockfile, but not with Micromamba (since Micromamba used to not have certain features I needed until 1.4). I will do so soon and let you know! |
@srilman, great! I'm on it now... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't take these comments seriously yet. They aren't final, and some of them are incorrect. I just want to get them out now in case a rebase comes in.
for dep in deps: | ||
dep_to_extra[dep] = category | ||
dep_to_extra[dep] = cat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR: I don't see anything preventing a dependency from belonging to multiple extras. I don't think dep_to_extra
should be single-valued.
@@ -111,9 +112,9 @@ def parse_poetry_pyproject_toml( | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR: keys are pyproject.toml subsections of ("tool", "poetry")
and values are conda-lock categories.
@@ -125,16 +126,48 @@ def parse_poetry_pyproject_toml( | |||
for depname, depattrs in get_in( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR: It would make a lot more sense to me if categories.items()
were assigned to section, conda_lock_category
instead of section, default_category
.
in_extra: bool = False | ||
|
||
# Extras can only be defined in `tool.poetry.dependencies` | ||
if default_category == "main": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be more direct here to write
if section == ("dependencies",):
if default_category == "main": | ||
category = dep_to_extra.get(depname) or "main" | ||
in_extra = category != "main" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid redefinition of category
and in_extra
by removing previous definitions and adding here:
else:
category = default_category
in_extra = False
if optional_flag is not None and default_category != "main": | ||
warnings.warn( | ||
f"`{depname}` in file {path.name} is specified with the `optional` flag. " | ||
f"Conda-Lock will follows Poetry behavior and ignore the flag. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follows
→follow
category: str = default_category | ||
optional: bool = category != "main" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing the right-hand-sides and leave only the type definition. (See comment below for explanation.)
in_extra = category != "main" | ||
else: | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this trigger on all dependency groups?
conda_lock/conda_lock.py
Outdated
@@ -542,7 +542,7 @@ def render_lockfile_for_platform( # noqa: C901 | |||
lockfile.toposort_inplace() | |||
|
|||
for p in lockfile.package: | |||
if p.platform == platform and p.category in categories: | |||
if p.platform == platform and not p.categories.isdisjoint(categories): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double-negative requires some mental overhead to parse. It may be a bit verbose, but I'd prefer:
... and len(p.categories & categories) > 0
and also for clarity it'd be nice to rename categories
→categories_to_install
.
Important note: in order to avoid breaking backwards compatibility, we should still emit the |
I'm worried now that even including |
@maresb I'm going to rebase this PR and address some of the comments. |
That would be wonderful, thanks so much @srilman!!! |
c437936
to
94c68e4
Compare
@maresb I ran into this problem recently and couldn't get around it, so I can't put off this PR any longer. I tried naively rebasing but ran into some issues that I need your opinion with. I noticed that a lot has changed in terms of the lockfile format. There seems to be 2 lockfile formats now, I presume because of how Pydantic works. In this PR, we add an additional field to the lockfile format, What are your thoughts? |
Hey @srilman, it's really great to hear from you again!!! I already attempted and failed to rebase this in #410. I've been trying to steer the changes in a direction to make a rebase easier, so maybe it makes sense to give this another shot. We can't change the lockfile format without big discussions from the Mamba and Rattler folks. (We would need to do a major-version release coordinated with Micromamba.) Hence the V2 format is a proposal, and we work internally with V2, but the input/output always goes through V1. And Ideally we would implement Maybe we could add some flags that are very clearly marked as experimental so that we can export a V2 format file that will only work with conda-lock. You may be interested in #546 where we're working on merging version specifications. It'd be really great to get your feedback there if you get the chance. |
94c68e4
to
77807e1
Compare
@maresb Took a while, but finally rebased the PR. You're right, it seemed easier to integrate than before. Like you said, I made I'll take a look at #546, but it looks like that PR might need some changes from this PR. |
This is very exciting, thanks so much @srilman!!!
Ok, but we'll need to thoroughly test this with both While #546 could greatly benefit from your input, I'd really like to get this in without more rebases, so let's make this PR top priority. |
Any thoughts on how to do this? Can we write unit tests for this? Probably for |
It would be really hard to write unit tests here for Micromamba's install process. I'm mainly interested in if/how Micromamba handles deduplication. If you have a Python dependency in both Other than that, we might want to write a unit test with a simple example to make sure that we get a functioning environment after installing from a simple multi-category generated lockfile, and the test would just be running some installed thing as a subprocess. |
In the PR, I added an extra filter pass that takes any dependency in
Let me see what I can do, but feel free to add to this PR if you have the time to do so. |
Ok so far for my use case, this PR seems to be working well. It seems to be producing a valid lockfile (based on a quick script I whipped up to make sure that the dependency tree is correct). In addition, I added the test case for the @maresb When you have a chance, could you review this PR and let me know if there's anything else you'd like me to do? I'd really love to see this merged soon so that I can start using it safely. |
1f31cd1
to
90a7f08
Compare
build=self.build, | ||
optional=self.category != "main", | ||
) | ||
categories: Set[str] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be set("main")
? The previous default value for category
was "main"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want {"main"}
, as set("main")
gives you {"m", "a", ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that in apply_categories
, we assume that categories
is initially empty and append to the set as needed. If we initially set it to {'main'}
, then every dependency will be in the main category and will always be installed.
@maresb So I've started integrating this into my own use case in general, and had a chance to large-scale test and with micromamba. This is what I saw so far:
IMO the |
FYI I think the test case I added in this PR is a good example to test Micromamba with |
I've carefully looked through the first commit. It was really challenging for me to understand what's going on. Probably I will have a lot of comments to add. Here are some notes: I'm trying to understand "Change category → categories in LockfileV2 only". There are a few classes involved, namely: _BaseDependency: has BaseLockedDependency → LockedDependencyV1, LockedDependencyV2 We want to change We add
For 1. we convert a v2 dependency to a list of v1 dependencies that are identical except To convert from v1 back to v2, we start with a list of For 3. we start by leaving |
In that case, I'm going to add some comments to make the second commit a bit easier to understand. Thanks for pointing that out. |
@@ -163,7 +194,7 @@ def write_conda_lock_file( | |||
content.filter_virtual_packages_inplace() | |||
with path.open("w") as f: | |||
if include_help_text: | |||
categories = set(p.category for p in content.package) | |||
categories: Set[str] = set().union(*(p.categories for p in content.package)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Takes the union of all sets of categories of all dependencies to get the final set of categories to store in the lockfile.
categories = [*categories, *(k for k in by_category if k not in categories)] | ||
root_requests = {} | ||
root_requests: DefaultDict[str, List[str]] = defaultdict(list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, we used to map each package (LockedDependency) to 1 root dependency that is its ancestor (Dependency). A root dependency is a dependency specified in a source file. So the root dependency uses the package as a sub-dependency. As the comment says, we would try to use the first root.
Now, we need to store all root dependencies, in case there are multiple that use a package as a sub-dependency. That will allow us to take multiple category
values, one from each root, and combine to create the categories
value for a package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core change for the second commit.
|
||
# For any dep that is part of the 'main' category | ||
# we should remove all other categories | ||
_truncate_main_category(planned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, if any package has a root / ancestor package that is part of the main
category, it should also be in the main
category. main
root packages are always installed, so their sub-dependencies should also always be installed.
We want to maintain that same behavior in this PR. Thus, for any package that is in the main
category, we don't need to remember any other categories it is in, since it is already non-optional. So function _truncate_main_category
will remove other categories in that case.
- linux-64 | ||
|
||
dependencies: | ||
- pyspark =3.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to test the following with this setup.
- Pandas should be in the lockfile with 2 optional categories, dev and test. It is a source dependency in file
dev.yml
, so we can make sure that special case works. - Numpy should be in the lockfile with 2 optional categories, dev and test. It is a subdependency of astropy and PySpark.
- Python should be in the lockfile, with just the main category. It is a dependency of every other package, so this can test
_truncate_main_category
Hey, I just wanted to say that I have not forgotten about this!!! I've been trying to very carefully understand the fine details of this PR, and have been working on trying to make pure refactoring changes in #589 that will allow us to simply switch over to the new system. I feel really bad because I keep having annoying personal stuff (like bureaucracy of registering in a new country, taxes, etc.) that keeps draining away the time that I want to devote to pushing this through. At the same time, I want to be realistic, and I haven't had the available time I envisioned to push things forward here. The folks at prefix.dev have a team working full-time doing amazing work with Conda lockfiles. So I think pixi is a more sustainable solution in the medium-term. I'd like to converge with them so that conda-lock users can smoothly transition when they're ready. I just pinged the prefix folks here: https://discord.com/channels/1082332781146800168/1214169703811907604/1214169703811907604 |
Superseded by #697 |
Support Multiple Categories for Sub-Dependencies in Lockfile (Rebase #390)
Partially implements #278. This PR supports internally storing multiple categories for sub-dependencies (dependencies of dependencies specified in source files) and including that information in the lockfile, following the syntax specified in the associated issue.
Note that I specifically tackled sub-dependencies first because source dependencies can have the issue where they are specified in two different source files. In addition to having 2 different category specs, they may have different version specs as well. There is an existing PR (#300) to merge version constraints together, but it is blocked right now due to a discussion. I feel like this is not as common of a situation, so I decided to work on sub-dependencies first.
As discussed in the associated issue, this PR will change the resulting lockfile by adding the additional field
categories
. Furthermore, we will also have multiple copies of the same dependency for each category in the categories. This PR makes sure thatconda-lock
produces a lockfile in this format and parse lockfiles with this and the previous format correctly.This PR builds on top of #389.