Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multiple Categories for Sub-Dependencies in Lockfile #390

Closed
wants to merge 3 commits into from

Conversation

srilman
Copy link
Contributor

@srilman srilman commented Mar 13, 2023

Partially implements #278. This PR supports internally storing multiple categories for sub-dependencies (dependencies of dependencies specified in source files) and including that information in the lockfile, following the syntax specified in the associated issue.

Note that I specifically tackled sub-dependencies first because source dependencies can have the issue where they are specified in two different source files. In addition to having 2 different category specs, they may have different version specs as well. There is an existing PR (#300) to merge version constraints together, but it is blocked right now due to a discussion. I feel like this is not as common of a situation, so I decided to work on sub-dependencies first.

As discussed in the associated issue, this PR will change the resulting lockfile by adding the additional field categories. Furthermore, we will also have multiple copies of the same dependency for each category in the categories. This PR makes sure that conda-lock produces a lockfile in this format and parse lockfiles with this and the previous format correctly.

This PR builds on top of #389.

@srilman srilman requested a review from a team as a code owner March 13, 2023 00:53
@netlify
Copy link

netlify bot commented Mar 13, 2023

Deploy Preview for conda-lock ready!

Name Link
🔨 Latest commit 90a7f08
🔍 Latest deploy log https://app.netlify.com/sites/conda-lock/deploys/65963a534838f60008fe3f5e
😎 Deploy Preview https://deploy-preview-390--conda-lock.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@srilman
Copy link
Contributor Author

srilman commented Apr 2, 2023

@maresb Any chance you can review this sometime this week?

@maresb
Copy link
Contributor

maresb commented Apr 2, 2023

It's really tough for me since I'm in the middle of a big move...

I wanted to ask... have you checked that the resulting lockfiles install with Micromamba?

@maresb
Copy link
Contributor

maresb commented Apr 2, 2023

I feel really bad. Despite appearances I'm very eager to get this in.

@srilman
Copy link
Contributor Author

srilman commented Apr 5, 2023

@maresb No worries! I wanted to remind you just in case, but not a big deal at all. i have tested this PR on a very large production environment lockfile, but not with Micromamba (since Micromamba used to not have certain features I needed until 1.4). I will do so soon and let you know!

@maresb
Copy link
Contributor

maresb commented May 20, 2023

@srilman, I'd like to get as much as possible merged this weekend. I've lost sight a bit of all the PRs. Could you please suggest to me a merge order? Like #389, #390, and then maybe #300?

@srilman
Copy link
Contributor Author

srilman commented May 20, 2023

@maresb I think #300 needs to be rebased a little bit first. So if possible, could you review #389 and then #390 for now?

@maresb
Copy link
Contributor

maresb commented May 20, 2023

@srilman, great! I'm on it now...

@maresb
Copy link
Contributor

maresb commented May 20, 2023

In this PR #390 it looks like optional was removed via commits in the middle, as per #389. That means this PR will need to be rebased, right? For the moment I'm proceeding with the review, ignoring optional...

EDIT: I think I was confused... the optional I was looking at was for Poetry. 🙂

Copy link
Contributor

@maresb maresb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't take these comments seriously yet. They aren't final, and some of them are incorrect. I just want to get them out now in case a rebase comes in.

for dep in deps:
dep_to_extra[dep] = category
dep_to_extra[dep] = cat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR: I don't see anything preventing a dependency from belonging to multiple extras. I don't think dep_to_extra should be single-valued.

@@ -111,9 +112,9 @@ def parse_poetry_pyproject_toml(
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR: keys are pyproject.toml subsections of ("tool", "poetry") and values are conda-lock categories.

@@ -125,16 +126,48 @@ def parse_poetry_pyproject_toml(
for depname, depattrs in get_in(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR: It would make a lot more sense to me if categories.items() were assigned to section, conda_lock_category instead of section, default_category.

in_extra: bool = False

# Extras can only be defined in `tool.poetry.dependencies`
if default_category == "main":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be more direct here to write

if section == ("dependencies",):

if default_category == "main":
category = dep_to_extra.get(depname) or "main"
in_extra = category != "main"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid redefinition of category and in_extra by removing previous definitions and adding here:

else:
    category = default_category
    in_extra = False

if optional_flag is not None and default_category != "main":
warnings.warn(
f"`{depname}` in file {path.name} is specified with the `optional` flag. "
f"Conda-Lock will follows Poetry behavior and ignore the flag. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followsfollow

Comment on lines 129 to 130
category: str = default_category
optional: bool = category != "main"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing the right-hand-sides and leave only the type definition. (See comment below for explanation.)

in_extra = category != "main"
else:
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this trigger on all dependency groups?

@@ -542,7 +542,7 @@ def render_lockfile_for_platform( # noqa: C901
lockfile.toposort_inplace()

for p in lockfile.package:
if p.platform == platform and p.category in categories:
if p.platform == platform and not p.categories.isdisjoint(categories):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double-negative requires some mental overhead to parse. It may be a bit verbose, but I'd prefer:

... and len(p.categories & categories) > 0

and also for clarity it'd be nice to rename categoriescategories_to_install.

@maresb maresb mentioned this pull request May 20, 2023
@maresb
Copy link
Contributor

maresb commented May 20, 2023

Important note: in order to avoid breaking backwards compatibility, we should still emit the category key for now. (Removing category will unfortunately require a schema change.) This probably entails reincluding the previously-computed value for category, preferably fenced off with comments to aid future excision.

@maresb
Copy link
Contributor

maresb commented May 20, 2023

I'm worried now that even including categories will require a schema change, since LockedDependency is a StrictModel. Does this mean that conda-lock <=1.4.0 will balk if it finds categories in a lockfile? 🤔

@srilman
Copy link
Contributor Author

srilman commented Jul 20, 2023

@maresb I'm going to rebase this PR and address some of the comments.

@maresb
Copy link
Contributor

maresb commented Jul 21, 2023

That would be wonderful, thanks so much @srilman!!!

@srilman
Copy link
Contributor Author

srilman commented Nov 24, 2023

@maresb I ran into this problem recently and couldn't get around it, so I can't put off this PR any longer. I tried naively rebasing but ran into some issues that I need your opinion with.

I noticed that a lot has changed in terms of the lockfile format. There seems to be 2 lockfile formats now, I presume because of how Pydantic works. In this PR, we add an additional field to the lockfile format, categories. Can we still add the new field to both V1 and V2, or can we only add it to V2? If we can only add it to V2, then we will need a follow-up PR to make V2 the default. If there are any other changes we want to make to the lockfile format, we should also do so soon.

What are your thoughts?

@maresb
Copy link
Contributor

maresb commented Nov 24, 2023

Hey @srilman, it's really great to hear from you again!!!

I already attempted and failed to rebase this in #410. I've been trying to steer the changes in a direction to make a rebase easier, so maybe it makes sense to give this another shot.

We can't change the lockfile format without big discussions from the Mamba and Rattler folks. (We would need to do a major-version release coordinated with Micromamba.) Hence the V2 format is a proposal, and we work internally with V2, but the input/output always goes through V1. And categories is V2-only. Does this make sense?

Ideally we would implement categories in V2 where everything works as it should, and then when we convert to V1 for output, it reproduces as closely as possible the current weird behavior. This way we don't have any breaking changes, and everything's ready to go with V2.

Maybe we could add some flags that are very clearly marked as experimental so that we can export a V2 format file that will only work with conda-lock.

You may be interested in #546 where we're working on merging version specifications. It'd be really great to get your feedback there if you get the chance.

@srilman
Copy link
Contributor Author

srilman commented Dec 23, 2023

@maresb Took a while, but finally rebased the PR. You're right, it seemed easier to integrate than before.

Like you said, I made categories a V2 field while expanding to multiple elements with a different category for V1. Everything seems to work fine right now, but we can add an experimental V2 flag in another PR.

I'll take a look at #546, but it looks like that PR might need some changes from this PR.

@maresb
Copy link
Contributor

maresb commented Dec 24, 2023

This is very exciting, thanks so much @srilman!!!

while expanding to multiple elements with a different category for V1

Ok, but we'll need to thoroughly test this with both conda-lock install and micromamba to make sure it doesn't break anything. If everything works that would be amazing!!! How much have you done in terms of testing?

While #546 could greatly benefit from your input, I'd really like to get this in without more rebases, so let's make this PR top priority.

@srilman
Copy link
Contributor Author

srilman commented Dec 25, 2023

Ok, but we'll need to thoroughly test this with both conda-lock install and micromamba to make sure it doesn't break anything. If everything works that would be amazing!!! How much have you done in terms of testing?

Any thoughts on how to do this? Can we write unit tests for this? Probably for conda-lock, but micromamba? I wrote some unit tests for the generation portion, but not the install step. In terms of manual testing, I'm manually verifying that it works for my use case.

@maresb
Copy link
Contributor

maresb commented Dec 25, 2023

It would be really hard to write unit tests here for Micromamba's install process. I'm mainly interested in if/how Micromamba handles deduplication. If you have a Python dependency in both main and dev, then will Micromamba install Python twice?

Other than that, we might want to write a unit test with a simple example to make sure that we get a functioning environment after installing from a simple multi-category generated lockfile, and the test would just be running some installed thing as a subprocess.

@srilman
Copy link
Contributor Author

srilman commented Dec 31, 2023

If you have a Python dependency in both main and dev, then will Micromamba install Python twice?

In the PR, I added an extra filter pass that takes any dependency in main and another category and moves it to just main. But I get what you mean in the general sense. Any dependency in 2 optional categories that are selected at the same time.

Other than that, we might want to write a unit test with a simple example to make sure that we get a functioning environment after installing from a simple multi-category generated lockfile, and the test would just be running some installed thing as a subprocess.

Let me see what I can do, but feel free to add to this PR if you have the time to do so.

@srilman
Copy link
Contributor Author

srilman commented Jan 4, 2024

Ok so far for my use case, this PR seems to be working well. It seems to be producing a valid lockfile (based on a quick script I whipped up to make sure that the dependency tree is correct). conda-lock install worked, but I haven't tested micromamba just yet.

In addition, I added the test case for the conda-lock install step.

@maresb When you have a chance, could you review this PR and let me know if there's anything else you'd like me to do? I'd really love to see this merged soon so that I can start using it safely.

build=self.build,
optional=self.category != "main",
)
categories: Set[str] = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be set("main")? The previous default value for category was "main".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want {"main"}, as set("main") gives you {"m", "a", ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that in apply_categories, we assume that categories is initially empty and append to the set as needed. If we initially set it to {'main'}, then every dependency will be in the main category and will always be installed.

@srilman
Copy link
Contributor Author

srilman commented Jan 11, 2024

@maresb So I've started integrating this into my own use case in general, and had a chance to large-scale test and with micromamba. This is what I saw so far:

  • Running conda-lock install with this branch is working just fine. No major issues I found, and nothing has changed
  • Running conda-lock install with a lockfile generated on this branch but installed with conda-lock 2.5.1 fails. The current release seems to have an edge case where if we select 2 optional categories, an optional dependency in both categories fails to install. It seems to work fine when for optional dependencies in 1 selected category.
  • Micromamba is working just fine as well. It doesn't have the issue conda-lock 2.5.1 has and it installed all of the expected dependencies. Although IMO it's weird to me that micromamba doesn't assume the main category by default.

IMO the conda-lock 2.5.1 is not a blocker since merging this PR will fix it. It's just annoying for me since I'm trying to use this PR now.

@srilman
Copy link
Contributor Author

srilman commented Jan 11, 2024

FYI I think the test case I added in this PR is a good example to test Micromamba with

@srilman srilman requested a review from maresb January 13, 2024 23:38
@maresb
Copy link
Contributor

maresb commented Jan 14, 2024

I've carefully looked through the first commit. It was really challenging for me to understand what's going on. Probably I will have a lot of comments to add. Here are some notes:

I'm trying to understand "Change category → categories in LockfileV2 only".

There are a few classes involved, namely:

_BaseDependency: has .category that defaults to "main"

BaseLockedDependency → LockedDependencyV1, LockedDependencyV2

We want to change LockedDependencyV2 to have categories instead of category.
It's currently on BasedLockDepenedency (with default value "main"),
so we need to move it from BaseLockedDependency to LockedDependency (V1).
This lets us add it to LockedDependency (V2) as a Set[str].

We add categories to LockedDependencyV2 with default value set().
This forces a few adjustments:

  1. We need to return multiple LockedDependencyV1 objects when converting v2→v1.
  2. We need to aggregate the multiple LockedDependencyV2 objects into a single v1 object.
  3. We need to fill in the value of categories

For 1. we convert a v2 dependency to a list of v1 dependencies that are identical except
for their category field, and the category field spans the values in categories.

To convert from v1 back to v2, we start with a list of LockedDependencyV1.
These will come from reading the lockfile which will contain several duplicates with
the same key that differ only in category. We assemble these duplicates into a list,
copy the fields from the first element of the list, and let categories be the union
of the category fields from the individual elements.

For 3. we start by leaving categories empty so that they can be filled in according to
transitivity.

@srilman
Copy link
Contributor Author

srilman commented Jan 15, 2024

In that case, I'm going to add some comments to make the second commit a bit easier to understand. Thanks for pointing that out.

@@ -163,7 +194,7 @@ def write_conda_lock_file(
content.filter_virtual_packages_inplace()
with path.open("w") as f:
if include_help_text:
categories = set(p.category for p in content.package)
categories: Set[str] = set().union(*(p.categories for p in content.package))
Copy link
Contributor Author

@srilman srilman Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takes the union of all sets of categories of all dependencies to get the final set of categories to store in the lockfile.

categories = [*categories, *(k for k in by_category if k not in categories)]
root_requests = {}
root_requests: DefaultDict[str, List[str]] = defaultdict(list)
Copy link
Contributor Author

@srilman srilman Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, we used to map each package (LockedDependency) to 1 root dependency that is its ancestor (Dependency). A root dependency is a dependency specified in a source file. So the root dependency uses the package as a sub-dependency. As the comment says, we would try to use the first root.

Now, we need to store all root dependencies, in case there are multiple that use a package as a sub-dependency. That will allow us to take multiple category values, one from each root, and combine to create the categories value for a package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core change for the second commit.


# For any dep that is part of the 'main' category
# we should remove all other categories
_truncate_main_category(planned)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, if any package has a root / ancestor package that is part of the main category, it should also be in the main category. main root packages are always installed, so their sub-dependencies should also always be installed.

We want to maintain that same behavior in this PR. Thus, for any package that is in the main category, we don't need to remember any other categories it is in, since it is already non-optional. So function _truncate_main_category will remove other categories in that case.

- linux-64

dependencies:
- pyspark =3.4
Copy link
Contributor Author

@srilman srilman Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to test the following with this setup.

  • Pandas should be in the lockfile with 2 optional categories, dev and test. It is a source dependency in file dev.yml, so we can make sure that special case works.
  • Numpy should be in the lockfile with 2 optional categories, dev and test. It is a subdependency of astropy and PySpark.
  • Python should be in the lockfile, with just the main category. It is a dependency of every other package, so this can test _truncate_main_category

@maresb
Copy link
Contributor

maresb commented Mar 4, 2024

Hey, I just wanted to say that I have not forgotten about this!!!

I've been trying to very carefully understand the fine details of this PR, and have been working on trying to make pure refactoring changes in #589 that will allow us to simply switch over to the new system.

I feel really bad because I keep having annoying personal stuff (like bureaucracy of registering in a new country, taxes, etc.) that keeps draining away the time that I want to devote to pushing this through.

At the same time, I want to be realistic, and I haven't had the available time I envisioned to push things forward here. The folks at prefix.dev have a team working full-time doing amazing work with Conda lockfiles. So I think pixi is a more sustainable solution in the medium-term. I'd like to converge with them so that conda-lock users can smoothly transition when they're ready.

I just pinged the prefix folks here: https://discord.com/channels/1082332781146800168/1214169703811907604/1214169703811907604
It seems like they're not concerned about supporting conda-lock lockfiles, so I think as long as we support conda-lock and micromamba we'll be good to go. 🚀

@maresb
Copy link
Contributor

maresb commented Sep 13, 2024

Superseded by #697

@maresb maresb closed this Sep 13, 2024
maresb added a commit that referenced this pull request Sep 13, 2024
Support Multiple Categories for Sub-Dependencies in Lockfile (Rebase #390)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants