Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Source File and Lock Specification Approach #316

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 6 additions & 90 deletions conda_lock/conda_lock.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,6 @@
except ImportError:
PIP_SUPPORT = False
from conda_lock.lockfile import (
Dependency,
GitMeta,
InputMeta,
LockedDependency,
Expand All @@ -76,12 +75,8 @@
write_conda_lock_file,
)
from conda_lock.lookup import set_lookup_location
from conda_lock.src_parser import LockSpecification, aggregate_lock_specs
from conda_lock.src_parser.environment_yaml import parse_environment_file
from conda_lock.src_parser.meta_yaml import parse_meta_yaml_file
from conda_lock.src_parser.pyproject_toml import parse_pyproject_toml
from conda_lock.src_parser import LockSpecification, make_lock_spec
from conda_lock.virtual_package import (
FakeRepoData,
default_virtual_package_repodata,
virtual_package_repo_from_specification,
)
Expand Down Expand Up @@ -114,8 +109,6 @@
sys.exit(1)


DEFAULT_PLATFORMS = ["osx-64", "linux-64", "win-64"]

KIND_EXPLICIT: Literal["explicit"] = "explicit"
KIND_LOCK: Literal["lock"] = "lock"
KIND_ENV: Literal["env"] = "env"
Expand Down Expand Up @@ -243,44 +236,6 @@ def fn_to_dist_name(fn: str) -> str:
return fn


def make_lock_spec(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both make_lock_spec and parse_source_files are specifically related to parsing source files, and don't have a big effect on the rest of the program. Thus, I moved them to conda_lock/src_parser/__init__.py to reduce the amount of code in this file (since its about 1500 lines of code).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all this work!!! It would still really help for reviewing to have these refactor steps in separate commits so that I could view the substantive changes separately. (In general it's much easier to follow several smaller logical commits than one massive one.)

I'm sincerely very eager to see this through quickly, but my schedule looks difficult at the moment. I'll see what I can do, but apologies in advance if I'm slow to respond.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maresb I tried to break this PR down into a couple of commits to make it a bit easier. I had some trouble breaking down the last couple of commits since the contents is very much tied together. But if it is still difficult to look through, let me know.

Copy link
Contributor

@maresb maresb Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the additional commits! This is much better for review.

It could still be even better... The best would be one single logical change per commit. Please don't change this particular commit now, but to explain what I mean, your first commit "Move Function Sub PR" could be further broken down into:

  • Add pip_support as argument to make_lock_spec and parse_source_files
  • Move parse_source_files to src_parser
  • Move make_lock_spec to src_parser

because this is the level to which I need to deconstruct the changes to see what's going on. (Currently I have to diff each function removed from conda_lock.py with each function added to __init__.py in order to see exactly what changed, so a verbatim cut-from-one-file and paste-in-the-other easier to process as a single logical change.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, good point. I'm used to writing large PRs in general. In the future, I can definitely break down my commits even further. Would you like me to modify the last large commit of this PR? My concern with modifying that commit is that I'm not sure how to break it down without having some commit be broken. I normally try to ensure that every commit exposed to master is a somewhat-working impl of the app or library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PRs are fine, it's just large commits which are difficult to understand.

Your commits don't need to be perfect, and I'm asking for a fairly high standard. I can explain how I try to write commits:

I'm not sure how to break it down without having some commit be broken.

This is a good rule in general. But in some situations I think it's fine to break something in one commit and fix it in a subsequent one. (For example, in one commit I might remove functionality X, and then in the next commit I add functionality Y which replaces X. This way the new details of Y aren't confused with the old details of X.)

What also helps is to stage partial changes. For example, in the process of implementing X, I may modify some type hints in another part of the code. In this case, I can stage and commit the type hints in a separate commit, so that my implementation of X remains focused.

The most complicated technique is to rebase code you've already committed, but that is really a lot of work.

A few concrete suggestions for how you could break up the main commit:

  • Switch to ruamel
  • Define new classes
  • Implement core logic using the new classes

I think I can handle the large commit as-is, but I would need to find an uninterrupted block of time where I could work through the whole thing at once. If you manage to break up the commit, then I can probably finish reviewing it sooner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you spent a lot of time looking at the last 2 commits: Initial Version of SourceFile Approach and Adding Test Yaml Files. If not, I can try to split those further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already looked through the first 2 commits thoroughly, so I will leave them alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already looked through the first 2 commits thoroughly, so I will leave them alone.

Yes, in fact, if you create a separate PR for those I think we can already merge them since they are minor refactoring changes.

For the big commit I was thinking: since this is a major change, Marius will have to review it after me. Thus it may be worthwhile to invest extra time to make it readable.

*,
src_files: List[pathlib.Path],
virtual_package_repo: FakeRepoData,
channel_overrides: Optional[Sequence[str]] = None,
platform_overrides: Optional[Sequence[str]] = None,
required_categories: Optional[AbstractSet[str]] = None,
) -> LockSpecification:
"""Generate the lockfile specs from a set of input src_files. If required_categories is set filter out specs that do not match those"""
lock_specs = parse_source_files(
src_files=src_files, platform_overrides=platform_overrides
)

lock_spec = aggregate_lock_specs(lock_specs)
lock_spec.virtual_package_repo = virtual_package_repo
lock_spec.channels = (
[Channel.from_string(co) for co in channel_overrides]
if channel_overrides
else lock_spec.channels
)
lock_spec.platforms = (
list(platform_overrides) if platform_overrides else lock_spec.platforms
) or list(DEFAULT_PLATFORMS)

if required_categories is not None:

def dep_has_category(d: Dependency, categories: AbstractSet[str]) -> bool:
return d.category in categories

lock_spec.dependencies = [
d
for d in lock_spec.dependencies
if dep_has_category(d, categories=required_categories)
]

return lock_spec


def make_lock_files(
*,
conda: PathLike,
Expand Down Expand Up @@ -353,11 +308,12 @@ def make_lock_files(

with virtual_package_repo:
lock_spec = make_lock_spec(
src_files=src_files,
src_file_paths=src_files,
channel_overrides=channel_overrides,
platform_overrides=platform_overrides,
platform_overrides=set(platform_overrides) if platform_overrides else set(),
virtual_package_repo=virtual_package_repo,
required_categories=required_categories if filter_categories else None,
pip_support=PIP_SUPPORT,
)
lock_content: Optional[Lockfile] = None

Expand Down Expand Up @@ -704,12 +660,8 @@ def _solve_for_arch(
"""
if update_spec is None:
update_spec = UpdateSpecification()
# filter requested and locked dependencies to the current platform
dependencies = [
dep
for dep in spec.dependencies
if (not dep.selectors.platform) or platform in dep.selectors.platform
]
dependencies = spec.dependencies[platform]

locked = [dep for dep in update_spec.locked if dep.platform == platform]
requested_deps_by_name = {
manager: {dep.name: dep for dep in dependencies if dep.manager == manager}
Expand Down Expand Up @@ -867,42 +819,6 @@ def create_lockfile_from_spec(
)


def parse_source_files(
src_files: List[pathlib.Path],
platform_overrides: Optional[Sequence[str]],
) -> List[LockSpecification]:
"""
Parse a sequence of dependency specifications from source files

Parameters
----------
src_files :
Files to parse for dependencies
platform_overrides :
Target platforms to render environment.yaml and meta.yaml files for
"""
desired_envs: List[LockSpecification] = []
for src_file in src_files:
if src_file.name == "meta.yaml":
desired_envs.append(
parse_meta_yaml_file(
src_file, list(platform_overrides or DEFAULT_PLATFORMS)
)
)
elif src_file.name == "pyproject.toml":
desired_envs.append(parse_pyproject_toml(src_file))
else:
desired_envs.append(
parse_environment_file(
src_file,
platform_overrides,
default_platforms=DEFAULT_PLATFORMS,
pip_support=PIP_SUPPORT,
)
)
return desired_envs


def _add_auth_to_line(line: str, auth: Dict[str, str]) -> str:
matching_auths = [a for a in auth if a in line]
if not matching_auths:
Expand Down
175 changes: 143 additions & 32 deletions conda_lock/src_parser/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,30 @@
import typing

from itertools import chain
from typing import Dict, List, Optional, Tuple, Union
from typing import (
AbstractSet,
Dict,
Iterable,
List,
Optional,
Sequence,
Set,
Tuple,
Union,
)

from pydantic import BaseModel, validator
from typing_extensions import Literal

from conda_lock.common import ordered_union, suffix_union
from conda_lock.common import suffix_union
from conda_lock.errors import ChannelAggregationError
from conda_lock.models import StrictModel
from conda_lock.models.channel import Channel
from conda_lock.virtual_package import FakeRepoData


DEFAULT_PLATFORMS = {"osx-64", "linux-64", "win-64"}

logger = logging.getLogger(__name__)


Expand All @@ -42,7 +54,9 @@ class _BaseDependency(StrictModel):
optional: bool = False
category: str = "main"
extras: List[str] = []
selectors: Selectors = Selectors()

def to_source(self) -> "SourceDependency":
return SourceDependency(dep=self) # type: ignore


class VersionedDependency(_BaseDependency):
Expand All @@ -59,32 +73,66 @@ class URLDependency(_BaseDependency):
Dependency = Union[VersionedDependency, URLDependency]


class SourceDependency(StrictModel):
dep: Dependency
selectors: Selectors = Selectors()
Comment on lines +76 to +78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in words what's the idea behind a SourceDependency?

Copy link
Contributor Author

@srilman srilman Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure (I can also add a comment). So we want the LockSpecification to be a mapping from environment (platform right now, but potentially other attributes in the future) to a list of dependencies. Those dependencies don't need selectors, because selectors are only used when constructing the LockSpecification for determining if a dep is required for an environment, or multiple versions to use.

Thus, a SourceDependency represents a dependency with any additional info associated with it from a source file. Right now, its just selectors, but in the future, we may have other limiters like min or max Python version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this make sense as a docstring?

A SourceDependency represents information about a particular dependency specification which has been extracted from a SourceFile (e.g. environment.yml or pyproject.toml). Generally, information about the target environment (e.g. platform) is not included, but it might be specified by including selectors.

Please rewrite in case I misunderstood the details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's perfect! Will add



class Package(StrictModel):
url: str
hash: str


class SourceFile(StrictModel):
file: pathlib.Path
dependencies: List[SourceDependency]
# TODO: Should we store the auth info in here?
channels: List[Channel]
platforms: Set[str]

@validator("channels", pre=True)
def validate_channels(cls, v: List[Union[Channel, str]]) -> List[Channel]:
for i, e in enumerate(v):
if isinstance(e, str):
v[i] = Channel.from_string(e)
return typing.cast(List[Channel], v)

def spec(self, platform: str) -> List[Dependency]:
from conda_lock.src_parser.selectors import dep_in_platform_selectors

return [
dep.dep
for dep in self.dependencies
if dep.selectors.platform is None
or dep_in_platform_selectors(dep, platform)
]


class LockSpecification(BaseModel):
dependencies: List[Dependency]
dependencies: Dict[str, List[Dependency]]
# TODO: Should we store the auth info in here?
channels: List[Channel]
platforms: List[str]
sources: List[pathlib.Path]
virtual_package_repo: Optional[FakeRepoData] = None

@property
def platforms(self) -> List[str]:
return list(self.dependencies.keys())

def content_hash(self) -> Dict[str, str]:
return {
platform: self.content_hash_for_platform(platform)
for platform in self.platforms
for platform in self.dependencies.keys()
}

def content_hash_for_platform(self, platform: str) -> str:
data = {
"channels": [c.json() for c in self.channels],
"specs": [
p.dict()
for p in sorted(self.dependencies, key=lambda p: (p.manager, p.name))
if p.selectors.for_platform(platform)
for p in sorted(
self.dependencies[platform], key=lambda p: (p.manager, p.name)
)
],
}
if self.virtual_package_repo is not None:
Expand All @@ -105,34 +153,97 @@ def validate_channels(cls, v: List[Union[Channel, str]]) -> List[Channel]:
return typing.cast(List[Channel], v)


def aggregate_lock_specs(
lock_specs: List[LockSpecification],
) -> LockSpecification:

# unique dependencies
def aggregate_deps(grouped_deps: List[List[Dependency]]) -> List[Dependency]:
# List unique dependencies
unique_deps: Dict[Tuple[str, str], Dependency] = {}
for dep in chain.from_iterable(
[lock_spec.dependencies for lock_spec in lock_specs]
):
for dep in chain.from_iterable(grouped_deps):
key = (dep.manager, dep.name)
if key in unique_deps:
# Override existing, but merge selectors
previous_selectors = unique_deps[key].selectors
previous_selectors |= dep.selectors
dep.selectors = previous_selectors
unique_deps[key] = dep

dependencies = list(unique_deps.values())
try:
channels = suffix_union(lock_spec.channels or [] for lock_spec in lock_specs)
except ValueError as e:
raise ChannelAggregationError(*e.args)
return list(unique_deps.values())


def aggregate_channels(
channels: Iterable[List[Channel]],
channel_overrides: Optional[Sequence[str]] = None,
) -> List[Channel]:
if channel_overrides:
return [Channel.from_string(co) for co in channel_overrides]
else:
# Ensure channels are correctly ordered
try:
return suffix_union(channels)
except ValueError as e:
raise ChannelAggregationError(*e.args)


def parse_source_files(
src_file_paths: List[pathlib.Path], pip_support: bool = True
) -> List[SourceFile]:
"""
Parse a sequence of dependency specifications from source files

Parameters
----------
src_files :
Files to parse for dependencies
pip_support :
Support pip dependencies
"""
from conda_lock.src_parser.environment_yaml import parse_environment_file
from conda_lock.src_parser.meta_yaml import parse_meta_yaml_file
from conda_lock.src_parser.pyproject_toml import parse_pyproject_toml
Comment on lines +193 to +195
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason to put these imports inside the function? For more standardized code, I'd prefer to have imports at the top of the file unless there's a good reason.

Copy link
Contributor Author

@srilman srilman Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those modules import conda_lock/src_parser/__init__.py. When I included them at the top, I was getting a circular dependency error. Those files use the SourceDependency, SourceFile, VersionedDependency, and URLDependency classes. If I move these classes to a new file like conda_lock/src_parser/models.py, then I can get rid of the circular dependency and have these be top level imports. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, circular dependencies in Python are really annoying.

Those files use the SourceDependency, SourceFile, VersionedDependency, and URLDependency classes.

Are they using them just for type hints? If so, then they are not true import cycles. In that case, you can do

from typing import TYPE_CHECKING

if TYPE_CHECKING;
    from ... import SourceDependency, ...

and the types won't be imported at runtime, avoiding the cycle. (Very nice, I see that you already know this trick! 😄)

If it's not just for type hints, then there may be some genuinely circular logic occurring. For instance, if a imports B from c and c imports D from a, then you may need to make a new module e which contains B and D. (Then a imports B from e and c imports D from e, and all is well.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just for type hints. I will move them to a new module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention in my previous comment that Python does permit certain types of circular imports.

Running from a import B is essentially equivalent to import c; B = c.B. Python will allow you to import c from a while also importing a from c, as long as you don't access the module attributes (i.e. c.B) before the module has been fully loaded. (This works when all a.B accesses occur within function definitions.)

So there is another strategy: rather than fix the cycles, try to import modules lazily, i.e. replace from c import Bimport c and Bc.B. But I have the impression that this leads to very fragile code. I think it's generally much more robust to remove all non-typing-related circular imports when possible.


src_files: List[SourceFile] = []
for src_file_path in src_file_paths:
if src_file_path.name in ("meta.yaml", "meta.yml"):
src_files.append(parse_meta_yaml_file(src_file_path))
elif src_file_path.name == "pyproject.toml":
src_files.append(parse_pyproject_toml(src_file_path))
else:
src_files.append(
parse_environment_file(
src_file_path,
pip_support=pip_support,
)
)
return src_files


def make_lock_spec(
*,
src_file_paths: List[pathlib.Path],
virtual_package_repo: FakeRepoData,
channel_overrides: Optional[Sequence[str]] = None,
platform_overrides: Optional[Set[str]] = None,
required_categories: Optional[AbstractSet[str]] = None,
pip_support: bool = True,
) -> LockSpecification:
"""Generate the lockfile specs from a set of input src_files. If required_categories is set filter out specs that do not match those"""
src_files = parse_source_files(src_file_paths, pip_support)

# Determine Platforms to Render for
platforms = (
platform_overrides
or {plat for sf in src_files for plat in sf.platforms}
or DEFAULT_PLATFORMS
)

spec = {
plat: aggregate_deps([sf.spec(plat) for sf in src_files]) for plat in platforms
}

if required_categories is not None:
spec = {
plat: [d for d in deps if d.category in required_categories]
for plat, deps in spec.items()
}

return LockSpecification(
dependencies=dependencies,
# Ensure channel are correctly ordered
channels=channels,
# uniquify metadata, preserving order
platforms=ordered_union(lock_spec.platforms or [] for lock_spec in lock_specs),
sources=ordered_union(lock_spec.sources or [] for lock_spec in lock_specs),
dependencies=spec,
channels=aggregate_channels(
(sf.channels for sf in src_files), channel_overrides
),
sources=src_file_paths,
virtual_package_repo=virtual_package_repo,
)
Loading