-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Implementation for publish
goal
#12805
Conversation
) | ||
|
||
# ??? | ||
PublishedPackageSet(packages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what you're meant to do with the resulting values from the concrete implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for logging purposes in the publish goal, I'd say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is likely a good place to render something to the Console
that lists the things that were published.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better is logging imo:
pants/src/python/pants/core/goals/package.py
Lines 102 to 109 in fd71177
for pkg in packages: | |
for artifact in pkg.artifacts: | |
msg = [] | |
if artifact.relpath: | |
msg.append(f"Wrote {dist_dir.relpath / artifact.relpath}") | |
msg.extend(str(line) for line in artifact.extra_log_lines) | |
if msg: | |
logger.info("\n".join(msg)) |
We've been using logging when it describes a side-effect like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By this point the publish will already have happened in the background, so that's the relevant side-effect logging (see other comment). So that's why this felt like a good place for a summary, similar to:
pants/src/python/pants/core/goals/lint.py
Lines 260 to 275 in e1dd981
if all_results: | |
console.print_stderr("") | |
for results in all_results: | |
if results.skipped: | |
sigil = console.sigil_skipped() | |
status = "skipped" | |
elif results.exit_code == 0: | |
sigil = console.sigil_succeeded() | |
status = "succeeded" | |
else: | |
sigil = console.sigil_failed() | |
status = "failed" | |
exit_code = results.exit_code | |
console.print_stderr(f"{sigil} {results.linter_name} {status}.") | |
return Lint(exit_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most excellent draft, so far, in my opinion.
I have left a few comments to think about, however, don't take them for more than my opinion. I am sure that you'll get more feedback from the core maintainers.
default = "${TWINE_PASSWORD}" | ||
|
||
|
||
class PypiPublishRequest(PublishRequest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, that rather than to re-invent for the publish request scenario, to instead make use of the existing field sets mechanisms. Looking at the other core goals, they all manage to employ this across packaging, testing, linting etc..
That would use the TargetRootsToFieldSetsRequest
and friends.
makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach makes sense, but the feasibility of the implementation not so much...
I went with this approach to support publishing to multiple different repositories for any given type.
My understanding is a FieldSet is just a subset of the fields of a target. But we're not just operating on a subset of those fields.
e.g. this example: https://github.com/xlevus/pants2-plugins/blob/publish/examples/pypi_repository/BUILD
The example_package
has two publish_targets
one is a pypi repository, the other a S3 bucket.
So we end up with two calls to Get(PublishedPackage,...)
roughly like:
Get(PublishedPackage, PublishRequest, PyPiPublishRequest(publish_target=":pypi-repo", fieldset=PypiFieldSet()))
Get(PublishedPackage, PublishRequest, S3PublishRequest(publish_target=":s3-bucket", fieldset=S3Bucket()))
So, (as far as my understanding goes) using TargetRootsToFieldSetsRequest
, There's still going to need to be a Union'd class to carry the [FieldSet, PublishTarget]
pair.
It gets a bit weird, as I predicted that a publish-target (repository) might need to know more than just the artifacts of the build (i.e. a docker push
needs to know the image and version) So there's a bit of ping-pong.
- We find all the targets that have the minimum publishable fields
- We find the different publish-targets for the publishee
- We then determine what fields each of those targets need
- We check that the publishee has the fields that the publish-target wants.
- we do the thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty close, but I think a few of those steps is handled by the FieldSet logic, for us.
I've been playing with a POC locally to try out my idea.. and what I think I've landed in, that it will be easier if each repository/registry has its own field type, registered to each target type that it supports to publish for.
Say that the pypi publisher registers a pypi_repositories
field for python_distribution
targets, and the s3 publisher registers a s3_buckets
field for python_distribution
, archive
and files
for instance. With the added benefit of it will be very clear which kinds of repositories are supported for which kind of targets.
Then it'll be straightforward to get the field sets for all targets to publish, based on targets that has any of those publish fields in the BUILD file. Some of this reasoning is inspired by the lint
goal, and how it constructs the set of LintRequest
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning for implementing it this way was to make things in a sense 'just work'.
If a developer A writes a plugin that allows them to publish a file to some form of file store, then developer B writes a plugin that bundles some files together, both should work together for the end-user.
Having a specific field type for each type of publishable target would mean that either A would need to know about B to add support, or the end user would have to write their own plugin to shim A and B together.
With this implementation, A just needs to ensure they do some sanity-checking on input, and B just needs to ensure their target has a publish_targets
field.
Yes, there are cases where things are not compatible. i.e. uploading a docker image (a 'fileless' artifact) to a file-store. Or a tarball to a pypy repository. But to me, it does not seem too egregious to just error with a "I can't do that".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Eric-Arellano : Any thoughts here?
@@ -1,10 +1,14 @@ | |||
# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md). | |||
# Licensed under the Apache License, Version 2.0 (see LICENSE). | |||
|
|||
from pants.backend.experimental.python import user_lockfiles | |||
from pants.backend.experimental.python import publish_pypi, user_lockfiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be better to separate the rules for publishing to pypi, from the tool used to do it (twine). That way, there can potentially be multiple tools to choose from, that all publishes to a pypi registry, and it will be up to the user to decide which one to enable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the pypi_repository
target stay as a 'core' type, with the twine
subsystem being a user-configurable backend, or does pypi_repository
become twine_repository
and sit in it's own backend?
My counter argument:
There should be one-- and preferably only one --obvious way to do it.
Supporting two tools to publish to pypi just sounds like twice as much code to maintain to achieve the same goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not saying that there'd necessarily be more than one supported by core pants. Not entirely sure how best to structure this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be better to separate the rules for publishing to pypi, from the tool used to do it (twine). That way, there can potentially be multiple tools to choose from, that all publishes to a pypi registry, and it will be up to the user to decide which one to enable.
I'm less sure about this... since the destination (PyPI) has a particular API (, auth requirements, etc), various tools to publish to it are likely to be fairly similar in terms of their arguments. Seems like maybe premature generalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just consider if I'd rather use, say, poetry to publish my dist instead of twine, how would that work? (for reasons, like using fewer tools/configuration etc)
I may have to provide my own plugin to support using poetry for this, but I'd want to be able to do so, without the default twine implementation tripping me up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd have to define a poetry_pypi_repository
target.
@@ -16,6 +16,7 @@ | |||
|
|||
name_value_re = re.compile(r"([A-Za-z_]\w*)=(.*)") | |||
shorthand_re = re.compile(r"([A-Za-z_]\w*)") | |||
interpolation_re = re.compile(r"^\$\{([A-Za-z_]{1,}[A-Za-z0-9]{0,})\}$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing that ${VAR}
is expanded but $VAR
is not. Also, I would expect that "embedded $VAR here"
to expand the VAR
in the middle of the string.
With that said, I'm not too sure that this is what we want here, or need. The Pants
way to do it, is to add the required values as options on the tool used.
Say you have a twine tool. Along with that is options for specifying which version to use (if not the default) and possibly overriding the download url, if needed.
There'll also be the right place to add twine specific configuration, such as username, password, and config files for twine, to setup repository configurations. Those options may pick values for specific options from ENV vars as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ps.
I realize this comes from my own suggestion earlier over in your own repo. Didn't think of the tooling options then, apologies :)
Ds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing that ${VAR} is expanded but $VAR is not. Also, I would expect that "embedded $VAR here" to expand the VAR in the middle of the string.
This is mostly just a result of my idioms, as I'm used to predominantly using the ${foo}
syntax in Dockerfile
and docker-compose.yml
files.
Also has the convenient side-effect of supporting non-variable values that start with a $
symbol. Still doesn't cover ${Not_A_Variable,This_Is_a_password}
but I think the odds of that are significantly smaller.
With that said, I'm not too sure that this is what we want here, or need. The Pants way to do it, is to add the required values as options on the tool used.
Say you have a twine tool. Along with that is options for specifying which version to use (if not the default) and possibly overriding the download url, if needed.
There'll also be the right place to add twine specific configuration, such as username, password, and config files for twine, to setup repository configurations. Those options may pick values for specific options from ENV vars as needed.
The username and password isn't a configuration value for Twine, but rather a configuration value for the repository. In my use-case, I have two destinations for python packages. One for internal tooling, and the other to a "public" repository for clients. Both have different credentials.
Ideally, I wouldn't like to implement it specifically and have it handled by something that resolves #12797. But my familiarity with Pants only goes so far 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, there's an escape for $
itself for disambiguation. That makes it clear what is, and what is not, a variable for expansion.
However, regarding twine, you can provide a configuration file with repositories, and those in turn supports environment variables: https://twine.readthedocs.io/en/latest/#environment-variables
Question then just becomes, what environment variables to pass on when running twine, but I think that will be a cleaner approach.
Edit: Read that documentation a bit too quickly. It was instead of the config file. Sigh. Will reconsider my reply..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is an OK limitation to not expect publishing to more than a single pypi repo at a time (when relying on env vars)?
In which case the env variables to use are a defined set at all times.
If using a config file, and that file can also hold the required credentials, then there's no such limitations.
@@ -1,10 +1,14 @@ | |||
# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md). | |||
# Licensed under the Apache License, Version 2.0 (see LICENSE). | |||
|
|||
from pants.backend.experimental.python import user_lockfiles | |||
from pants.backend.experimental.python import publish_pypi, user_lockfiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be better to separate the rules for publishing to pypi, from the tool used to do it (twine). That way, there can potentially be multiple tools to choose from, that all publishes to a pypi registry, and it will be up to the user to decide which one to enable.
I'm less sure about this... since the destination (PyPI) has a particular API (, auth requirements, etc), various tools to publish to it are likely to be fairly similar in terms of their arguments. Seems like maybe premature generalization.
) | ||
|
||
# ??? | ||
PublishedPackageSet(packages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is likely a good place to render something to the Console
that lists the things that were published.
process = await Get( | ||
Process, | ||
VenvPexProcess, | ||
VenvPexProcess( | ||
twine_pex, | ||
argv=["upload", *call_args.args, *paths], | ||
input_digest=request.built_package.digest, | ||
extra_env=call_args.env, | ||
description=f"Publishing {', '.join(paths)} to {request.publish_target.address}.", | ||
), | ||
) | ||
|
||
await Get(ProcessResult, Process, process) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is side-effecting, which makes it a vaguely sketchy thing to do in a @rule
rather than at the top level in the @goal_rule
if there is any chance that the Process
is not idempotent. For example: this process invoke will be cached (unless you pass a non-default CacheScope
for the *Process
), and will not re-run after succeeding. By default, failed processes are not cached, so that part should be fine. But it's also the case that while executing a cache lookup the N+1th
time this process runs, we might retry the process and then cancel it.
So. An important question for this Goal
's API is: do you think that it is a reasonable requirement that all publishing commands/destinations MUST be idempotent? If so, then I think that the current API works. If not so, then rather than actually executing the publish processes in @rule
s, you would want to return them as a list of publish InteractiveProcess
es to have the @goal_rule
execute synchronously in the foreground.
In my experience, idempotent publish commands are generally a good idea, but it's something to be intentional about here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little unsure on how to proceed with this.
Twine isn't idempotent, re-running the upload with the same inputs will result in an error.
So, I'm guessing I need to move the publish call from the @rule
to the @goal_rule
. Presumably by just returning a (boxed?) InteractiveProcess
for the goal to execute.
But what about systems that do not necessarily use a process? i.e. the docker
subsystem could in theory also use the docker library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the execution of the process to the @goal_rule
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Twine isn't idempotent, re-running the upload with the same inputs will result in an error.
True by default, although it can be configured to not fail for existing destinations. @benjyw , @jsirois : thoughts on this one?
But what about systems that do not necessarily use a process? i.e. the docker subsystem could in theory also use the docker library.
I think that even if some of the implementations (like docker) are idempotent, it's still ok to execute them in the foreground. So this makes sense I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Twine fails on re-upload for a reason, I'm not sure we want to elide that away unconditionally.
Thanks a lot, this looks great! Kicked off CI. |
@xlevus : When this is ready for review, please take it out of Draft: thanks! |
Allows rules/targets to easily support values that can be pulled from the environment. Alternative solution to pantsbuild#12797.
Implementation for #8935
Full(er) implementation here: https://github.com/xlevus/pants2-plugins/tree/publish