Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implementation for publish goal #12805

Closed
wants to merge 4 commits into from
Closed

Conversation

xlevus
Copy link
Contributor

@xlevus xlevus commented Sep 9, 2021

Implementation for #8935

Full(er) implementation here: https://github.com/xlevus/pants2-plugins/tree/publish

)

# ???
PublishedPackageSet(packages)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what you're meant to do with the resulting values from the concrete implementations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for logging purposes in the publish goal, I'd say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is likely a good place to render something to the Console that lists the things that were published.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better is logging imo:

for pkg in packages:
for artifact in pkg.artifacts:
msg = []
if artifact.relpath:
msg.append(f"Wrote {dist_dir.relpath / artifact.relpath}")
msg.extend(str(line) for line in artifact.extra_log_lines)
if msg:
logger.info("\n".join(msg))

We've been using logging when it describes a side-effect like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this point the publish will already have happened in the background, so that's the relevant side-effect logging (see other comment). So that's why this felt like a good place for a summary, similar to:

if all_results:
console.print_stderr("")
for results in all_results:
if results.skipped:
sigil = console.sigil_skipped()
status = "skipped"
elif results.exit_code == 0:
sigil = console.sigil_succeeded()
status = "succeeded"
else:
sigil = console.sigil_failed()
status = "failed"
exit_code = results.exit_code
console.print_stderr(f"{sigil} {results.linter_name} {status}.")
return Lint(exit_code)

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most excellent draft, so far, in my opinion.

I have left a few comments to think about, however, don't take them for more than my opinion. I am sure that you'll get more feedback from the core maintainers.

default = "${TWINE_PASSWORD}"


class PypiPublishRequest(PublishRequest):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, that rather than to re-invent for the publish request scenario, to instead make use of the existing field sets mechanisms. Looking at the other core goals, they all manage to employ this across packaging, testing, linting etc..

That would use the TargetRootsToFieldSetsRequest and friends.

makes sense?

Copy link
Contributor Author

@xlevus xlevus Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach makes sense, but the feasibility of the implementation not so much...

I went with this approach to support publishing to multiple different repositories for any given type.

My understanding is a FieldSet is just a subset of the fields of a target. But we're not just operating on a subset of those fields.

e.g. this example: https://github.com/xlevus/pants2-plugins/blob/publish/examples/pypi_repository/BUILD
The example_package has two publish_targets one is a pypi repository, the other a S3 bucket.

So we end up with two calls to Get(PublishedPackage,...) roughly like:

Get(PublishedPackage, PublishRequest, PyPiPublishRequest(publish_target=":pypi-repo", fieldset=PypiFieldSet()))
Get(PublishedPackage, PublishRequest, S3PublishRequest(publish_target=":s3-bucket", fieldset=S3Bucket()))

So, (as far as my understanding goes) using TargetRootsToFieldSetsRequest, There's still going to need to be a Union'd class to carry the [FieldSet, PublishTarget] pair.

It gets a bit weird, as I predicted that a publish-target (repository) might need to know more than just the artifacts of the build (i.e. a docker push needs to know the image and version) So there's a bit of ping-pong.

  1. We find all the targets that have the minimum publishable fields
  2. We find the different publish-targets for the publishee
  3. We then determine what fields each of those targets need
  4. We check that the publishee has the fields that the publish-target wants.
  5. we do the thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty close, but I think a few of those steps is handled by the FieldSet logic, for us.
I've been playing with a POC locally to try out my idea.. and what I think I've landed in, that it will be easier if each repository/registry has its own field type, registered to each target type that it supports to publish for.

Say that the pypi publisher registers a pypi_repositories field for python_distribution targets, and the s3 publisher registers a s3_buckets field for python_distribution, archive and files for instance. With the added benefit of it will be very clear which kinds of repositories are supported for which kind of targets.

Then it'll be straightforward to get the field sets for all targets to publish, based on targets that has any of those publish fields in the BUILD file. Some of this reasoning is inspired by the lint goal, and how it constructs the set of LintRequests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for implementing it this way was to make things in a sense 'just work'.

If a developer A writes a plugin that allows them to publish a file to some form of file store, then developer B writes a plugin that bundles some files together, both should work together for the end-user.

Having a specific field type for each type of publishable target would mean that either A would need to know about B to add support, or the end user would have to write their own plugin to shim A and B together.

With this implementation, A just needs to ensure they do some sanity-checking on input, and B just needs to ensure their target has a publish_targets field.

Yes, there are cases where things are not compatible. i.e. uploading a docker image (a 'fileless' artifact) to a file-store. Or a tarball to a pypy repository. But to me, it does not seem too egregious to just error with a "I can't do that".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eric-Arellano : Any thoughts here?

@@ -1,10 +1,14 @@
# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

from pants.backend.experimental.python import user_lockfiles
from pants.backend.experimental.python import publish_pypi, user_lockfiles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be better to separate the rules for publishing to pypi, from the tool used to do it (twine). That way, there can potentially be multiple tools to choose from, that all publishes to a pypi registry, and it will be up to the user to decide which one to enable.

Copy link
Contributor Author

@xlevus xlevus Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the pypi_repository target stay as a 'core' type, with the twine subsystem being a user-configurable backend, or does pypi_repository become twine_repository and sit in it's own backend?

My counter argument:

There should be one-- and preferably only one --obvious way to do it.

Supporting two tools to publish to pypi just sounds like twice as much code to maintain to achieve the same goal.

Copy link
Member

@kaos kaos Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not saying that there'd necessarily be more than one supported by core pants. Not entirely sure how best to structure this, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be better to separate the rules for publishing to pypi, from the tool used to do it (twine). That way, there can potentially be multiple tools to choose from, that all publishes to a pypi registry, and it will be up to the user to decide which one to enable.

I'm less sure about this... since the destination (PyPI) has a particular API (, auth requirements, etc), various tools to publish to it are likely to be fairly similar in terms of their arguments. Seems like maybe premature generalization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just consider if I'd rather use, say, poetry to publish my dist instead of twine, how would that work? (for reasons, like using fewer tools/configuration etc)
I may have to provide my own plugin to support using poetry for this, but I'd want to be able to do so, without the default twine implementation tripping me up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have to define a poetry_pypi_repository target.

@@ -16,6 +16,7 @@

name_value_re = re.compile(r"([A-Za-z_]\w*)=(.*)")
shorthand_re = re.compile(r"([A-Za-z_]\w*)")
interpolation_re = re.compile(r"^\$\{([A-Za-z_]{1,}[A-Za-z0-9]{0,})\}$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing that ${VAR} is expanded but $VAR is not. Also, I would expect that "embedded $VAR here" to expand the VAR in the middle of the string.

With that said, I'm not too sure that this is what we want here, or need. The Pants way to do it, is to add the required values as options on the tool used.

Say you have a twine tool. Along with that is options for specifying which version to use (if not the default) and possibly overriding the download url, if needed.
There'll also be the right place to add twine specific configuration, such as username, password, and config files for twine, to setup repository configurations. Those options may pick values for specific options from ENV vars as needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ps.
I realize this comes from my own suggestion earlier over in your own repo. Didn't think of the tooling options then, apologies :)
Ds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing that ${VAR} is expanded but $VAR is not. Also, I would expect that "embedded $VAR here" to expand the VAR in the middle of the string.

This is mostly just a result of my idioms, as I'm used to predominantly using the ${foo} syntax in Dockerfile and docker-compose.yml files.

Also has the convenient side-effect of supporting non-variable values that start with a $ symbol. Still doesn't cover ${Not_A_Variable,This_Is_a_password} but I think the odds of that are significantly smaller.

With that said, I'm not too sure that this is what we want here, or need. The Pants way to do it, is to add the required values as options on the tool used.
Say you have a twine tool. Along with that is options for specifying which version to use (if not the default) and possibly overriding the download url, if needed.
There'll also be the right place to add twine specific configuration, such as username, password, and config files for twine, to setup repository configurations. Those options may pick values for specific options from ENV vars as needed.

The username and password isn't a configuration value for Twine, but rather a configuration value for the repository. In my use-case, I have two destinations for python packages. One for internal tooling, and the other to a "public" repository for clients. Both have different credentials.

Ideally, I wouldn't like to implement it specifically and have it handled by something that resolves #12797. But my familiarity with Pants only goes so far 😉

Copy link
Member

@kaos kaos Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, there's an escape for $ itself for disambiguation. That makes it clear what is, and what is not, a variable for expansion.

However, regarding twine, you can provide a configuration file with repositories, and those in turn supports environment variables: https://twine.readthedocs.io/en/latest/#environment-variables

Question then just becomes, what environment variables to pass on when running twine, but I think that will be a cleaner approach.

Edit: Read that documentation a bit too quickly. It was instead of the config file. Sigh. Will reconsider my reply..

Copy link
Member

@kaos kaos Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is an OK limitation to not expect publishing to more than a single pypi repo at a time (when relying on env vars)?
In which case the env variables to use are a defined set at all times.

If using a config file, and that file can also hold the required credentials, then there's no such limitations.

@@ -1,10 +1,14 @@
# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

from pants.backend.experimental.python import user_lockfiles
from pants.backend.experimental.python import publish_pypi, user_lockfiles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be better to separate the rules for publishing to pypi, from the tool used to do it (twine). That way, there can potentially be multiple tools to choose from, that all publishes to a pypi registry, and it will be up to the user to decide which one to enable.

I'm less sure about this... since the destination (PyPI) has a particular API (, auth requirements, etc), various tools to publish to it are likely to be fairly similar in terms of their arguments. Seems like maybe premature generalization.

)

# ???
PublishedPackageSet(packages)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is likely a good place to render something to the Console that lists the things that were published.

Comment on lines 119 to 131
process = await Get(
Process,
VenvPexProcess,
VenvPexProcess(
twine_pex,
argv=["upload", *call_args.args, *paths],
input_digest=request.built_package.digest,
extra_env=call_args.env,
description=f"Publishing {', '.join(paths)} to {request.publish_target.address}.",
),
)

await Get(ProcessResult, Process, process)
Copy link
Member

@stuhood stuhood Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is side-effecting, which makes it a vaguely sketchy thing to do in a @rule rather than at the top level in the @goal_rule if there is any chance that the Process is not idempotent. For example: this process invoke will be cached (unless you pass a non-default CacheScope for the *Process), and will not re-run after succeeding. By default, failed processes are not cached, so that part should be fine. But it's also the case that while executing a cache lookup the N+1th time this process runs, we might retry the process and then cancel it.

So. An important question for this Goal's API is: do you think that it is a reasonable requirement that all publishing commands/destinations MUST be idempotent? If so, then I think that the current API works. If not so, then rather than actually executing the publish processes in @rules, you would want to return them as a list of publish InteractiveProcesses to have the @goal_rule execute synchronously in the foreground.

In my experience, idempotent publish commands are generally a good idea, but it's something to be intentional about here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little unsure on how to proceed with this.

Twine isn't idempotent, re-running the upload with the same inputs will result in an error.

So, I'm guessing I need to move the publish call from the @rule to the @goal_rule. Presumably by just returning a (boxed?) InteractiveProcess for the goal to execute.

But what about systems that do not necessarily use a process? i.e. the docker subsystem could in theory also use the docker library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the execution of the process to the @goal_rule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Twine isn't idempotent, re-running the upload with the same inputs will result in an error.

True by default, although it can be configured to not fail for existing destinations. @benjyw , @jsirois : thoughts on this one?

But what about systems that do not necessarily use a process? i.e. the docker subsystem could in theory also use the docker library.

I think that even if some of the implementations (like docker) are idempotent, it's still ok to execute them in the foreground. So this makes sense I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Twine fails on re-upload for a reason, I'm not sure we want to elide that away unconditionally.

@stuhood
Copy link
Member

stuhood commented Sep 9, 2021

Thanks a lot, this looks great! Kicked off CI.

@stuhood
Copy link
Member

stuhood commented Sep 10, 2021

@xlevus : When this is ready for review, please take it out of Draft: thanks!

@kaos
Copy link
Member

kaos commented Nov 12, 2021

Thanks for kicking this off the ground, @xlevus
Closing this as we have merged #13057.

@kaos kaos closed this Nov 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants