Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pants: Add BUILD metadata to handle git submodule #6258

Merged
merged 10 commits into from
Oct 8, 2024

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Oct 5, 2024

This PR has commits extracted from #6202 where I'm working on getting all of our unit tests to run under pants+pytest.

This PR focuses on managing the git submodule we use to test pack features, especially the content version selection that relies on git worktree. Pants runs processes like pytest in a sandbox, but the .git directory does not get copied into the sandbox. Plus, .git is generally ignored, so I had to find a way to work around the pants assumptions that ignore .git and copy the submodule bits into the sandbox.

This is the target that copies the .git directory for the submodule:

st2/BUILD

Lines 105 to 109 in a25bcfd

shell_command(
name="capture_git_modules",
environment="in_repo_workspace",
command="cp -r .git/modules {chroot}/.git",
tools=["cp"],

This relies on a few pants features, some of which are experimental (which means the interface can change in a future release of pants):

  • shell_command(...): This target tells pants to run a command when another target depends on this one. See the docs. For example, this is a test that depends on this target:

"test_pythonrunner.py": dict(
dependencies=[
"st2tests/st2tests/resources/packs/pythonactions/actions",
"//:capture_git_modules",

  • environment=...: Environments is a newer, experimental feature in pants. Typically things run in the local environment, but they can also run remotely or in docker depending on the config. See the docs for this feature, and the blog post about it.
  • "in_repo_workspace": this is the name of the environment we're using. To define the environment, I had to register it in pants.toml and add a BUILD target. Here is the snippet from pants.toml (note that environments are an experimental/preview feature):

st2/pants.toml

Lines 252 to 254 in a25bcfd

[environments-preview.names]
# https://www.pantsbuild.org/stable/docs/using-pants/environments
in_repo_workspace = "//:in_repo_workspace"

The "in_repo_workspace" environment uses an even newer experimental feature, the experimental_workspace_environment(...) target. See the docs overview and the docs reference. This environment is the key to capturing .git, because it allows us to run the shell_command in the repo (in the workspace) instead of running in a sandbox. Pants could run a command using another target like run_shell_command, but they either don't allow capturing output files for use in other sandboxed tasks, or they were more cumbersome. I tried to add plenty of documentation to the new BUILD.environment file. In the future we might add one or more docker_environment targets as well. Here is the target definition:

st2/BUILD.environment

Lines 10 to 14 in a25bcfd

experimental_workspace_environment(
name="in_repo_workspace",
description=(
"""
This allows shell_command and similar to run in the repo, instead of in a sandbox.

Returning to the shell_command, with command="cp -r .git/modules {chroot}/.git":

  • Here, {chroot} is very important. Though the command does not run in a sandbox (aka "chroot"), the command still has a sandbox that can contain generated files, or, for our purposes, files captured as "outputs" that can be placed, as "inputs", in the sandbox of targets that depend on the command. The run_shell_command docs are helpful in understanding this.
  • This is where the shell_command defines the files to capture from its sandbox. The command merely copies the files we need (which are very small because the test repo is tiny) into the sandbox so they can be captured:

st2/BUILD

Line 118 in a25bcfd

output_directories=[".git/modules"],

  • And this is where I defined some dependencies of the command. execution_dependencies makes pants copy stuff into the sandbox, and more importantly, tells pants it has to rerun the command if the files change. output_dependencies gets passed onto any targets that depend on the command, so they also transitively depend on the files.

st2/BUILD

Lines 116 to 117 in a25bcfd

execution_dependencies=[":gitmodules"],
output_dependencies=[":gitmodules"],

  • The gitmodules target is defined here. Note that .git is a file, not a directory, in the submodule. It points git to the actual location of the git metadata in the st2 repo's .git directory. We can't directly capture .git/modules like this, because the .git directory is ignored.

st2/BUILD

Lines 97 to 103 in a25bcfd

files(
name="gitmodules",
sources=[
".gitmodules",
"**/.git",
],
)

  • Sadly, I couldn't find a clean way to make pants invalidate cached results of the process when someone updates the commit in the submodule. So, I left this note with the workaround:

st2/BUILD

Lines 110 to 115 in a25bcfd

# execution_dependencies allows pants to invalidate the output
# of this command if the .gitmodules file changes (for example:
# if a submodule gets updated to a different repo).
# Sadly this does not get invalidated if the submodule commit
# is updated. In our case, that should be rare. To work around
# this, kill the `pantsd` process after updating a submodule.

Finally, the last piece of getting these tests to run with pytest+pants in #6202 was working around a quirk of cloning in GHA. The actions/checkout module only fetches 1 commit, which is great for st2.git, but it is not enough for the submodule which needs the full history and git tags. So, I updated The pants test GHA workflow to work around this.

@cognifloyd cognifloyd added this to the pants milestone Oct 5, 2024
@cognifloyd cognifloyd self-assigned this Oct 5, 2024
@pull-request-size pull-request-size bot added the size/M PR that changes 30-99 lines. Good size to review. label Oct 5, 2024
@cognifloyd cognifloyd changed the title Pants dep metadata Pants: Add BUILD metadata to handle git submodule Oct 5, 2024
@cognifloyd cognifloyd marked this pull request as ready for review October 5, 2024 15:43
@cognifloyd
Copy link
Member Author

I have no idea why circleci is failing. Nothing here touches anything that gets packaged.

@cognifloyd cognifloyd merged commit 4d69f8a into master Oct 8, 2024
29 checks passed
@cognifloyd cognifloyd deleted the pants-dep-metadata branch October 8, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pantsbuild size/M PR that changes 30-99 lines. Good size to review. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants