Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TestModule#test run in T.dest folder by default #3347

Merged
merged 15 commits into from
Aug 11, 2024

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Aug 8, 2024

This PR makes it such that TestModule#test runs subprocesses inside T.dest / "sandbox" by default, rather than T.workspace.

This change can be disabled with a flag def testSandboxWorkingDir = false, but is a good default to encourage:

  • Before, because tests run with the pwd in the repository root, there are no guardrails preventing you from doing the "easy" thing and just reading stuff off of disk rather than having explicit build dependencies.

    • Even though often people put stuff in resources/ and read from it, which did the right thing "accidentally" since tests depend on the resource folder, it was easy for people to put stuff in other not-resources/ folders and have the dependency between the test files and the test task missing.
    • This also means things like --watch do not properly pick up changes, and testCached does not properly get invalidated.
    • It also means that it is easy to accidentally write over shared files, which becomes problematic when parallelism is turned on (which becomes the default for testCached in https://github.com/com-lihaoyi/mill/pull/3265/files, and we probably want to encourage more testCached going forward)
  • With this change, tests run in empty T.dest / "sandbox" folders. This means that it is very hard to "accidentally" do the wrong thing:

    • Although it is still possible to read and write anywhere on disk, it now becomes something you have to try hard to do, rather than something that can be done by accident.
    • Rather, you should prefer to pass files to tests via forkEnv or forkArgs, which is a pattern we already do in Mill's own build (e.g. here and here).
    • This makes an dependencies between tests and their files explicit, so --watch works and testCached properly invalidates
    • To provide convenience for the common case of use cases where we read stuff from the resource directories, I added a MILL_TEST_RESOURCE_FOLDER environment variable that contains the resource folders as a semicolon-separated string. In the common case, this is a single path, and can be used in your test code to refer to the resource folder as an alternative to the old practice reading directly from os.pwd / "src" / "test" / "resources" or whatever

There is some design leeway here; we could follow bazel, and create symlinks for all referenced PathRefs inside the test sandbox folder. Or we could do nothing, and let people pass in the stuff they need manually via forkEnv or forkArgs. I think the proposed approach of providing MILL_TEST_RESOURCE_FOLDER by default, and letting people provide other things on their own, is a reasonable compromise between these approaches. A full design and implementation of a bazel-style "Runfiles" system is probably more complexity than I am willing bear at this moment. This change is nowhere near the sophistication of the bazel sandbox, but it is a start

This is a binary compatible but semantically breaking change, and should go into 0.12.0. It will also need extensive updates to the documentation, which I'll postpone till later since we don't want those changes to be visible to users until this change has been released

@lihaoyi lihaoyi changed the title [RFC] Make TestModule#test run in T.dest folder by default Make TestModule#test run in T.dest folder by default Aug 8, 2024
@lefou
Copy link
Member

lefou commented Aug 8, 2024

I could imaging a dedicated testSandboxResources targets, which automatically get copied into the sandbox when preparing the test run. This can be helpful if the app under test needs some files.

In contrast, to the other resources, which always specify resources meant to be found on the classpath, this new target explicitly is about files in the FS at runtime, but not on the classpath.

@lefou
Copy link
Member

lefou commented Aug 8, 2024

This is a binary compatible but semantically breaking change, and should go into 0.12.0. It will also need extensive updates to the documentation, which I'll postpone till later since we don't want those changes to be visible to users until this change has been released

Again, I think we should not postpone any documentation. It hinders early adoption and testing, also proofreading and incremental fine-tuning. Instead we should make the latest stable release the default version we show in the documentation. For me, the ideal scenario is to only release features that are documented at release time.

@lihaoyi lihaoyi requested a review from lefou August 9, 2024 00:54
@lihaoyi
Copy link
Member Author

lihaoyi commented Aug 10, 2024

@lefou I've added some docs to this PR covering the different ways of loading resources in tests, both classpath based and file based approaches using environment variables, in Java and Scala

I think for this PR let's just stick with the status-quo pass-path-as-environmental-variable approach that we've used in the past. There's definitely more we can do here, but that will take more thought and can come in a follow up

@lihaoyi lihaoyi added this to the 0.12.0 milestone Aug 10, 2024
@lefou
Copy link
Member

lefou commented Aug 11, 2024

The reason why I proposed a new testSandboxResouces is the fact that test resources are already bundles up in the main jar and therefore using them directly from the filesystem blurs their correct purpose. This leads to projects having lots of test reosurces in the resouces directory, which will effectively also end up in the jar, but were never supposed to end up there. I think the fact that so many projects use the test resources folder is just convenience. Maven, who invented the src/test/resources folder is simply too hard to customized to use a better location for such resources, to begin with.

In contrast, we could easily make a distinction here, make it easy to do the right thing, and avoid encouraging users to do the not-so-correct thing. resources is supposed to be packaged up in the jar, so the correct usage of these files is via the classpath. (In Maven slang, we're speaking of something like a src/run/resources.) A new type of resources (testSandboxResources or some other name) will never end up in the package. Looks like a better place for resources a test case shall look up at run time from the filesystem. Even if we don't copy them into the sandbox but make the path accessible via a Java sysprop, it looks to me as the cleaner approach.

@lihaoyi
Copy link
Member Author

lihaoyi commented Aug 11, 2024

Going to merge this for now, but let's continue test file discussion in #3362 and see what solution we can find

@lihaoyi lihaoyi merged commit 1cc6f91 into com-lihaoyi:main Aug 11, 2024
30 checks passed
lihaoyi added a commit that referenced this pull request Aug 16, 2024
This makes `os.pwd` an empty directory: `out/mill-worker-*/sandbox/` for
Mill's server mode, `out/mill-no-server/sandbox/` otherwise

In general, code outside of tasks should use `millSourcePath`, while
code inside of tasks should use `T.dest` or the `PathRef`s given to it
by upstream tasks. Accessing project files via `os.pwd` can result in
tasks not invalidating when files changed.

Like #3347, this does not
prevent people intentionally walking the filesystem to work mischieve,
but it does help mitigate "accidental" usage of `os.pwd` to access
project files. The original `os.pwd` is still made available as
`mill.api.WorkspaceRoot.workspaceRoot` outside of tasks, or
`T.workspace` inside of tasks, for the occasional scenarios that need
it. But at least this adds some partial guardrails against accidentally
using `os.pwd` in a way that subverts Mill's expectations, and the APIs
that expose the workspace root can be appropriately documented to warn
people against doing the wrong thing
lihaoyi added a commit that referenced this pull request Aug 18, 2024
Also got rid of `MILL_TEST_DEST_FOLDER` since
#3347 makes `os.pwd` sufficient.

There are also a few other `MILL_*` things that look like env vars but
are actually system properties, so I left them out of this PR
lihaoyi added a commit that referenced this pull request Sep 5, 2024
Since #3347 made all test suites
run with `pwd` in the test's `.dest` folder by default, we are already
guaranteed that different concurrent test suites will not collide, so we
do not need the `out/blah/blah` segments to avoid conflicts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants