-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build & Test Caching / Incremental Builds / "Modular Cloud" (Remote Computation Cache) #121
Comments
Not at all convinced if this is a good idea, but Jest has some logic to figure out changed files and dependencies between files:
It is also possible to define custom Jest runners: https://www.youtube.com/watch?v=U_IYuAXtJZ0 Might be an interesting experiment to see if these packages can be used to detect changed modules/packages? |
Btw, this issue assumes that we want to build and deploy all of the widgets/views within the repository and that this repository will grow very large because it's a single repository for (almost) all applications. If that wasn't the case, or we didn't want to use caching to solve this, we could go back to the original 'module federation' solution that @NMinhNguyen and I created back in April (and only use 'affected' logic). Also, there are other solutions described here (e.g. lazy compilation). However, presumably we'd still need caching for coverage reports of large repositories, since I understand that they're required for all of the source code. |
Related to this is @tannerlinsley's gist "Replacing Create React App with the Next.js CLI".
It would be worth us investigating how a Next.js app using webpack 5 works with persistent caching (perhaps using the internal cache tool we've built, or experiment on a branch on GitHub using GitHub Cache Action). |
tracking issue for webpack 5 support in create react app facebook/create-react-app#9994 Feels like something we could potentially contribute to ourselves. |
Modular is an open-source framework built for scaling up your development workflow so you can spend more time doing productive work. One important aspect of allowing a project to scale is its build/test time; if this is high, it degrades the efficiency of the whole development workflow.
Early in the project we implemented a cache of
node_modules/
to reduce/skip the installation step on CI. This is a tablestakes feature of modern open-source CI systems and benefits projects within the firm by collapsingyarn install
times from ~2 minutes to ~18 seconds.Very recently @NMinhNguyen installed this caching layer into another internal project and found that the impact was far greater, with the CI process going from taking ~19 minutes in total to only ~3 minutes. This outcome was highly surprising and it turned out that the reason for it was because the project in question used Nx which incidentally stores a build cache within the
node_modules/
directory we cached.This technique of caching build information/outputs is extremely common and absolutely necessary if you wish to scale a very large project within a single repository / CI process.
It is widely seen as a way of making builds/tests up to 10x faster, which can mean a 10x higher throughput of PRs by a team, or a very significant cost-saving for the business with regards to CI resource usage.
e.g. Bazel and Nx
If you're wondering whether remote caching is a basic feature of all projects that allow you to build/test large repositories you'd be correct. It is a foundational feature that every monorepo build toolchain implements to build/test large repositories:
Right from the onset of the internal
*-cache
project, its purpose has been to create the foundations for fast builds/tests within the firm, that is required for Modular to provide a scalable development workflow. In a way, what has been built essentially acts like 'Nx Private Cloud' but with fewer bells and whistles.As a remote cache is now deployed internally, we'll now be able to use this to increase the scalability of other work within a repository (e.g. coverage), and we shall discuss tooling and primitives to do so below.
Build in-house vs delegate to opensource tool
Rebuilding a tool like Nx or Bazel would be development intensive.
We could choose to wrap one of these tools or we could choose to provide guidance on how they could be used on top of Modular. However, since build/test scalability appears to be an important aspect of what we're trying to achieve, completely delegating this work to huge/complex toolchains that we aren't able to control or understand well might be a mistake.
My opinion is that it'd be a good idea to create internal caching 'primitives' that can be used by each of the tasks in a development workflow, and so we should experiment with and try to understand the design choices in the bigger toolchains, but also take a look at smaller libraries which appear to deal with small aspects of the problem to see if we can learn approaches from these (e.g.
beezel
,backfill
and@rushstack/package-deps-hash
).Implement
CacheStorage
classSomething that should be mentioned is that adding a new remote cache to some of the build tools is relatively easy. For example, Nx has the following class (see also
nx-remotecache-gcs
) andbackfill
has a similarCacheStorage
class.Minh suggested that we could use
patch-package
to add an S3 Cache Provider intobackfill
and then upstream this as a PR once we are happy with it.Caching Primitives, Caching Strategies & Cache Coarsenesss
Depending on whether you are caching package builds, very large applications,
node_modules/
or tests/coverage you could need a completely different caching strategy.Concepts
Key generation
Keys are constructed from the information used to execute a task. (See also Nx's docs on this.)
Fallbacks when there are cache misses (but yet useful caches are available)
GitHub Cache has the concept of
restore-keys
to allow fallback to other caches.Per-package caches
Tools like Nx create a cache per package. These work well for monorepos with many independent packages, since we can decide for each package whether to skip build or test steps depending on whether a cache key exists or not.
Application caches
On the other hand, if you have a webpack application that spans a whole repository and this is updated and persisted on every build, a tool like Nx wouldn't be a good choice. If your cache is produced from a large amount of source code and this is not partitioned by package, then a single file change would cause the cache key for the whole application cache to change and caching would be completely unviable as a solution.
In this situation, we should not generate cache keys in the same way. We need a strategy in which the last known good cache for the branch is retrieved. There are a number of CI providers which seem to have a time-based cache strategy and seem to use the latest uploaded cache.
Additionally, this depends on there being a way of caching the build of applications in a highly granular way (Snowpack might be the best case here?). If we're continuing to use webpack, we would need to upgrade to webpack 5, which adds support for a persistent cache. See the following comment:
Open-sourcing the backend as "Modular Cloud"
If we'd like Modular to scale large development workflows outside the firm, we should consider opensourcing a version of the internal
*-cache
backend without any logic specific to internal services. We could call it 'Modular Cloud' as it would be synonymous with 'Nx Cloud'.The other possibility would be to allow users to swap out the cache backend used by Modular.
The text was updated successfully, but these errors were encountered: