-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"modular go fast" - Reducing task times in modular projects - Umbrella Issue ☂️ #62
Comments
Build PerformanceI'm going to try my best to explain 'the problem' and possible approaches in this comment. Apologies if I'm re-explaining something that has already been said -- there's a lot of interconnection in my thoughts and I need to get the whole of it out in the open for criticism. The problemThe basic problem is that while small projects often have respectable webpack build times, larger projects frequently end up with build times that are unbearably bad on CI and sometimes affect local DX. Part of the issue is that, within the industry that we work (finance), we're often creating applications that are larger than those found in other industries. Firstly, these are web apps and not web sites, and even though the UI that is visible is quite often small, the business logic and functionality is quite often complicated due to regulations or user needs (each application could be 6+ months of a 5-person team's work). Secondly, and more importantly, we're often asked to build applications that aggregate other applications together into something a bit like an operating system, tiling window manager or dashboard. Given these two points, and the possibility that 500+ applications might end up being accessed from the same interface, it's very easy to end up with webpack builds that are either impossible or which massively slow down a team's ability to get work done in a timely manner. This makes us heavy consumers of CI/compute infrastructure which sometimes can't be procured when our demands are too high. What tends to end up happening is that team's learn to develop and deploy their applications completely independently so they can have their own pipelines with their own metrics, and ensure that their build times don't get in the way of them doing work. As discussed previously this causes horrible difficulty integrating and testing, significant wheel reinvention, bloated bundle sizes, and eventual terminal lock-in to specific versions of libraries (that could have bugs, vulnerabilities or performance deficiencies compared to what is state-of-the-art). Another way that we are different is that we tend to not use traditional routers. Elsewhere in the industry it is common for developers to use libraries like React Router to code-split an application at the URL boundary, but here applications tend to mainly exist against one URL and be flexibly assembled from separate views into a layout. The source code that describes this might look like: import React, { lazy } from 'react';
import { Layout } from 'layout';
const viewsMap = {
'@modular-app/view-0001': lazy(() => import('@modular-app/view-0001')),
'@modular-app/view-0002': lazy(() => import('@modular-app/view-0002')),
'@modular-app/view-0003': lazy(() => import('@modular-app/view-0003')),
'@modular-app/view-0004': lazy(() => import('@modular-app/view-0004')),
'@modular-app/view-0005': lazy(() => import('@modular-app/view-0005')),
// ... many more views ...
//
// If each application had a 2-minute long build time, by
// combining every application into one the build time
// will increase to the point that either webpack fails
// to build due to an OOM crash, or it takes so long
// that engineers aren’t able to get their PRs merged
// in a timely manner.
//
// ... many more views ...
'@modular-app/view-0997': lazy(() => import('@modular-app/view-0997')),
'@modular-app/view-0998': lazy(() => import('@modular-app/view-0998')),
'@modular-app/view-0999': lazy(() => import('@modular-app/view-0999')),
'@modular-app/view-1000': lazy(() => import('@modular-app/view-1000')),
}
export default function App({ visibleViews = {} }) {
return (
<Layout>
{Object.fromEntries(visibleViews).map(
([viewName, props]) => {
const View = viewsMap[viewName];
return <View key={viewName} {...props} />;
}
)}
</Layout>
);
} Effectively, the problem with bundling the source code above, is that while code-splitting allows you to only pay the runtime cost of the views you load, you still have to pay the build-time cost of all of the views pointed to by the page. This eventually stops being viable. The impact of not solving the problemGiven the scale that we operate at and the type of applications we tend to create, Modular will not work for us in an acceptable way unless we solve this problem. It is a critical issue that must be resolved. The risk isn’t just that builds would take too long. The risk is that we would cause out-of-memory (OOM) crashes. This was confirmed in another repository testing super-large webpack builds. If all applications within our part of the company are built within a single repository, and we ignore this problem until we get OOMs, it would be catastrophic. It would block all PR merges and deployments for all UI Software Engineers. High-level goals and non-functional requirements (NFRs)
The approachesWe’ve looked into a number of different techniques and tools including webpack 5’s persistent caching, Vite, Snowpack, smart ESM CDNs were discussed when we were considering ESM vs Module Federation, esbuild, Rush.js (and interesting supporting libraries Build source code fasteresbuild is roughly ~100x faster than other bundlers, however, that is presumably only the case if we were to use it for bundling, transpilation and minification. If we are still going to use webpack as our bundler, and Babel as the transpiler, then we will not get as significant a speed-up from esbuild. That said, it would be beneficial to use it for minification and perhaps transpilation. A consideration if using it for transpilation is that it is written in Golang and is the work of a singular developer. It might be harder for engineers to contribute to and presumably there are fewer people checking the implementation of transpilation rules at PR. Do less workIdeally, we’d not need to transpile or bundle at all — browsers would support modern syntax, and corporate networks wouldn’t cause us any trouble. This is some of the promise of tools like Snowpack, which avoid transpilation and bundling, and ship ES Modules. It’s a good idea but in practice we don’t think the ecosystem is quite ready, and some of the solutions demand a more modern environment than we often find ourselves working with (for example, I’ve heard of HTTP/2+ being disabled on some corporate networks). Incremental builds
The latest beta of webpack 5 has support for persistent caching, which would improve the speed of builds in the majority of cases by re-using previous work. I believe that this would need to be coupled to logic from Rushstack, Nx or Backfill to create a hash of what we use to build an application and to associate the build cache/outputs to this on CI. Unfortunately, CRA doesn’t yet support webpack 5 (although I did start some work towards this upgrade back in March). Since Next.js currently supports webpack 5, if required we should have the necessary context to finish any upgrade to CRA. The big issue with depending on incremental builds for speed is that if the build gets too large and then we inadvertently bust the cache with a change to a core library, we could end up in a situation where there needs to be a full re-build but we can’t do this because it takes too long or in the worst case scenario crashes with OOM errors. Build applications lazilySo far each option has presumed that you would build an application upfront during your CI process, but what if you could build application source code lazily at the point that it’s required? Since in many cases ‘builds’ are never deployed it makes sense that we wouldn’t want to pay the cost of bundling JavaScript and assets until they are needed. This is what smart CDNs like This approach was also brought up by @threepointone on Twitter here. Potentially, if we were to do this, we could look into rewriting the There are probably cons to this approach that I haven’t considered. My immediate concerns are that (1) if there would have been transpilation/bundler errors during CI we’ve pushed these to the runtime, and (2) if we don’t ‘warm’ these lazily builded imports by building them at startup, we could end up with the runtime of the application being stalled by these requests as they are built in the background. Separate applications into multiple builds and join them together using module federationYou can read about module federation here. It’s a feature that allows you to split a build up into multiple build outputs which are then stitched into a single application at runtime. The main drawbacks are that (1) the integration is complicated enough that we could accidentally create lock-in if we’re not careful, (2) forgetting to share modules which contain singletons will break your application at runtime, and (3) because webpack is only considering parts of the module graph at a time and you must explicitly opt-in to sharing modules, vendoring and chunking are less effective and the bundle size will not be optimal. Closing RemarksCombining multiple approaches together might be the best approach. During a discussion with @NMinhNguyen he mentioned that we could use a pattern like I’m unsure about the right solution and it could be very different from what I am envisaging but I do have a few opinions:
|
^ This is a bit of a long comment so I'm going to attempt to prioritise and split off the ideas into separate GitHub issues. With regards to 'Build Performance' I would personally prioritise the work as follows:
(We might not need to do all of these things immediately of course.) |
This is an umbrella task for all things related to reducing task times in modular projects.
As modular projects grow (as they should), because we do centralised tasks for build/start/test/etc, we will hit bottlenecks in being able to develop and deploy quickly. While this shouldn't affect daily development per se, it'll start affecting productivity as a whole. Some examples:
modular start
might take a long time to warm up, which isn't nice.(note: We should make a comprehensive list of pain points; the solutions won't be super general, so we should make sure we've looked at every possible pain point.)
(note: this issue is not about runtime performance of react applications, though we should probably make an umbrella task for that too)
Possible solutions and strategies:
caching third party dependencies, so it doesn't have to be pulled down on every build (existing work: https://circleci.com/docs/2.0/caching/, https://github.com/actions/cache)
webpack 5's module federation for builds https://webpack.js.org/concepts/module-federation/ We can split the build into pieces (and stitch them up manually, if at all) with this feature; it'll be key that we do this automatically, and without exposing the internals to consumers, or else it'll be hell to unwind later. Things to consider: deduplicating dependencies, verifying module graphs, etc.
webpack 5's persistent caching https://github.com/webpack/changelog-v5/blob/master/guides/persistent-caching.md Again, it'll be key to do this without exposing any internals to the user.
configure jest to only run affected tests on builds (https://jestjs.io/docs/en/cli#--changedsince, https://jestjs.io/docs/en/cli#--findrelatedtests-spaceseparatedlistofsourcefiles) A thing to note here is that we should be able to generate complete coverage reports even if we only run a few of them during. (another precedent in java land - https://github.com/jpmorganchase/sandboni-core)
Use feature flags for rapid release cycles: we should be able to build and ship features rapidly, and be able to turn them on/off as we desire. (https://martinfowler.com/articles/feature-toggles.html, and my own writeup https://gist.github.com/threepointone/2c2fae0622681284410ec9edcc6acf9e)
lint/prettier only on changed files (copy react's yarn linc, basically): In the modular repo, we've setup linting only for changed files when we do commits, but not as a standalone command. We can copy react's linc command. We should also ship the commit behaviour and commands in generated repositories.
Incremental builds: This is fairly nascent in production world for the javascript ecosystem; usually because solutions are tightly bound to serving infrastructure and so on. 'Big' companies like fb/google have in-house solutions for the same. There's an opportunity here to start work on a project that's designed for things like from the start. (I hear whispers of parcel 2 also working on similar goals)
Please feel free to add more to this list in the replies, and/or feedback. I'll keep updating this list based on so. If you'd like to start work on any of these, please file a separate issue and link it back here.
The text was updated successfully, but these errors were encountered: