Coupling all xtasks into one binary can hurt performance #2

epage · 2019-10-16T14:09:46Z

Some xtasks a developer will want to run regularly (like bundling up artifacts for wasm or embedded as part of their debug cycle). Some developers might rarely run, like some code-gen.

Similar for a CI. Multiple independent jobs will need to build the xtasks. Most jobs will just need the ones related to CI. Only one job will need to validate code-gen. Only a couple of jobs, conditioned on version tags, will need to call xtask dist.

By putting all of these into a single binary, we're losing out on performance by requiring it to build dependencies or run build scripts for xtasks that won't be run

The text was updated successfully, but these errors were encountered:

epage · 2019-10-16T14:16:41Z

To be clear, I understand this is a stable approach to getting tasks and might eventually get baked-in in a way that allows these to be separate packages, improving build times. I just wanted to ensure the downside under the current system was noted and that we should look for how to address it with native tasks.

matklad · 2019-10-16T15:00:12Z

Note that splitting it over several binaries/packages can hurt performance as well, as rustc would need to do more linking (each binary will get it's own copy of std).

I wouldn't worry to much about performance, as tasks are compiled once, and should be cacheable pretty robustly. In parcticular, for CI i'd just add ./xtask/target to .cache and be done with it.

matklad · 2019-10-16T15:07:12Z

I've added some general advice about build times: 15ac41b

Given that xtasks shouldn't be rebuild that often, I am inclined to close this issue!

epage · 2019-10-16T15:28:46Z

I wouldn't worry to much about performance, as tasks are compiled once, and should be cacheable pretty robustly. In parcticular, for CI i'd just add ./xtask/target to .cache and be done with it.

I disagree for CIs. I feel like caching within CIs is still very limited until we have worked out the ways to clean up a workspace's stale files. For AzDO, caching is still in preview and they discourage it for production builds. I'll leave to you the decision on whether to re-open.

Less important is the impact on me as a user. I find that from time to time I do clean builds. Sometimes its to recover space from stale files but usually its for various scenarios I'm testing out. A machine-wide cache of non-workspace dependencies would be a big help for reducing the impact of clean builds.

matklad · 2019-10-16T15:32:37Z

Yeah, let's reopen! I can't come up with nice strategy here myself (spliting xtasks over several packages seems like a pain), but perhaps somebody else will!

kornelski · 2019-10-16T20:41:11Z

I would prefer tasks to be separate binaries, as I don't want to parse arguments. Dumping my spaghetti code in main is easier :)

matklad · 2019-10-17T15:56:55Z

@kornelski you'd need to define a separate alias for each xtask then, which is messy. And something like xtask --help would not work

kornelski · 2019-10-17T18:54:35Z

Yeah, I would define separate alias for each task. I expected generation of aliases to be automatic.

Supporting --help seems strange and unnecessary to me. Tasks I use don't take arguments, so why would they have help? make foo doesn't support help.

To me ideally the workflow would be something like:

Make tasks/say_moo.rs with fn main() { println!("moo!") }
Run cargo xtasks --update or such
Expect say_moo to be added to [[bin]] in Cargo.toml (or moved to src/bin/say_moo.rs) and say_moo alias added

Where steps 2 and 3 would not exist if it becomes Cargo built-in solution. This maps my solution with shell files, where I have many shell files, not one with case statement.

matklad · 2019-10-17T19:09:47Z

Supporting --help seems strange and unnecessary to me. Tasks I use don't take arguments, so why would they have help? make foo doesn't support help.

I think there should be a way for the new contributor to the project to figure out the list of available tasks. cargo xtask, cargo xtask --help (or just about any other invocation that doesn't mention existing task) seem like a good UI for showing the list of tasks.

I'll add "multiple binaries" as an alternative, but I personally would still prefer a single binary:

less linking for Cargo
common boilerplate like error handling and argument parsing needs to be done only once
adding a task should update only one file

kornelski · 2019-10-17T19:19:41Z

But cargo task server-start may just run Command for one or two executables, no linking needed. OTOH cargo task make-emoji-pack may require serde, network libraries, unicode libraries, font rendering, image processing libraries, compression libraries, and take minutes to build.

To me separate tasks mean each task starts as fast as it can. Merged tasks mean every task is at least as slow to start as the slowest task.

matklad · 2019-10-17T19:27:31Z

If binaries are part of the same Cargo package, you'd still need to build all deps for any binary unfortunately :-(

kornelski · 2019-10-17T23:28:31Z

Oh, it does indeed. That's a bummer :(

kornelski · 2019-10-17T23:39:09Z

However, still I wouldn't want to be in business of having multi-purpose bin. The tasks I have are sometimes wildly different.

For example for lib.rs I have following binaries:

reindexing all crates
reindexing user data. Even though that's still reindexing, it could reuse maybe 10% of the previous script, because it uses a different database and does parallelism differently.
interactive tool for inspecting github repos.
building crates in docker and collecting results in a database
exporting collected build results into a compressed summary
building a filtered crates.io index git from a compressed summary
building of the whole website as static pages
building of individual page. Takes 1 arg. No help. I did copy'n'paste of the previous script and deleted irrelevant code, instead of making it configurable.
export of rss feed and sitemap
conversion of YAML syntaxes to a compressed bundle for syntect lib
conversion of JSON emoji definitions into a perfect hashmap

I would not want to have all of them in one file with a giant match statement. I'd split them off to separate modules, but managing that doesn't seem better than wrapping each in its own fn main().

I shall also add that I treat them as internal tools (things that would be in bash if bash was faster and could call Rust code better ;) so their code is not very pretty or DRY, since they're either write-once tools, or just a small glue of library functions that have better code and tests elsewhere. They generally don't take args, or take just one.

tv42 · 2023-05-22T20:13:12Z

You could avoid per-task entries in .cargo/config by having one binary dispatch to the others, based on the first argument

cargo xtask foo --flags-for-foo
-> cargo run --package xtask -- foo --flags-for-foo
-> xtask foo --flags-for-foo
-> cargo run --package xtask --bin foo -- --flags-for-foo
-> foo --flags-for-foo

tommy-gilligan · 2023-08-24T07:49:47Z

I ended up bundling all my tasks into a single binary and found this to be kind of cumbersome for my project. I'm going to try using the xtask binary as merely dispatch for my project and see how I go with that but I'm also wondering if a consensus here has evolved much?

Imberflur · 2023-10-26T17:10:17Z

I saw another approach here of putting a less used task with more dependencies behind a feature and then having the xtask rebuild itself with the required features if necessary: https://github.com/gfx-rs/wgpu/blob/55b73a9fb8421135a55f5f5f44a7deca505715cb/xtask/src/main.rs#L18

Might be useful for certain cases.

jcbhmr · 2024-04-20T21:29:41Z

I've used the rebuild-with-feature-flags strategy myself. 👍 It works particularly well when you really want to keep everything in a single file such as when using the new cargo script feature.

example of cargo script with self-rerun feature flag

#!/usr/bin/env -S cargo +nightly -Zscript
```cargo
[features]
generate = ["quick-xml", "wasmtime"]
[dependencies]
quick-xml = { version = "0.3.1", optional = true }
wasmtime = { version = "19.0.1", optional = true }
```

#[cfg(feature = "generate")]
fn generate() -> Result<(), Box<dyn std::error::Error>> {
    use quick_xml::*;
    use wasmtime::*;
    // Do something with quick_xml and wasmtime...
    Ok(())
}

fn build_release() -> Result<(), Box<dyn std::error::Error>> {
    // Do the quick & easy build copy stuff.
    Ok(())
}

fn test_e2e() -> Result<(), Box<dyn std::error::Error>> {
    // Another easy one that requires only quick deps.
    Ok(())
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    match std::env::args().nth(1).ok_or("no task")?.as_str() {
        "generate" => {
            #[cfg(feature = "generate")]
            return generate();
            #[cfg(not(feature = "generate"))]
            return std::process::Command::new("cargo")
              .args(["+nightly", "-Zscript", "run", "--manifest-path", file!()])
              .args(["--features", "generate", "--", "generate"])
              .status()?
              .success()
              .then_some(())
              .ok_or("cmd failed".into());
        },
        "build-release" => build_release(),
        "test-e2e" => test_e2e(),
        _ => Err("no such task".into()),
    }
}

matklad closed this as completed Oct 16, 2019

matklad reopened this Oct 16, 2019

weihanglo mentioned this issue Apr 25, 2023

chore(xtask): Add cargo xtask unpublished rust-lang/cargo#12039

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coupling all xtasks into one binary can hurt performance #2

Coupling all xtasks into one binary can hurt performance #2

epage commented Oct 16, 2019

epage commented Oct 16, 2019

matklad commented Oct 16, 2019

matklad commented Oct 16, 2019

epage commented Oct 16, 2019

matklad commented Oct 16, 2019

kornelski commented Oct 16, 2019

matklad commented Oct 17, 2019

kornelski commented Oct 17, 2019 •

edited

Loading

matklad commented Oct 17, 2019

kornelski commented Oct 17, 2019 •

edited

Loading

matklad commented Oct 17, 2019

kornelski commented Oct 17, 2019

kornelski commented Oct 17, 2019 •

edited

Loading

tv42 commented May 22, 2023 •

edited

Loading

tommy-gilligan commented Aug 24, 2023 •

edited

Loading

Imberflur commented Oct 26, 2023

jcbhmr commented Apr 20, 2024 •

edited

Loading

Coupling all xtasks into one binary can hurt performance #2

Coupling all xtasks into one binary can hurt performance #2

Comments

epage commented Oct 16, 2019

epage commented Oct 16, 2019

matklad commented Oct 16, 2019

matklad commented Oct 16, 2019

epage commented Oct 16, 2019

matklad commented Oct 16, 2019

kornelski commented Oct 16, 2019

matklad commented Oct 17, 2019

kornelski commented Oct 17, 2019 • edited Loading

matklad commented Oct 17, 2019

kornelski commented Oct 17, 2019 • edited Loading

matklad commented Oct 17, 2019

kornelski commented Oct 17, 2019

kornelski commented Oct 17, 2019 • edited Loading

tv42 commented May 22, 2023 • edited Loading

tommy-gilligan commented Aug 24, 2023 • edited Loading

Imberflur commented Oct 26, 2023

jcbhmr commented Apr 20, 2024 • edited Loading

kornelski commented Oct 17, 2019 •

edited

Loading

kornelski commented Oct 17, 2019 •

edited

Loading

kornelski commented Oct 17, 2019 •

edited

Loading

tv42 commented May 22, 2023 •

edited

Loading

tommy-gilligan commented Aug 24, 2023 •

edited

Loading

jcbhmr commented Apr 20, 2024 •

edited

Loading