Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coupling all xtasks into one binary can hurt performance #2

Open
epage opened this issue Oct 16, 2019 · 17 comments
Open

Coupling all xtasks into one binary can hurt performance #2

epage opened this issue Oct 16, 2019 · 17 comments

Comments

@epage
Copy link
Collaborator

epage commented Oct 16, 2019

Some xtasks a developer will want to run regularly (like bundling up artifacts for wasm or embedded as part of their debug cycle). Some developers might rarely run, like some code-gen.

Similar for a CI. Multiple independent jobs will need to build the xtasks. Most jobs will just need the ones related to CI. Only one job will need to validate code-gen. Only a couple of jobs, conditioned on version tags, will need to call xtask dist.

By putting all of these into a single binary, we're losing out on performance by requiring it to build dependencies or run build scripts for xtasks that won't be run

@epage
Copy link
Collaborator Author

epage commented Oct 16, 2019

To be clear, I understand this is a stable approach to getting tasks and might eventually get baked-in in a way that allows these to be separate packages, improving build times. I just wanted to ensure the downside under the current system was noted and that we should look for how to address it with native tasks.

@matklad
Copy link
Owner

matklad commented Oct 16, 2019

Note that splitting it over several binaries/packages can hurt performance as well, as rustc would need to do more linking (each binary will get it's own copy of std).

I wouldn't worry to much about performance, as tasks are compiled once, and should be cacheable pretty robustly. In parcticular, for CI i'd just add ./xtask/target to .cache and be done with it.

@matklad
Copy link
Owner

matklad commented Oct 16, 2019

I've added some general advice about build times: 15ac41b

Given that xtasks shouldn't be rebuild that often, I am inclined to close this issue!

@matklad matklad closed this as completed Oct 16, 2019
@epage
Copy link
Collaborator Author

epage commented Oct 16, 2019

I wouldn't worry to much about performance, as tasks are compiled once, and should be cacheable pretty robustly. In parcticular, for CI i'd just add ./xtask/target to .cache and be done with it.

I disagree for CIs. I feel like caching within CIs is still very limited until we have worked out the ways to clean up a workspace's stale files. For AzDO, caching is still in preview and they discourage it for production builds. I'll leave to you the decision on whether to re-open.

Less important is the impact on me as a user. I find that from time to time I do clean builds. Sometimes its to recover space from stale files but usually its for various scenarios I'm testing out. A machine-wide cache of non-workspace dependencies would be a big help for reducing the impact of clean builds.

@matklad matklad reopened this Oct 16, 2019
@matklad
Copy link
Owner

matklad commented Oct 16, 2019

Yeah, let's reopen! I can't come up with nice strategy here myself (spliting xtasks over several packages seems like a pain), but perhaps somebody else will!

@kornelski
Copy link
Collaborator

I would prefer tasks to be separate binaries, as I don't want to parse arguments. Dumping my spaghetti code in main is easier :)

@matklad
Copy link
Owner

matklad commented Oct 17, 2019

@kornelski you'd need to define a separate alias for each xtask then, which is messy. And something like xtask --help would not work

@kornelski
Copy link
Collaborator

kornelski commented Oct 17, 2019

Yeah, I would define separate alias for each task. I expected generation of aliases to be automatic.

Supporting --help seems strange and unnecessary to me. Tasks I use don't take arguments, so why would they have help? make foo doesn't support help.

To me ideally the workflow would be something like:

  1. Make tasks/say_moo.rs with fn main() { println!("moo!") }
  2. Run cargo xtasks --update or such
  3. Expect say_moo to be added to [[bin]] in Cargo.toml (or moved to src/bin/say_moo.rs) and say_moo alias added

Where steps 2 and 3 would not exist if it becomes Cargo built-in solution. This maps my solution with shell files, where I have many shell files, not one with case statement.

@matklad
Copy link
Owner

matklad commented Oct 17, 2019

Supporting --help seems strange and unnecessary to me. Tasks I use don't take arguments, so why would they have help? make foo doesn't support help.

I think there should be a way for the new contributor to the project to figure out the list of available tasks. cargo xtask, cargo xtask --help (or just about any other invocation that doesn't mention existing task) seem like a good UI for showing the list of tasks.

I'll add "multiple binaries" as an alternative, but I personally would still prefer a single binary:

  • less linking for Cargo
  • common boilerplate like error handling and argument parsing needs to be done only once
  • adding a task should update only one file

@kornelski
Copy link
Collaborator

kornelski commented Oct 17, 2019

But cargo task server-start may just run Command for one or two executables, no linking needed. OTOH cargo task make-emoji-pack may require serde, network libraries, unicode libraries, font rendering, image processing libraries, compression libraries, and take minutes to build.

To me separate tasks mean each task starts as fast as it can. Merged tasks mean every task is at least as slow to start as the slowest task.

@matklad
Copy link
Owner

matklad commented Oct 17, 2019

If binaries are part of the same Cargo package, you'd still need to build all deps for any binary unfortunately :-(

@kornelski
Copy link
Collaborator

Oh, it does indeed. That's a bummer :(

@kornelski
Copy link
Collaborator

kornelski commented Oct 17, 2019

However, still I wouldn't want to be in business of having multi-purpose bin. The tasks I have are sometimes wildly different.

For example for lib.rs I have following binaries:

  • reindexing all crates
  • reindexing user data. Even though that's still reindexing, it could reuse maybe 10% of the previous script, because it uses a different database and does parallelism differently.
  • interactive tool for inspecting github repos.
  • building crates in docker and collecting results in a database
  • exporting collected build results into a compressed summary
  • building a filtered crates.io index git from a compressed summary
  • building of the whole website as static pages
  • building of individual page. Takes 1 arg. No help. I did copy'n'paste of the previous script and deleted irrelevant code, instead of making it configurable.
  • export of rss feed and sitemap
  • conversion of YAML syntaxes to a compressed bundle for syntect lib
  • conversion of JSON emoji definitions into a perfect hashmap

I would not want to have all of them in one file with a giant match statement. I'd split them off to separate modules, but managing that doesn't seem better than wrapping each in its own fn main().

I shall also add that I treat them as internal tools (things that would be in bash if bash was faster and could call Rust code better ;) so their code is not very pretty or DRY, since they're either write-once tools, or just a small glue of library functions that have better code and tests elsewhere. They generally don't take args, or take just one.

@tv42
Copy link

tv42 commented May 22, 2023

You could avoid per-task entries in .cargo/config by having one binary dispatch to the others, based on the first argument

cargo xtask foo --flags-for-foo
-> cargo run --package xtask -- foo --flags-for-foo
-> xtask foo --flags-for-foo
-> cargo run --package xtask --bin foo -- --flags-for-foo
-> foo --flags-for-foo

@tommy-gilligan
Copy link

tommy-gilligan commented Aug 24, 2023

I ended up bundling all my tasks into a single binary and found this to be kind of cumbersome for my project. I'm going to try using the xtask binary as merely dispatch for my project and see how I go with that but I'm also wondering if a consensus here has evolved much?

@Imberflur
Copy link

I saw another approach here of putting a less used task with more dependencies behind a feature and then having the xtask rebuild itself with the required features if necessary: https://github.com/gfx-rs/wgpu/blob/55b73a9fb8421135a55f5f5f44a7deca505715cb/xtask/src/main.rs#L18

Might be useful for certain cases.

@jcbhmr
Copy link

jcbhmr commented Apr 20, 2024

I've used the rebuild-with-feature-flags strategy myself. 👍 It works particularly well when you really want to keep everything in a single file such as when using the new cargo script feature.

example of cargo script with self-rerun feature flag
#!/usr/bin/env -S cargo +nightly -Zscript
```cargo
[features]
generate = ["quick-xml", "wasmtime"]
[dependencies]
quick-xml = { version = "0.3.1", optional = true }
wasmtime = { version = "19.0.1", optional = true }
```

#[cfg(feature = "generate")]
fn generate() -> Result<(), Box<dyn std::error::Error>> {
    use quick_xml::*;
    use wasmtime::*;
    // Do something with quick_xml and wasmtime...
    Ok(())
}

fn build_release() -> Result<(), Box<dyn std::error::Error>> {
    // Do the quick & easy build copy stuff.
    Ok(())
}

fn test_e2e() -> Result<(), Box<dyn std::error::Error>> {
    // Another easy one that requires only quick deps.
    Ok(())
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    match std::env::args().nth(1).ok_or("no task")?.as_str() {
        "generate" => {
            #[cfg(feature = "generate")]
            return generate();
            #[cfg(not(feature = "generate"))]
            return std::process::Command::new("cargo")
              .args(["+nightly", "-Zscript", "run", "--manifest-path", file!()])
              .args(["--features", "generate", "--", "generate"])
              .status()?
              .success()
              .then_some(())
              .ok_or("cmd failed".into());
        },
        "build-release" => build_release(),
        "test-e2e" => test_e2e(),
        _ => Err("no such task".into()),
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants