Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise rust-based formulas by setting target-cpu #15530

Closed
1 task done
Tokarak opened this issue Jun 6, 2023 · 14 comments · Fixed by #15544
Closed
1 task done

Optimise rust-based formulas by setting target-cpu #15530

Tokarak opened this issue Jun 6, 2023 · 14 comments · Fixed by #15544
Assignees
Labels
features New features outdated PR was locked due to age

Comments

@Tokarak
Copy link
Contributor

Tokarak commented Jun 6, 2023

Verification

Provide a detailed description of the proposed feature

Rust has optional support for building with certain cpu features enabled. For example, arm-based macs, I would imagine, use only apple-m1 or apple-m2 cpus. rustc --print cfg | grep target_feature | wc -l gives me 28 features, while the generic target only has neon.

Specific features can also be specified — this can be useful if it's feasible to take the common features of a set of cpus.

If a better union set of cpu features can be found than generic, that can be an improvement to the ecosystem.

What is the motivation for the feature?

This can increase peformance (sometimes marginally, other times significantly) for binaries built in Rust.

How will the feature be relevant to at least 90% of Homebrew users?

Mac users on official hardware should use only a small set of cpus. This means there is a high potential for improvement.

I'm worried about alienating the small number of hackingtosh and opencore-patcher users. Perphaps if the OS requires specific features, then those can be used. Otherwise, if a Hackingtosh can be built on any hardware, then it would seem that the only way to avoid alienation is to stick to the generic targets, or provide multiple builds (though I'm not looking to inflate the CI minutes).

This is particularly true for silicon based macs. AFAIK, there are no arm-based hackingtoshes, and there are only two, very similar cpu types. This could mean that it's completely safe to add these features cargo_args on silicon builds and distributions.

What alternatives to the feature have been considered?

  • Builds for each official cpu target, with generic target as backup
  • Lowest-common-denominator builds for official hardware. Generic backup may or may not be needed.
  • Add this only where it is convenient (eg for arm-based targets)
  • Add this to local build only (very achievable, but building wastes more power than unspecialised binaries ever will — alone, this is almost redundant)
  • Don't fix what ain't broken
@Tokarak Tokarak added the features New features label Jun 6, 2023
@carlocab
Copy link
Member

carlocab commented Jun 6, 2023

For C/C++-based formulae, we set -march to the oldest supported architecture for a given OS version on Intel macOS. Currently that's nehalem, which is pretty old, but still an improvement over what you get when you omit -march (penryn). We don't do this on M1 macOS since Clang will default to targeting apple-m1 without any -march/-mcpu flags.

It seems reasonable to do something similar for Rust. How do you set the desired target-cpu with rustc?

@Bo98
Copy link
Member

Bo98 commented Jun 6, 2023

We don't do this on M1 macOS since Clang will default to targeting apple-m1 without any -march/-mcpu flags.

We will probably add flags at some point for the benefit of GCC. The flags are however different on GCC and Clang so it's trickier than Intel.

But yeah for Rust this seems like a good idea.

@alex
Copy link
Contributor

alex commented Jun 6, 2023

FWIW starting with Rust 1.71 (coming out in 5.5 weeks) rust will default to targeting an M1 in macOS arm64 targets: rust-lang/rust#109899

@Bo98
Copy link
Member

Bo98 commented Jun 6, 2023

Ah excellent - we probably don't need to do anything here for arm64 then.

Not sure if there's anything worth doing for x86_64 or not.

@Tokarak
Copy link
Contributor Author

Tokarak commented Jun 7, 2023

Currently that's nehalem

This simplifies deciding which cpu to target.

FWIW starting with Rust 1.71 (coming out in 5.5 weeks) rust will default to targeting an M1 in macOS arm64 targets: rust-lang/rust#109899

That's nice to know, thanks.

nehalem is in the list of cpus supported by LLVM; I checked. The following flag needs to be added to cargo (x64 builds only):
-Ctarget-cpu=nehalem
Equivalently (but redundantly since Rust 1.71) for aarch64:
-Ctarget-cpu=apple-m1
To optimise local builds, -Ctarget-cpu=native can be passed. I don't know if homebrew differentiates between local and distribution builds. Also, for native AVX2 might need to be disabled by default; I saw many issues online saying it could make the binary slower.

If the flag can be conditionally added to std_cargo_args, that would mean no refactoring of formulas. I don't know much detail about the homebrew codebase, so maybe there is another way.

@carlocab
Copy link
Member

carlocab commented Jun 7, 2023

I don't think passing a -C flag to cargo is correct. From cargo-install(1):

       -C PATH
           Changes the current working directory before executing any specified operations. This affects things like where cargo looks
           by default for the project manifest (Cargo.toml), as well as the directories searched for discovering .cargo/config.toml,
           for example.

Perhaps you need for this to be set in RUSTFLAGS?

@Tokarak
Copy link
Contributor Author

Tokarak commented Jun 7, 2023

Yes, you are right. -C passes a codegen flag to rustc.

For cargo, this option can be set in rustflags.

@MikeMcQuaid
Copy link
Member

I don't know if homebrew differentiates between local and distribution builds.

We used to but intentionally don't any more to reduce the difference (and we don't support local builds anyway).

As a result,

-Ctarget-cpu=nehalem

sounds perfect.

@Tokarak
Copy link
Contributor Author

Tokarak commented Jun 7, 2023

RUSTFLAGS='-C target-cpu=nehalem' cargo build ...

@Tokarak
Copy link
Contributor Author

Tokarak commented Jun 12, 2023

I think I can self-assign this.

I could add a config flag to std_cargo_args based on the target triple (thus sorting out conditionals on cpu arch). A cargo config file will be more ergonomic at this point, but that can be left for a future PR.

If anyone has general advice or warnings of any pitfalls, please let me know.

@carlocab
Copy link
Member

Off the top of my head, one important thing to note is that we'll want to use the same value returned from here instead of actually hard-coding nehalem or apple-m1:

# @private
sig { returns(Symbol) }
def effective_arch
if @build_bottle && @bottle_arch
@bottle_arch.to_sym
else
Hardware.oldest_cpu
end
end

@MikeMcQuaid
Copy link
Member

Sounds good, looking forward to the PR @Tokarak!

@Tokarak
Copy link
Contributor Author

Tokarak commented Jun 13, 2023

I looked at oldest_cpus, and it will be a nightmare to compatibilise with rust.
Current rust nightly behaviour defaults to oldest supported cpus

  • Defaults to M1 on aarch64-apple-darwin
  • Core2 on x86_64-apple-darwin
    • This is the current homebrew behaviour for Version < 10.14; it's easier not to change this in that case.

Given that Rust already minimaxs the cpu without version context, it may be best to create a function which returns rust flags with version context, similar to

# @private
sig { returns(Symbol) }
def effective_arch
if @build_bottle && @bottle_arch
@bottle_arch.to_sym
else
Hardware.oldest_cpu
end
end

but without oldest_cpu which has a lot of complex logic which is incompatible with rustflags and is already handled by cargo. This allows to use useful default rust cpus (just return an empty string to not change rust flags!) Maintaining 5 lines of code shouldn't be an issue either, even though oldest_cpu and this function will have to be updated in parallel.

@MikeMcQuaid
Copy link
Member

Closing to keep discussion in #15544.

Given that Rust already minimaxs the cpu without version context, it may be best to create a function which returns rust flags with version context, similar to

Seems reasonable 👍🏻

@github-actions github-actions bot added the outdated PR was locked due to age label Jul 15, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
features New features outdated PR was locked due to age
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants