Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanity check profiler atomics #113448

Closed

Conversation

SeanMollet
Copy link

This fixes #112313

Probably the ideal way to fix it would be to add support for 64 bit Atomic operations to LLVM for every platform. I don't have time to do that and I don't think it's really necessary anyway. Atomics improve accuracy of profiling, but that's not really needed for checking code coverage (which is the primary use of it here). Non-zero values are sufficient to know that something has been called.

@rustbot
Copy link
Collaborator

rustbot commented Jul 7, 2023

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 7, 2023
@rustbot
Copy link
Collaborator

rustbot commented Jul 7, 2023

This PR changes how LLVM is built. Consider updating src/bootstrap/download-ci-llvm-stamp.

@SeanMollet
Copy link
Author

I've built a toolchain with this change for x86_64, aarch64 and mipsel (the one I need that doesn't support 64 bit atomics) and tested outputs of the toolchain. They all work as intended.

@Urgau
Copy link
Member

Urgau commented Jul 7, 2023

Did do try linking the resulting binary against libatomic (which should provide the necessary symbols) ?

If not you can try adding println!("cargo:rustc-link-lib=atomic"); in your build.rs and see if it works.

If it does than it might be better to link against libatomic than to provide a sub-par experience.

@SeanMollet
Copy link
Author

@Urgau I tried it for the sake of completeness. Same error.

My take is: the atomics aren't function calls, they're LLVM intrinsics. They shouldn't ever leave the backend of the compiler, but it assumes that since it doesn't know what that call is, that it should just call that as a function. It puts the call in the output and we get our error.

FYI, sync_fetch_and_add_8 isn't in gnu libatomic.

// So, don't do it on 32 bit platforms
if (TargetTriple.isArch64Bit()){
Options.Atomic = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work on i686-unknown-linux-gnu, at least, so this must be more nuanced than just being 64-bit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. 32bit architectures can’t do them in a single operation because their registers aren’t large enough for the 64 bit value used by the counter.

Some 32 bit (x86 and arm sub targets that support sync) have library implementations for the operation.

But, those library implementations are slow. This means a significant slowdown and change of timing for the program.

So, it’s a trade for either better accuracy in counts at the cost of a slow program with timing that is influenced by the profiler, or a program that runs more like normal and works on all 32 bit platforms, at the cost of potential undercounts in the profiler. It can’t undercount to 0 though.

The primary use here, code coverage measurement, is unaffected by the potential inaccuracy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we introducing UB if we instrument a data race by non-atomic updates?

That's not a rhetorical question - I really don't know. Maybe it's done low enough in the stack that such formal UB doesn't exist, but I think we should be very sure about it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UB = undefined behavior?

The variables being incremented are distinct per function, so the race would be between threads calling the same function. The worst case is that they both read and increment, then the thread changes and they end up overwriting each others increment.

Net result is an undercount by 1 anytime there is a collision.

LLVM’s profiler ran this way exclusively for more a decade. The option for Atomics was added but is still defaulted to off.

We could also check for x86 and arm and turn it on for them. I don’t really have a stake either way.

My personal opinion is that I’d rather trade a little accuracy in counts for more normal performance. Makes analysis of races and other timing critical things more useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have target information about this support at the Rust level, cfg(target_has_atomic="64") and max_atomic_width(), so maybe we can just pass that in as yet-another parameter? Either making that bool InstrumentCoverage a tri-state flag, or adding another bool for whether to use atomics.

(LLVMRustOptimize is getting so many parameters that we might want a new struct...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have target information about this support at the Rust level, cfg(target_has_atomic="64") and max_atomic_width(), so maybe we can just pass that in as yet-another parameter? Either making that bool InstrumentCoverage a tri-state flag, or adding another bool for whether to use atomics.

(LLVMRustOptimize is getting so many parameters that we might want a new struct...)

IMHO, this is absolutely the right answer. I'm not sure that I'm well versed enough in all of this to do that, but I'd be willing to give it a shot if I could impose on somebody for a "block diagram" of where it would need to be wired in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of LLVMRustOptimize here needs to be changed to make the InstrumentCoverage parameter correct to the new type introduced, or to add a new parameter.

extern "C" LLVMRustResult
LLVMRustOptimize(
LLVMModuleRef ModuleRef,
LLVMTargetMachineRef TMRef,
LLVMRustPassBuilderOptLevel OptLevelRust,
LLVMRustOptStage OptStage,
bool NoPrepopulatePasses, bool VerifyIR, bool UseThinLTOBuffers,
bool MergeFunctions, bool UnrollLoops, bool SLPVectorize, bool LoopVectorize,
bool DisableSimplifyLibCalls, bool EmitLifetimeMarkers,
LLVMRustSanitizerOptions *SanitizerOptions,
const char *PGOGenPath, const char *PGOUsePath,
bool InstrumentCoverage, const char *InstrProfileOutput,

Then the binding in cg_llvm to that function needs to be changed to match:

pub fn LLVMRustOptimize<'a>(

And the callsite of LLVMRustOptimize need to be altered appropriately:

let result = llvm::LLVMRustOptimize(

You may have trouble getting the "does our target have atomics?" information at this point, you may need to work back a bit, possibly by making sure the information is accessible via the "god-object" of CodegenContext:

pub(crate) unsafe fn llvm_optimize(
cgcx: &CodegenContext<LlvmCodegenBackend>,

Which is documented here:
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/back/write/struct.CodegenContext.html

And parameterized by this:
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/struct.LlvmCodegenBackend.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@workingjubilee Thanks!

@Zalathar
Copy link
Contributor

Zalathar commented Jul 8, 2023

I would highly recommend squashing all of these changes down into one simple commit, so that the intermediate commits don't end up obscuring the history of unrelated lines in the file.

@SeanMollet
Copy link
Author

@Zalathar Fair. I assumed you guys squashed on PR merge. I assume from your response that you don't, so I've gone ahead and done it.

@Mark-Simulacrum
Copy link
Member

Atomics improve accuracy of profiling, but that's not really needed for checking code coverage (which is the primary use of it here).

Isn't this code path also used for the PGO data collection? Or does that use some other coverage instrumentation? If so, that is a case where having accurate counts may be quite important. I don't think we do PGO on non-64-bit targets today in our CI anywhere, so it's probably not easy to check against our existing benchmarks...

@cuviper
Copy link
Member

cuviper commented Jul 10, 2023

Issue #91092 also cared about accurate counts, which is why we made it atomic in the first place.

@workingjubilee
Copy link
Member

sync_fetch_and_add_8 isn't in gnu libatomic.

I am pretty sure we do not care about libatomic at all.

@SeanMollet
Copy link
Author

sync_fetch_and_add_8 isn't in gnu libatomic.

I am pretty sure we do not care about libatomic at all.

I think you're right. I was responding to the suggestion of linking with it as a resolution.

@workingjubilee
Copy link
Member

workingjubilee commented Jul 11, 2023

Ah, true. I didn't make the connection.

Yes and no. 32bit architectures can’t do them in a single operation because their registers aren’t large enough for the 64 bit value used by the counter.

Registers are a convenient illusion provided by the CPU, and CMPXCHG8B dates back to the Pentium, but I mean obviously if we could tell LLVM to use AtomicUsize if supported we would, I guess?

@cuviper
Copy link
Member

cuviper commented Jul 21, 2023

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 21, 2023
@Dylan-DPC
Copy link
Member

@SeanMollet any updates?

@SeanMollet
Copy link
Author

If you saw my last comment before I deleted it, you'll see how out of touch I am right now: buried trying to get the product out the door. We have a reasonable proposed work-around solution that I'd forgotten about.

My work-around works well enough to solve my immediate problem. I'll come back to this after shipping and implement the more thorough work-around.

@JohnCSimon
Copy link
Member

@SeanMollet

Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Nov 12, 2023
@JohnCSimon JohnCSimon closed this Nov 12, 2023
@Dylan-DPC Dylan-DPC removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
9 participants