Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Profiling #1106

Open
Tracked by #1611
marandaneto opened this issue Nov 5, 2022 · 34 comments
Open
Tracked by #1611

Support Profiling #1106

marandaneto opened this issue Nov 5, 2022 · 34 comments

Comments

@marandaneto
Copy link
Contributor

marandaneto commented Nov 5, 2022

Description

Similar to https://docs.sentry.io/platforms/android/profiling/ but for Dart code

Relates issues on the Dart SDK dart-lang/sdk#3686, dart-lang/sdk#37664, dart-lang/sdk#50055, flutter/flutter#37204

@marandaneto
Copy link
Contributor Author

Right now that would be possible if you init the SDK manually.

Enable Performance and Profiling directly on the Native SDKs, for example, Android.
docs.sentry.io/platforms/android/performance
docs.sentry.io/platforms/android/profiling

The same steps for iOS, would work for Android and iOS native code only, not in the Dart bits nor C/C++ code.

@bruno-garcia
Copy link
Member

Hey @marandaneto , I was talking to @vaind and he might take a look at this to see how hard/what options do we have.

@kahest
Copy link
Member

kahest commented Jul 7, 2023

Hey @bruno-garcia @vaind are there any updates on this?

@vaind
Copy link
Collaborator

vaind commented Jul 7, 2023

@kahest no updates yet, just started looking into this recently

@vaind
Copy link
Collaborator

vaind commented Jul 8, 2023

This article by a Dart SDK developer gives some intro how profiling is implemented in the Dart VM and exposed via DevTools. TLDR:

  • there's native code in the dart VM responsible for sampling
  • this code is exposed via a VM service (websocket)
  • this code is not compiled in a release build (a.k.a "product"), see for example #ifndef PRODUCT in profiler.cc\
  • even if it was available in a release build, the VM service opens a port which would be accessible from any other app on the same device, so it would likely not be feasible to use this in production apps anyway.

Therefore, this looks like a dead end.

I'll update here if I can find an alternative solution, e.g. isolate stacks sampling from dart directly.

@marandaneto
Copy link
Contributor Author

@vaind what if we propose to make this available on release builds under a build opt-in flag, the port is closed in this case.
Could we reuse most of the profiler implementation if we get the buy-in from the Dart team in this case? raising an issue and so on.

@vaind
Copy link
Collaborator

vaind commented Jul 8, 2023

@marandaneto I've considered that too but I'm not sure it's feasible because of the VM service port being exposed to every app on the device. It doesn't really matter whether it's an opt-in at build time, you wouldn't want to distribute such an app, especially on mobile devices.

@marandaneto
Copy link
Contributor Author

@vaind that was my point, can we change this approach about the port? finding another way to consume the service without opening the port, or via dart-lang/sdk#37664

@krystofwoldrich
Copy link
Member

To send profiles, the SDK will need to update the way it sends/enriches events on Android.

The Outbox sender opens the saved envelope and sends envelope items as individual envelopes, but profiles have to be ingested with Tx in one envelope.

@vaind
Copy link
Collaborator

vaind commented Jul 11, 2023

So apparently, native profilers should work with AOT compiled dart. Going to see if I can make it work with our existing native SDK profilers. See this thread on Discord

mraleph
Anything that works for native code will work just the same for Dart, so if you have some sampling profiler for C++ / Objective-C / Swift then you can just use that.
AOT compiled binaries are just normal native binaries (at least on Linux, Android and Mac OS X / iOS - Windows is an exception) which just need some runtime support to run.
our calling conventions are fairly traditional (frames are linked through framepointer) and we generate eh_frame / debug_frame so non-FP based unwinders should also be able to unwind the stack.
This means native tools like perf and Instruments work just fine with AOT compiled Dart code and I also know that https://gperftools.github.io/gperftools/cpuprofile.html (which is a very simple profiler which simply unwinds stack using frame-pointer chaining) works as well.
(There is one minor catch which trips over simpleperf on Android ARM64 - but I don't think it matters much if you just write a manual unwinder which follows FP chain)

@marandaneto
Copy link
Contributor Author

@vaind the Android profiler right now only profiled Java/Kotlin code, No native (C/C++) code, maybe the iOS one works though.

@vaind
Copy link
Collaborator

vaind commented Jul 20, 2023

Update: sentry-cocoa profiler seems to work, somewhat. In a flutter app on macOS, I've started a transaction in swift, than ran a heavy operation in dart and stopped the swift transaction afterwards. The profile is captured and after symbolication, it shows function names, albeit the line numbers are not available consistently... See sample profile.

On the other hand, the CPU profiler is going to show work that is actually being executed, so in case of async-await, it may get more complicated to see what is actually going on. I'll have to devise a better testing app to evaluate that.

@vaind
Copy link
Collaborator

vaind commented Jul 20, 2023

After some testing, It seems like the native profiler could be the way to go, at least for iOS and macOS.
However, I'm having issues with the symbolication - all the flutter symbols (e.g. referencing /private/var/containers/Bundle/Application/2D6824B6-DC93-4F60-AFC2-ADE201585EC6/Runner.app/Frameworks/Flutter.framework/Flutter) in this profile are unresolved (symbol not found). I don't know - was there some custom handling in the symbolicator for flutter-specific symbols? Maybe that doesn't get triggered when the transaction comes from swift... Also I've tested triggering an error in swift and the image seems resolved but the frames say redacted. Do you know what that is about @marandaneto ?

@marandaneto
Copy link
Contributor Author

@vaind not aware of any changes/bugs.
There were changes for Flutter specifically, mostly around source maps IIRC.
I recall that as well https://github.com/getsentry/symbolic/blob/11472bfbb31f2ed76802ff50bfc40a2b0852ee1b/symbolic-debuginfo/src/dwarf.rs#L519-L521 but not sure if there's any impact.
Do the redacted frames are inApp or maybe some system apps/3rd party libs?

@vaind
Copy link
Collaborator

vaind commented Jul 21, 2023

OK, so this would definitely need more attention to get working properly. I'm not sure trying to investigate this deeply makes sense just yet, with other platforms not resolved yet. I'm thinking we should first make sure all other desired platforms can be supported, before the detailing work on iOS. WDYT @marandaneto ?

Also, I understand the goal would be to support all platforms supported by Flutter. However, if we go the route of native profiling, that means the platforms would be evaluated & implemented one by one. Would that be acceptable? If so, what are the priorities for platform support and is there a hard stop if some specific platform cannot be supported?

@marandaneto
Copy link
Contributor Author

marandaneto commented Jul 21, 2023

@vaind makes sense, I'd focus on iOS and Android first, most likely starting from iOS since the iOS profiler should work (as you stated with a few gotchas).
Next is Android although we'd need a different solution, probably something that should be builtin in https://github.com/getsentry/sentry-native?
Maybe @stefanosiano and/or @indragiek can chime in here maybe they know or have investigated C/C++ profilers for Android, instead of the current Java/Kotlin-only approach.

I know this: https://developer.android.com/topic/performance/tracing/custom-events-native

Wondering if the Android native profiler would work for Windows and Linux but that's definitely a stretch since we don't have the sentry-native SDK yet built-in in Sentry Flutter anyway.

@vaind
Copy link
Collaborator

vaind commented Jul 21, 2023

Good, my idea was to verify the feasibility of native profiling on Flutter with the android SDK (as you have mentioned, via sentry-native most likely). PoC would be enough IMO and then we can go on and finish iOS first before fully implementing Android.

@vaind
Copy link
Collaborator

vaind commented Jul 25, 2023

Some notes on Android profiling:

  • simpleperf apparently has issues profiling Dart
  • gperftools seems usable although the build system is a bit outdated and it's not clear whether Android is actually supported
  • pprof-rs also looks like an option but AFAIK the internal testing in the rust SDK has shown inconsistent sampling ratios

@marandaneto
Copy link
Contributor Author

@vaind your best bet to find out which native profilers work well on Android - Native code/NDK (at runtime/low frequency/release mode) will be asking on the Android united slack community, there's a #ndk channel and some Googlers are there, including @DanAlbert which is one of the lead contributors on https://github.com/android/ndk

If we can't use simpleperf directly, they might know some other options.

@marandaneto
Copy link
Contributor Author

marandaneto commented Jul 25, 2023

https://android.googlesource.com/platform/system/extras/+/refs/heads/main/simpleperf/doc/android_application_profiling.md
simpleperf is only profileable in debug and profile mode apparently, there's a work around but apparently still depends on adb.

If you want to profile a release build of an application:
For the release build type, Android studio sets android::debuggable=“false” in AndroidManifest.xml, disables JNI checks and optimizes C/C++ code. However, security restrictions mean that only apps with android::debuggable set to true can be profiled. So simpleperf can only profile a release build under these three circumstances: If you are on a rooted device, you can profile any app.

If you are on Android >= Q, you can add profileableFromShell flag in AndroidManifest.xml, this makes a released app profileable by preinstalled profiling tools. In this case, simpleperf downloaded by adb will invoke simpleperf preinstalled in system image to profile the app.

@vaind did you check the firefox profiler? https://profiler.firefox.com/docs/#/./guide-profiling-android-directly-on-device

Edit: apparently simpleperf as well https://searchfox.org/mozilla-central/source/third_party/libwebrtc/tools_webrtc/android/profiling/perf_setup.sh

@DanAlbert
Copy link

I know nothing about Dart.

@marandaneto
Copy link
Contributor Author

I know nothing about Dart.

Flutter apps written in Dart compiles to Native code so it's not really about Dart profilers but rather Android Profilers that are able to profile Native code and not only Java/Kotlin.

@vaind
Copy link
Collaborator

vaind commented Jul 25, 2023

After some testing, It seems like the native profiler could be the way to go, at least for iOS and macOS. However, I'm having issues with the symbolication - all the flutter symbols (e.g. referencing /private/var/containers/Bundle/Application/2D6824B6-DC93-4F60-AFC2-ADE201585EC6/Runner.app/Frameworks/Flutter.framework/Flutter) in this profile are unresolved (symbol not found). I don't know - was there some custom handling in the symbolicator for flutter-specific symbols? Maybe that doesn't get triggered when the transaction comes from swift... Also I've tested triggering an error in swift and the image seems resolved but the frames say redacted. Do you know what that is about @marandaneto ?

OK so at least in errors, the issue of some stack frames not being symbolicated is due to dSYMs missing for the Flutter.framework (or FlutterMacOS.framework). They're currently not shipped with Flutter at the moment so the dart plugin won't upload them to Sentry and thus they can't be used for symbolication, see flutter/flutter#117404 (comment)

@marandaneto
Copy link
Contributor Author

@vaind we can probably make a flutter symbol server https://docs.sentry.io/platforms/unreal/data-management/debug-files/symbol-servers/
Another
Option is that the dart plugin figure out the correct flutter version/download link and download/upload them.

@vaind
Copy link
Collaborator

vaind commented Jul 26, 2023

and the third one, IMO safer for long term maintenance, would be to update the flutter tool to include the dSYM together with the rest of the build output. The same applies to iOS, macOS and likely Android symbols.

FYI, after downloading the dSYM manually and uploading it to sentry.io as a DIF, the issue stack trace now looks much better:

@marandaneto
Copy link
Contributor Author

@vaind totally agree but the issue is ~2y old already, not sure if this will ever be addressed.
We can be more proactive and find a solution that won't demand too much work.
Symbol servers work with GCP so maybe it's an easy win.

What we can do for now is also amend the docs and let people know that they can do this manually (via sentry-cli), so at least is documented as a limitation of our automatic approach (and linking to the original GH issue).

@marandaneto
Copy link
Contributor Author

I've filed a feature request for a new built-in Flutter symbol server, let's see if this is possible, and is less work/less to maintain than the other options.

@vaind
Copy link
Collaborator

vaind commented Jul 27, 2023

The additional issue with symbolication on iOS I'm having trouble with is:

While symbolication works reasonable well, one function I've devised to actually produce a lot of load is getting multiple frames, with no line numbers present. I just can't seem to figure out what is the issue and why other frames in the stack trace do have the line number, even the caller Dart function which is the second in the stack... Maybe @Swatinem could help out here?

I've uploaded the whole build folder with the debug symbols, the envelope with the captured profile and the profile as downloaded (symbolicated) from Sentry.

@marandaneto
Copy link
Contributor Author

@vaind I will be OOO until the 7th but feel free to ping @Swatinem on Discord
@kahest or @krystofwoldrich can be the bridge as well if needed.

@mraleph
Copy link

mraleph commented Jul 31, 2023

While symbolication works reasonable well, one function I've devised to actually produce a lot of load is getting multiple frames, with no line numbers present.

It might be that the information is simply missing from the DWARF we generate. We emit just enough information to make meaningful stack traces, which means we don't emit any useful DWARF for the places which are not calls. So if you write something like this:

void foo() {
  for (var i = 0; i < N; i++) {
     // Do some math without any calls.
  }
}

Then the best you can be get is that the time is spent in foo function - but you would not be able to tell where exactly in that function the time is spent.

I suggest just looking at raw PCs that profiler has collected and then looking at the corresponding generated machine code & DWARF to see if this is indeed the case.

@vaind
Copy link
Collaborator

vaind commented Aug 8, 2023

While symbolication works reasonable well, one function I've devised to actually produce a lot of load is getting multiple frames, with no line numbers present.

It might be that the information is simply missing from the DWARF we generate. We emit just enough information to make meaningful stack traces, which means we don't emit any useful DWARF for the places which are not calls. So if you write something like this:

void foo() {
  for (var i = 0; i < N; i++) {
     // Do some math without any calls.
  }
}

Then the best you can be get is that the time is spent in foo function - but you would not be able to tell where exactly in that function the time is spent.

I suggest just looking at raw PCs that profiler has collected and then looking at the corresponding generated machine code & DWARF to see if this is indeed the case.

Thanks Slava, I also suspect as much, just wasn't able to confirm with my limited knowledge of DWARF. I was hoping @Swatinem could have a look at some point. It's not a blocker really as it seems to be "just" the leaf function code.

@Swatinem
Copy link
Member

Swatinem commented Aug 9, 2023

I’m stretched really thin these days, please ping me again next week :-)

@vaind
Copy link
Collaborator

vaind commented Sep 12, 2023

I’m stretched really thin these days, please ping me again next week :-)

@Swatinem any chance you could have a look? The latest profile has even less info, probably inlined?

@Swatinem
Copy link
Member

I suggest just looking at raw PCs that profiler has collected and then looking at the corresponding generated machine code & DWARF to see if this is indeed the case.

Was looking at the DWARF, it is indeed reporting line 0, column 0 for the whole block of code that is hit by the profiler.

Looking at the very latest profile / debug file you posted, indeed the DWARF only reports that as a single toplevel function, but it has a ton of line table entries from multiple files as well.

So maybe the DWARF info for inlined functions is not being generated correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: In Progress
Development

No branches or pull requests

8 participants