-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LTTNG-UST 2.13 changes soname breaking loading of libcoreclrtraceptprovider #57784
Comments
Tagging subscribers to this area: @tommcdon Issue DetailsThere's an ABI breaking change between LTTNG-ust 2.12 and 2.13, so they've reved their soname from @brianrob @dotnet/dotnet-diag
|
Note, same (shim) tricks are used for libnuma.so, which is closer to where coreclrtraceptprovider lives. 🙂 |
Thanks. The shim approach is tempting. We sort of codegen the trace points, so this one will be interesting if we decide to support both. |
After some digging, it turned out
Here is the expansion of macros for our generated # on Ubuntu x64
$ ./build.sh clr
$ clang-9 -E \
artifacts/obj/coreclr/Linux.x64.Debug/pal/src/eventprovider/lttngprovider/lttng/tpdotnetruntimemonoprofiler.h \
-Isrc/coreclr/pal/inc/rt -Isrc/coreclr/pal/inc > dump the dump looks like this: http://sprunge.us/J5rsv3 (and file tpdotnetruntimemonoprofiler.h: http://sprunge.us/LPmIuJ). Search for I will defer to @janvorli as to the best approach for lttngshim. One option is that we install both versions on the official build machine, and compile both v0 and v1 implementations in our code with our pseudo init; which will probe for .so.1, and if not found, .so.0, then select the implementation based on the result. This approach will prevent us from stamping out symbols of |
@am11 thank you for the analysis! I'll take a look into the possibilities of shimming lttng and consider your suggestion. |
Thanks. I actually found about dlopen/dlsym pattern in their code when I commented out these lines (to see what breaks)
just to find out it had no effect on the build result (still succeeds) other than we (unnecessarily) link to liblttng-ust{-tracepoint}.so.0 . 🙂 All public APIs in their header are already shim-like macro (without multi-version support):
|
It seems that we will need to create and distribute two flavors of the libcoreclrtraceptprovider.so, one compiled with one version of lttng and the other with the other one. Then we would try to load one and if we fail, try to load the other. It seems that's essentially what @am11 has suggested, right @am11? |
Yes, and that also seems to be what needs to happen from my understanding. |
@janvorli, yes, that's what I was thinking. It will be cleaner, as it will save us from using macro post-expansion calls. The quickest way to get there is to duplicate python code (as macro names are changed in 2.13). Part II - python free implementationHowever, in a separate project, we can replace the python scripts with lttngshim which will dynamically define those macros (at build time) and we can expand it without needing to build code. It is more work as python scripts read the XML (ETL manifest) file and generates multiple header and code files. If we think replacing python is a good idea, then I was imagining for:
|
One thing to add here: Using |
I understand the rationale, and using
runtime/src/libraries/Common/src/Interop/Unix/System.Native/Interop.UnixFileSystemTypes.cs Line 15 in 462ab6e
|
Agree that we have precedent for duplicate lists that have to be kept in sync, I'd just rather not add another one. :) |
Yes, but eventing is one of those things that often doesn't get updated much, but it gets updated by more than their known maintainers. Maintaining it outside the manifest is brittle. That's the main reason the python scripts have changed. It could easily be a little C# app that we stick in clr.prereqs and reads the manifest; we could stub them for bring up or generate them in a separate machine if needed if all we are trying to do is get rid of the Python dependency. There's prior art for this with the DAC table generation. |
Looking at git history of ClrEtwAll.man, it doesn't seem that way.
Why would the C# app better than a simple msbuild XmlPeek like Line 49 in b201a16
|
Both C# app and msbuild would be problematic for bring-up of new platforms where we need to be able to build runtime native parts without pre-existing .NET runtime. So it seems it would be preferable to use a shell script instead if we wanted to drop python. |
@janvorli for bring up, do you consider LTTNG crucial? Eventing in general is important for breadcrumbs, but I don't see why on bring up the source can't be checked in for the time being or stubbed out until it's deemed usable. We didn't want bash - we'd need bash + cmd. That's why it's python right now. I don't see all that much issue in python at the moment, but I know @am11 and @jkotas have a different opinion around having it as a dependency. |
I was thinking about a header containing a list of macros with event definitions (similar to That header file containing definitions could be generated/update with msbuild task when msbuild is available. On new platforms, where msbuild is not available and we pass Regarding the shell script option, it would require an additional dependency on xml parsing tool which I was avoiding. |
No, however we would need to enable building runtime without eventing. However, it would be nicer to be able to build all native parts even on a new platform. In the past, we've done bringup using an existing .NET SDK as a basis where we've replaced all native binaries be ones built on the new platform. So having all of them makes things complete. Also, I am not sure how would the source build that is expected to be used by distros handle this case. |
This looks similar to what I have suggested in #39052 (comment)
I believe that source build assumes pre-built SDK. It does not bootstrap from nothing. |
Ah, I've forgotten about this comment, that sounds like the best solution to take care of my concern. |
FWIW, Fedora 36 has picked up this version of lttng-ust and I am seeing crashes (in crossgen2!) there while trying to build .NET 6. Here's the backtrace from lldb:
Is there a fix for this planned? |
Current workaround is to disable event trace by deleting the following block before build: runtime/src/coreclr/clrfeatures.cmake Lines 5 to 7 in 9d4ce40
Supporting both ABI versions of liblttng-ust in parallel would require more work than a few liner fix. |
Disabling feature like this: diff --git a/src/coreclr/clrfeatures.cmake b/src/coreclr/clrfeatures.cmake
index f82ff1aa4e7..dfab148dbc3 100644
--- a/src/coreclr/clrfeatures.cmake
+++ b/src/coreclr/clrfeatures.cmake
@@ -3,7 +3,7 @@ if(CLR_CMAKE_TARGET_TIZEN_LINUX)
endif()
if(NOT DEFINED FEATURE_EVENT_TRACE)
- set(FEATURE_EVENT_TRACE 1)
+ set(FEATURE_EVENT_TRACE 0)
endif(NOT DEFINED FEATURE_EVENT_TRACE)
if(NOT DEFINED FEATURE_PERFTRACING AND FEATURE_EVENT_TRACE) Or even diff --git a/src/coreclr/clrfeatures.cmake b/src/coreclr/clrfeatures.cmake
index f82ff1aa4e7..54a61c808e6 100644
--- a/src/coreclr/clrfeatures.cmake
+++ b/src/coreclr/clrfeatures.cmake
@@ -3,7 +3,6 @@ if(CLR_CMAKE_TARGET_TIZEN_LINUX)
endif()
if(NOT DEFINED FEATURE_EVENT_TRACE)
- set(FEATURE_EVENT_TRACE 1)
endif(NOT DEFINED FEATURE_EVENT_TRACE)
if(NOT DEFINED FEATURE_PERFTRACING AND FEATURE_EVENT_TRACE) Doesn't seem to work too well out of the box. A
And many similar errors. |
I'm reproducing this issue.
I think the build works fine while using the prebuilts that link against In the stacktrace we see |
I can reproduce the issue by invoking corerun on a hello world console app. The crash happens at this line: https://github.com/lttng/lttng-ust/blob/4c155a06d838e1ab5d385abd1d73ae56e71b7d5e/src/lib/lttng-ust/lttng-probes.c#L153. The field is null.
|
These fields get initialized dynamically.
It seems they have not been initialized (yet):
|
binary built against |
Yes, but it doesn't get used. Because runtime/src/coreclr/pal/src/misc/tracepointprovider.cpp Lines 120 to 122 in b015912
|
Yes, in addition to updating PAL (adding version probing at run-time), I think we would also need two versions of lttng is a bit tricky as we use their macro based APIs and their headers expand those macros to dlopen (with hardcoded SO version) calls at build time, so we would need both set of headers on the build machine. That's why I think best way is to have those headers in our source tree under |
This issue is about adding support for both versions. |
Reason for adding this support is because .NET apps started to crash on systems with newer lttng library. |
That is about .NET crashing when built against
From what I see, .NET built against
|
@tmds, point is
this should fix the crash you are seeing. Besides, the crash you are seeing is same as #60407 which was closed as a duplicate. |
Source build has patching mechanism in place, you can disable lttng feature at build time as discussed above. |
See: dotnet/runtime#57784 Signed-off-by: Alex J Lennon <[email protected]>
Is it still not resolved after 3 years? |
There's an ABI breaking change between LTTNG-ust 2.12 and 2.13, so they've reved their soname from
liblttng-ust.so.0
toliblttng-ust.so.1
. We link against the former. Given some distros will start supporting newer versions, customers will likely end up upgrading packages and realize they can't loadlibcoreclrtraceptprovider.so
. We should investigate avenues to support multiple versions as needed using compilation tricks like the ones used in System.Security for openssl. Other possibility is to only support 2.13 and that brings us come code sanity advantages - this would only happen in .NET 7+ realistically.@brianrob @dotnet/dotnet-diag
The text was updated successfully, but these errors were encountered: