-
-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIX support #593
AIX support #593
Conversation
So far, I'm guessing:
|
Any ideas how this could be tested? There is a very good chance support for this is going to routinely regress unless we find a way to test this in a somewhat automated way. |
I can contact the people I know at IBM for what CI resources are available. Worst case, I can contact some people I know with spare capacity to run a VM for CI. |
So far, much better, though my big question is why I'm getting the messages about invalid transport from the One issue is in the dladdr equivalent, I realize now I have to heap-alloc both strings which is Not Great™. Of course, the symbolizer actually shields the rest of the client from dl* implementation details, so maybe these could become owned strings, or
|
Oops, setting the default transport used for tests in CMakeLists actually makes them go!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice so far!
A few notes:
- Please use bounds-checked versions of
snprintf
, etc. - Please use our signal-safe
sentry_malloc
where needed. - I was under the impression that
alloca
should be avoided in general. Also you are only calling that with constants, so there is no real difference to just using achar buf[N]
.
Regarding support, I will go even further than @mitsuhiko and say that we won’t actively support this, even if we had access to CI.
(Comparing this to mingw, which we also don’t actively support, but which has higher customer demand, and is also easier to get working in CI, and reproduced locally)
Having said that, I’m happy to just skip any tests that can’t be made to work with reasonable effort.
Regarding the lack of debug-ids: I’m wondering how it will be possible to symbolicate crashes at all without them server-side. So it seems that client-side symbolicaton is the only useful thing we can do?
In that case, you might want to turn on client-side symbolication by default, as is done on Android:
sentry-native/src/sentry_options.c
Lines 40 to 45 in 90966cc
opts->symbolize_stacktraces = | |
#ifdef SENTRY_PLATFORM_ANDROID | |
true; | |
#else | |
false; | |
#endif |
@@ -1,6 +1,12 @@ | |||
#include "sentry_boot.h" | |||
|
|||
#if defined(SENTRY_PLATFORM_DARWIN) || defined(__GLIBC__) | |||
// XXX: Make into a CMake check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO its fine to have this here.
*/ | ||
if (member_part[0] == '\0') { | ||
/* Not an archive, just copy the file name. */ | ||
sprintf(libname, "%s", file_part); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do a snprintf
here and below just in case.
|
||
sentry_value_append(g_modules, module); | ||
|
||
cur = (struct ld_info*)((char*)cur + cur->ldinfo_next); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if I understand correctly, this is a relative offset from the current element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's basically all the structures from the loader packed and arranged into a linked list.
static void | ||
load_modules(void) | ||
{ | ||
char *buf = (char*)alloca(10000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use a normal stack allocation (like libname
below) instead of using alloca?
Also, if I understand the code correctly, loadquery
writes a linked list into the provided buffer. What happens if it is too small? Can we ever get into the situation where ldinfo_next
points out of bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should return ENOMEM if it's too small per https://www.ibm.com/docs/en/aix/7.2?topic=l-loadquery-subroutine
@@ -5,6 +5,191 @@ | |||
#include <dlfcn.h> | |||
#include <string.h> | |||
|
|||
/* XXX: Break into a separate file */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be nice, yes ;-)
static int | ||
dladdr(void* s, Dl_info* i) { | ||
/* | ||
* Use stack instead of heap because malloc may be messed up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this comment, is it still safe to malloc internally as you do?
In general, I think we can be a lot smarter about this, since sentry__symbolize
is based on callbacks (and its own struct), you control the lifetime of whatever string data is used there. Which means instead of intentionally leaking, you can clean up after yourself just fine.
Also note that we have our own safe sentry_malloc
wrapper which is signal safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing it via malloc was the only method I had. I did take a look at the symbolizer, but I thought the lifetimes were more complicated than that. Knowing I can at least.
The bit about malloc being messed up was where the code I wrote was being used before. I'll try to clean up the memory stuff here.
Most of the integration tests are failing because of:
You can also just skip the unit tests asserting debug-ids since you simply don’t have those. That should leave the |
Yes, agreed. My suspicion is it's subtle issues related to locking (i.e. the fact that fd for flock must be
I do have an ersatz debug ID generated from the timestamp on the header, but I doubt this is reliable or quite what we want.
I did have a workaround for |
Hmmmmm, the failed to open file messages about the envelope seem to be ENOENT if I add a bit about errno to the failure message. I wonder why 🤔 |
After placing some hokey printf, I think I know why: The run directory doesn't seem to exist:
My hack so far: diff --git a/src/path/sentry_path_unix.c b/src/path/sentry_path_unix.c
index 7ffa980..ac75449 100644
--- a/src/path/sentry_path_unix.c
+++ b/src/path/sentry_path_unix.c
@@ -331,7 +331,8 @@ sentry__path_create_dir_all(const sentry_path_t *path)
#define _TRY_MAKE_DIR \
do { \
int mrv = mkdir(p, 0700); \
- if (mrv != 0 && errno != EEXIST) { \
+ if (mrv != 0 && errno != EEXIST && errno != EINVAL) { \
+fprintf(stderr, " * errno is %d", errno);\
rv = 1; \
goto done; \
} \
@@ -475,10 +476,25 @@ write_buffer_with_flags(
{
int fd = open(
path->path, flags, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH);
+fprintf(stderr, " **** Attempting write_buffer_with_flags: %s with flags %x\n", path->path, flags);
if (fd < 0) {
- SENTRY_TRACEF("failed to open file \"%s\" for writing", path->path);
+ SENTRY_TRACEF("failed to open file \"%s\" for writing (errno %d, flags %x)", path->path, errno, flags);
+// test other paths
+char *_x = strdup(path->path);
+char *_y = _x;
+while ((_y = strrchr(_x, '/')) != NULL && strlen(_x)) {
+ *_y = '\0';
+ struct stat _s;
+ errno = 0;
+ memset(&_s, 0, sizeof(_s));
+ int _ret = stat(_x, &_s);
+ fprintf(stderr, " ** %s: ret %d errno %d // mode %o\n", _x, _ret, errno, _s.st_mode);
+}
+free(_x);
+// end
return 1;
}
+fprintf(stderr, " **** Done write_buffer_with_flags: fd %d\n", fd);
size_t remaining = write_loop(fd, buf, buf_len);
|
I'm getting more and more confused with this fragment (the other failed tests are similar) - mkdir succeeds, the directory has been made, but it somehow ceases to exist (ENOENT is 2) when called later? I'm guessing something could be calling |
Ugh! This sucks! I think something is removing the run path when it shouldn't be, i.e.
With a printf put on path_remove. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are you suggesting that the call comes from somewhere inside sentry?
src/sentry_options.c
Outdated
@@ -38,7 +38,8 @@ sentry_options_new(void) | |||
opts->auto_session_tracking = true; | |||
opts->system_crash_reporter_enabled = false; | |||
opts->symbolize_stacktraces = | |||
#ifdef SENTRY_PLATFORM_ANDROID | |||
// these platforms don't have debug IDs and we'll need to symbolize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment is not quite correct. Android has debug IDs. We symbolize on-device because we most likely don’t have access to the object files from the server.
Oh, I think maybe I know what is wrong. I think this check here fails: sentry-native/src/sentry_database.c Lines 166 to 171 in 90966cc
The comment is a bit misleading. File locking is supposed to also lock things in the current process, and prevent us from cleaning up that directory. Maybe we can skip the current run directory/lock directly, instead of trying to lock it and skipping then. |
I don't think that's the spot, from placing a printf there. Time for even more printf... |
I actually think it's this per a printf: https://github.com/getsentry/sentry-native/blob/master/src/sentry_database.c#L222
|
Well yes, that what I meant ;-) Maybe I didn’t phrase that correctly. My understanding was that it does not skip cleaning/removing the directory based on the lockfile check (the code I linked to above). |
I was writing a comment then I realized we were agreeing on the same thing instead of disagreeing 🙃. That does indeed do it:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Please port the suggested alloca
and snprintf
changes from symbolizer to modulefinder.
And run the C code through a make format
.
Otherwise the CI fails look like some python dependency issue, not sure what is going on there.
src/sentry_options.c
Outdated
// AIX doesn't have reliable debug IDs and needs symbols. | ||
// Android, the object files are unlikely on the Sentry server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// AIX doesn't have reliable debug IDs and needs symbols. | |
// Android, the object files are unlikely on the Sentry server. | |
// AIX doesn't have reliable debug IDs for server-side symbolication, | |
// and the diversity of Android makes it infeasible to have access to debug files. |
The CI issues were caused by version updates in the CI runners and will be fixed if you rebase/merge master. |
libutil is not the BSD one, but a compat library provided by IBM in their Yum repo. XXX: On AIX, technically flock is provided by libbsd (which is NOT the fd.o libbsd), but it also adds additional functions that will radically alter some libc functionality.
XXX: PASE *does*, but the functionality is not exposed to AIX. To do so, we must acquire the ILE pthread_setname_np and convert our AIX pthread_t to an ILE one somehow (or call ILE pthread_self and have the same limitation as macOS). (IBM i also allows naming mutexes, locks, etc., but other platforms don't have this.)
This is ugly and possibly a performance issue, but I don't see a better way other than rolling our own timegm...
Uses the loader subsystem, which is slightly awkward. While we could use procfs, this is unavailable under PASE and thus we'll have to rely on something else instead. This uses code I originally wrote for Mono, adapted for the purpose. Mono is under the same license, so this is compatible legally and I also wrote it.
Slightly hackish prepended to symbolizer. Caveat empire: As with the previous commit, based on MIT-licensed Mono code I wrote.
Three possibilities - argv[0], may be clobbered or have diff cwd - sysv procfs, not available under PASE (can't test) - procinfo, will always get us a valid name with no path Unfortunately, going with the lesser evil.
This is annoying because I'm not sure if the heap will work when we need to call these, and they leak because the API assumes the strings are somewhere in loader-owned memory. Possible strategies include string ownership or making the char ptrs in frame into char[n] fields.
Come up with another obviously bogus address instead at the other end of memory.
Debug binary identification is sus on AIX, like on Android.
It's ugly but less ugly. Still heap allocates though. Perhaps we could just shove the strings as fields of Dl_info on AIX instead, so it doens't need to be freed? The lifetimes would be the same either way...
For some reason, the lock check doesn't trigger on AIX. Better safe than sorry.
AIX doesn't return this, but PASE can on the mode part. Oddly, it's documented to only do this if non-permission bits are set, but we always only set the user permissions...
32d6cc6
to
d42998d
Compare
FWIW, can't reformat on this platform because no LLVM, will reformat from another system |
Windows CI complains about:
|
Hmm, there's no path compare function from skimming the sentry_path header, so it'd have to be ifdef'd for on Windows wide... or one implemented. Which would be preferred? |
I think its okay to ifdef it there. |
should this be a sentry_path_compare (which needs to be written)?
Codecov Report
@@ Coverage Diff @@
## master #593 +/- ##
==========================================
+ Coverage 86.74% 86.77% +0.03%
==========================================
Files 49 49
Lines 3938 3947 +9
Branches 842 844 +2
==========================================
+ Hits 3416 3425 +9
- Misses 432 434 +2
+ Partials 90 88 -2 |
🎉 thanks for working on this! |
Now I have to actually develop the code to use it after hacking on this for a month, so maybe I'll be here again 🥴 |
* meta: Bump python dependencies (getsentry#600) The old version of pytest breaks with python 3.10 which changed a little how code object internals work. Since python 3.10 is now released it starts being used in CI. * fix: Ensure that a valid DSN has a public_key (getsentry#598) * feat: AIX support (getsentry#593) Co-authored-by: Floris Bruynooghe <[email protected]> Co-authored-by: Arpad Borsos <[email protected]> Co-authored-by: Calvin Buckley <[email protected]>
) * feat: Add support for Qt 6 (getsentry#509) * fix: Windows SDK Compiler Warning (getsentry#511) * fix: Validate tm put into strftime (getsentry#510) * fix: Rewrite the Linux module finder (getsentry#431) It now works in memory, without requiring to mmap the libraries again, which should make it work correctly on android when loading libraries directly from apk or appbundle files. The new code will keep track of readable memory maps and will translate read requests based on the offset of those memory maps, making sure that we actually can read whatever we are trying to read. * build: Avoid building all targets (getsentry#512) It looks like cmake is broken and builds ALL the targets when the parallel option is specified first, lol * fix: Update Crashpad to 2021-04-12 and fix macOS universal build (getsentry#513) * feat: Invoke before_send hook when using Crashpad (getsentry#519) * feat: Add more Event Payload convenience methods (getsentry#517) Adds: * `sentry_value_new_exception` * `sentry_value_new_thread` * `sentry_value_new_stacktrace` * `sentry_event_add_exception` * `sentry_event_add_thread` Deprecates `sentry_event_value_add_stacktrace` * feat: Introduce `sentry_close` (getsentry#518) This replaces the former `sentry_shutdown`, which is being forwarded. * meta: Prepare Changelog for upcoming release (getsentry#522) * ref: Pass options to scope apply directly (getsentry#521) * fix: Further clean up platform libraries for static linking (getsentry#523) * fix: Better macOS availability checks (getsentry#524) This should allow building on older macOS versions as well as running on older versions by fixing the usage of __builtin_available, and adding a different clock source for older macOS versions. * release: 0.4.9 * fix: Avoid double-free on invalid DSN (getsentry#527) * meta: Use correct libunwindstack commit * fix: Allow for Unity builds (getsentry#536) * ref: Add more testcases that trigger crashes in various ways (getsentry#538) * ref(craft): Modernize Craft config (getsentry#543) * fix: Update venv and test-discovery Makefile targets (getsentry#544) * fix: Avoid recursion when using `sentry_reinstall_backend` (getsentry#548) Previously, the `inproc` and `crashpad` (on linux) backends didn’t correctly reset their signal handlers when doing `reinstall_backend` (or multiple `init` calls for that matter). This has led to an infinite loop generating crashes. The fix now correctly unregisters the inproc/crashpad signal handlers, and adds an integration test using `reinstall_backend` to make sure we do not end up in an infinite loop. Co-authored-by: Mischa Alff <[email protected]> * fix: Address -Wundef warning for SENTRY_UNITTEST defines (getsentry#549) * build: Set 32-bit option for compiling assembly as well (getsentry#550) This fixes compilation of breakpad for 32-bit systems * meta: Update break/crashpad to 2021-06-14 (getsentry#552) * fix: Shorten/Split Locked sections to avoid deadlock (getsentry#551) We have received a report that the `sentry_get_modules_list` on mac can deadlock when other code concurrently does a `dlopen` and thus invokes the `add_image` callback from a different thread. We shorten/split the locked blocks in order to avoid holding a lock in the `get_modules` function whenever the `add_image` function is being invoked possibly from other threads. * fix: Tighten Stack Usage (getsentry#553) According to some docs, JVM/JNI stacks on Android can be as small as 32K, and our own sigaltstack is not much larger with 64K. Make sure to avoid large stack allocations as much as possible. We have especially seen the literal content of `/proc/self/maps` as well as formatted addresses inside corrupted release/environment attributes, which might point to overflows that write into a previously allocated release/environment string. * meta: Update Changelog (getsentry#556) * release: 0.4.10 * reformat * fix: Make Linux modulefinder/unwinder safer (getsentry#559) This is using the `process_vm_read` call to safely poke at random memory. It also makes sure to shim the libc provided call with a direct syscall for older Android devices. * docs: Try to better explain unwind API (getsentry#564) * fix: Make Crashpad Backend respect max_breadcrumbs setting (getsentry#566) * fix: Cancel slow winhttp requests on shutdown (getsentry#570) Co-authored-by: Gerhard Herbert <[email protected]> * fix: Properly close the background worker thread on timeout (getsentry#571) * fix: Possible race conditions in init/close and sessions (getsentry#545) * meta: Draft Changelog (getsentry#572) * release: 0.4.11 * feat: Make shutdown timeout customizable (getsentry#577) Co-authored-by: Andrei Muraru <[email protected]> * CMake: Link to the CURL::libcurl target when available (getsentry#579) Caters better for newer cmake versions. * meta: Update crashpad to 2021-07-14 (getsentry#580) * fix: Properly use `SENTRY_BUILD_RUNTIMESTATIC` for `sentry_fuzz_json` unit test (getsentry#583) * meta: Update break/crashpad to 2021-07-28 (getsentry#584) * release: 0.4.12 * fix: Increment CXX standard to 14 to allow crashpad build (getsentry#585) Fixes getsentry#574 * meta: Bump python dependencies (getsentry#600) The old version of pytest breaks with python 3.10 which changed a little how code object internals work. Since python 3.10 is now released it starts being used in CI. * fix: Ensure that a valid DSN has a public_key (getsentry#598) * feat: AIX support (getsentry#593) * CMake: Check whether libcurl was already found (getsentry#602) Currently when there is any other project that brings libcurl as a dependency, the build fails with “Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR)“, even though libcurl has already added as CURL::libcurl library. This patch adds a check for CURL_FOUND, to indicate that the library was already found, if set by another project. It also skips the additional find_package() step so it does not fail. Signed-off-by: Ladislav Macoun <[email protected]> * CMake: fix `SENTRY_BACKEND` defined in outer scope (getsentry#603) * CMake: add ability to set solution folder name (getsentry#604) * [pull] master from getsentry:master (#14) * ci(codechecker): Workaround for code checker not building due to node issues (getsentry#615) * meta: Update breakpad/crashpad to 2021-12-03 (getsentry#614) * feat(tracing): Add config options (getsentry#613) * fix: Correct changelog entry (getsentry#622) * meta: Bump breakpad (getsentry#621) * feat: Add internal UUID types (getsentry#616) This adds in support for internal UUIDs needed by tracing, such as the trace ID and the span ID. The major difference between this and the "standard" UUID is that the hyphens are stripped during serialization. sentry appears to not consider the hyphenated representations of these UUIDs to be valid for certain fields in an event. * meta: Update changelog (getsentry#625) * release: 0.4.13 * feat(tracing): Groundwork to add tracing context to all events (getsentry#617) This adds the appropriate stubs and fields to start storing spans on the (universal) scope. No actual logic has been added to actually support setting spans on the scope itself. The focus of this is to begin including tracing info in the context on all events if there is a transaction set on the scope. It does this fairly naively right now as the tooling to merge `sentry_value_t`s are basically nonexistent. * ci: Make integration tests capable of reading the non-backwards compatible version number for Big Sur (getsentry#627) * feat(tracing): Basic transaction context creation (getsentry#619) This adds in the ability to create and manipulate transaction contexts as defined in https://develop.sentry.dev/sdk/performance/#new-span-and-transaction-classes, under Transaction Interface. Instead of defining several transaction constructor functions with varying names (since overloading doesn't exist), the decision has been made to have the user construct an "inactive" transaction which should be fed into the SDK's implementation of `start_transaction`. This follows an existing pattern in the SDK where exceptions, threads, messages, etc can be constructed but they must be explicitly added to an event to be sent to sentry. * feat(tracing): Support basic sampling of transactions (getsentry#620) If an event is a transaction, event flushing should determine discard or forward the transaction to sentry based on the sample rate as configured in sentry options. Follows the sampling rules as defined in https://develop.sentry.dev/sdk/performance/#sampling-context. This does not take into consideration parent sampling as that property is currently unimplemented on the transaction context. * feat(tracing): Introduce a helper that identifies events that are transactions (getsentry#628) * feat(tracing): Restrict `sentry_capture_event` so it only sends non-transaction events (getsentry#629) Prevent the public API from being used to send transaction events as another transaction-specific function is meant to be used to accomplish this. * fix: Avoid deadlocks with uninitialized options (getsentry#639) The `SENTRY_WITH_OPTIONS_MUT` was a footgun since it never unlocked when the options were NULL (uninitialized). This removes the macro and replaces its uses with explicit lock/unlock calls. * feat(tracing): Add in basic Envelope support for Transactions (getsentry#630) * feat(tracing): Allow manual creation and sending of spanless Transactions (getsentry#631) * feat(tracing): Defer some transaction validation and allow creation of internal spans (getsentry#633) Co-authored-by: relaxolotl <[email protected]> Co-authored-by: Sebastian Zivota <[email protected]> Co-authored-by: getsentry-bot <[email protected]> Co-authored-by: Arpad Borsos <[email protected]> Co-authored-by: Arpad Borsos <[email protected]> Co-authored-by: Tor Arne Vestbø <[email protected]> Co-authored-by: Arpad Borsos <[email protected]> Co-authored-by: Luke Street <[email protected]> Co-authored-by: getsentry-bot <[email protected]> Co-authored-by: Sentry Bot <[email protected]> Co-authored-by: Arpad Borsos <[email protected]> Co-authored-by: bschatt <[email protected]> Co-authored-by: Burak Yigit Kaya <[email protected]> Co-authored-by: MikeRumplerSentry <[email protected]> Co-authored-by: Mischa Alff <[email protected]> Co-authored-by: Michał Janiszewski <[email protected]> Co-authored-by: getsentry-bot <[email protected]> Co-authored-by: Gerhard Herbert <[email protected]> Co-authored-by: andrei-mu <[email protected]> Co-authored-by: Andrei Muraru <[email protected]> Co-authored-by: pastdue <[email protected]> Co-authored-by: Roshan Padaki <[email protected]> Co-authored-by: mjvankampen <[email protected]> Co-authored-by: Floris Bruynooghe <[email protected]> Co-authored-by: Calvin Buckley <[email protected]> Co-authored-by: Ladislav <[email protected]> Co-authored-by: Mikhail Paulyshka <[email protected]> Co-authored-by: pull[bot] <39814207+pull[bot]@users.noreply.github.com> Co-authored-by: relaxolotl <[email protected]> Co-authored-by: Sebastian Zivota <[email protected]>
At work, we're looking into crash reporting for a language runtime, and Sentry is one of the solutions we're evaluating. However, all the users are on IBM i PASE (basically syscall emulator for AIX binaries, so treat it as AIX), so we need a working C client for it.
As of right now, it seems to build and passes quite a few tests, but also fails quite a few more (as of 2021-09-20).
Breakpad/Crashpad is unlikely to be supported, so this will have to rely on inproc.
update 2021-09-21