PortDumpTest fails if kernel.core_pattern forces the core to be redirected #6300

davidjmccann · 2022-01-13T15:47:53Z

Running omrporttest if kernel.core_pattern=core.%p (for example), PortDumpTest passes.

If it's something like "|/usr/share/apport/apport %p %s %c %d %P %E" then the core doesn't necessarily get created in the current working directory, and the test fails.

This then raises the point that for this code to work, the user must disable core dump redirection, which is going to be a privileged operation. Even worse, there is no way of setting it within a container unless the container is privileged, meaning that the host has to be changed to get a core dump generated within the container with the specified name. This is described here:
containers/podman#6528

Could the core dump code for Linux be changed to directly generate the file similar to OSX so it wouldn't be affected by core_pattern at all?

davidjmccann · 2022-03-03T13:32:11Z

Google coredumper (https://code.google.com/archive/p/google-coredumper/) did attempt to do this kind of thing but it's very old. https://github.com/Percona-Lab/coredumper was an attempt at updating it, but it uses various bits of assembler that don't seem to build now.

davidjmccann · 2022-03-03T16:13:31Z

The basic issue I'm trying to solve is... in a Kubernetes environment, how can I get a core dump out of a process:

Without requiring elevated privileges in the container
Without requiring root access on the host node to set specific core settings
With the core dump going to the persistent volume of choice rather than storage on the host.

babsingh · 2022-03-03T16:55:57Z

With apport enabled, is a core file generated in any of the below locations?

/var/lib/apport/coredump/
/var/crash/

babsingh · 2022-03-03T17:00:29Z

# apport disabled
systemctl stop apport

./fvtest/porttest/omrporttest --verbose --gtest_filter=PortDumpTest.dump_test_create_dump_with_NO_name
Note: Google Test filter = PortDumpTest.dump_test_create_dump_with_NO_name
[==========] Running 1 test from 1 test case.
[----------] 1 test from PortDumpTest
[----------] 1 test from PortDumpTest (18 ms total)

[==========] 1 test from 1 test case ran. (18 ms total)
[  PASSED  ] 1 test.
[  ALL TESTS PASSED  ]

# apport enabled
systemctl start apport

./fvtest/porttest/omrporttest --verbose --gtest_filter=PortDumpTest.dump_test_create_dump_with_NO_name
Note: Google Test filter = PortDumpTest.dump_test_create_dump_with_NO_name
[==========] Running 1 test from 1 test case.
[----------] 1 test from PortDumpTest
JVMPORT030W
/root/openj9-openjdk-jdk18/omr/fvtest/porttest/omrdumpTest.cpp line  173: omrdump_test_create_dump_with_NO_name omrdump_create returned: 1, with filename: The core file created by child process with pid = 21208 was not found. Expected to find core file with name "core"
		LastErrorNumber: 0
		LastErrorMessage:

/root/openj9-openjdk-jdk18/omr/fvtest/porttest/testHelpers.cpp:109: Failure
Value of: 0 == numberFailedTestsInComponent
  Actual: false
Expected: true
Test failed!
[  FAILED  ] PortDumpTest.dump_test_create_dump_with_NO_name (5376 ms)
[----------] 1 test from PortDumpTest (5376 ms total)

[==========] 1 test from 1 test case ran. (5377 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] PortDumpTest.dump_test_create_dump_with_NO_name

 1 FAILED TEST

babsingh · 2022-03-03T17:10:41Z

My OS:

NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"

With apport enabled, the core file gets generated in /var/lib/apport/coredump/core._root_openj9-openjdk-jdk18_omr_build_fvtest_porttest_omrporttest.0.d4015845-7c5d-441f-90c2-82548c7c33d0.21208.706762115.

With apport disabled, the core file gets generated in the same directory.

omr/port/unix/omrosdump.c

Line 102 in 0234071

    
           omrdump_create(struct OMRPortLibrary *portLibrary, char *filename, char *dumpType, void *userData)

omr/port/linux/omrosdump_helpers.c

Line 131 in dd1373f

    
           renameDump(struct OMRPortLibrary *portLibrary, char *filename, pid_t pid, int signalNumber)

The above functions correctly work with apport disabled since they expect the core file to be generated in the same directory. We can look into having the above functions work with apport. Will there be value in this since Apport is not enabled by default in stable releases, even if it is installed? The reasons are specified in https://wiki.ubuntu.com/Apport:

Apport collects potentially sensitive data, such as core dumps, stack traces, and log files. They can contain passwords, credit card numbers, serial numbers, and other private material.
During the development release we already collect thousands of crash reports, much more than we can ever fix. Continuing to collect those for stable releases is not really useful, since ...
Data collection from apport takes a nontrivial amount of CPU and I/O resources, which slow down the computer and don't allow you to restart the crashed program for several seconds.

davidjmccann · 2022-03-03T17:16:10Z

In an OpenShift environment we're more likely to be looking at systemd-coredump rather than Apport.

keithc-ca · 2022-03-03T19:39:43Z

This seems a reasonable feature to add, but I suggest it should be opt-in: an application must explicitly indicate that the new mechanism should be used to generate core files (as opposed to allowing the Linux kernel to do that).

babsingh · 2022-03-03T20:11:13Z

More details on the reasonable feature:

With apport or systemd-coredump, core files generated may be placed in a centralized location which can be inaccessible from the container environment.
Instead of relying upon the Linux OS to generate and redirect the core file to an inaccessible location, the new feature will generate and write the core file at the desired location. This approach is used on OSX:

omr/port/osx/omrosdump.c

Line 96 in 0234071

coredump_to_file(mach_port_t task_port, pid_t pid)
Due to differences in system calls between Linux and OSX, the OSX approach cannot be used as-is on Linux. It will need to be re-implemented for Linux.
Opt-in methods can be either compile time via a flag or runtime via an environment variable or command line option.

@mikezhang1234567890 While implementing #6014, did you find any resources which will allow us to extend the core dump tool to Linux?

mikezhang1234567890 · 2022-03-03T21:20:25Z

If we're looking to implement a user-space core dump tool, the basic approach can roughly be the same, dump the memory (of a copy or of the original process), and dump the thread state.

Most of the information needed to implement is regarding the binary format, which is ELF on Linux, and documentation is plentiful for ELF. https://www.gabriel.urdhr.fr/2015/05/29/core-file/ is a good read on the basic structure of a core file generated by the kernel or GDB.

I don't have anything for challenges or issues specific to Linux unfortunately.

davidjmccann · 2022-03-04T06:22:08Z

Particular care would probably need to be made for setuid/setgid executables. The file permissions would probably need to be read just for the owning user of the process?

kgibm · 2022-03-07T18:38:09Z

The basic issue I'm trying to solve is... in a Kubernetes environment, how can I get a core dump out of a process:

Without requiring elevated privileges in the container

Without requiring root access on the host node to set specific core settings

With the core dump going to the persistent volume of choice rather than storage on the host.

Note that there are two common solutions to this:

Install gdb in the image so that the gcore command is available and then run gcore ${PID} which attaches gdb to the process, writes the core to the current directory, and then detaches:
```
% podman exec -it $(podman ps -q) sh -c 'gcore "$(cat /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/server1/server1.pid)" && ls -l core*'
[...]
-rw-r--r--. 1 root root 5468347904 Mar  7 18:35 core.1306
```
It doesn't write out all the same VMAs as the kernel does when it creates a core, but it's close. The one downside is that if you happen to get the core while the JVM is in a sensitive operation like a garbage collection, then pointers might be in-flight and the core might be useless as far as some forms of common Java heap dump analysis. There are some ways around this like injecting an exception into the process and using an -Xdump handler with request=exclusive+prepwalk and filtering to that exception to exec a tool script that calls gcore but that's complicated.
For the most common core_pattern of |/usr/lib/systemd/systemd-coredump, set ProcessSizeMax and ExternalSizeMax (both of which default to 2GB which may cause truncation, until systemd-coredump v250 which defaults to 32GB) in /etc/systemd/coredump.conf on the worker node, run systemctl daemon-reload, and then after the container produces a core, find it on the worker node with coredumpctl.

With that, it would be very nice to have a core-dumper in J9 like on macOS since, 1) most containers don't have gdb and it's a complex process for most customers to re-build their images, and 2) worker node operations are privileged and complicated to do at many customers.

davidjmccann · 2022-03-08T06:47:59Z

Thanks @kgibm

Doesn't that still require extra privileges in the container though? Adding gdb also adds quite a number of MB to the image size. Also running gcore requires you to run the command after an event has occurred, rather than being able to take a core when a problem occurs - e.g. a SIGSEGV.
This just highlights how awkward getting core dumps can be!

kgibm · 2022-03-08T14:10:35Z

Doesn't that still require extra privileges in the container though?

Yes, you're right, I just tried this and it requires --cap-add SYS_PTRACE.

rather than being able to take a core when a problem occurs - e.g. a SIGSEGV.

One could create an -Xdump handler for, e.g., gpf that uses the tool option to exec out to gcore, but yeah, that's cumbersome, and made largely moot by the above privilege point.

This just highlights how awkward getting core dumps can be!

Agreed, and from my experience, most customers have the default systemd-coredump configuration which truncates many cores.

mikezhang1234567890 · 2022-04-14T17:09:06Z

I can have a try at this. I will initially try to do this with an environment variable or Java option to toggle the behaviour.

roolebo · 2023-10-16T19:05:09Z

When apport is installed as default coredump handler, if you listen on /run/apport.socket inside the container you can accept coredump from within the container. I have not found any way to do something similar with systemd.

0xdaryl mentioned this issue Mar 3, 2022

Agenda for March 3, 2022 OMR Architecture Meeting #6352

Closed

davidjmccann mentioned this issue May 5, 2022

Enable core file generation on Linux in the Azure pipeline #6490

Open

mikezhang1234567890 mentioned this issue Sep 2, 2022

Add option for user space core dump on Linux #6688

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PortDumpTest fails if kernel.core_pattern forces the core to be redirected #6300

PortDumpTest fails if kernel.core_pattern forces the core to be redirected #6300

davidjmccann commented Jan 13, 2022

davidjmccann commented Mar 3, 2022

davidjmccann commented Mar 3, 2022

babsingh commented Mar 3, 2022

babsingh commented Mar 3, 2022

babsingh commented Mar 3, 2022 •

edited

Loading

davidjmccann commented Mar 3, 2022

keithc-ca commented Mar 3, 2022

babsingh commented Mar 3, 2022 •

edited

Loading

mikezhang1234567890 commented Mar 3, 2022 •

edited

Loading

davidjmccann commented Mar 4, 2022

kgibm commented Mar 7, 2022

davidjmccann commented Mar 8, 2022

kgibm commented Mar 8, 2022 •

edited

Loading

mikezhang1234567890 commented Apr 14, 2022 •

edited

Loading

roolebo commented Oct 16, 2023 •

edited

Loading

PortDumpTest fails if kernel.core_pattern forces the core to be redirected #6300

PortDumpTest fails if kernel.core_pattern forces the core to be redirected #6300

Comments

davidjmccann commented Jan 13, 2022

davidjmccann commented Mar 3, 2022

davidjmccann commented Mar 3, 2022

babsingh commented Mar 3, 2022

babsingh commented Mar 3, 2022

babsingh commented Mar 3, 2022 • edited Loading

davidjmccann commented Mar 3, 2022

keithc-ca commented Mar 3, 2022

babsingh commented Mar 3, 2022 • edited Loading

mikezhang1234567890 commented Mar 3, 2022 • edited Loading

davidjmccann commented Mar 4, 2022

kgibm commented Mar 7, 2022

davidjmccann commented Mar 8, 2022

kgibm commented Mar 8, 2022 • edited Loading

mikezhang1234567890 commented Apr 14, 2022 • edited Loading

roolebo commented Oct 16, 2023 • edited Loading

babsingh commented Mar 3, 2022 •

edited

Loading

babsingh commented Mar 3, 2022 •

edited

Loading

mikezhang1234567890 commented Mar 3, 2022 •

edited

Loading

kgibm commented Mar 8, 2022 •

edited

Loading

mikezhang1234567890 commented Apr 14, 2022 •

edited

Loading

roolebo commented Oct 16, 2023 •

edited

Loading