Skip to content

Commit

Permalink
[Core] Disabling object store from appearing in worker coredumps (ray…
Browse files Browse the repository at this point in the history
…-project#30150)

Description
If a Ray worker process crashes and generates a coredump, Linux will by default dump all pages mapped by the process. This includes the plasma pages, which on large instances can be quite large. This PR uses madvise to disable dumping of the plasma pages in worker processes. The impact is that coredumps generated by Ray worker processes are now ~300MB instead of ~= object store size.

See this [public] Coredumps appearing to cause hangs in Ray for more information.

Testing
It is difficult to test this in CI because 1) at the C++ level there isn't a Linux API to verify madvise status of pages and 2) coredumps aren't enabled (/proc/sys/kernel/core_pattern is set to an invalid value and ulimit -c is 0) so a Python-level test won't work. I can go and enable coredumps in CI but that feels like a big change, want to check before going down that path.

In terms of manual testing, this is disabled by macro for non Linux builds. On Linux, the coredump size goes down significantly:

$ ls -alh /tmp/core.ray::Actor.abor.88940 # without madvise
-rw------- 1 ray users 9.6G Nov 13 10:52 /tmp/core.ray::Actor.abor.88940
$ ls -alh /tmp/core.ray::Actor.abor.97217 # with madvise
-rw------- 1 ray users 239M Nov 13 11:09 /tmp/core.ray::Actor.abor.97217
$ gdb -c /tmp/core.ray::Actor.abor.97217
(gdb) info proc mappings
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      0x55a3af4a7000     0x55a3af506000    0x5f000        0x0 /home/ray/anaconda3/bin/python3.8
      0x55a3af506000     0x55a3af6fe000   0x1f8000    0x5f000 /home/ray/anaconda3/bin/python3.8
      0x55a3af6fe000     0x55a3af7e5000    0xe7000   0x257000 /home/ray/anaconda3/bin/python3.8
      0x55a3af7e6000     0x55a3af7eb000     0x5000   0x33e000 /home/ray/anaconda3/bin/python3.8
      0x55a3af7eb000     0x55a3af823000    0x38000   0x343000 /home/ray/anaconda3/bin/python3.8
      0x7fc9792c0000     0x7fcd64000000 0x3ead40000        0x0 /dev/shm/plasmax6oDcM (deleted)
(...)
Closes ray-project#29576

Signed-off-by: Weichen Xu <[email protected]>
  • Loading branch information
cadedaniel authored and WeichenXu123 committed Dec 19, 2022
1 parent d50b901 commit 953dfce
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 0 deletions.
4 changes: 4 additions & 0 deletions src/ray/common/ray_config_def.h
Original file line number Diff line number Diff line change
Expand Up @@ -714,3 +714,7 @@ RAY_CONFIG(int64_t, health_check_initial_delay_ms, 5000)
RAY_CONFIG(int64_t, health_check_period_ms, 3000)
RAY_CONFIG(int64_t, health_check_timeout_ms, 10000)
RAY_CONFIG(int64_t, health_check_failure_threshold, 5)

/// Use madvise to prevent worker coredump from including the mapped plasma pages
/// in the worker processes.
RAY_CONFIG(bool, worker_core_dump_exclude_plasma_store, true)
26 changes: 26 additions & 0 deletions src/ray/object_manager/plasma/shared_memory.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include <unistd.h>
#endif

#include "ray/common/ray_config.h"
#include "ray/object_manager/plasma/malloc.h"
#include "ray/util/logging.h"

Expand All @@ -33,6 +34,31 @@ ClientMmapTableEntry::ClientMmapTableEntry(MEMFD_TYPE fd, int64_t map_size)
RAY_LOG(FATAL) << "mmap failed";
}
close(fd.first); // Closing this fd has an effect on performance.

#endif

MaybeMadviseDontdump();
}

void ClientMmapTableEntry::MaybeMadviseDontdump() {
if (!RayConfig::instance().worker_core_dump_exclude_plasma_store()) {
RAY_LOG(DEBUG) << "worker_core_dump_exclude_plasma_store disabled, worker coredumps "
"will contain "
<< "the object store mappings.";
return;
}

#if !defined(__linux__)
RAY_LOG(DEBUG)
<< "Filtering object store pages from coredumps only supported on linux.";
#else
int rval = madvise(pointer_, length_, MADV_DONTDUMP);
if (rval) {
RAY_LOG(WARNING) << "madvise(MADV_DONTDUMP) call failed: " << rval << ", "
<< strerror(errno);
} else {
RAY_LOG(DEBUG) << "madvise(MADV_DONTDUMP) call succeeded.";
}
#endif
}

Expand Down
2 changes: 2 additions & 0 deletions src/ray/object_manager/plasma/shared_memory.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ class ClientMmapTableEntry {
/// The length of the memory-mapped file.
size_t length_;

void MaybeMadviseDontdump();

RAY_DISALLOW_COPY_AND_ASSIGN(ClientMmapTableEntry);
};

Expand Down

0 comments on commit 953dfce

Please sign in to comment.