Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Plasma] Use jemalloc instead of dlmalloc in plasma #2593

Closed
wants to merge 6 commits into from

Conversation

pcmoritz
Copy link
Contributor

This is currently blocked on jemalloc/jemalloc#1329 but I wanted to post it here too in case anybody has an idea how to move forward.

(@xhochy You are an expert with jemalloc, let me know if you have any ideas on what the right way to do this is. I'd really like to have one memory mapped file only if at all possible so we can remove the sending of file descriptors.)

ExternalProject_Add(jemalloc
URL ${jemalloc_URL}
DOWNLOAD_DIR "${DOWNLOAD_LOCATION}"
CONFIGURE_COMMAND ./autogen.sh "--prefix=${JEMALLOC_PREFIX}" "--with-jemalloc-prefix=je_plasma_" "--with-private-namespace=je_plasma_private_" "--disable-tls"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the same jemalloc version as we do in Arrow C++. Eventhough they should live happily along eachother, we should not add multiple versions of a library to the project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: if we're able to remove our vendored jemalloc by making this upgrade that would be great

@wesm
Copy link
Member

wesm commented Jan 29, 2019

Is there a plan about how to proceed on this patch?

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Jan 29, 2019

Yes, the plan is like this:

  • Get ARROW-4296: [Plasma] Use one mmap file by default, prevent crash with -f #3490 merged
  • In a follow up PR, impose a memory limit on the number of bytes that are allocated from the plasma side (i.e. plasma will keep track of the memory limit), and if jemalloc will require a bit more due to fragmentation and metadata, that's ok. We will need to make sure that the files are living on a file system that is ~ 10% or so larger than the plasma memory limit to account for the extra memory.
  • Then after we don't depend on the dlmalloc footprint limit any more, come back and merge this PR, replacing dlmalloc with jemalloc.
  • Don't assume we will be able to coerce jemalloc to use exactly one memory mapped file (that was the original plan, but I've given up on that and with ARROW-3958: [Plasma] Reduce number of IPCs #3124 it's not so important any more),

cc @anuragkh who suggested this plan

@wesm wesm changed the title [WIP] Use jemalloc instead of dlmalloc in plasma [WIP][Plasma] Use jemalloc instead of dlmalloc in plasma May 31, 2019
@kszucs kszucs force-pushed the master branch 2 times, most recently from ed180da to 85fe336 Compare July 22, 2019 19:29
@emkornfield
Copy link
Contributor

@pcmoritz it looks like this is has gotten fairly out of date, do you have plans to pick it up soon, or should we close the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants