Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dynamic calculation of JVM resources in CLI cmd #944

Merged
merged 5 commits into from
Apr 18, 2024

Commits on Apr 16, 2024

  1. Support dynamic calculation of JVM resources in CLI cmd

    Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
    
    Fixes NVIDIA#943
    
    This code change is to reduce the probability of OOME thrown by the core-tools when too many threads are created within the core module.
    The problem was that a thread processing the eventlog would need around
    4-6 GB to succeed. This PR is aiming at dynamically calculating the number of threads that can fit to the virtual memory of the host.
    Note that this does not solve the problem. It is an improvement to dynamically pass JVM resources to the java cmd.
    Again, an OOME can be thrown if the batch of eventlogs is too large to
    exceed the expected 8 GB scenario.
    
    What has changed:
    
    - Use G1GC as GC algorithm. this is to override the default JDK8 parallel GC. The G1GC which stands for Garbage-First GC could be a better option to target short living objects.
    - Pull the Virtual memory information of the host machine to calculate the default heap size. By default the heap size is set to 80% of the total virtual memory.
    - Next, calculate the number of threads to be passed to the RAPIDS java cmd. Assuming that a thread needs at least 8GB of heap memory. the number of threads is calculated at (`heap_size / 8`)
    - If the CLI is running in concurrent mode (i.e., estimation_model is enabled), then the CLI splits the resources between Profiling and Qualification by the ratio of 2:1 respectively.
    - Add `jvm_heap_size` to the `spark_rapids` CLI
    amahussein committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    7361d70 View commit details
    Browse the repository at this point in the history
  2. Disiable running RAPIDS tools in parallel

    Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
    amahussein committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    f0016f0 View commit details
    Browse the repository at this point in the history
  3. Address review comments

    Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
    amahussein committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    17e8d2d View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. Configuration menu
    Copy the full SHA
    de6b285 View commit details
    Browse the repository at this point in the history

Commits on Apr 18, 2024

  1. Add jvm_threads as argument to the CLI

    Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
    amahussein committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    eaeb76b View commit details
    Browse the repository at this point in the history