Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK] Optimize the storage of accumulables in core tools #1263

Merged
merged 16 commits into from
Aug 8, 2024

Conversation

amahussein
Copy link
Collaborator

@amahussein amahussein commented Aug 6, 2024

Fixes #1202

While investigating bottlenecks, it was found that most of the objects being allocated are representing metrics.

In original code the accumulables were stored in a huge map AppBase.taskStageAccumMap

  // accum id to task stage accum info
  var taskStageAccumMap: HashMap[Long, ArrayBuffer[TaskStageAccumCase]] =
    HashMap[Long, ArrayBuffer[TaskStageAccumCase]]()

There is another map hashMap that stored AccumulableIDs to set of StageIDs to build the Exec-to-Stage map.

The class TaskStageAccumCase definition is as follows:

case class TaskStageAccumCase(
    stageId: Int,
    attemptId: Int,
    taskId: Option[Long],
    accumulatorId: Long,
    name: Option[String],
    // The total accumulated so far for all tasks
    value: Option[Long],
    // The amount for this particular task/update
    update: Option[Long],
    isInternal: Boolean)

A new TaskStageAccumCase is created for each accumulable when the stage/stage is completed.

Lets say there is a stage with 100 tasks.
An accumulator ID will have an ArrayBuffer of size 101. all of those entries will repeat the same common fields and differ only in the TaskID/update values if any.


Changes

This PR revisits the way the accumulables are stored.

  • Accumulable names are stored in a global concurrent hashMap AccumNameRef. This implies that we create only one string and use it across all the threads to represent the same accumulable.
  • Create a new class AccumMetaRef that holds <accumId-AccNameRef>: this encapsulation tends to be very important to propagate the same optimization while dumping the data.
  • AccumMetaRef are stored in a per-app hashMap because this should not be shared across the different threads/applications. Once analysis is done, the map is collected
  • AccumProfileResult is changed to use AccumMetaRef to optimize the memory consumption. This reduces the number of allocations since accumMetaRef already exists in memory. Finally the CSVformat conversion is also part of the AccumNameRef because we should create only one value for each accumulator instead of reformating a new string for each row (X by number of stages)

Unit tests affected

  • "test printSQLPlanMetrics" : this unit-test was affected because in the legacy code we used to use a "0" if a task-update does not exist. As a result, it forced the minimum to be 0 for all the accumulables which is incorrect. The new code handles this correctlt because it only aggregates stats for records with valid mapping.
  • "test dsv1 complex": the estimate GPU speedup is different in the new code.
    • The new code will create an entry between AccumID-to-StageID before a stage is completed. This should capture the cases where updates an accumulable but it is not completed
    • As mentioned earliuer, the new code does not enforce a "0" to the accumulable in case a map between stage/task-AccumId does not exists.
    • The new code captures the case when a task updates the same Accumulable multiple times.
    • the maxDuration is correctly looking into the stage records. The legacy code was looking for the maximum value in the entire ArrayBuffer. This could eventually pick tasks if stage entries do not exist (incomplete stages)

Future work and Followups

See the list of tasks in #815


Performance Optimizations

@bilalbari please share some performance numbers in this PR description:

  • heap-dumps before and after
  • benchmarkSuite before/after with different heap size
  • Proof of memory savings by showing a case where an eventlog would fail on heap size and eventually succeeds on the new branch with smaller or equal heap size

Heap Usage Before Changes

Screenshot from 2024-08-05 16-56-06
Heap Usage Post Changes

Screenshot from 2024-08-05 16-53-49

Example of Failed Run that OOM
Setup -

Heap Size - 14G ( This continues to OOM till 18G heap size )
Event Log Size - 1.5G ( gz compressed )

Screenshot from 2024-08-07 10-18-32

OOM BASELINES

  • Before changes
    • Minimum memory required to run - 19G
  • After changes
    • Minimum memory required to run - 5G
  • %age memory improvement - 74%

BenchmarkSuite output without changes

JVM Name                   :   OpenJDK 64-Bit Server VM 
Java Version               :   1.8.0_422 
OS Name                    :   Linux 
OS Version                 :   6.5.0-41-generic 
MaxHeapMemory              :   21504 MB 
Total Warm Up Iterations   :   3 
Total Runtime Iterations   :   3 
Input Arguments            :    --output-directory /home/sbari/project-repos/scratch_folder/issue-367/output_folder /home/sbari/project-repos/scratch_folder/issue-367/eventlogs/temp-event-logs 
 
================================================================================================
Benchmark_Per_SQL_Arg_Qualification
================================================================================================

Benchmark :                               Best Time(ms)   Avg Time(ms)   Stdev(ms)      Avg GC Time(ms)       Avg GC Count     Stdev GC Count    Max GC Time(ms)       Max GC Count   Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Enable_Per_SQL_Arg_Qualification                  87366          87441         100               6607.0               48.0                  4               6632                 51      1.00X
Disable_Per_SQL_Arg_Qualification                 86165          86448         257               6627.0               45.0                  0               6908                 46      1.01X

BenchmarkSuite output with changes

JVM Name                   :   OpenJDK 64-Bit Server VM 
Java Version               :   1.8.0_422 
OS Name                    :   Linux 
OS Version                 :   6.5.0-41-generic 
MaxHeapMemory              :   10240 MB 
Total Warm Up Iterations   :   3 
Total Runtime Iterations   :   3 
Input Arguments            :    --output-directory /home/sbari/project-repos/scratch_folder/issue-367/output_folder /home/sbari/project-repos/scratch_folder/issue-367/eventlogs/temp-event-logs 
 
================================================================================================
Benchmark_Per_SQL_Arg_Qualification
================================================================================================

Benchmark :                               Best Time(ms)   Avg Time(ms)   Stdev(ms)      Avg GC Time(ms)       Avg GC Count     Stdev GC Count    Max GC Time(ms)       Max GC Count   Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Enable_Per_SQL_Arg_Qualification                 116229         116712         424               3416.0              142.0                  2               3455                144      1.00X
Disable_Per_SQL_Arg_Qualification                117012         118136        1466               3157.0              129.0                  1               3180                130      0.99X

amahussein and others added 14 commits July 29, 2024 17:03
Signed-off-by: Ahmed Hussein <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein <[email protected]>
Signed-off-by: Sayed Bilal Bari <[email protected]>
Signed-off-by: Sayed Bilal Bari <[email protected]>
Signed-off-by: Sayed Bilal Bari <[email protected]>
Signed-off-by: Sayed Bilal Bari <[email protected]>
Signed-off-by: Sayed Bilal Bari <[email protected]>
Signed-off-by: Sayed Bilal Bari <[email protected]>
Signed-off-by: Sayed Bilal Bari <[email protected]>
@amahussein amahussein added the core_tools Scope the core module (scala) label Aug 6, 2024
cindyyuanjiang
cindyyuanjiang previously approved these changes Aug 7, 2024
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits. LGTM.

Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just minor nits, otherwise looks good

* @param taskId the taskId pulled from the TaskEnd event
* @param accumulableInfo the accumulableInfo from the TaskEnd event
*/
def addAccToTask(stageId: Int, taskId: Long, accumulableInfo: AccumulableInfo): Unit = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit rename addAccumToTask

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* triggered by a taskEnd event and the mapo between stage-Acc has not been
* established yet.
*/
def addAccToStage(stageId: Int,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit rename addAccumToStage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator Author

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tgravescs and @cindyyuanjiang for the prompt review
I addressed the comments and renamed classe names "Acc*" to "Accum*" to be consistent

* triggered by a taskEnd event and the mapo between stage-Acc has not been
* established yet.
*/
def addAccToStage(stageId: Int,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* @param taskId the taskId pulled from the TaskEnd event
* @param accumulableInfo the accumulableInfo from the TaskEnd event
*/
def addAccToTask(stageId: Int, taskId: Long, accumulableInfo: AccumulableInfo): Unit = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

Copy link
Collaborator

@parthosa parthosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bilalbari and @amahussein for the design and implementation of this optimization. The numbers look promising.

NAMES_TABLE.computeIfAbsent(nameKey, AccumNameRef.fromString)
}

// Intern the accumulator name if it is not already present in the table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be insert?

@amahussein amahussein merged commit 1a2351f into NVIDIA:dev Aug 8, 2024
14 checks passed
@amahussein amahussein deleted the rapids-tools-1202-final branch August 8, 2024 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TASK] Optimize the storage of accumulables in core tools
6 participants