Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support onprem platform for user-tools profiling command #431

Merged
merged 3 commits into from
Jul 12, 2023

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Jul 11, 2023

This PR fixes #315.
This PR adds user-tools profiling command for On-premisis platform. Updated the documentation for profiling CLI.

Command to test profiling tool:

export EVENT_LOGS=$SCRATCH_DIR/eventlogs
export WORKER_INFO=$SCRATCH_DIR/gpu_worker_info.yaml

 spark_rapids_user_tools onprem profiling \
    --eventlogs $EVENT_LOGS \
    --worker_info $WORKER_INFO

@nartal1 nartal1 added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label Jul 11, 2023
@nartal1 nartal1 self-assigned this Jul 11, 2023
| **rapids_options**** | A list of valid [Profiling tool options](../../core/docs/spark-profiling-tool.md#qualification-tool-options). Note that (`output-directory`, `auto-tuner`, `combined`) flags are ignored | N/A | N |

If the CLI does not provide an argument `worker_info`, the tool will throw an error and exit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now. But would it be simple to allow the user to run profiling without worker_info and then not execute auto-tuner and just generate profiling output files locally? That would be needed to fully deprecate Profiler java command as some customers have run w/o cluster info to get metrics output without recommendations.

Copy link
Collaborator

@amahussein amahussein Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, then I suggest that to be done within the same PR so we don't have to go back to the spark_rapids_user_tools CLI to add features.
The trick with the new ascli proposal that it recommends removing the "rapids_options" arguments which means that some arguments consumed by Profiler java have to be hardcoded.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @nartal1 If you find that it is easy modifying the profiling command to only run the java command without recommendation, then we can do it within the same PR. Otherwise, then we can keep that feature out of scope since it is part of the proposal of the new CLI
  • Something that was overlooked in previous PRs is that onPRem qualification does not have doc-string to generate the help command. We can add the doc-string of the qualification here to get done with it.

@nartal1
Copy link
Collaborator Author

nartal1 commented Jul 12, 2023

  • @nartal1 If you find that it is easy modifying the profiling command to only run the java command without recommendation, then we can do it within the same PR. Otherwise, then we can keep that feature out of scope since it is part of the proposal of the new CLI
  • Something that was overlooked in previous PRs is that onPRem qualification does not have doc-string to generate the help command. We can add the doc-string of the qualification here to get done with it.

Modifying profiling command to run java command without recommendation will take more effort. As discussed offline, we will incorporate this change in the new CLI.

Updated the doc-string for qualification command. Removed arguments which are not needed for on prem platform in that process. PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support onprem platform for user tools profiling command
3 participants