Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve console output from python tool for failed/gpu/photon event logs #1235

Merged
merged 1 commit into from
Jul 29, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Jul 26, 2024

Fixes #1126.

#1187 added feature in scala tools to include the status report of all apps even those which had failed or skipped due to photon event logs. This PR improves python tools to alway display num of processed apps (even if passed GPU event logs or Streaming logs or any other failure)

Output

Case 1 : Event logs with no successfull apps: photon/gpu/csp event log with authentication issue

spark_rapids qualification --platform <platform>  --eventlogs </path/to/gpu-or-photon-logs>  --tools_jar <tools_jar> --verbose

Previously

____________________________________________________________________________________________________
                                        QUALIFICATION Report
____________________________________________________________________________________________________
The Qualification tool did not generate any output. Nothing to display.

After this change

    - Intermediate output generated by tools: /Users/psarthi/Work/tools-run/qual_20240726231326_caB38e8A/intermediate_output
    - Application status report: /Users/psarthi/Work/tools-run/qual_20240726231326_caB38e8A/rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_status.csv

Qualification tool found no successful applications to process.

Report Summary:
----------------------  -
Total applications      3
Processed applications  0
Top candidates          0
----------------------  -

Case 2: Event logs with no top candidates

Console Output
    - Summarized savings and speedups CSV report: /Users/psarthi/Work/tools-run/qual_20240726211957_B9A2A6A9/qualification_summary.csv
    - Intermediate output generated by tools: /Users/psarthi/Work/tools-run/qual_20240726211957_B9A2A6A9/intermediate_output
    - *Cluster config recommendations: /Users/psarthi/Work/tools-run/qual_20240726211957_B9A2A6A9/rapids_4_spark_qualification_output/tuning
    - Metadata file with cluster recommendation and tuning details: /Users/psarthi/Work/tools-run/qual_20240726211957_B9A2A6A9/qualification_summary_metadata.json
    - Application status report: /Users/psarthi/Work/tools-run/qual_20240726211957_B9A2A6A9/rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_status.csv

Qualification tool found no qualified applications after applying the filters.
See the CSV file for full report or disable the filters.

Report Summary:
----------------------  -
Total applications      3
Processed applications  3
Top candidates          0
----------------------  -

Case 3: Event logs path with some top candidate apps, some gpu apps/photon apps

Console Output
    - Summarized savings and speedups CSV report: /Users/psarthi/Work/tools-run/qual_20240726225824_511D59E9/qualification_summary.csv
    - Intermediate output generated by tools: /Users/psarthi/Work/tools-run/qual_20240726225824_511D59E9/intermediate_output
    - *Cluster config recommendations: /Users/psarthi/Work/tools-run/qual_20240726225824_511D59E9/rapids_4_spark_qualification_output/tuning
    - Metadata file with cluster recommendation and tuning details: /Users/psarthi/Work/tools-run/qual_20240726225824_511D59E9/qualification_summary_metadata.json
    - Application status report: /Users/psarthi/Work/tools-run/qual_20240726225824_511D59E9/rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_status.csv
+----+-------------------------+-------------------------+-----------------+---------------------------+------------------------------+-----------------------------+
|    | App Name                | App ID                  | Estimated GPU   | Qualified Node            | Full Cluster                 | GPU Config                  |
|    |                         |                         | Speedup         | Recommendation            | Config                       | Recommendation              |
|    |                         |                         | Category**      |                           | Recommendations*             | Breakdown*                  |
|----+-------------------------+-------------------------+-----------------+---------------------------+------------------------------+-----------------------------|
|  1 | test_spark_app_111111   | app-20240311195738-0000 | Small           | r5d.2xlarge to g5.2xlarge | app-20240311195738-0000.conf | app-20240311195738-0000.log |
|  0 | test_spark_app_222222   | app-20240311074805-0000 | Small           | r5d.2xlarge to g5.2xlarge | app-20240311074805-0000.conf | app-20240311074805-0000.log |
+----+-------------------------+-------------------------+-----------------+---------------------------+------------------------------+-----------------------------+

Report Summary:
----------------------  -
Total applications      7
Processed applications  4
Top candidates          2
----------------------  -

Changes

  • In class qualification.py::QualificationSummary:
    • Removed duplicate all_apps and df_results DFs
    • Renaming of class variables with comments to state what they refer to
    • Renaming of getters
    • Comments that list output files should be shown even if there are no candidates (Eg Case 2 above)
  • In class qualification.py::Qualification:
    • Introduced a helper method _read_qualification_output_file() that reads files generated from the scala output folder
    • Now that we are reading status report, add event log in the metadata json file for each app.

@parthosa parthosa added bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python) usability track issues related to the Tools's user experience labels Jul 26, 2024
@parthosa parthosa self-assigned this Jul 26, 2024
@parthosa parthosa changed the title Improve console output from python tool Improve console output from python tool for failed/gpu/photon event logs Jul 26, 2024
@parthosa parthosa marked this pull request as ready for review July 27, 2024 00:31
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTME!
Thanks @parthosa

Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa! LGTM.

@parthosa parthosa merged commit fa9d11e into NVIDIA:dev Jul 29, 2024
15 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1126 branch July 29, 2024 23:20
@tgravescs
Copy link
Collaborator

did this change the output of the top candidate table and location of where the * and ** definition is?

like:

  • *Cluster config recommendations: /Users/psarthi/Work/tools-run/qual_20240726225824_511D59E9/rapids_4_spark_qualification_output/tuning

is now above the table?

@parthosa parthosa added affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) and removed affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) labels Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working usability track issues related to the Tools's user experience user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] python user tools should always display processed apps - even if passed GPU event logs
4 participants