Fix prediction CSV files for multiple qual directories #1267

leewyang · 2024-08-07T18:50:28Z

This PR fixes an open TODO item. Currently, if the path passed to --qual_output contains more than one qual tool output directory, the code will loop over the qual tool output directories, making predictions and saving out various CSV files (e.g. per_app.csv, per_sql.csv, shap_values.csv) in the xgboost_predictions output folder. Unfortunately, these files will be overwritten each with each iteration of the loop. Note, however, that the final dataset_summaries contains the full, concatenated results of all of the iterations, so only these CSV files were impacted.

This PR combines the qual tool output directories into a single prediction "dataset", so the various debugging files now contain data for all qual tool output directories found in --qual_output. This has the side-benefit of speeding up prediction in these cases. If the user wants individual results per qual tool output directory, they can still invoke the spark_rapids prediction command for each of those directories to produce one output directory per input directory.

I have confirmed that the final prediction output matches the prior version code (aside from ordering), while the CSV files now contain the full, expected data.

Test

Following CMDs have been tested.

External Usage:

spark_rapids prediction

Internal Usage:

python qualx_main.py predict

Signed-off-by: Lee Yang <[email protected]>

parthosa

Thanks @leewyang. I tested the changes using the CMD:

spark_rapids prediction --qual_output test_dir

Directory Structure:

test_dir
├── qual_20240807213120_796364d1
├── qual_20240807213202_bdee1EF3
├── qual_20240807213609_FBbF7EBC

The tool now correctly generates the values for each app whereas previously it would overwrite and write the results for only the last one.

However, the appName seems to be inconsistent.

In features.csv and per_sql.csv, appName is test_dir
In per_app.csv and prediction.csv, appName is qual_2024xxx
In the previous version, all four files had appName as qual_2024xxx

Now, in actual, appName should be (from qualification_summary.csv)

NDS - query72 for qual_20240807213120_796364d1
Databricks Shell for qual_20240807213202_bdee1EF3
NDS - Power Run for qual_20240807213609_FBbF7EBC

We can fix this bug in a separate PR if needed.

leewyang · 2024-08-07T23:07:53Z

@parthosa Thanks for catching that. I was actually trying to match the behavior of the current code, but it should be simpler to just keep the original appName (e.g. NDS - query72), so I'll try to make that change.

Signed-off-by: Lee Yang <[email protected]>

parthosa

Thanks @leewyang for the fix. LGTME.

fix prediction CSV files for multiple qual directories

1f33e9e

Signed-off-by: Lee Yang <[email protected]>

leewyang self-assigned this Aug 7, 2024

leewyang requested a review from parthosa August 7, 2024 18:50

leewyang added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label Aug 7, 2024

parthosa reviewed Aug 7, 2024

View reviewed changes

parthosa added the bug Something isn't working label Aug 7, 2024

use original appNames

2a67bb3

Signed-off-by: Lee Yang <[email protected]>

leewyang requested a review from parthosa August 8, 2024 00:04

reset appName earlier

488196a

Signed-off-by: Lee Yang <[email protected]>

parthosa approved these changes Aug 8, 2024

View reviewed changes

leewyang merged commit 3cf66ce into NVIDIA:dev Aug 8, 2024
14 checks passed

leewyang deleted the qualx_predict_csv branch August 8, 2024 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix prediction CSV files for multiple qual directories #1267

Fix prediction CSV files for multiple qual directories #1267

leewyang commented Aug 7, 2024

parthosa left a comment •

edited

Loading

leewyang commented Aug 7, 2024

parthosa left a comment

Fix prediction CSV files for multiple qual directories #1267

Fix prediction CSV files for multiple qual directories #1267

Conversation

leewyang commented Aug 7, 2024

Test

External Usage:

Internal Usage:

parthosa left a comment • edited Loading

Choose a reason for hiding this comment

leewyang commented Aug 7, 2024

parthosa left a comment

Choose a reason for hiding this comment

parthosa left a comment •

edited

Loading