-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Print out Dataset statistics automatically after execution #29876
Conversation
Congrats @scottjlee for the first PR! As discussed offline, let's do several things:
|
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]> Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scottjlee! This looks quite solid.
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Scott Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just a few nits!
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM!
CI looks good, merging! |
…ray-project#29876) Currently, the stats can only be printed manually via the Dataset.stats() method from user side. This is inconvenient during debugging as it adds overhead on the user side. We add functionality to automatically log Dataset summary statistics after execution of each stage, which can be enabled via a new configuration in DatasetContext, enable_auto_log_stats. Signed-off-by: Weichen Xu <[email protected]>
Why are these changes needed?
Currently, the stats can only be printed manually via the Dataset.stats() method from user side. This is inconvenient during debugging as it adds overhead on the user side. We add functionality to automatically log
Dataset
summary statistics after execution of each stage, which can be enabled via a new configuration inDatasetContext
,enable_auto_log_stats
.Related issue number
Closes #29560
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.