-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datasets] Persist Datasets statistics to log file #30557
Conversation
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scottjlee, looks solid, have some comments.
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Hi @clarkzinzow and @jianoaix - could you help take a look? Thanks. |
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
So currently, this log is written to the worker log which is persisted already as .out, and in case of job failure, we can access the stats from worker log, right? |
d8c09b2
to
7c4db74
Compare
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
""" | ||
# Logger used to logging to log file (in addition to the root logger, | ||
# which logs to stdout as normal). We set `logger.propagate` to False | ||
# to ensure the file logger only logs to the file, and not stdout, by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation seems confusing to me since the class-level comment says it writes to the file in addition to stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded the comments and documentation to clarify here and in the class level comment, let me know if things are still confusing here. thanks!
Signed-off-by: Scott Lee <[email protected]>
The failed CI test looks irrelevant - https://flakey-tests.ray.io/ . |
@clarkzinzow, @jianoaix any more comments? Thanks |
Currently, when we print Dataset stats after execution, there is no way to retrieve this information in case of job failure/crash. By persisting the logs to a separate file, we can access the stats which could be helpful for debugging. By default, this is configured to write to /logs/ray-data.log. The new logger, DatasetLogger, is configured to always write logs to the ray-data.log file, and optionally also writes to stdout (this is enabled by default). The motivation behind this is so that users can easily use the specific log file to filter for Dataset logs, while still maintaining console logs for those who use them. Signed-off-by: Weichen Xu <[email protected]>
Currently, when we print Dataset stats after execution, there is no way to retrieve this information in case of job failure/crash. By persisting the logs to a separate file, we can access the stats which could be helpful for debugging. By default, this is configured to write to /logs/ray-data.log. The new logger, DatasetLogger, is configured to always write logs to the ray-data.log file, and optionally also writes to stdout (this is enabled by default). The motivation behind this is so that users can easily use the specific log file to filter for Dataset logs, while still maintaining console logs for those who use them. Signed-off-by: tmynn <[email protected]>
Why are these changes needed?
Currently, when we print Dataset stats after execution, there is no way to retrieve this information in case of job failure/crash. By persisting the logs to a separate file, we can access the stats which could be helpful for debugging. By default, this is configured to write to
/logs/ray-data.log
:The new logger,
DatasetLogger
, is configured to always write logs to theray-data.log
file, and optionally also writes to stdout (this is enabled by default). The motivation behind this is so that users can easily use the specific log file to filter for Dataset logs, while still maintaining console logs for those who use them.Related issue number
Closes #29575
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.