-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI][Green-Ray][4] Compute and store unique crash pattern from logs #34200
Conversation
adf46b3
to
532ddfd
Compare
532ddfd
to
59765e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it be easier to get this info if we ask the test to print/log this in a more structured form, rather than trying to parse the log?
This logic mainly attempts to capture logs from ray logs (https://docs.ray.io/en/latest/ray-observability/ray-logging.html#logging-directory-structure). 100% agree there are observability improvement/loves to do more in these logs. |
b8f6669
to
57fa827
Compare
d4b66ef
to
c6acbb5
Compare
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
…atterns Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
d3fce8d
to
485a199
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, but the stack trace parsing does not seem to be related to the DB reporter. Let's move it into a separate Python module!
release/ray_release/reporter/db.py
Outdated
def compute_crash_pattern(self, logs: str) -> str: | ||
stack_trace = self._compute_stack_trace(logs.splitlines()) | ||
return self._compute_signature(stack_trace)[:4000] # limit of databrick field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really have anything to do with the Databricks reporter. Can we move the crash pattern parsing into a separate python file?
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Ping me for merge when CI passes
class LogAggregator: | ||
def __init__(self, log: str): | ||
self.log = log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, but it doesn't look like we actually need a class here (we could just have functions instead) - but fine with me
…ay-project#34200) This PR computes and aggregate unique crash patterns from logs, then store them in Databricks. Later on, this will help us build a dashboard for heat map of errors from aggregated logs, help us prioritize the most impactful errors to fix. Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: elliottower <[email protected]>
…ay-project#34200) This PR computes and aggregate unique crash patterns from logs, then store them in Databricks. Later on, this will help us build a dashboard for heat map of errors from aggregated logs, help us prioritize the most impactful errors to fix. Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: Jack He <[email protected]>
…ay-project#34200) This PR computes and aggregate unique crash patterns from logs, then store them in Databricks. Later on, this will help us build a dashboard for heat map of errors from aggregated logs, help us prioritize the most impactful errors to fix. Signed-off-by: Cuong Nguyen <[email protected]>
Why are these changes needed?
This PR computes and aggregate unique crash patterns from logs, then store them in Databricks. Later on, this will help us build a dashboard for heat map of errors from aggregated logs, help us prioritize the most impactful errors to fix.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.