Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(library): add error field truncating #231

Merged
merged 6 commits into from
Feb 16, 2022
Merged

Conversation

ecrupper
Copy link
Contributor

@ecrupper ecrupper commented Jan 31, 2022

Fixes Issue 468

On rare occasions, the error generated by a step exceeds our database limit of 500 characters. When this happens, the build hangs and does not complete. This PR makes sure we don't set the error field to a value we can't support in the database.

@ecrupper ecrupper requested a review from a team as a code owner January 31, 2022 19:40
@ecrupper ecrupper self-assigned this Jan 31, 2022
@codecov
Copy link

codecov bot commented Jan 31, 2022

Codecov Report

Merging #231 (1ecd2a6) into master (00b58d8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #231   +/-   ##
=======================================
  Coverage   97.06%   97.06%           
=======================================
  Files          53       53           
  Lines        5794     5797    +3     
=======================================
+ Hits         5624     5627    +3     
  Misses        125      125           
  Partials       45       45           
Impacted Files Coverage Δ
library/hook.go 100.00% <ø> (ø)
database/build.go 100.00% <100.00%> (ø)

Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part of the error is most likely to include the useful bits? In python, I always want the end of the traceback. I'm still too new to go to say for sure.

What if you took the first and last 250 characters, dropping everything in the middle?

@ecrupper
Copy link
Contributor Author

Which part of the error is most likely to include the useful bits? In python, I always want the end of the traceback. I'm still too new to go to say for sure.

What if you took the first and last 250 characters, dropping everything in the middle?

I like this idea. Most errors contain the vital information in the beginning or end.

@kneal
Copy link

kneal commented Feb 1, 2022

Also, wonder if in addition to character limits if we should just support more characters for the field in the database. I don't think it would be super harmful to allow 1000 characters.

@GregoryDosh
Copy link
Contributor

Is it possible to place the truncation code to the "last responsible moment" so that within any libraries or other bits of code that want to use the full error, they can, but when we go to place the data into the database we're only saving the important bits? I don't have context into if you've already explored that, but could reduce the code duplication and not preemptively truncate before we need to do so.

Otherwise it looks good and I don't see any obvious issues. Thanks for your contribution!

@@ -24,6 +24,9 @@ const (
// BuildTimeoutDefault defines the default value in minutes for repo build timeout.
BuildTimeoutDefault = 30

// ErrorLimit defines the maximum size in characters for resource error fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one small thing, this comment is helpful, but it doesn't let anyone know what happens if the limit is exceeded. They'd have to dig into the code to understand that half of the beginning and half of the end are kept. It might be beneficial for future folks looking in the codebase to understand the intent of that and what happens if it gets exceeded.

@ecrupper
Copy link
Contributor Author

ecrupper commented Feb 2, 2022

I relocated these changes to be within the crop function of the build database object. It seems like that's where it belongs. I decided to leave out hook, step, and service for now. When trying to reproduce the error from the issue with just the build crop updated, it seemed to work just fine.

GregoryDosh
GregoryDosh previously approved these changes Feb 2, 2022
Copy link
Contributor

@GregoryDosh GregoryDosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! This all looks good to me.

@@ -27,6 +27,8 @@ const (
maxTitleLength = 1000
// Maximum message field length.
maxMessageLength = 2000
// Maximum error field length.
maxErrorLength = 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @kneal 👍

We should explore expanding the number of chars we allow for the error field across all resources.

Doubling it from 500 -> 1000 sounds like a fair place to start 😃

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's currently a somewhat rare occurrence (afaik) so handling the outliers this way instead of having the outliers shape the data size seems fine to me. do we expect this be more commonplace?

Copy link

@kneal kneal Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal view would be I don't think that field has ever changed in size. So, maybe this is showing errors that could be growing in size. I also don't think 1000 is that big of a concern and then would save us having to parse errors and hopefully not cut off the important part. If we had to go beyond 1000 then I think truncating could be useful.

I'm also not sure if we use error wrapping everywhere in the worker/runtime to make these errors longer so depending how errors are handled or want to be handled long term in that codebase 1000 might be the norm but I honestly don't know. The error: field in the build resource is entirely for admins because thats where we populate a platform error. So, cutting off the data really only hurts ourselves for richer research.

We should explore expanding the number of chars we allow for the error field across all resources.

@jbrockopp probably not a bad idea since I don't think that's ever been done. I assume most of these char lengths are there original defaults.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good callout on error wrapping! in other words, it has good chances of becoming larger. I'm good with bumping to 1000 then 👍🏼

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we bump the limit to 1000, do we still want to crop the value if it exceeds that limit? Or do we have another idea for an approach to that situation?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other idea I have would be compression if we wanna keep the field small but keep the full error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you support compressing the other fields currently being processed by the crop method? (message and title)

Copy link

@kneal kneal Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the answer to that one. I think for error it might make sense to compress because that field could be quite variable depending on how the error wrapping works.

Title and Message might make sense to crop or inherit because those come from the SCM system attached and I would assume GitHub has limits on those. So, it might make sense to just mirror those limits.

Copy link
Contributor

@jbrockopp jbrockopp Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ecrupper

So if we bump the limit to 1000, do we still want to crop the value if it exceeds that limit? Or do we have another idea for an approach to that situation?

I'm onboard with the idea of a "simple" cropping strategy similar to title and message.

In the future, if this becomes a problem with the 1000 char limit, then we should introduce compression.

cc: @kneal @wass3r

Comment on lines 81 to 86
if len(b.Error.String) > maxErrorLength {
front := maxErrorLength - (maxErrorLength / 2)
end := len(b.Error.String) - (maxErrorLength / 2)
str := b.Error.String[:front] + b.Error.String[end:]
b.Error = sql.NullString{String: str, Valid: true}
}
Copy link
Contributor

@jbrockopp jbrockopp Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we're taking the right approach here? 😅

Personally, I'd rather start with increasing the number of characters before going down this route 👍

If we feel that's insufficient, then we should use compression for the error field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand not wanting to process string logic like this, but I am curious as to why the message and title fields get cropped in a similar fashion?

Also, I do wonder if it may be more efficient to handle larger errors by referencing the log from which it came rather than attempt to store some truncated version.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To your point, I think we might put the errors in STDOUT but I can't be sure about that. A big advantage to why we added the error field was to track platform errors that occur in the runtime and have them uniquely identified vs the user error in a container.

Having the field exposed here also in the past made it easier for admins to do their own support because they didn't have to go digging through logs to find the error or crash log. They could just see the information directly in the build and troubleshoot that way.

Copy link
Contributor

@jbrockopp jbrockopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ecrupper ecrupper merged commit 0ee0f1e into master Feb 16, 2022
@ecrupper ecrupper deleted the fix/error-field-limiting branch February 16, 2022 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Service that throws error with more than 500 characters doesn't update build state
6 participants