-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Push logs with grpc error handling #1749
Conversation
Current dependencies on/for this PR: This stack of pull requests is managed by Graphite. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
f39d158
to
e0d8923
Compare
|
||
message PushLogsResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have error types propagated in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does grpc messages not have error types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to write a test to verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, but we should expand, make sure calling code propagates and add message/status etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message Status { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the status pattern used in the rest of coordinator.proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status status = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is the correct pattern. I think we need to fix that code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we can:
- Implement the correct change uniformly
- Follow the existing pattern
Fragmentation of standards is not something we should be ok with rn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot fix everything I see. We should always be mindful about making incremental improvements. These improvements will come by priorities not by the fact that we see them. Also, fixing existing SysDB code is needed before we launch the product. I'm sure that we will fix it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gRPC service definitions should have error handling propagation.
"gorm.io/gorm" | ||
"time" | ||
) | ||
|
||
type recordLogDb struct { | ||
db *gorm.DB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to initialize db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is in earlier pr
Record: &recordsContent[index], | ||
}) | ||
} | ||
err := tx.CreateInBatches(recordLogs, len(recordLogs)).Error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, this will commit if the transaction is successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test won't pass otherwise
assert.Equal(t, collectionId.String(), *recordLogs[index].CollectionID) | ||
record := &coordinatorpb.SubmitEmbeddingRecord{} | ||
if err := proto.Unmarshal(*recordLogs[index].Record, record); err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not panic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a no-panic linter rule? Panic should be rare and we can expect ourselves to call its permissible use out explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just mean orthogonally. Not in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, we should almost never panic. I would just assert.NotNil here.
if err != nil { | ||
return nil, grpcError | ||
} | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should return a logservicepb.PushLogsResponse struct with proper field values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return error if it's error case... otherwise your grpc respons will have 200 status code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can go verify but I think the default interceptor is going to ignore the response anyways...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if we are going to return results in error case, we should be very specific about what we are returning. is the result partial?
recordCount, err := s.logService.PushLogs(ctx, collectionID, recordsContent) | ||
if err != nil { | ||
log.Error("error pushing logs", zap.Error(err)) | ||
return nil, grpcutils.BuildInternalGrpcError("error pushing logs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should return a logservicepb.PushLogsResponse struct with proper field values.
|
||
message PushLogsResponse { | ||
int32 record_count = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also include a status code? Also, can you double check how GRPC handles golang errors in its response to client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused... There is a Status in the grpc message itself why do we need to have a Status in the response? shouldn't it be the responsibility for the interceptor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to have our own status on different errors for our service. For example, it is useful in SysDB to pass back to the front end why creating a collection is not successful so that the user can take the right action (e.g. unique constraint violation).
I am not sure how the python side handles the errors raised by the golang GRPC response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents, I think we could have both. I understand the value of having specific error code, but also It seems weird to have it in the response. I think we could use grpc response metadata to propagate the custom error code. Keeping native grpc code would help with monitoring and standard tooling. Putting it in metadata, allow us to set it at the framework lvl, and we don't have to add it in every response.
|
||
message PushLogsResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the status pattern used in the rest of coordinator.proto.
1145424
to
ca7b332
Compare
@HammadB discussed offline. do you still want to request this change? |
You said you were not doing it now - I don't follow what the point of this question is sorry. Do you want me to approve the PR? |
Do you still think this change should block this PR? Did we agree that this change can happen later in a different PR? |
No, we did not agree. But we can agree to disagree :) |
|
Sigh. I never said that we should do it in the same PR - I suggested to
first figure out what we think the right way is with high confidence if how
it is is wrong and then issue a stacked change so we don’t leave things
halfway.
Hammad Bashir
EECS | Cal
…On Wed, Feb 21, 2024 at 10:01 PM Weili Gu ***@***.***> wrote:
gRPC service definitions should have error handling propagation.
@HammadB <https://github.com/HammadB> discussed offline. do you still
want to request this change?
You said you were not doing it now - I don't follow what the point of this
question is sorry. Do you want me to approve the PR?
Do you still think this change should block this PR? Did we agree that
this change can happen later in a different PR?
No, we did not agree. But we can agree to disagree :)
1. Fixing the SysDB gRPC is a separate non-trivial change and should
not be included in this PR
2. I disagree with the other option to implement gRPC protocol in the
wrong way just to make things consistent.
It is your call to disagree with above points and keep the change
request.
—
Reply to this email directly, view it on GitHub
<#1749 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKW32J34CA27ONUXQOWAZLYU3NL5AVCNFSM6AAAAABDTYDXOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJYG43DCMRTGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shipit
chromadb/logservice/grpc/client.py
Outdated
import grpc | ||
|
||
|
||
class GrpcLogService: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to use the existing log abstractions or somehow rectify this with those.
Error handling for SysDB is covered in https://linear.app/trychroma/issue/CHR-281/sysdb-service-error-handling-and-retry-policy. I can add some notes in this ticket to also fix the Status Code. This does not seem to be a quick fix since we also need to fix front end services and I see this lower priority than getting Log service stand up and running. I agree that having two different implementations is not ideal but I also think we should accept incremental changes and work towards unified implementation. |
eefa7d8
to
b3e082c
Compare
zap.Int64("timestamp", timestamp), | ||
zap.Int("count", len(recordsContent))) | ||
|
||
var recordLogs []*dbmodel.RecordLog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make(type, size) if you know the size of the array to avoid append.
log.Error("error building grpc error", zap.Error(err)) | ||
return nil, err | ||
} | ||
return nil, grpcError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use named return params. To avoid shadowing, and this kind of return
## Description of changes https://linear.app/trychroma/issue/CHR-295/push-log-api - PushLogs implementation in DAO - PushLogs API for gRPC - DB error handling and retry is not included ## Test plan - record_log_service test - record_log test
Description of changes
https://linear.app/trychroma/issue/CHR-295/push-log-api
Test plan
How are these changes tested?