Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Push logs with grpc error handling #1749

Merged
merged 16 commits into from
Feb 23, 2024
Merged

[ENH] Push logs with grpc error handling #1749

merged 16 commits into from
Feb 23, 2024

Conversation

weiligu
Copy link
Contributor

@weiligu weiligu commented Feb 21, 2024

Description of changes

https://linear.app/trychroma/issue/CHR-295/push-log-api

  • PushLogs implementation in DAO
  • PushLogs API for gRPC
  • DB error handling and retry is not included

Test plan

How are these changes tested?

  • record_log_service test
  • record_log test

Copy link
Contributor Author

weiligu commented Feb 21, 2024

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@weiligu weiligu changed the title push logs with error handling, need to add arrow push logs with error handling Feb 21, 2024
@weiligu weiligu changed the base branch from fix_system_test to main February 21, 2024 21:04
@weiligu weiligu marked this pull request as ready for review February 21, 2024 21:09

message PushLogsResponse {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have error types propagated in the response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does grpc messages not have error types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to write a test to verify

Copy link
Collaborator

@HammadB HammadB Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but we should expand, make sure calling code propagates and add message/status etc

https://cloud.google.com/apis/design/errors#error_model

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message Status {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the status pattern used in the rest of coordinator.proto.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is the correct pattern. I think we need to fix that code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we can:

  1. Implement the correct change uniformly
  2. Follow the existing pattern

Fragmentation of standards is not something we should be ok with rn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot fix everything I see. We should always be mindful about making incremental improvements. These improvements will come by priorities not by the fact that we see them. Also, fixing existing SysDB code is needed before we launch the product. I'm sure that we will fix it soon.

Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC service definitions should have error handling propagation.

@weiligu weiligu changed the title push logs with error handling push logs with grpc error handling Feb 21, 2024
"gorm.io/gorm"
"time"
)

type recordLogDb struct {
db *gorm.DB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to initialize db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is in earlier pr

Record: &recordsContent[index],
})
}
err := tx.CreateInBatches(recordLogs, len(recordLogs)).Error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, this will commit if the transaction is successful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test won't pass otherwise

assert.Equal(t, collectionId.String(), *recordLogs[index].CollectionID)
record := &coordinatorpb.SubmitEmbeddingRecord{}
if err := proto.Unmarshal(*recordLogs[index].Record, record); err != nil {
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not panic here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a no-panic linter rule? Panic should be rare and we can expect ourselves to call its permissible use out explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even in tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean orthogonally. Not in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we should almost never panic. I would just assert.NotNil here.

if err != nil {
return nil, grpcError
}
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return a logservicepb.PushLogsResponse struct with proper field values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return error if it's error case... otherwise your grpc respons will have 200 status code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go verify but I think the default interceptor is going to ignore the response anyways...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if we are going to return results in error case, we should be very specific about what we are returning. is the result partial?

recordCount, err := s.logService.PushLogs(ctx, collectionID, recordsContent)
if err != nil {
log.Error("error pushing logs", zap.Error(err))
return nil, grpcutils.BuildInternalGrpcError("error pushing logs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return a logservicepb.PushLogsResponse struct with proper field values.


message PushLogsResponse {
int32 record_count = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also include a status code? Also, can you double check how GRPC handles golang errors in its response to client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused... There is a Status in the grpc message itself why do we need to have a Status in the response? shouldn't it be the responsibility for the interceptor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have our own status on different errors for our service. For example, it is useful in SysDB to pass back to the front end why creating a collection is not successful so that the user can take the right action (e.g. unique constraint violation).

I am not sure how the python side handles the errors raised by the golang GRPC response.

Copy link
Contributor

@nicolasgere nicolasgere Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents, I think we could have both. I understand the value of having specific error code, but also It seems weird to have it in the response. I think we could use grpc response metadata to propagate the custom error code. Keeping native grpc code would help with monitoring and standard tooling. Putting it in metadata, allow us to set it at the framework lvl, and we don't have to add it in every response.

@HammadB HammadB changed the title push logs with grpc error handling [ENH] Push logs with grpc error handling Feb 21, 2024

message PushLogsResponse {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the status pattern used in the rest of coordinator.proto.

@weiligu
Copy link
Contributor Author

weiligu commented Feb 22, 2024

gRPC service definitions should have error handling propagation.

@HammadB discussed offline. do you still want to request this change?

@HammadB
Copy link
Collaborator

HammadB commented Feb 22, 2024

gRPC service definitions should have error handling propagation.

@HammadB discussed offline. do you still want to request this change?

You said you were not doing it now - I don't follow what the point of this question is sorry. Do you want me to approve the PR?

@weiligu
Copy link
Contributor Author

weiligu commented Feb 22, 2024

gRPC service definitions should have error handling propagation.

@HammadB discussed offline. do you still want to request this change?

You said you were not doing it now - I don't follow what the point of this question is sorry. Do you want me to approve the PR?

Do you still think this change should block this PR? Did we agree that this change can happen later in a different PR?

@HammadB
Copy link
Collaborator

HammadB commented Feb 22, 2024

gRPC service definitions should have error handling propagation.

@HammadB discussed offline. do you still want to request this change?

You said you were not doing it now - I don't follow what the point of this question is sorry. Do you want me to approve the PR?

Do you still think this change should block this PR? Did we agree that this change can happen later in a different PR?

No, we did not agree. But we can agree to disagree :)

@weiligu
Copy link
Contributor Author

weiligu commented Feb 22, 2024

gRPC service definitions should have error handling propagation.

@HammadB discussed offline. do you still want to request this change?

You said you were not doing it now - I don't follow what the point of this question is sorry. Do you want me to approve the PR?

Do you still think this change should block this PR? Did we agree that this change can happen later in a different PR?

No, we did not agree. But we can agree to disagree :)

  1. Fixing the SysDB gRPC is a separate non-trivial change and should not be included in this PR
  2. I disagree with the other option to implement gRPC protocol in the wrong way just to make things consistent.
    It is your call to disagree with above points and keep the change request.

@HammadB
Copy link
Collaborator

HammadB commented Feb 22, 2024 via email

Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shipit

import grpc


class GrpcLogService:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to use the existing log abstractions or somehow rectify this with those.

@weiligu
Copy link
Contributor Author

weiligu commented Feb 22, 2024

Sigh. I never said that we should do it in the same PR - I suggested to first figure out what we think the right way is with high confidence if how it is is wrong and then issue a stacked change so we don’t leave things halfway. Hammad Bashir EECS | Cal

On Wed, Feb 21, 2024 at 10:01 PM Weili Gu @.> wrote: gRPC service definitions should have error handling propagation. @HammadB https://github.com/HammadB discussed offline. do you still want to request this change? You said you were not doing it now - I don't follow what the point of this question is sorry. Do you want me to approve the PR? Do you still think this change should block this PR? Did we agree that this change can happen later in a different PR? No, we did not agree. But we can agree to disagree :) 1. Fixing the SysDB gRPC is a separate non-trivial change and should not be included in this PR 2. I disagree with the other option to implement gRPC protocol in the wrong way just to make things consistent. It is your call to disagree with above points and keep the change request. — Reply to this email directly, view it on GitHub <#1749 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKW32J34CA27ONUXQOWAZLYU3NL5AVCNFSM6AAAAABDTYDXOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJYG43DCMRTGU . You are receiving this because you were mentioned.Message ID: @.>

Error handling for SysDB is covered in https://linear.app/trychroma/issue/CHR-281/sysdb-service-error-handling-and-retry-policy. I can add some notes in this ticket to also fix the Status Code. This does not seem to be a quick fix since we also need to fix front end services and I see this lower priority than getting Log service stand up and running. I agree that having two different implementations is not ideal but I also think we should accept incremental changes and work towards unified implementation.

zap.Int64("timestamp", timestamp),
zap.Int("count", len(recordsContent)))

var recordLogs []*dbmodel.RecordLog
Copy link
Contributor

@nicolasgere nicolasgere Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make(type, size) if you know the size of the array to avoid append.

log.Error("error building grpc error", zap.Error(err))
return nil, err
}
return nil, grpcError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use named return params. To avoid shadowing, and this kind of return

@weiligu weiligu merged commit e61ac08 into main Feb 23, 2024
117 checks passed
atroyn pushed a commit to csbasil/chroma that referenced this pull request Apr 3, 2024
## Description of changes
https://linear.app/trychroma/issue/CHR-295/push-log-api
- PushLogs implementation in DAO
- PushLogs API for gRPC
- DB error handling and retry is not included

## Test plan
- record_log_service test
- record_log test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants