Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote registry client does not map application errors #4392

Closed
dmartinol opened this issue Aug 8, 2024 · 16 comments · Fixed by #4458
Closed

Remote registry client does not map application errors #4392

dmartinol opened this issue Aug 8, 2024 · 16 comments · Fixed by #4458

Comments

@dmartinol
Copy link
Contributor

Expected Behavior

A feast application has a registry of type remote connected to a registry server.
The feast client invokes a an API on the feast registry that thrown an error, e.g. store.get_feature_service("foo") which raises
FeatureServiceNotFoundException from the registry server because foo is not a registered feature service.

The expectation is that the method invoked by the client raises the FeatureServiceNotFoundException and the application can detect it with:

try:
    fs = store.get_feature_service("foo")
except FeatureServiceNotFoundException
    ....

Current Behavior

The registy server always raises an _InactiveRpcError error instead.

This bug causes unexpected errors when the client invokes services whose implementation needs such expections to execute fall-back logic, like _get_feature_views_to_materialize.

Steps to reproduce

See Expected Behavior section.

Specifications

  • Version: 0.40.0
  • Platform: any
  • Subsystem: registry server

Possible Solution

The registry server can wrap the original exception in an error status message that the remote registry client can catch and translate into the original exception:

@tokoko
Copy link
Collaborator

tokoko commented Aug 8, 2024

@dmartinol (This might be more relevant for online store/offline store, but...) Do you think the goal here should be to forward all exceptions or only ones defined by feast? If we forward all, remote implementations will in a way become indistinguishable from concrete implementations, but on the other hand I think we might have a hard time raising exact same external exceptions on the client side when some of those libraries might not even be present in the client environment.

I think we should aim to "re-raise" feast exceptions, but suppress all others behind some new exception type.

@dmartinol
Copy link
Contributor Author

@tokoko Yes, the problem is broader than described in the issue. Ideally, we should remap all exceptions across all servers to ensure the client code remains agnostic of the underlying implementation. Not an easy task, also considering the number of exception that have been modelled, but worth to invest some time.
IMO, If we find a general way to "serialize" and "deserialize" it'd be great, at least for feast-exceptions and some core errors like ValueError and of course PermissionError.

@tokoko tokoko self-assigned this Aug 24, 2024
@dmartinol
Copy link
Contributor Author

@tokoko what about to tackle this one, maybe stating from the registry server?
any idea/pattern for how to transfer and rebuild the exceptions? (I wanted to write "serialize and deserialize", but it could be misleading)

@tokoko
Copy link
Collaborator

tokoko commented Aug 27, 2024

@dmartinol I haven't looked into it too much, but I agree, "serialize and deserialize" is definitely the wrong way to go about it. I'm thinking, we should use richer grpc error messages (don't know what that means, but sounds suitable) to transfer two string values (exception type and exception message). I think we should start by limiting transferred exceptions exclusively to feast exception types and reassess later if there are any other core exceptions that also need to be treated in a similar fashion.

One reason for not trying to pass any more details (trace + other exceptions) is that at least in case of a feature server I think we should assume that a possible future server implementation might not be python-based, so binding error information to python too much can be a headache.

@dmartinol
Copy link
Contributor Author

Agree on the error model, that has to be minimalistic and compatible with all the communication protocols used by the Feast servers. Are you already working on this issue or do you want any of us to contribute?

@tokoko
Copy link
Collaborator

tokoko commented Aug 27, 2024

No, not working on it. Feel free to take it up, let me unassign myself.

@tokoko tokoko removed their assignment Aug 27, 2024
@dmartinol
Copy link
Contributor Author

Looked at the grpc options, and I think we have an int value with a predefined status code that we cannot customize, and a detail label.
Because of the limitations above, we can try to send a JSON in the detail label with all data required to rebuild the original error, WDYT?

...
except Exception as e:
    m = {
       "type": f"{type(e).__name__}",
       "message": f"{str(e)}"
    }
    context.abort(
        grpc.StatusCode.INTERNAL,
        json.dumps(m),
    )
...

To optimize it, we can limit the except to detect only the desired errors (Feast errors + selected builtins) and also try to map the code to a meaningful value,at least when it makes sense (e.g. PermissionError can be grpc.StatusCode.PERMISSION_DENIED and so on)
Applying a similar "serialization" could help also the other comm protocols (REST and Arrow).

@dmartinol
Copy link
Contributor Author

started draft PR #4458 to verify we are in sync.
@tmihalac @lokeshrangineni once we validate the design we can proceed in parallel for REST and Arrow servers.

@tokoko
Copy link
Collaborator

tokoko commented Aug 28, 2024

@dmartinol yeah, json sounds good to me, certainly better than a single concatenated string.

btw, we should probably create a FeastException type and rewrite all existing exceptions to extend it. That way we would be able to simply do except FeastException as e in the try block.

@dmartinol
Copy link
Contributor Author

I was thinking the same, and can also have a to_grpc_status_code and to_http_status_code method to override instead of a global function.
BUT, some errors are not in the feast package. Should we add a Feast version of builtins.PermissionError?

@tokoko
Copy link
Collaborator

tokoko commented Aug 28, 2024

yeah, I think so. We should switch to FeastPermissionException (or Error... whatever convention existing exceptions use) that will extend builtins.PermissionError.

@dmartinol
Copy link
Contributor Author

or Error... whatever convention existing exceptions use)

Some numbers: 8 Feast classes named like *Exception and 22 like *Error. Considering that .*Error seems the python standard name pattern, I would go with FeastError and, of course FeastPermissionErrror (see also PEP 8)

@tokoko
Copy link
Collaborator

tokoko commented Aug 29, 2024

Agreed

@dmartinol
Copy link
Contributor Author

@tokoko Can we keep this open to apply the same error mapping to the remaining servers, or do you prefer dedicated GH issues instead?

@tokoko
Copy link
Collaborator

tokoko commented Aug 30, 2024

dedicated issues would be better imho, they are almost completely different problems to solve from implementation perspective.

@dmartinol
Copy link
Contributor Author

@tmihalac @lokeshrangineni could you pls raise server-specific issues to track the error mapping problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants