-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return better structured error logs to connector builder #46963
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
try: | ||
return response.content.decode("utf-8") | ||
except Exception: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change allows this method to also properly handle response bodies which are just string values, since that currently causes a JSONDecodeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change itself seems perfectly fine. This will have longer-term compatibility concern (log error message model should map to the protocol better), but this can be solved later. Left a comment on that.
@@ -39,6 +39,8 @@ class StreamReadSlices: | |||
class LogMessage: | |||
message: str | |||
level: str | |||
internal_message: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trade off here is that since this does not map to the protocol error message model, when we move to using source-declarative-manifest container as manifest runner for Builder, we'd need to make sure that the protocol has these fields, otherwise error reporting will break.
Should we make the work and align these models in this file with protocol-level models now instead? Or ship the small improvement and record a todo for later?
/cc @bnchrch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good callout Natik!
Looking into it, this does already align with the protocol model AirbyteTraceMessage
❓ @lmossman can you confirm Im reading this as this is already a protocol blessed type?
If so should LogMessage
become a union type of AirbyteTraceMessage
and AirbyteLogMessage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trade off here is that since this does not map to the protocol error message model, when we move to using source-declarative-manifest container as manifest runner for Builder, we'd need to make sure that the protocol has these fields, otherwise error reporting will break.
This LogMessage
class is only used in the message_grouper
logic which wraps a call to the read
protocol method call.
Even if we swap out the runtime to use source-declarative-manifest to execute the read, we will still need all of this extra message_grouper
logic around it to create the StreamRead
output object that the connector builder server expects, and it shouldn't need to be changed because the get_message_groups
function does operate on the airbyte protocol already, and is just normalizing those airbyte protocol messages into a form that is easier understood by the connector builder server.
If we don't want to directly execute any python anymore at all and therefore delete this message_grouper
logic, then the connector builder server will require a bunch of other changes anyway to do similar logic to what that python code is doing now.
So, I think I'd prefer to keep this PR simple right now, especially because I don't think we can easily reference the protocol AirbyteLogMessage and AirbyteTraceMessage in our connector-builder-server and airbyte-server openapi specs, and I'd rather not repeat those entire schemas as they are more complicated than the LogMessage schema I've defined here.
But let me know if anything I said above sounds off @natikgadzhi @bnchrch !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two Questions! Should pass once those are addressed
# is that this message will be shown in the Builder. | ||
if ( | ||
traced_exception.message is not None | ||
and "During the sync, the following streams did not sync successfully" in traced_exception.message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Can we modify AbstractSource
to indicate its a "Final Exception"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnchrch Do you have a specific approach in mind? We could set internal_message
to something for example to indicate this but this would still rely on comparing strings in that approach.
I'm wary to raise a different exception in abstract_source here because I'm not sure what relies on this being an AirbyteTracedException and I don't feel familiar enough with that fairly central CDK code to know what is safe to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add that any raised AirbyteTracedException is sort of already implied to be the final exception. Within abstract_source.py
, https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L171-L177 , during a read of a stream always try to catch any form of Exception or AirbyteTracedException thrown by a stream and wrap it into an emitted trace message. We only formally raise the "final" AirbyteTracedException because we need to end the process with a non-zero error code.
Granted the small gap is if we get an exception outside of that try/except block. But from my perspective, us trying to adjust the protocol or create new abstractions to capture what is effectively a work around feels like we're trying to over engineer a bit. The ideal design is that the platform can identify that merely receiving any AirbyteTracedException is enough to know that the sync was unsuccessful. And then we could get rid of this work around.
So TLDR: I think how we do it here is fine.
@@ -39,6 +39,8 @@ class StreamReadSlices: | |||
class LogMessage: | |||
message: str | |||
level: str | |||
internal_message: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good callout Natik!
Looking into it, this does already align with the protocol model AirbyteTraceMessage
❓ @lmossman can you confirm Im reading this as this is already a protocol blessed type?
If so should LogMessage
become a union type of AirbyteTraceMessage
and AirbyteLogMessage
a93d329
to
0200577
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know there are other questions of the marketplace side, but not concerns by me
# is that this message will be shown in the Builder. | ||
if ( | ||
traced_exception.message is not None | ||
and "During the sync, the following streams did not sync successfully" in traced_exception.message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add that any raised AirbyteTracedException is sort of already implied to be the final exception. Within abstract_source.py
, https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L171-L177 , during a read of a stream always try to catch any form of Exception or AirbyteTracedException thrown by a stream and wrap it into an emitted trace message. We only formally raise the "final" AirbyteTracedException because we need to end the process with a non-zero error code.
Granted the small gap is if we get an exception outside of that try/except block. But from my perspective, us trying to adjust the protocol or create new abstractions to capture what is effectively a work around feels like we're trying to over engineer a bit. The ideal design is that the platform can identify that merely receiving any AirbyteTracedException is enough to know that the sync was unsuccessful. And then we could get rid of this work around.
So TLDR: I think how we do it here is fine.
/approve-regression-tests Only changes the connector_builder wrapper around airbyte-cdk, so shouldn't affect any actual connectors. I have tested this locally against the connector builder (along with the platform changes linked in the PRs in the description)
|
/approve-regression-tests Only changes the connector_builder wrapper around airbyte-cdk, so shouldn't affect any actual connectors. I have tested this locally against the connector builder (along with the platform changes linked in the PRs in the description)
|
What
Partially resolves https://github.com/airbytehq/airbyte-internal-issues/issues/6977
This is the first of 3 PRs to have better structured error messages in the Connector Builder.
See the other 2 PRs that build on top of this one:
How
This PR changes how the connector_builder module of the CDK returns trace messages back to the connector builder server in a few key ways:
message
andstacktrace
into separate fieldsinternal_message
of the trace message to another field in the returned LogAirbyteTracedException
which is only meant to cause a non-zero return value to result in a sync failure, but does not provide any useful information beyond the other traced exceptions that are already yieldedTesting
Follow the Testing instructions in the description of this PR to test this end-to-end:
Can this PR be safely reverted and rolled back?