You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A closer look on the traces suggested some of them might not come from a server error. For example:
/v1/source_oauths/complete_oauth failed because java.io.IOException: Undefined 'code' from consent redirected url. (failed oauth?)
/v1/web_backend/connections/get failed because Duplicate key [deals] (attempted merging values {"type":["null","array"],"items":{"type":["null","string"]}} and {"type":["null","string"]}) (invalid schema maybe?)
/v1/connections/reset failed because Could not find job with id: -1 (has the job existed ever?)
/v1/scheduler/sources/check_connection failed due to com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Secret Payload cannot be empty. (maybe invalid auth info from customer?)
/v1/connections/sync failed because a sync has been running at the moment. (can be reproduced by openning the same workspace in two tabs, and clicking sync on both pages)
For some errors we could validate the status before running the workflow, such as checking if a sync has been running, while for others such as validating oauth we need to actually start the workflow and rely on the result.
@lmossman or @benmoriceau From the git blame it seems you two know the context about the code I'm going to change - can you take a look on this ticket and see if you think this proposal makes sense and doable? I appreciate any comments before actually working on it. Thanks a lot!
A closer look found out the workflow issue was correctly handled. The underlying issue needs to be resolved individually. The previous PR was sufficient to close this issue for now.
The error rate on the following APIs are a slightly higher than desired, from datadog monitoring page:
A closer look on the traces suggested some of them might not come from a server error. For example:
/v1/source_oauths/complete_oauth
failed becausejava.io.IOException: Undefined 'code' from consent redirected url.
(failed oauth?)/v1/web_backend/connections/get
failed becauseDuplicate key [deals] (attempted merging values {"type":["null","array"],"items":{"type":["null","string"]}} and {"type":["null","string"]})
(invalid schema maybe?)/v1/connections/reset
failed becauseCould not find job with id: -1
(has the job existed ever?)/v1/scheduler/sources/check_connection
failed due tocom.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Secret Payload cannot be empty.
(maybe invalid auth info from customer?)/v1/connections/sync
failed because a sync has been running at the moment. (can be reproduced by openning the same workspace in two tabs, and clicking sync on both pages)For some errors we could validate the status before running the workflow, such as checking if a sync has been running, while for others such as validating oauth we need to actually start the workflow and rely on the result.
It seems that we forward errors from worker as
illegalStateException
which later gets translated into 500 error code. https://github.com/airbytehq/airbyte/blob/master/airbyte-server/src/main/java/io/airbyte/server/handlers/SchedulerHandler.java#L384Or simply throwing a RuntimeException - https://github.com/airbytehq/airbyte/blob/master/airbyte-scheduler/client/src/main/java/io/airbyte/scheduler/client/DefaultSynchronousSchedulerClient.java#L129-L131
We should classify them into correct errors so we maintain a healthy monitoring environment, and we could spot errors from server side easier.
@davinchia FYI since we have discussed this :)
Implementation goals:
The text was updated successfully, but these errors were encountered: