-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySql source : source binary node transforms to text node #5878
Comments
The issue with BinaryNode is general for all source connectors: after formatting binary data to the Airbyte record, data is stored in BinaryNode, but after |
@irynakruk the source should output the binary data as a base64 or hex-encoded string. Can you elaborate on what's needed, code-wise, to make this change? what's the scope/subtasks? |
@sherifnada I'll investigate and come with an update |
@sherifnada in current implementation binary data is read using
The node, which is inserted, is of type BinaryNode. So, e.x. we have 'test' value in MySql VARBINARY column, created BinaryNode after read contains During serialization it is passed as "dGVzdA==" String value and is inserted to a destination as String value without any additional conversion. Is this an expected behavior based on your question? Or maybe I misunderstood you? |
I think encoding it as b64 is reasonable behavior for the moment. Let's add a note to the data type mappings in the docs and call it good I think. @DoNotPanicUA do you have any thoughts on this? |
@sherifnada,
Note! Of course, the user can decode the values after all, but it doesn't sound like a convenient tool :) In my humble opinion, we should transform all binary values into text by our own inside sources. |
Based on Andrii's comment, if we come up with a decision to transform all binary values into a text, there are two possible ways to do it:
I think, with proper tests which should cover binary datatype for all source JDBC connectors, it'd better to move on with the first option. In case any connector would have an issue with |
just to make sure we are aligned, the desired behavior of the connector is to output JSONs as follows:
this would allow us to use the
So for now, the source should output only base64 string encoding of the binary data, and the destination should be expected to convert it to binary again. @irynakruk @DoNotPanicUA do you think this makes sense? |
I totally agree that it's the best solution, but I'm not sure that we can make it in one dash. I see a few major challenges here:
So, I propose to split this task into three milestones:
@sherifnada |
@DoNotPanicUA this sounds like a great approach. So we will first make sure the source outputs b64 encoding, then change the JSON schema representation to express that the data is binary, then migrate destinations to start handling this correctly. I think we should do it step by step like you proposed Andrii. |
Found during work on #3932
Current Behavior
Initial investigation showed that the data type test gets airbyte messages with binary values as text.
As a result, check
isBinary
returns false and the value contains useless data (not even string version).Note! The issue is not investigated fully. There might be a global issue of binary values.
Expected Behavior
Binary values in Airbyte message data should be in Binnary nodes for proper processing.
The text was updated successfully, but these errors were encountered: