Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong detection of magic byte during avro deserialization when the first field is empty #1479

Closed
tszappanos opened this issue May 4, 2021 · 7 comments
Labels
type/bug Something isn't working

Comments

@tszappanos
Copy link

In the case of BINARY encoding, at the empty value of the first field, the serializer inserts 0 into the first Byte and then the deserializer evaluates it as a magic byte and ends with an error.
The first byte should always be used for the magic byte. Even if it is not a value of 0.
Serializer code (AbstractKafkaSerializer) is not compatible:
if (headersHandler != null && headers != null) { headersHandler.writeHeaders(headers, schema.toArtifactReference()); serializeData(headers, parsedSchema, data, out); } else { out.write(MAGIC_BYTE); getIdHandler().writeId(schema.toArtifactReference(), out); serializeData(parsedSchema, data, out); }
with the deserializer (AbstractKafkaDeserializer) code
if (data[0] == MAGIC_BYTE){ return deserialize(topic, data); } else if (headers == null){ throw new IllegalStateException("Headers cannot be null"); } else { ArtifactReference artifactReference = headersHandler.readHeaders(headers);

Schema example (optional field):
"fields": [{ "name": "alternativeName", "type": ["null", { "type": "string", "avro.java.string": "String", "pattern": "^. {0,100} $" }], "doc": "", "default": null }

@famarting
Copy link
Contributor

so, if I understood correctly, because of using a schema that allows to have an empty value in the first field... the deserializers incorrectly understand that as the magic byte and try to do the deserialization as if the Artifact Global ID was after that first byte.

I agree this is an issue.

Btw, the reason why this is happening now is because now (after Apicurio Registry 2.0.0.Final) the serializers use the headers to write the Artifact Global ID by default. This way in your usecase, the Global ID is in the message headers, so there is no magic byte in the message, and because of your schema allowing to have empty values in the first field , that ends up with the deserializer getting confused and thinking your messages are not using the headers to pass the Artifact Global ID while they are actually using it.

I have to think of a fix for this, meanwhile I think disabling the headers in the serializer will solve your issue (https://github.com/Apicurio/apicurio-registry/blob/master/serdes/serde-common/src/main/java/io/apicurio/registry/serde/SerdeConfig.java#L151) @tszappanos

@famarting famarting added the type/bug Something isn't working label May 4, 2021
@tszappanos
Copy link
Author

so, if I understood correctly, because of using a schema that allows to have an empty value in the first field... the deserializers incorrectly understand that as the magic byte and try to do the deserialization as if the Artifact Global ID was after that first byte.

I agree this is an issue.

Btw, the reason why this is happening now is because now (after Apicurio Registry 2.0.0.Final) the serializers use the headers to write the Artifact Global ID by default. This way in your usecase, the Global ID is in the message headers, so there is no magic byte in the message, and because of your schema allowing to have empty values in the first field , that ends up with the deserializer getting confused and thinking your messages are not using the headers to pass the Artifact Global ID while they are actually using it.

I have to think of a fix for this, meanwhile I think disabling the headers in the serializer will solve your issue (https://github.com/Apicurio/apicurio-registry/blob/master/serdes/serde-common/src/main/java/io/apicurio/registry/serde/SerdeConfig.java#L151) @tszappanos

thanks for the workaround, it looks like it works fine.

@EricWittmann
Copy link
Member

@famartinrh Do you want to keep this issue open until we have a fix? Or is the workaround sufficient for this use-case?

@famarting
Copy link
Contributor

the workaround could be sufficient, but maybe it's better to know from the users if they really wanted to use the headers to transmit the globalId, wdyt @tszappanos is the workaround sufficient?

@tszappanos
Copy link
Author

@famartinrh , after a deeper analysis, we found that we need to correct the behavior of globalids in the headers. The workaround is not enough. When could this bugfix be ready, please? thanks.

@EricWittmann
Copy link
Member

@tszappanos Fabian just submitted a PR to fix this and it has been merged into both master and 2.0.x. We'll be doing a 2.0.1.Final release this week (maybe today). The fix will be in that release.

@EricWittmann
Copy link
Member

Fixed in #1573

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants