Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: decode in the case of field missing in both writer binary and reader struct. #452

Merged
merged 3 commits into from
Sep 16, 2024

Conversation

redaLaanait
Copy link
Contributor

I think I overlooked an edge case the last time I worked on schema evolution:

If a field is missing in both the binary and the reader struct, it should be ignored rather than calling skipDecoder (which technically performs a read and moves the cursor).

The current implementation might perform a dirty read, causing errors like:

unexpected EOF

ReadString: size is greater than Config.MaxByteSliceSize

nrwiersma
nrwiersma previously approved these changes Sep 16, 2024
Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nrwiersma nrwiersma merged commit 571d881 into hamba:main Sep 16, 2024
12 checks passed
@TheGreatAbyss
Copy link

Hello, and again, thank you so much for this library

Sorry if there is something that I'm totally missing, but is there a way to set theFieldSetDefault Action as a global config for a schema? Here is my example use case that is failing with reading int: EOF

func TestLeaveMeExampleOfConvertToAvroWithOldSchemaThenDecodeWithNew(t *testing.T) {
     schema, err := avro.Parse(`{
    "type": "record",
    "name": "simple",
    "namespace": "org.hamba.avro",
    "fields" : [
        {"name": "a", "type": "int"},
        {"name": "b", "type": "string"},
	{"name": "c", "type": "string"},
	{"name": "d", "type": "string"}
	]
}`)
	require.NoError(t, err)

	// encode
	// Need to add and remove magic bytes before and after the wire
	buf := bytes.NewBuffer([]byte{})
	enc := avro.NewEncoderForSchema(schema, buf)

	obj2 := map[string]any{
		"a": 3,
		"b": "John",
		"c": "Jill",
		"d": "Jack",
	}

	err = enc.Encode(obj2)
	require.NoError(t, err)
	data := buf.Bytes()
	fmt.Println(data)

	// Decode
	newSchema, err := avro.Parse(`{
           "type": "record",
           "name": "simple",
            "namespace": "org.hamba.avro",
             "fields" : [
                 {"name": "a", "type": "int"},
                 {"name": "b", "type": "string"},
		{"name": "c", "type": "string"},
		{"name": "d", "type": "string"},
                {"name": "e","type": ["null","string"],"default": null}
]
}`)
	require.NoError(t, err)

	decoder := avro.NewDecoderForSchema(newSchema, buf)
	stringToInterfaceMap := map[string]any{}
	err = decoder.Decode(&stringToInterfaceMap)
	require.NoError(t, err)
	fmt.Println(stringToInterfaceMap)

	expected := map[string]any{
		"a": 3,
		"b": "John",
		"c": "Jill",
		"d": "Jack",
		"e": nil,
	}

	assert.EqualValues(t, expected, stringToInterfaceMap)
}

If I create compositeSchema using SchemaCompatibility I do get a schema with the field "e" having an Action of set_to_default which is then able to correctly decode.

	schemaCompatibility := avro.NewSchemaCompatibility()
	compositeSchema, err := schemaCompatibility.Resolve(newSchema, schema)
	assert.NoError(t, err)

However in practice having to use NewSchemaCompatibility isn't ideal as I'd like to just always pull the latest schema and decode to the default for new fields that were not in previous versions. The only work-around I can think of is I'll always need to pull the first and last version of a schema to create the composite schema.

Again sorry if I'm missing something super basic.

Thank You!!

@nrwiersma
Copy link
Member

What you want does not work. To have differing schemas between read and write, you must use a composite. Avro and the spec assume that the byte data is complete, as there is nothing in the data to tell it otherwise.

@TheGreatAbyss
Copy link

Thank you for your response.

I was curious how I hadn't come across this yet so I reverted back to v2.18.0 and the above test does work. Just pointing out that this is a breaking change.

Thanks Again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants