Fix: decode in the case of field missing in both writer binary and reader struct. #452

redaLaanait · 2024-09-15T19:00:33Z

I think I overlooked an edge case the last time I worked on schema evolution:

If a field is missing in both the binary and the reader struct, it should be ignored rather than calling skipDecoder (which technically performs a read and moves the cursor).

The current implementation might perform a dirty read, causing errors like:

unexpected EOF

ReadString: size is greater than Config.MaxByteSliceSize

…ader struct

codec_record.go

nrwiersma

LGTM

…missing_struct_field

nrwiersma

LGTM

TheGreatAbyss · 2024-10-22T21:55:22Z

Hello, and again, thank you so much for this library

Sorry if there is something that I'm totally missing, but is there a way to set theFieldSetDefault Action as a global config for a schema? Here is my example use case that is failing with reading int: EOF

func TestLeaveMeExampleOfConvertToAvroWithOldSchemaThenDecodeWithNew(t *testing.T) {
     schema, err := avro.Parse(`{
    "type": "record",
    "name": "simple",
    "namespace": "org.hamba.avro",
    "fields" : [
        {"name": "a", "type": "int"},
        {"name": "b", "type": "string"},
	{"name": "c", "type": "string"},
	{"name": "d", "type": "string"}
	]
}`)
	require.NoError(t, err)

	// encode
	// Need to add and remove magic bytes before and after the wire
	buf := bytes.NewBuffer([]byte{})
	enc := avro.NewEncoderForSchema(schema, buf)

	obj2 := map[string]any{
		"a": 3,
		"b": "John",
		"c": "Jill",
		"d": "Jack",
	}

	err = enc.Encode(obj2)
	require.NoError(t, err)
	data := buf.Bytes()
	fmt.Println(data)

	// Decode
	newSchema, err := avro.Parse(`{
           "type": "record",
           "name": "simple",
            "namespace": "org.hamba.avro",
             "fields" : [
                 {"name": "a", "type": "int"},
                 {"name": "b", "type": "string"},
		{"name": "c", "type": "string"},
		{"name": "d", "type": "string"},
                {"name": "e","type": ["null","string"],"default": null}
]
}`)
	require.NoError(t, err)

	decoder := avro.NewDecoderForSchema(newSchema, buf)
	stringToInterfaceMap := map[string]any{}
	err = decoder.Decode(&stringToInterfaceMap)
	require.NoError(t, err)
	fmt.Println(stringToInterfaceMap)

	expected := map[string]any{
		"a": 3,
		"b": "John",
		"c": "Jill",
		"d": "Jack",
		"e": nil,
	}

	assert.EqualValues(t, expected, stringToInterfaceMap)
}

If I create compositeSchema using SchemaCompatibility I do get a schema with the field "e" having an Action of set_to_default which is then able to correctly decode.

	schemaCompatibility := avro.NewSchemaCompatibility()
	compositeSchema, err := schemaCompatibility.Resolve(newSchema, schema)
	assert.NoError(t, err)

However in practice having to use NewSchemaCompatibility isn't ideal as I'd like to just always pull the latest schema and decode to the default for new fields that were not in previous versions. The only work-around I can think of is I'll always need to pull the first and last version of a schema to create the composite schema.

Again sorry if I'm missing something super basic.

Thank You!!

nrwiersma · 2024-10-23T04:24:03Z

What you want does not work. To have differing schemas between read and write, you must use a composite. Avro and the spec assume that the byte data is complete, as there is nothing in the data to tell it otherwise.

TheGreatAbyss · 2024-10-23T15:57:28Z

Thank you for your response.

I was curious how I hadn't come across this yet so I reverted back to v2.18.0 and the above test does work. Just pointing out that this is a breaking change.

Thanks Again.

fix: decode in the case of field missing in both writer binary and re…

921355a

…ader struct

nrwiersma reviewed Sep 16, 2024

View reviewed changes

codec_record.go Show resolved Hide resolved

nrwiersma previously approved these changes Sep 16, 2024

View reviewed changes

redaLaanait added 2 commits September 16, 2024 18:48

Merge branch 'main' of https://github.com/hamba/avro into fix_decode_…

a6b9db7

…missing_struct_field

clean up test

c45059c

redaLaanait dismissed nrwiersma’s stale review via c45059c September 16, 2024 17:52

nrwiersma approved these changes Sep 16, 2024

View reviewed changes

nrwiersma merged commit 571d881 into hamba:main Sep 16, 2024
12 checks passed

TheGreatAbyss mentioned this pull request Oct 23, 2024

Schema Parsing does not seem to respect versions referenced in Schema Registry #474

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: decode in the case of field missing in both writer binary and reader struct. #452

Fix: decode in the case of field missing in both writer binary and reader struct. #452

redaLaanait commented Sep 15, 2024

nrwiersma left a comment

nrwiersma left a comment

TheGreatAbyss commented Oct 22, 2024

nrwiersma commented Oct 23, 2024

TheGreatAbyss commented Oct 23, 2024

Fix: decode in the case of field missing in both writer binary and reader struct. #452

Fix: decode in the case of field missing in both writer binary and reader struct. #452

Conversation

redaLaanait commented Sep 15, 2024

nrwiersma left a comment

Choose a reason for hiding this comment

nrwiersma left a comment

Choose a reason for hiding this comment

TheGreatAbyss commented Oct 22, 2024

nrwiersma commented Oct 23, 2024

TheGreatAbyss commented Oct 23, 2024