Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Parsing does not seem to respect versions referenced in Schema Registry #474

Open
TheGreatAbyss opened this issue Oct 23, 2024 · 3 comments

Comments

@TheGreatAbyss
Copy link

Hello Again,

Due to the change since v2.18 where the library no longer supports reading schemas where the null is not present in the data on a union schema with a default of Null, I now need to read in both the original schema, and the current schema from Schema Registry in order to compile a composite schema.

However I have some declared record types that specify a version. For example:

some record version 1

{
  "type": "record",
  "name": "some_record",
  "fields": [
    {
      "name": "email",
      "type": "string"
    },
    {
      "name": "timestamp",
      "type": "long",
      "logicalType": "timestamp-millis"
    }
]
}

some record version 2

{
  "type": "record",
  "name": "some_record",
  "fields": [
    {
      "name": "email",
      "type": "string"
    },
    {
      "name": "timestamp",
      "type": "long",
      "logicalType": "timestamp-millis"
    },
    {
      "name": "received_timestamp",
      "type": [
        "null",
        "long"
      ],
      "default": null,
      "logicalType": "timestamp-millis"
    }
]
}

Then I have another record that references some record

{
  "type": "record",
  "name": "record_collection",
  "fields": [
    {
      "name": "delivered_at_timestamp",
      "type": [
        "null",
        "long"
      ],
      "default": null,
      "logicalType": "timestamp-millis"
    },
    {
      "name": "events",
      "type": {
        "type": "array",
        "items": "some_record"
      }
    }
  ],
  "default": null
}

record_collection has two versions, with the only difference being the first version references some_record version 1, and the second version 2. I'm not going to write them both out but you get the idea.

  "references": [
    {
      "name": "some_record",
      "subject": "some_record-value",
      "version": 1|2
    }
  ]

AFAICT the library doesn't follow the version number of the record. It ends up in the function parsePrimitiveType where it finds the referenced schema in the cache regardless of version (or id). Note the referenced schema is also used in my application so it was already present.

	default:
		schema := cache.Get(fullName(namespace, s))
		if schema != nil {
			return schema, nil
		}

Obviously a suboptimal work around is to rebuild my schemas without using references types.

Is there something I'm missing, some configuration I need to set? If not can this be a feature request?

Thank You Again

  • Eric
@TheGreatAbyss
Copy link
Author

TheGreatAbyss commented Oct 23, 2024

Changing around the order in which I initiate topics I'm realizing this library doesn't actually reach out SR to resolve referenced schema types, it just depends on it already being present.

Still, it would be nice for the internal cache to be keyed off id, or name+version

@nrwiersma
Copy link
Member

The Schema Parser has no concept of the registry or its references. It would need to setup a specific schema cache, parse all of the references first, then parse the record. Needless to say, this is not happening in the registry at this point. The issue here though is in the registry, not the schema parser. This however should not be wildly difficult to implement.

@TheGreatAbyss
Copy link
Author

That would be awesome. Thank You!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants