Allow topics and subjects to be different #45

eparreno · 2018-08-02T15:41:40Z

I just want to discuss about that before creating a PR. You're assuming that topics match subjects, which is the most common approach but is not always the case. In our app for instance we have topics per cities where we sent the same kind of messages, some people do it with partitions but we choose that approach for some reasons.

In your implementation it's assumed that topic and subject are the same. Do you think it would be worth to implement and interface like

producer.produce(topic, partition, subject, value, key);

The text was updated successfully, but these errors were encountered:

ricardohbin · 2018-08-06T20:21:59Z

@eparreno hi, thanks for your message. Just curiosity about the architecture : how many topics/cities you are thinking about? They are created dynamically? This approach is very particular and it's very expensive to kafka cluster too.

All clients (in all languages - from Java to C++) use the same inputs. Don't you agree that if you are needing this, something is a bit strange?

rocketraman · 2018-08-16T07:17:43Z

Recently, changes were made for Kafka Avro serialization on the Java side that allow for more flexibility in subject names, for good reason, since ordering can only be maintained across different message types if they are all on the same topic.

See this blog post: https://www.confluent.io/blog/put-several-event-types-kafka-topic/

They use a "strategy" configuration option which controls how the subject is calculated. The default is TopicNameStrategy, which is the default and maps topics to subjects. However, they also support TopicRecordNameStrategy which combines the topic and avro record type to form the subject, as well as RecordNameStrategy to form the subject from just the avro record type.

For compatibility with the overall Kafka ecosystem, I would suggest supporting the same here.

rocketraman · 2018-08-16T17:31:29Z

I need this functionality so I'm probably going to work on this on a fork. I believe the logic used in this project can be simplified and improved at the same time. Currently the schema registry code downloads all the schemas in advance, and makes assumptions about the schemas based on the subject name, like what topic the schema is associated with, whether it is a key or value schema, and so on. These assumptions are not based on the information in the schema registry itself, but on how it was populated, and are now incorrect in confluent 4+.

Also, I think a lot of this logic is unnecessary -- when deserializing you don't need to determine the topic name and version and so forth at all -- the magic # can be used to look up the schema directly. There is also no need to read and cache all the schemas at startup -- they can just be looked up at deserialization time (and cached).

The subject naming strategies only need to be used at produce time, to register a new schema with the appropriate subject, if it does not already exist in the registry. Since the current code doesn't do this anyway AFAICT, I'll likely just punt on this and assume the registry contains everything it needs to. This can be a future enhancement.

ricardohbin · 2018-08-17T22:43:43Z

@rocketraman yes, all points you said are true: there are many closed issues (ex: #41) that mention the current strategy of kafka-avro. The CachedSchemaRegistryClient strategy would be a much better implementation - and the plans are release the version 2.0.0 with this.

Unfortunately I am without time to create this implementation right now... :/

I will stay tuned at your fork btw, maybe we can use it here

rocketraman · 2018-08-18T13:57:43Z

I will stay tuned at your fork btw, maybe we can use it here

Unfortunately, I ran out of time to make this happen, and worked around it. I'll look forward to a 2.0 release. Thanks!

bfncs · 2019-03-18T11:02:57Z

I just wanted to second, that it would be great to find a way to support this. I think it would make sense to support the three strategies that the Java serializer popularized and as well making it possible to provide your own schema name resolution function (topic, subject) => schemaName to be usable with more involved strategies.

I read in #59 that this kind of functionality is planned for a bigger rewrite 2.0 - does it still make sense to add this to the current codebase beforehand?

pleszczy · 2019-09-06T14:02:20Z

Hey,
I have implemented the TopicRecordNameStrategy which is the default strategy on the confluent platform in this pr : https://github.com/waldophotos/kafka-avro/pull/71/files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow topics and subjects to be different #45

Allow topics and subjects to be different #45

eparreno commented Aug 2, 2018

ricardohbin commented Aug 6, 2018 •

edited

Loading

rocketraman commented Aug 16, 2018

rocketraman commented Aug 16, 2018

ricardohbin commented Aug 17, 2018

rocketraman commented Aug 18, 2018

bfncs commented Mar 18, 2019

pleszczy commented Sep 6, 2019

Allow topics and subjects to be different #45

Allow topics and subjects to be different #45

Comments

eparreno commented Aug 2, 2018

ricardohbin commented Aug 6, 2018 • edited Loading

rocketraman commented Aug 16, 2018

rocketraman commented Aug 16, 2018

ricardohbin commented Aug 17, 2018

rocketraman commented Aug 18, 2018

bfncs commented Mar 18, 2019

pleszczy commented Sep 6, 2019

ricardohbin commented Aug 6, 2018 •

edited

Loading