Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow topics and subjects to be different #45

Open
eparreno opened this issue Aug 2, 2018 · 7 comments
Open

Allow topics and subjects to be different #45

eparreno opened this issue Aug 2, 2018 · 7 comments

Comments

@eparreno
Copy link
Contributor

eparreno commented Aug 2, 2018

I just want to discuss about that before creating a PR. You're assuming that topics match subjects, which is the most common approach but is not always the case. In our app for instance we have topics per cities where we sent the same kind of messages, some people do it with partitions but we choose that approach for some reasons.

In your implementation it's assumed that topic and subject are the same. Do you think it would be worth to implement and interface like

producer.produce(topic, partition, subject, value, key);

@ricardohbin
Copy link
Collaborator

ricardohbin commented Aug 6, 2018

@eparreno hi, thanks for your message. Just curiosity about the architecture : how many topics/cities you are thinking about? They are created dynamically? This approach is very particular and it's very expensive to kafka cluster too.

All clients (in all languages - from Java to C++) use the same inputs. Don't you agree that if you are needing this, something is a bit strange?

@rocketraman
Copy link

Recently, changes were made for Kafka Avro serialization on the Java side that allow for more flexibility in subject names, for good reason, since ordering can only be maintained across different message types if they are all on the same topic.

See this blog post: https://www.confluent.io/blog/put-several-event-types-kafka-topic/

They use a "strategy" configuration option which controls how the subject is calculated. The default is TopicNameStrategy, which is the default and maps topics to subjects. However, they also support TopicRecordNameStrategy which combines the topic and avro record type to form the subject, as well as RecordNameStrategy to form the subject from just the avro record type.

For compatibility with the overall Kafka ecosystem, I would suggest supporting the same here.

@rocketraman
Copy link

I need this functionality so I'm probably going to work on this on a fork. I believe the logic used in this project can be simplified and improved at the same time. Currently the schema registry code downloads all the schemas in advance, and makes assumptions about the schemas based on the subject name, like what topic the schema is associated with, whether it is a key or value schema, and so on. These assumptions are not based on the information in the schema registry itself, but on how it was populated, and are now incorrect in confluent 4+.

Also, I think a lot of this logic is unnecessary -- when deserializing you don't need to determine the topic name and version and so forth at all -- the magic # can be used to look up the schema directly. There is also no need to read and cache all the schemas at startup -- they can just be looked up at deserialization time (and cached).

The subject naming strategies only need to be used at produce time, to register a new schema with the appropriate subject, if it does not already exist in the registry. Since the current code doesn't do this anyway AFAICT, I'll likely just punt on this and assume the registry contains everything it needs to. This can be a future enhancement.

@ricardohbin
Copy link
Collaborator

@rocketraman yes, all points you said are true: there are many closed issues (ex: #41) that mention the current strategy of kafka-avro. The CachedSchemaRegistryClient strategy would be a much better implementation - and the plans are release the version 2.0.0 with this.

Unfortunately I am without time to create this implementation right now... :/

I will stay tuned at your fork btw, maybe we can use it here

@rocketraman
Copy link

I will stay tuned at your fork btw, maybe we can use it here

Unfortunately, I ran out of time to make this happen, and worked around it. I'll look forward to a 2.0 release. Thanks!

@bfncs
Copy link
Contributor

bfncs commented Mar 18, 2019

I just wanted to second, that it would be great to find a way to support this. I think it would make sense to support the three strategies that the Java serializer popularized and as well making it possible to provide your own schema name resolution function (topic, subject) => schemaName to be usable with more involved strategies.

I read in #59 that this kind of functionality is planned for a bigger rewrite 2.0 - does it still make sense to add this to the current codebase beforehand?

@pleszczy
Copy link
Contributor

pleszczy commented Sep 6, 2019

Hey,
I have implemented the TopicRecordNameStrategy which is the default strategy on the confluent platform in this pr : https://github.com/waldophotos/kafka-avro/pull/71/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants