-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow topics and subjects to be different #45
Comments
@eparreno hi, thanks for your message. Just curiosity about the architecture : how many topics/cities you are thinking about? They are created dynamically? This approach is very particular and it's very expensive to kafka cluster too. All clients (in all languages - from Java to C++) use the same inputs. Don't you agree that if you are needing this, something is a bit strange? |
Recently, changes were made for Kafka Avro serialization on the Java side that allow for more flexibility in subject names, for good reason, since ordering can only be maintained across different message types if they are all on the same topic. See this blog post: https://www.confluent.io/blog/put-several-event-types-kafka-topic/ They use a "strategy" configuration option which controls how the subject is calculated. The default is For compatibility with the overall Kafka ecosystem, I would suggest supporting the same here. |
I need this functionality so I'm probably going to work on this on a fork. I believe the logic used in this project can be simplified and improved at the same time. Currently the schema registry code downloads all the schemas in advance, and makes assumptions about the schemas based on the subject name, like what topic the schema is associated with, whether it is a key or value schema, and so on. These assumptions are not based on the information in the schema registry itself, but on how it was populated, and are now incorrect in confluent 4+. Also, I think a lot of this logic is unnecessary -- when deserializing you don't need to determine the topic name and version and so forth at all -- the magic # can be used to look up the schema directly. There is also no need to read and cache all the schemas at startup -- they can just be looked up at deserialization time (and cached). The subject naming strategies only need to be used at produce time, to register a new schema with the appropriate subject, if it does not already exist in the registry. Since the current code doesn't do this anyway AFAICT, I'll likely just punt on this and assume the registry contains everything it needs to. This can be a future enhancement. |
@rocketraman yes, all points you said are true: there are many closed issues (ex: #41) that mention the current strategy of Unfortunately I am without time to create this implementation right now... :/ I will stay tuned at your fork btw, maybe we can use it here |
Unfortunately, I ran out of time to make this happen, and worked around it. I'll look forward to a 2.0 release. Thanks! |
I just wanted to second, that it would be great to find a way to support this. I think it would make sense to support the three strategies that the Java serializer popularized and as well making it possible to provide your own schema name resolution function I read in #59 that this kind of functionality is planned for a bigger rewrite 2.0 - does it still make sense to add this to the current codebase beforehand? |
Hey, |
I just want to discuss about that before creating a PR. You're assuming that topics match subjects, which is the most common approach but is not always the case. In our app for instance we have topics per cities where we sent the same kind of messages, some people do it with partitions but we choose that approach for some reasons.
In your implementation it's assumed that topic and subject are the same. Do you think it would be worth to implement and interface like
producer.produce(topic, partition, subject, value, key);
The text was updated successfully, but these errors were encountered: