-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: introduce Format interface #4437
Conversation
8309e29
to
5ca39a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agavra
Few thoughts below. I see you've just pushed a new change so I'll check that out too.
public abstract class AbstractFormat implements Format { | ||
|
||
@Override | ||
public String toString() { | ||
return name(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're adding a abstract base class just to overload toString
? Yuck! Just call name()
rather than use toString()
.
This is even more true considering we're to support pluggable formats: we can't rely on other people correctly overloading toString
, so we shouldn't rely on it.
* in ksqlDB. The builtin formats are specified in the {@link Formats} | ||
* class. | ||
*/ | ||
public interface Format { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a @PublicApi
annotation and mark this with it, (and any others you can think of, e.g. UDF family of stuff).
(Separate PR, obvs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#4439 - it isn't public quite yet, but I'll make sure to create some sort of annotation for this and the others
* @return {@code true} if this {@code Format} supports schema inference | ||
* through Confluent Schema Registry. | ||
*/ | ||
default boolean supportsSchemaInference() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice :D
* | ||
* @return a set of valid property names | ||
*/ | ||
Set<String> getSupportedProperties(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to empty set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was debating this or forcing people to implement it - I'm happy defaulting to empty set.
* @return the set of properties that are considered "inheritable" | ||
*/ | ||
public Set<String> getInheritableProperties() { | ||
return inheritableProperties; | ||
default Set<String> getInheritableProperties() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: on the javadoc
Specifies the set of "inheritable" properties - these properties will persist across streams and tables if a sink is created from a source of the same type and does not explicitly overwrite the property.
Don't really know what 'source of the same type' means. Do you mean the sink has the same format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - updated the javadoc to clarify this (hopefully)
throw new KsqlException("Unknown format: " + value.getFormat()); | ||
} | ||
} | ||
KsqlSerdeFactory getSerdeFactory(FormatInfo info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If KsqlSerdeFactory
is going to become part of our public API then we should give it some love to make it more extensible, e.g.:
- Rather than passing config and SR clients as params, pass them via a Context object, which allows us to add more things to it later without needing to change the signature.
- Consider removing KSqlConfig from the method sigs completely, as this is specific to our serde classes. Instead, pluggable formats can implement
Configurable
if they need config. (We should limit what config is passed to 3rd party formats just like we do for UDFs e.g. we shouldn't be passing all config that may contain secrets etc). - move the type checking version of
createSerde
out of the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | ||
* A class containing the builtin supported formats in ksqlDB. | ||
*/ | ||
public final class Formats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this will grow to be a class that knows about all formats in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will know about only builtin formats - I haven't figured out exactly how we're going to do the factory (whether it be annotation/classpath based or configuring a plugin in config and packaging jars). At the point when that mechanism is worked out this class will probably be removed and we'll dogfood that mechanism with all format classes
|
||
public final class AvroFormat extends AbstractFormat { | ||
|
||
public static final String FULL_SCHEMA_NAME = "fullSchemaName"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm... just looked at how VALUE_AVRO_SCHEMA_FULL_NAME
in the WITH clause is wired up to this. (Same for DELIMITED).
Thinking...
- Remember that we'll have key variants of these at some point soon. I think what's there will work, i.e. we'll have to change
CreateSourceAsProperties.getFormatProperties()
togetValueFormatProperties()
or something. Maybe if you plan ahead for key formats you may get a better long term design. No biggie though. VALUE_AVRO_SCHEMA_FULL_NAME
should urgently be renamed toVALUE_SCHEMA_FULL_NAME
is we're soon to have Json or Protobuf schema names in there.CreateSourceAsProperties
codifies the mapping fromVALUE_AVRO_SCHEMA_FULL_NAME
tofullSchemaName
. That works fine for our built in ones, but how would other formats expose their properties in the WITH clause? Maybe the property names exposed by the formats should match the names used in the WITH clause, (with optionalVALUE_
orKEY_
prefixes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- good point, haven't thought this through much but IMO the Formats should be agnostic of key/value formats (if they are supposed to behave differently, then they should be two different formats)
- yup, will definitely do that :)
- my intention going forward was to ditch this mapping from whats in the WITH clause to what is used internally and have them stick to what's in the with clause. I'll create a ticket describing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea what you changed in your last commit...
@@ -178,7 +177,7 @@ public void shouldBuildStepForWindowedAggregate() { | |||
ExecutionStepFactory.streamWindowedAggregate( | |||
queryContext, | |||
schemaGroupedStream.getSourceStep(), | |||
Formats.of(expected, valueFormat, SerdeOption.none()), | |||
io.confluent.ksql.execution.plan.Formats.of(expected, valueFormat, SerdeOption.none()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hummm... nasty name clash... can we rename one? e.g. PlanFormats
? Or make your Formats
something like FormatFactory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed mine to FormatFactory
because the other is in the serialized plan
4ce5986
to
fb33bfc
Compare
fb33bfc
to
5f93260
Compare
Description
This change looks large but it's really quite simple - most of it is just auto-renaming of things:
Format
and rename the existing enumFormats
Format.AVRO
is nowAvroFormat
)This sets us up toward allowing pluggable schema support and makes it easier to add new formats (e.g. protobuf, coming soon).
Review Guide
Format
,Formats
,AvroFormat
,JsonFormat
,KafkaFormat
andDelimitedFormat
. The rest is basically IDE jitter.Testing done
mvn clean install
Reviewer checklist