Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: introduce Format interface #4437

Merged
merged 1 commit into from
Feb 5, 2020

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Feb 4, 2020

Description

This change looks large but it's really quite simple - most of it is just auto-renaming of things:

  • Extract an interface Format and rename the existing enum Formats
  • Move each enum into its own class (e.g. Format.AVRO is now AvroFormat)
  • Move some functionality into the classes instead of using switch statements throughout the code

This sets us up toward allowing pluggable schema support and makes it easier to add new formats (e.g. protobuf, coming soon).

Review Guide

  • All you really need to look at are Format, Formats, AvroFormat, JsonFormat, KafkaFormat and DelimitedFormat. The rest is basically IDE jitter.

Testing done

  • Added some new tests
  • mvn clean install

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@agavra agavra requested a review from a team as a code owner February 4, 2020 20:10
@big-andy-coates big-andy-coates self-assigned this Feb 4, 2020
Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agavra

Few thoughts below. I see you've just pushed a new change so I'll check that out too.

Comment on lines 18 to 23
public abstract class AbstractFormat implements Format {

@Override
public String toString() {
return name();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding a abstract base class just to overload toString? Yuck! Just call name() rather than use toString().

This is even more true considering we're to support pluggable formats: we can't rely on other people correctly overloading toString, so we shouldn't rely on it.

* in ksqlDB. The builtin formats are specified in the {@link Formats}
* class.
*/
public interface Format {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a @PublicApi annotation and mark this with it, (and any others you can think of, e.g. UDF family of stuff).

(Separate PR, obvs).

Copy link
Contributor Author

@agavra agavra Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4439 - it isn't public quite yet, but I'll make sure to create some sort of annotation for this and the others

* @return {@code true} if this {@code Format} supports schema inference
* through Confluent Schema Registry.
*/
default boolean supportsSchemaInference() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :D

*
* @return a set of valid property names
*/
Set<String> getSupportedProperties();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to empty set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating this or forcing people to implement it - I'm happy defaulting to empty set.

* @return the set of properties that are considered "inheritable"
*/
public Set<String> getInheritableProperties() {
return inheritableProperties;
default Set<String> getInheritableProperties() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: on the javadoc

Specifies the set of "inheritable" properties - these properties will persist across streams and tables if a sink is created from a source of the same type and does not explicitly overwrite the property.

Don't really know what 'source of the same type' means. Do you mean the sink has the same format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - updated the javadoc to clarify this (hopefully)

throw new KsqlException("Unknown format: " + value.getFormat());
}
}
KsqlSerdeFactory getSerdeFactory(FormatInfo info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If KsqlSerdeFactory is going to become part of our public API then we should give it some love to make it more extensible, e.g.:

  • Rather than passing config and SR clients as params, pass them via a Context object, which allows us to add more things to it later without needing to change the signature.
  • Consider removing KSqlConfig from the method sigs completely, as this is specific to our serde classes. Instead, pluggable formats can implement Configurable if they need config. (We should limit what config is passed to 3rd party formats just like we do for UDFs e.g. we shouldn't be passing all config that may contain secrets etc).
  • move the type checking version of createSerde out of the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* A class containing the builtin supported formats in ksqlDB.
*/
public final class Formats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this will grow to be a class that knows about all formats in time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will know about only builtin formats - I haven't figured out exactly how we're going to do the factory (whether it be annotation/classpath based or configuring a plugin in config and packaging jars). At the point when that mechanism is worked out this class will probably be removed and we'll dogfood that mechanism with all format classes


public final class AvroFormat extends AbstractFormat {

public static final String FULL_SCHEMA_NAME = "fullSchemaName";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm... just looked at how VALUE_AVRO_SCHEMA_FULL_NAME in the WITH clause is wired up to this. (Same for DELIMITED).

Thinking...

  1. Remember that we'll have key variants of these at some point soon. I think what's there will work, i.e. we'll have to change CreateSourceAsProperties.getFormatProperties() to getValueFormatProperties() or something. Maybe if you plan ahead for key formats you may get a better long term design. No biggie though.
  2. VALUE_AVRO_SCHEMA_FULL_NAME should urgently be renamed to VALUE_SCHEMA_FULL_NAME is we're soon to have Json or Protobuf schema names in there.
  3. CreateSourceAsProperties codifies the mapping from VALUE_AVRO_SCHEMA_FULL_NAME to fullSchemaName. That works fine for our built in ones, but how would other formats expose their properties in the WITH clause? Maybe the property names exposed by the formats should match the names used in the WITH clause, (with optional VALUE_ or KEY_ prefixes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. good point, haven't thought this through much but IMO the Formats should be agnostic of key/value formats (if they are supposed to behave differently, then they should be two different formats)
  2. yup, will definitely do that :)
  3. my intention going forward was to ditch this mapping from whats in the WITH clause to what is used internally and have them stick to what's in the with clause. I'll create a ticket describing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what you changed in your last commit...

@@ -178,7 +177,7 @@ public void shouldBuildStepForWindowedAggregate() {
ExecutionStepFactory.streamWindowedAggregate(
queryContext,
schemaGroupedStream.getSourceStep(),
Formats.of(expected, valueFormat, SerdeOption.none()),
io.confluent.ksql.execution.plan.Formats.of(expected, valueFormat, SerdeOption.none()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hummm... nasty name clash... can we rename one? e.g. PlanFormats? Or make your Formats something like FormatFactory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed mine to FormatFactory because the other is in the serialized plan

@agavra agavra merged commit b324bf5 into confluentinc:master Feb 5, 2020
@agavra agavra deleted the format_refactor branch February 5, 2020 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants