Updated Kafka + SQL storage variant, re-using existing kafka storage utils/data model #1026

EricWittmann · 2020-11-23T20:45:00Z

OK here is an updated implementation. I think we have some interesting ideas here that will ultimately lead to a nice hybrid between the Kafka and SQL approaches that I think will be useful in a variety of deployment configurations.

This implementation currently has a few problems that I think we can address in future iterations.

Kafka log compaction not possible
Number and structure of Kafka messages not optimized
SQL layer not optimized

Regarding (1), I think our goal must be to support log compaction to ensure the smallest log retention requirements for Kafka that we possibly can. To that end, I think we need to evolve this implementation to adhere to the following principals:

Each Kafka message key should be composed of tuple(artifactId, version)
Each Kafka message should contain all information about an artifact version (all meta-data), unless the type is DELETE
Content should be managed separately from artifacts and indexed using e.g. shaHash(content)
Kafka messages for the artifact+version should reference the content by its unique contentId

If my understanding of Log Compaction is correct, the above rules should hopefully be a reasonable starting point, but aiui the idea is to ensure that if Kafka throws away all messages except the most recent (for any given unique "key"), the integrity/correctness of the application should still hold. I think it also has the side benefit of ensuring message ordering.

Still more for me to understand/think about!

Regarding (2) - we have a few places where we are creating or updating meta-data using two Kafka messages. And I think we may not be handling content in the most efficient way possible.

Finally, part of the reason we're doing what we're doing in (2) is because the (3) SQL layer isn't designed to facilitate this hybrid approach. In the next iteration we should enhance the SQL layer to include functionality useful to the hybrid approach. For example (but not limited to):

Fast existence checks
Manage content separately from artifacts
Upserts: artifacts, artifact versions, rules, etc

…roperties in one file

…ifactId to ensure ordering

fix selectArtifactMetaDataByGlobalId query bug and add reproducer test

* kafka + sql storage variant, reusing streams variant datamodel * ksql - integration tests * fix streams storage

…istry into learning/kafka-sql

… handle new async behavior

storage/ksql/src/main/java/io/apicurio/registry/storage/impl/ksql/KafkaSqlCoordinator.java

carlesarnal · 2020-11-24T08:55:35Z

storage/ksql/src/main/java/io/apicurio/registry/storage/impl/ksql/KafkaSqlRegistryStorage.java

+            throw e;
+        }
+
+        return submitter


I don't see how the artifact version is being handled in this storage. I know that the -1 is what the streams storage is sending at this point, but then the StreamsTopologyProvider updates that version appropriately. I don't see something that replaces that functionality here.

The version is created when the SQL layer creates the row in the versions table. I believe the try-catch is purely a fail-fast check for when the artifact doesn't exist.

The way this will work is that a Kafka message with the artifactId and a "create version" action (and the content) will be published. Then all replicas will consume that message and perform the action, which is to create a new version for the given artifactId in the DB. At that point the version ID will be generated and communicated back to the original thread via the coordinator.

Does that explanation make sense?

Yes, it does. Thanks, now I see the query and its usage in the CommonSqlStatements.

carlesarnal · 2020-11-24T09:11:00Z

LGTM, but I have a few questions.

Why we don't use the globalId as the message key?
As I said in my previous comment, I don't see how the artifact versions are being handled.
The new ksql storage is missing in the Github actions workflow.

famarting · 2020-11-24T10:02:09Z

looks good so far... and I agree with the points to address you described. But in regards to log compaction, something I've just realized, because globalId is generated by the SQL storage we must ensure the generated globalIds after processing a compacted topic are the same.
For instance, in a sequence like:

artifact1 -> artifact created (globalId 1)
artifact2 -> artifact created (globalId 2)
artifact1 -> artifact deleted

after log compaction it will be like:

artifact2 -> artifact created (globalId 2)
artifact1 -> artifact deleted

we need to think how are we going to make artifact2 to have globalId 2

EricWittmann · 2020-11-24T13:50:20Z

I think I'm going to start a document to work through the details of log compaction as it applies to this use-case. I think it's tricky and if we want to use it, we'll need to get it right. :)

…utils/data model (#1026) * Added a Kafka+SQL storage variant * introduced overlays for application.properties to avoid putting all properties in one file * Fixed the ksql tests - they all pass! * added some logging to the merge properties mojo * push the UUID into the payload and make the kafka message key the artifactId to ensure ordering * fix selectArtifactMetaDataByGlobalId query bug and add reproducer test * some tweaks based on perf testing * kafka + sql storage variant, reusing streams variant datamodel (#1012) * kafka + sql storage variant, reusing streams variant datamodel * ksql - integration tests * fix streams storage * fixed some bugs in the ksql modified impl, and modified some tests to handle new async behavior * minor TODO * run integration tests and fix storage bug (#1028) * fix ksql storage - error create/update artifact with metadata (#1029) * update after some PR feedback * remove some debug methods * updated the perftest readme Co-authored-by: Fabian Martinez <[email protected]> Co-authored-by: Fabian Martinez <[email protected]>

EricWittmann and others added 13 commits November 10, 2020 10:50

Added a Kafka+SQL storage variant

20d2060

introduced overlays for application.properties to avoid putting all p…

e76c3d8

…roperties in one file

Fixed the ksql tests - they all pass!

31f2098

added some logging to the merge properties mojo

5863a97

push the UUID into the payload and make the kafka message key the art…

402db21

…ifactId to ensure ordering

fix selectArtifactMetaDataByGlobalId query bug and add reproducer test

1b874f3

Merge pull request #10 from famartinrh/learning/kafka-sql/fix-sql-bug

f4240cb

fix selectArtifactMetaDataByGlobalId query bug and add reproducer test

some tweaks based on perf testing

371a49d

kafka + sql storage variant, reusing streams variant datamodel (#1012)

66105cf

* kafka + sql storage variant, reusing streams variant datamodel * ksql - integration tests * fix streams storage

Merge branch 'learning/kafka-sql' of github.com:Apicurio/apicurio-reg…

7054b23

…istry into learning/kafka-sql

Merged changes from master

004edd3

fixed some bugs in the ksql modified impl, and modified some tests to…

163a77d

… handle new async behavior

minor TODO

df0c072

EricWittmann requested review from alesj, jsenko, carlesarnal and famarting November 23, 2020 20:45

carlesarnal reviewed Nov 24, 2020

View reviewed changes

storage/ksql/src/main/java/io/apicurio/registry/storage/impl/ksql/KafkaSqlCoordinator.java Outdated Show resolved Hide resolved

carlesarnal reviewed Nov 24, 2020

View reviewed changes

famarting mentioned this pull request Nov 24, 2020

run integration tests and fix storage bug #1028

Merged

run integration tests and fix storage bug (#1028)

8e43971

famarting and others added 5 commits November 24, 2020 10:48

fix ksql storage - error create/update artifact with metadata (#1029)

3ea5801

update after some PR feedback

463252b

remove some debug methods

9d01a5b

Merge remote-tracking branch 'upstream/master' into learning/kafka-sql

2b2621e

updated the perftest readme

dda622a

EricWittmann merged commit 5f8b06f into master Nov 24, 2020

EricWittmann deleted the learning/kafka-sql branch November 24, 2020 19:15

munahaf mentioned this pull request Oct 15, 2024

[Snyk] Security upgrade @asyncapi/react-component from 1.0.0-next.40 to 2.1.0 munahaf/apicurio-registry#85

Open

AtiQ-Rahman mentioned this pull request Oct 15, 2024

[Snyk] Security upgrade @asyncapi/react-component from 1.0.0-next.40 to 2.1.0 AtiQ-Rahman/apicurio-registry#61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated Kafka + SQL storage variant, re-using existing kafka storage utils/data model #1026

Updated Kafka + SQL storage variant, re-using existing kafka storage utils/data model #1026

EricWittmann commented Nov 23, 2020

carlesarnal Nov 24, 2020

EricWittmann Nov 24, 2020 •

edited

Loading

carlesarnal Nov 24, 2020

carlesarnal commented Nov 24, 2020

famarting commented Nov 24, 2020

EricWittmann commented Nov 24, 2020

Updated Kafka + SQL storage variant, re-using existing kafka storage utils/data model #1026

Updated Kafka + SQL storage variant, re-using existing kafka storage utils/data model #1026

Conversation

EricWittmann commented Nov 23, 2020

carlesarnal Nov 24, 2020

Choose a reason for hiding this comment

EricWittmann Nov 24, 2020 • edited Loading

Choose a reason for hiding this comment

carlesarnal Nov 24, 2020

Choose a reason for hiding this comment

carlesarnal commented Nov 24, 2020

famarting commented Nov 24, 2020

EricWittmann commented Nov 24, 2020

EricWittmann Nov 24, 2020 •

edited

Loading