Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC: Fix Producer/Consumer State Machine #2721

Merged
merged 3 commits into from
Apr 3, 2021

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Apr 2, 2021

What

How

  • This solution matches the requirement laid out in the problem statement in the aforementioned PR.
  • Creates a publisher and consumer concept in the CDC source and codifies state machine of each.

Recommended reading order

  1. PostgresSource.java
  2. DebeziumRecordPublisher.java
  3. DebeziumRecordConsumer.java
  4. the rest

@auto-assign auto-assign bot requested review from davinchia and jrhizor April 2, 2021 20:47
@cgardens cgardens changed the title CDC: Fix Pub/Sub State Machine CDC: Fix Producer/Consumer State Machine Apr 2, 2021

public class DebeziumEventUtils {

public static AirbyteMessage convertChangeEvent(ChangeEvent<String, String> event, Instant emittedAt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: toAirbyteMessage

}

private static JsonNode formatDebeziumData(JsonNode before, JsonNode after, JsonNode source) {
final ObjectNode base = (ObjectNode) (after.isNull() ? before : after);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we OK to modify the input object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. it is safe in this case. i will add comment to make that clear.

* alive as long as the publisher is not closed or if there are any new records for it to process
* (even if the publisher is closed).
*/
public class DebeziumRecordConsumer extends AbstractIterator<ChangeEvent<String, String>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is very confusing. It is not really a consumer. It is an Iterator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DebeziumRecordIterator?

return hasClosed.get();
}

public synchronized void close() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dont you make close idempotent instead of synchronized?

public void close() throws Exception {
  if (hasclosed.compareAndSet(false, true) {
     close stuff & shutdown
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't quite give us the guarantee that we want. there are things we care about here:

  1. we only want the internals of close to be called no more than once
  2. we only want to set isClosed to true after the internals of close have run.

i think what you're describing only gives us 1. if we introduced a second boolean isClosing we might be able to use this technique but it leads to a weird behavior where if close is called twice, the second call can return from close while the object still isn't actually closed (unless we add a busy wait). gets kinda complicated. lmk if you think i'm missing something. definitely happy to try a different approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, you're correct!

Copy link
Contributor

@jrhizor jrhizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it didn't fix the race condition this would be a nice refactor.

Base automatically changed from cgardens/cdc_state_management to jrhizor/debezium April 2, 2021 23:51
@cgardens cgardens merged commit ec6abf5 into jrhizor/debezium Apr 3, 2021
@cgardens cgardens deleted the cgardens/cdc_pub_sub branch April 3, 2021 00:16
@davinchia
Copy link
Contributor

I'm late; commenting to say this makes sense to me and I appreciated the comments.

jrhizor added a commit that referenced this pull request Apr 9, 2021
* spike

* more

* debezium wip

* use oneof for configuration

* iterator wrapping structure

* push current

* working loop

* move capability into source

* hack it into a sharable state

* debezium test runner (#2617)

* CDC Wait for Values (#2618)

* output actual AirbyteMessages for cdc (#2631)

* message conversion

* fmt

* add lsn extraction and comparison (#2613)

* postgres cdc catalog (#2673)

* update cdc catalog

* A

* table selection for cdc (#2690)

* table selection for cdc

* fix broken merge

* also test double quote in name

* Add state management to CDC (#2718)

* CDC: Fix Producer/Consumer State Machine (#2721)

* CDC Postgres Tests (#2777)

* fix postgres cdc image name and run check before reading data (#2785)

* minor postgres cdc fixes

* add test and fix check behavior

* fix

* improve comment

* remove unused props, remove todos, add some more sanity tests (#2791)

* cdc: add offset store tests (#2793)

* clean (#2798)

* postgres cdc docs (#2784)

* cdc docs

* Update docs/integrations/sources/postgres.md

Co-authored-by: Charles <[email protected]>

* address gcp

* learn too english

* add link

* add more disk space warnings

* add additional cdc use case

* add information on how to find postgresql.conf

* add how to find the file

Co-authored-by: Charles <[email protected]>

* various merge conflict fixes (#2799)

* cdc standard tests (#2813)

* require cdc users to create publications & update docs (#2818)

* postgres cdc race condition

* working? but different process

* add additional logging to help debug in the future

* everything done except working config

* remove unintended change

* Use oneOf in PG CDC spec (#2827)

* add oneOf configuration for postgres cdc  (#2831)

* add oneof configuration for cdc postgres

* fmt

Co-authored-by: Charles <[email protected]>

* fix test (#2834)

* fix test

* bump version

* add docs on creating replica identities (#2838)

* add docs on creating replica identities

* emphasize danger

* grammar

* bump pg version in source catalog

* generate seed files

Co-authored-by: cgardens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants