Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION #25

Closed
astubbs opened this issue Nov 10, 2020 · 3 comments · Fixed by #31
Closed
Labels
verified bug Something isn't working

Comments

@astubbs
Copy link
Contributor

astubbs commented Nov 10, 2020

While running and publishing messages back to Kafka (pollAndProduce), sometimes a transaction error occurs in the log:

java.lang.IllegalStateException: Cannot call send in state COMMITTING_TRANSACTION

This is due to some error in the way transaction state is managed / monitored by the system.

Also, as reported by a user:

It doesn’t get stuck on Cannot call send in state COMMITTING_TRANSACTION, it just re-processes.
However, sometimes I get Invalid transition attempted from state READY to state COMMITTING_TRANSACTION on startup and then an infinite loop occurs. I’ve only seen this happening when I set max.poll.records very low.

@astubbs astubbs added the verified bug Something isn't working label Nov 10, 2020
JorgenRingen added a commit to JorgenRingen/parallel-consumer that referenced this issue Nov 10, 2020
JorgenRingen added a commit to JorgenRingen/parallel-consumer that referenced this issue Nov 10, 2020
JorgenRingen added a commit to JorgenRingen/parallel-consumer that referenced this issue Nov 10, 2020
@JorgenRingen
Copy link
Contributor

JorgenRingen commented Nov 10, 2020

I've contributed a small test-case that reproduces some of the issues we discussed on slack: JorgenRingen@9d5fd91

By tweaking the parallel-consumer and kafka-consumer settings I can reproduce Invalid transition attempted from state READY to state COMMITTING_TRANSACTION. Should happen on maxPolledRecords .

Also reproduced messagesProcessed > messagesProduced and Cannot call send in state COMMITTING_TRANSACTION (as the test is now).

I get somewhat different behavior by tweaking parallel-consumer options and the max.poll.records. It's not completely deterministic.

@astubbs
Copy link
Contributor Author

astubbs commented Nov 11, 2020

Afternoon @JorgenRingen, I believe I have fixed the issue with transactions / commit state, and now made transactions optional as well: https://github.com/confluentinc/parallel-consumer/pull/31/files

Take a look at the interface and let me know what you think - specifically this choice: https://github.com/confluentinc/parallel-consumer/pull/31/files#diff-12c3d1f966a5367ab47be49f7c0f11a9dcd4cf339b73acb092b8c44ae9c9a6e2R45

@JorgenRingen
Copy link
Contributor

Evening @astubbs, great with optional transactions. Adding the parameter and verifying by IllegalArugmentException'ing seems like a pragmatic and fair approach to me. However, if introducing support for optional producer, the parameter might be a little confusing. Don't have any immediate ideas on how to improve. Maybe some "fluent style" perhaps (which might be totally overkill) like ParallelConsumerOptions.with[outProducer|NonTransactionProducer|TransactionalProducer]().numberOfThreads(...).maxConcurrency(...)

I actually found a bug in the test:
https://github.com/confluentinc/parallel-consumer/pull/31/files#diff-d6d31b42ae96a5e31f2793c52624720af3a084c434707a9f031969a7af1b4e14R96 <- this line would always override maxPollRecords instrumented by the tests.

I deleted the line, added a couple of more tests and better verification and error-messages.
In my branch the following tests now fail fairly consistently:

  • io.confluent.parallelconsumer.examples.core.Bug25AppTest#testTransactionalLowMaxPoll (1) (infinite loop on every run)
  • io.confluent.parallelconsumer.examples.core.Bug25AppTest#testTransactionalDefaultMaxPoll (500) (infinite loop about ~50% of the times)

Typically infinite loops occurs when processed - produced=16 (16 = default number of threads)

All non-tx tests works and tx works when running with maxPollRecords=10000.

Checkout updated test:
JorgenRingen@be57d92

astubbs pushed a commit to astubbs/parallel-consumer that referenced this issue Nov 20, 2020
…l commits

Reproducing bug 25: confluentinc#25

WIP! Feature: Makes transactions optional

More

More - optional tx test exposes tx state error - non tx works

ReentrantReadWrite lock protects non-thread safe transactional producer from incorrect multithreaded use

Refactor

Fix onClose, remove TransactionState monitoring attempt - not needed

WIP! docs: Adds transaction system architecture

Wider lock to prevent transaction's containing produced messages that they shouldn't

Wider lock to prevent transaction's containing produced messages that they shouldn't

Squash tighten up

Tighten up

Fix

Implement non transactional synchronous commit sync properly

Volume tests adapted to non transactional, passes

More

Test fix - must start tx in MockProducer as well

More

fixed bug in test, added more tests and better verification

Fixes double lock causing deadlock, stops interruption of transaction commits

Fixes wakeup issues with non-transactional commits, adds supervision to poller

Bug: fix example app tests

- Fixes a performance issue with the async committer not being woken up
- Enhances tests to run under multiple commit modes

More

More

WIP! Update tests to run in all 3 commit modes

More

More

More

More

More

- fixes example app tests - incorrectly testing wrong thing and MockProducer not configured to auto complete

Use iteration over stream as can’t workout a concurrent modification here

- remove Optional#ifPresentOrElse} - it’s only @SInCE 9

Fixes vertx app test, adds wiremock

Squash

Squash

Squash

Squash! Fix concurrent modification exception

Squash

bug: Stabilise test and make committer thread revoke partitions and commit

- CloseAndOpenOffset test from some race condition
- Make sure the correct thread owner of the committer is the to close the consumer and thus commit the offsets

Fix concurrency and ProducerFencedExceptions

Enhancement: make the thread responsible for committing the thread which runs the commit upon close
Have onPartitionsRevoked be responsible for committing on close, instead of an explicit call to commit by controller

Make sure Broker Poller now drains properly, committing any waiting work
Add missing revoke flow to MockConsumer wrapper
Add missing latch timeout check
astubbs pushed a commit to astubbs/parallel-consumer that referenced this issue Nov 20, 2020
…mits

- Choose either Consumer sync or async commits
- Fixes confluentinc#25 confluentinc#25:
-- Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION confluentinc#25
- ReentrantReadWrite lock protects non-thread safe transactional producer from incorrect multithreaded use
- Wider lock to prevent transaction's containing produced messages that they shouldn't
- Implement non transactional synchronous commit sync properly
- Select tests adapted to non transactional as well
- Must start tx in MockProducer as well
- Adds supervision to poller
- Fixes a performance issue with the async committer not being woken up
- Enhances tests to run under multiple commit modes
- Fixes example app tests - incorrectly testing wrong thing and MockProducer not configured to auto complete
- Make committer thread revoke partitions and commit
- Have onPartitionsRevoked be responsible for committing on close, instead of an explicit call to commit by controller
- Make sure Broker Poller now drains properly, committing any waiting work
- Add missing revoke flow to MockConsumer wrapper
- Add missing latch timeout check
astubbs pushed a commit to astubbs/parallel-consumer that referenced this issue Nov 20, 2020
…mits

- Choose either Consumer sync or async commits
- Fixes confluentinc#25 confluentinc#25:
-- Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION confluentinc#25
- ReentrantReadWrite lock protects non-thread safe transactional producer from incorrect multithreaded use
- Wider lock to prevent transaction's containing produced messages that they shouldn't
- Implement non transactional synchronous commit sync properly
- Select tests adapted to non transactional as well
- Must start tx in MockProducer as well
- Adds supervision to poller
- Fixes a performance issue with the async committer not being woken up
- Enhances tests to run under multiple commit modes
- Fixes example app tests - incorrectly testing wrong thing and MockProducer not configured to auto complete
- Make committer thread revoke partitions and commit
- Have onPartitionsRevoked be responsible for committing on close, instead of an explicit call to commit by controller
- Make sure Broker Poller now drains properly, committing any waiting work
- Add missing revoke flow to MockConsumer wrapper
- Add missing latch timeout check
astubbs pushed a commit to astubbs/parallel-consumer that referenced this issue Nov 20, 2020
…mits

- Choose either Consumer sync or async commits
- Fixes confluentinc#25 confluentinc#25:
-- Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION confluentinc#25
- ReentrantReadWrite lock protects non-thread safe transactional producer from incorrect multithreaded use
- Wider lock to prevent transaction's containing produced messages that they shouldn't
- Implement non transactional synchronous commit sync properly
- Select tests adapted to non transactional as well
- Must start tx in MockProducer as well
- Adds supervision to poller
- Fixes a performance issue with the async committer not being woken up
- Enhances tests to run under multiple commit modes
- Fixes example app tests - incorrectly testing wrong thing and MockProducer not configured to auto complete
- Make committer thread revoke partitions and commit
- Have onPartitionsRevoked be responsible for committing on close, instead of an explicit call to commit by controller
- Make sure Broker Poller now drains properly, committing any waiting work
- Add missing revoke flow to MockConsumer wrapper
- Add missing latch timeout check
astubbs pushed a commit to astubbs/parallel-consumer that referenced this issue Nov 20, 2020
…mits

- Choose either Consumer sync or async commits
- Fixes confluentinc#25 confluentinc#25:
-- Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION confluentinc#25
- ReentrantReadWrite lock protects non-thread safe transactional producer from incorrect multithreaded use
- Wider lock to prevent transaction's containing produced messages that they shouldn't
- Implement non transactional synchronous commit sync properly
- Select tests adapted to non transactional as well
- Must start tx in MockProducer as well
- Adds supervision to poller
- Fixes a performance issue with the async committer not being woken up
- Enhances tests to run under multiple commit modes
- Fixes example app tests - incorrectly testing wrong thing and MockProducer not configured to auto complete
- Make committer thread revoke partitions and commit
- Have onPartitionsRevoked be responsible for committing on close, instead of an explicit call to commit by controller
- Make sure Broker Poller now drains properly, committing any waiting work
- Add missing revoke flow to MockConsumer wrapper
- Add missing latch timeout check
astubbs pushed a commit to astubbs/parallel-consumer that referenced this issue Nov 23, 2020
…mits

- Choose either Consumer sync or async commits
- Fixes confluentinc#25 confluentinc#25:
-- Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION confluentinc#25
- ReentrantReadWrite lock protects non-thread safe transactional producer from incorrect multithreaded use
- Wider lock to prevent transaction's containing produced messages that they shouldn't
- Implement non transactional synchronous commit sync properly
- Select tests adapted to non transactional as well
- Must start tx in MockProducer as well
- Adds supervision to poller
- Fixes a performance issue with the async committer not being woken up
- Enhances tests to run under multiple commit modes
- Fixes example app tests - incorrectly testing wrong thing and MockProducer not configured to auto complete
- Make committer thread revoke partitions and commit
- Have onPartitionsRevoked be responsible for committing on close, instead of an explicit call to commit by controller
- Make sure Broker Poller now drains properly, committing any waiting work
- Add missing revoke flow to MockConsumer wrapper
- Add missing latch timeout check
astubbs added a commit to astubbs/parallel-consumer that referenced this issue Nov 23, 2020
…mits

- Choose either Consumer sync or async commits
- Fixes confluentinc#25 confluentinc#25:
-- Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION confluentinc#25
- ReentrantReadWrite lock protects non-thread safe transactional producer from incorrect multithreaded use
- Wider lock to prevent transaction's containing produced messages that they shouldn't
- Implement non transactional synchronous commit sync properly
- Select tests adapted to non transactional as well
- Must start tx in MockProducer as well
- Adds supervision to poller
- Fixes a performance issue with the async committer not being woken up
- Enhances tests to run under multiple commit modes
- Fixes example app tests - incorrectly testing wrong thing and MockProducer not configured to auto complete
- Make committer thread revoke partitions and commit
- Have onPartitionsRevoked be responsible for committing on close, instead of an explicit call to commit by controller
- Make sure Broker Poller now drains properly, committing any waiting work
- Add missing revoke flow to MockConsumer wrapper
- Add missing latch timeout check
astubbs added a commit that referenced this issue Nov 23, 2020
…mits

- Choose either Consumer sync or async commits
- Fixes #25 #25:
-- Sometimes a a transaction error occurs - Cannot call send in state COMMITTING_TRANSACTION #25
- ReentrantReadWrite lock protects non-thread safe transactional producer from incorrect multithreaded use
- Wider lock to prevent transaction's containing produced messages that they shouldn't
- Implement non transactional synchronous commit sync properly
- Select tests adapted to non transactional as well
- Must start tx in MockProducer as well
- Adds supervision to poller
- Fixes a performance issue with the async committer not being woken up
- Enhances tests to run under multiple commit modes
- Fixes example app tests - incorrectly testing wrong thing and MockProducer not configured to auto complete
- Make committer thread revoke partitions and commit
- Have onPartitionsRevoked be responsible for committing on close, instead of an explicit call to commit by controller
- Make sure Broker Poller now drains properly, committing any waiting work
- Add missing revoke flow to MockConsumer wrapper
- Add missing latch timeout check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
verified bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants