RequestReplyHelper#call fails with IllegalMonitorStateException #118

hsestupin · 2015-09-16T14:43:25Z

Hey.
First, I apologize for not being able to share all the code reproducing the problem which happens quite rarely (it seems like there is a race condition somewhere).
But I can describe environment in some words - we have complex actor's interaction algorithm at our actor system, some of them are communicating via RequestReplyHelper#call. And sometimes this exception raises. The problems are

I don't have enough knowledge about actors mailbox queue internals, and don't understand what happens
There isn't lots of javadoc comments such internal classes like SingleConsumerQueueChannel, OwnedSynchronizer

So its hard for me to reason about the source of this exception. Could you help me to find out are we doing something wrong or is this a bug in quasar-actors? It would be great if you've already encountered this problem and know the mistakes we are doing. Thanks.

java.lang.IllegalMonitorStateException: attempt by Fiber@10000046:Communicator[task: ParkableForkJoinTask@56d103cb(Fiber@10000046), target: co.paralleluniverse.actors.ActorRunner@2c27afe, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@6a9637db] but owned by Fiber@10000046:Communicator[task: ParkableForkJoinTask@56d103cb(Fiber@10000046), target: co.paralleluniverse.actors.ActorRunner@2c27afe, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@6a9637db]
    at co.paralleluniverse.strands.OwnedSynchronizer.register(OwnedSynchronizer.java:36)
    at co.paralleluniverse.strands.channels.SingleConsumerQueueChannel.receive(SingleConsumerQueueChannel.java:87)
    at co.paralleluniverse.actors.Actor.receive(Actor.java:462)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:236)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.doExec(FiberForkJoinScheduler.java:272)
    at co.paralleluniverse.fibers.Fiber.immediateExecHelper(Fiber.java:1205)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:1173)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark1(Fiber.java:691)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark(Fiber.java:619)
    at co.paralleluniverse.strands.Strand.yieldAndUnpark(Strand.java:531)
    at co.paralleluniverse.strands.OwnedSynchronizer.signalAndWait(OwnedSynchronizer.java:66)
    at co.paralleluniverse.strands.channels.QueueChannel.signalAndWait(QueueChannel.java:109)
    at co.paralleluniverse.strands.channels.QueueChannel.send0(QueueChannel.java:245)
    at co.paralleluniverse.strands.channels.QueueChannel.sendSync(QueueChannel.java:204)
    at co.paralleluniverse.actors.Mailbox.sendSync(Mailbox.java:73)
    at co.paralleluniverse.actors.Actor.sendSync(Actor.java:429)
    at co.paralleluniverse.actors.ActorRef.sendSync(ActorRef.java:97)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:173)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)

The text was updated successfully, but these errors were encountered:

circlespainter · 2015-09-17T03:40:03Z

At first glance it looks to me like an actor's mailbox is being shared with some other fiber/actor trying to receive messages from it (and usually actors mailboxes are single-consumer as they are meant to be private to the actor for receive operations).

Is something like this somehow happening in your code?

hsestupin · 2015-09-17T10:01:50Z

Actually I don't share actor's mailbox anywhere. All communications beetween actors is implemented through primitive message sending or RequestReplyHelper.call pattern. But what I found looks strange for me. Method ActorRef#sendSync is calling Fiber.exec according to the following stacktrace.

Caused by: java.lang.IllegalMonitorStateException: attempt by Fiber@10000014:Communicator[task: ParkableForkJoinTask@40e50f3a(Fiber@10000014), target: co.paralleluniverse.actors.ActorRunner@11999a26, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@3fcc90e2] but owned by Fiber@10000014:Communicator[task: ParkableForkJoinTask@40e50f3a(Fiber@10000014), target: co.paralleluniverse.actors.ActorRunner@11999a26, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@3fcc90e2]
    at co.paralleluniverse.strands.OwnedSynchronizer.register(OwnedSynchronizer.java:36)
    at co.paralleluniverse.strands.channels.SingleConsumerQueueChannel.receive(SingleConsumerQueueChannel.java:87)
    at co.paralleluniverse.actors.Actor.receive(Actor.java:462)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:236)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.doExec(FiberForkJoinScheduler.java:272)
    at co.paralleluniverse.fibers.Fiber.immediateExecHelper(Fiber.java:1205)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:1173)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark1(Fiber.java:691)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark(Fiber.java:619)
    at co.paralleluniverse.strands.Strand.yieldAndUnpark(Strand.java:531)
    at co.paralleluniverse.strands.OwnedSynchronizer.signalAndWait(OwnedSynchronizer.java:66)
    at co.paralleluniverse.strands.channels.QueueChannel.signalAndWait(QueueChannel.java:109)
    at co.paralleluniverse.strands.channels.QueueChannel.send0(QueueChannel.java:245)
    at co.paralleluniverse.strands.channels.QueueChannel.sendSync(QueueChannel.java:204)
    at co.paralleluniverse.actors.Mailbox.sendSync(Mailbox.java:73)
    at co.paralleluniverse.actors.Actor.sendSync(Actor.java:429)
    at co.paralleluniverse.actors.ActorRef.sendSync(ActorRef.java:97)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:173)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)

I guess at the moment when sendSync is trying to unpark some receiver-actor - exactly that actor is blocked on his mailbox receive method. Am I correct?
So I can inctroduce to you 2 stacktraces which are called OwnedSynchronizer.register on the same mailbox sequentially without calling OwnedSynchronizer.unregister by none of both fibers.

The first one (RequestReplyCall is nested in RequestReplyCall) -

at co.paralleluniverse.strands.OwnedSynchronizer.register(OwnedSynchronizer.java:35)
    at co.paralleluniverse.actors.Mailbox.lock(Mailbox.java:82)
    at co.paralleluniverse.actors.SelectiveReceiveHelper.receive(SelectiveReceiveHelper.java:103)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:174)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
    at service.actor.RemoteTransceiver.lambda$doRequest$1(RemoteTransceiver.java:80)
    at service.actor.RemoteTransceiver$$Lambda$52/525708236.get(Unknown Source)
    at util.Suspendables.getSuppressedInterruptable(Suspendables.java:58)
    at service.actor.RemoteTransceiver.transceive(RemoteTransceiver.java:60)
    at org.apache.avro.ipc.Requestor.getRemote(Requestor.java:285)
    at service.actor.ActorRegistry$AsyncSpecificRequestor$AsyncRequest.run(ActorRegistry.java:183)
    at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1618)
    at service.actor.avro.ActorRpcWrapper$$Lambda$32/13750979.accept(Unknown Source)
    at service.actor.avro.ActorRpcWrapper.handleMessage(ActorRpcWrapper.java:86)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:237)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.doExec(FiberForkJoinScheduler.java:272)
    at co.paralleluniverse.fibers.Fiber.immediateExecHelper(Fiber.java:1205)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:1173)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark1(Fiber.java:691)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark(Fiber.java:619)
    at co.paralleluniverse.strands.Strand.yieldAndUnpark(Strand.java:531)
    at co.paralleluniverse.strands.OwnedSynchronizer.signalAndWait(OwnedSynchronizer.java:66)
    at co.paralleluniverse.strands.channels.QueueChannel.signalAndWait(QueueChannel.java:109)
    at co.paralleluniverse.strands.channels.QueueChannel.send0(QueueChannel.java:245)
    at co.paralleluniverse.strands.channels.QueueChannel.sendSync(QueueChannel.java:204)
    at co.paralleluniverse.actors.Mailbox.sendSync(Mailbox.java:73)
    at co.paralleluniverse.actors.Actor.sendSync(Actor.java:429)
    at co.paralleluniverse.actors.ActorRef.sendSync(ActorRef.java:97)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:173)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
    at service.actor.RemoteTransceiver.lambda$doRequest$1(RemoteTransceiver.java:80)
    at service.actor.RemoteTransceiver$$Lambda$52/525708236.get(Unknown Source)
    at util.Suspendables.getSuppressedInterruptable(Suspendables.java:58)
    at service.actor.SyncTransceiver.transceive(SyncTransceiver.java:33)
    at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
    at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
    at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:118)
    at service.actor.ActorRegistry$RpcInvocationHandler.handleInvocation(ActorRegistry.java:212)
    at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
    at com.sun.proxy.$Proxy16.registerHttpHandler(Unknown Source)
    at lobby.http.CommandExecutor$RegisterCommand.run(CommandExecutor.java:146)
    at service.actor.avro.ActorRpcWrapper$$Lambda$28/541874155.accept(Unknown Source)
    at service.actor.avro.ActorRpcWrapper.handleMessage(ActorRpcWrapper.java:86)
    at lobby.http.CommandExecutor.handleMessage(CommandExecutor.java:74)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:237)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at jsr166e.ForkJoinTask.doExec(ForkJoinTask.java:261)
    at jsr166e.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:988)
    at jsr166e.ForkJoinPool.runWorker(ForkJoinPool.java:1628)
    at jsr166e.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

And the second

    at co.paralleluniverse.strands.OwnedSynchronizer.register(OwnedSynchronizer.java:35)
    at co.paralleluniverse.strands.channels.SingleConsumerQueueChannel.receive(SingleConsumerQueueChannel.java:87)
    at co.paralleluniverse.actors.Actor.receive(Actor.java:462)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:236)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.doExec(FiberForkJoinScheduler.java:272)
    at co.paralleluniverse.fibers.Fiber.immediateExecHelper(Fiber.java:1205)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:1173)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark1(Fiber.java:691)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark(Fiber.java:619)
    at co.paralleluniverse.strands.Strand.yieldAndUnpark(Strand.java:531)
    at co.paralleluniverse.strands.OwnedSynchronizer.signalAndWait(OwnedSynchronizer.java:66)
    at co.paralleluniverse.strands.channels.QueueChannel.signalAndWait(QueueChannel.java:109)
    at co.paralleluniverse.strands.channels.QueueChannel.send0(QueueChannel.java:245)
    at co.paralleluniverse.strands.channels.QueueChannel.sendSync(QueueChannel.java:204)
    at co.paralleluniverse.actors.Mailbox.sendSync(Mailbox.java:73)
    at co.paralleluniverse.actors.Actor.sendSync(Actor.java:429)
    at co.paralleluniverse.actors.ActorRef.sendSync(ActorRef.java:97)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:173)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
    at service.actor.RemoteTransceiver.lambda$doRequest$1(RemoteTransceiver.java:80)
    at service.actor.RemoteTransceiver$$Lambda$52/525708236.get(Unknown Source)
    at util.Suspendables.getSuppressedInterruptable(Suspendables.java:58)
    at service.actor.SyncTransceiver.transceive(SyncTransceiver.java:33)
    at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
    at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
    at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:118)
    at service.actor.ActorRegistry$RpcInvocationHandler.handleInvocation(ActorRegistry.java:212)
    at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
    at com.sun.proxy.$Proxy16.registerHttpHandler(Unknown Source)
    at lobby.http.CommandExecutor$RegisterCommand.run(CommandExecutor.java:146)
    at service.actor.avro.ActorRpcWrapper$$Lambda$28/541874155.accept(Unknown Source)
    at service.actor.avro.ActorRpcWrapper.handleMessage(ActorRpcWrapper.java:86)
    at lobby.http.CommandExecutor.handleMessage(CommandExecutor.java:74)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:237)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at jsr166e.ForkJoinTask.doExec(ForkJoinTask.java:261)
    at jsr166e.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:988)
    at jsr166e.ForkJoinPool.runWorker(ForkJoinPool.java:1628)
    at jsr166e.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

pron · 2015-09-17T13:13:19Z

Hi.
Can you explain what you mean exactly by

So I can inctroduce to you 2 stacktraces which are called OwnedSynchronizer.register on the same mailbox sequentially without calling OwnedSynchronizer.unregister by none of both fibers.

Especially when you say sequentially?

hsestupin · 2015-09-17T13:19:46Z

Hi,
I meant that expected behaviour of working with OwnedSynchronizer object is the following - for each register call we had to call unregister before calling register again... Am I right? And I used word sequentially to point out there was no unregister method call between 2 sequantially
register calls. Does this make sense? Sorry for my bad explanation, hope it becomes clearer now

pron · 2015-09-17T13:36:32Z

Yes, you are correct. But how are you reproducing it and how do you know there's no unregister?

hsestupin · 2015-09-17T13:45:51Z

how do you know there's no unregister?

I've simply added 2 breakpoints on register and unregister methods that prints stacktraces.

But how are you reproducing

Unfortunately, I don't understand how to reproduce it, it happens quite rarely. I need some time to write a minimal showcase for reproducing it..

pron · 2015-09-18T09:11:17Z

Do you provide any configurations to the actor's mailbox (size, policy)?

hsestupin · 2015-09-18T09:14:39Z

No, I don't. Every actor is constructed by co.paralleluniverse.actors.behaviors.BehaviorActor#BehaviorActor(java.lang.String)

pron · 2015-09-18T10:06:17Z

I've put unregister in a finally block (where it should be anyway), but I think there must be some exception thrown for unregister not to have been called.

Anyway, can you try with 0.7.4-SNAPSHOT (built from the 0.7.0 branch and uploaded to sonatype)

hsestupin · 2015-09-18T10:09:56Z

Yeah, for sure I can. Thanks.
(Actually you didn't have to upload new release, cause we already have our quasar fork :) )

pron · 2015-09-18T10:11:14Z

Good. In that case just note that all bug-fixes go into the 0.7.0 branch.

hsestupin · 2015-09-18T12:04:25Z

Unfortunately, it doesn't help. Everything remains the same, so we can conclude there is no exception thrown between calls to register and unregister.

Don't you find strange that exception message

java.lang.IllegalMonitorStateException: attempt by Fiber@10000002:Communicator[task: ParkableForkJoinTask@35c6721a(Fiber@10000002), target: co.paralleluniverse.actors.ActorRunner@2613a67d, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@4ef21c7a] but owned by Fiber@10000002:Communicator[task: ParkableForkJoinTask@35c6721a(Fiber@10000002), target: co.paralleluniverse.actors.ActorRunner@2613a67d, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@4ef21c7a]

tells us that exactly the same strand is trying to call OwnedSynchronizer.register once again?

pron · 2015-09-18T12:18:54Z

Yes. Unfortunately my debugging capabilities are diminished due to not being able to reproduce the problem, so any further information you can provide -- like a flight recorder dump (if you can reproduce with it on) -- would be very helpful.

hsestupin · 2015-09-18T12:47:34Z

This is so frustrating. With your commit test starts to always pass when I run gradle in debug-mode and fails without debug-mode sometimes like before...

I've just run it 100 times in debug mode and it passes always. May be it can be explained by the fact that debug-mode disables some JIT-specific optimizations and chances to make race condition happen are close to zero due to performance drop.

Ok, my next step would be to provide here flight recorder dump OR close this bug. Just need some time. thanks

pron · 2015-09-18T13:00:02Z

No need to close this bug. It's a real race condition, and eventually we'll catch it. It isn't the first one nor the last. :)

hsestupin · 2015-10-01T07:53:10Z

It seems I found out the reason why unregister doesn't get called. I have uninstrumented method in a callstack:

WARNING: Uninstrumented methods (marked '**') or call-sites (marked '!!') detected on the call stack: 
    at co.paralleluniverse.common.util.ExtendedStackTrace.here (ExtendedStackTrace.java:44 bci: 8)
    at co.paralleluniverse.fibers.Fiber.checkInstrumentation (Fiber.java:1618 bci: 0)
    at co.paralleluniverse.fibers.Fiber.verifySuspend (Fiber.java:1591 bci: 6)
    at co.paralleluniverse.fibers.Fiber.verifySuspend (Fiber.java:1586 bci: 3)
    at co.paralleluniverse.fibers.Fiber.park (Fiber.java:583 bci: 0)
    at co.paralleluniverse.fibers.Fiber.park (Fiber.java:595 bci: 4)
    at co.paralleluniverse.strands.Strand.park(java.lang.Object) (Strand.java:493 bci: 78)
    at co.paralleluniverse.strands.ConditionSynchronizer.await(int) (ConditionSynchronizer.java:54 bci: 255)
    at co.paralleluniverse.actors.Mailbox.await(int) (Mailbox.java:90 bci: 99)
    at co.paralleluniverse.actors.SelectiveReceiveHelper.receive(long,java.util.concurrent.TimeUnit,co.paralleluniverse.actors.MessageProcessor) (SelectiveReceiveHelper.java:145 bci: 917)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(co.paralleluniverse.actors.ActorRef,co.paralleluniverse.actors.behaviors.RequestMessage,long,java.util.concurrent.TimeUnit) (RequestReplyHelper.java:174 bci: 663)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(co.paralleluniverse.actors.ActorRef,co.paralleluniverse.actors.behaviors.RequestMessage) (RequestReplyHelper.java:112 bci: 335)
    at service.actor.RemoteTransceiver$GetAvroResponse.get) (RemoteTransceiver.java:102 bci: 228)
    at service.actor.RemoteTransceiver$GetAvroResponse.get) (RemoteTransceiver.java:82 bci: 1) (optimized)
    at util.Suspendables.getSuppressedInterruptable(util.function.SuspendableSupplier) (Suspendables.java:58 bci: 88)
    at service.actor.RemoteTransceiver.transceive(java.util.List) (RemoteTransceiver.java:62 bci: 132)
    at org.apache.avro.ipc.Requestor.getRemote) (Requestor.java:285 bci: 281)
    at service.actor.ActorRegistry$AsyncHandler$MyRunnable.run) (ActorRegistry.java:179 bci: 108)
    at java.util.concurrent.CompletableFuture$AsyncRun.run) (CompletableFuture.java:1618 bci: 36) **
    at service.actor.avro.ActorRpcWrapper$$Lambda$32/1406230445.accept (Unknown Source bci: 4)
    at service.actor.avro.ActorRpcWrapper.handleMessage (ActorRpcWrapper.java:92 bci: 208)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior (BehaviorActor.java:237 bci: 137)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun (BehaviorActor.java:293 bci: 122)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun (BehaviorActor.java:36 bci: 1) (optimized)
    at co.paralleluniverse.actors.Actor.run0 (Actor.java:697 bci: 244)
    at co.paralleluniverse.actors.ActorRunner.run (ActorRunner.java:51 bci: 148)
    at co.paralleluniverse.fibers.Fiber.run (Fiber.java:1024 bci: 11)
    at co.paralleluniverse.fibers.Fiber.run1 (Fiber.java:1019 bci: 1)

And what we've got in that case - when fiber parks it throws SuspendExecution and then in some place it's catched and is not rethrown as a RuntimeSuspendExecution. So the parked fiber couldn't be unparked correctly and it loses it stack including lines

at co.paralleluniverse.strands.ConditionSynchronizer.await(int) (ConditionSynchronizer.java:54 bci: 255)
at co.paralleluniverse.actors.Mailbox.await(int) (Mailbox.java:90 bci: 99)
at co.paralleluniverse.actors.SelectiveReceiveHelper.receive(long,java.util.concurrent.TimeUnit,co.paralleluniverse.actors.MessageProcessor) (SelectiveReceiveHelper.java:145 bci: 917)

That means method co.paralleluniverse.actors.Mailbox#unlock is never called. If my theory is correct then I need to instrument this method

at java.util.concurrent.CompletableFuture$AsyncRun.run) (CompletableFuture.java:1618 bci: 36) **

But I've tried to add

java.util.concurrent.CompletableFuture$AsyncRun.run or
java.util.concurrent.CompletableFuture.AsyncRun.run

to my META-INF/suspendables file. Unfortunately neither of those helps.
How could I force Quasar to instrument java.util.concurrent.CompletableFuture$AsyncRun.run?

hsestupin · 2015-10-01T10:27:53Z

Yeah, it seems I was right. If I spawn all my actors with spawnThread (so it means Fiber.park will not being called) - than this problem goes away and never appears.

pron · 2015-10-01T16:05:09Z

There should never be reason to instrument CompletableFuture. Either it's doing nothing in this case, or it's trying to do something asynchronous, while Quasar is inherently synchronous. Perhaps if you could explain what it is that your behavior method does, I could help with removing the CompletableFuture or using it differently.

hsestupin · 2015-10-05T09:16:34Z

I've got your point! I fixed that problem, now instrumentation works fine (I got rid of CompletableFuture.AsyncRun at my stack trace) and this error never happens.

So summing up my experience I had to say that Quasar users should pay more attention to uninstrumented methods warnings otherwise they can get unpredictable behaviour of the application if SuspendExecution exception is silently ignored somewhere.

In any case I would like to ask one more question if you don't mind. Don't know is it related to this issue or not. When fiber is unparked is there any guarantee that worker thread would be the same as one before parking?

It seems for me new worker thread would be obtained from fiber-pool without any guarantee... To deal with that problem quasar provides some concurrent abstractions like co.paralleluniverse.strands.concurrent.ReentrantLock. But what should I do with third-party library classes which uses jdk native java.util.concurrent.locks.ReentrantLock ? At this point I rarely get exception:

java.lang.IllegalMonitorStateException
    at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
    at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
    at org.apache.avro.ipc.Requestor.getRemote(Requestor.java:292)

Thanks

pron · 2015-10-05T09:37:17Z

When fiber is unparked is there any guarantee that worker thread would be the same as one before parking?

That entirely depends on the FiberScheduler. The default one does not make this guarantee.

To deal with that problem

Why is that a problem? Thread-locals travel with the fiber.

java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)

Ah, that's code that explicitly accesses the identity of the underlying thread. If you must use this in a fiber (and why does the fiber block between lock acquisition and release?) you can use a scheduler that pins fibers to threads. It shouldn't be hard to do. The simplest one is simply a scheduler that schedules all fibers onto a single thread, which is created with:

new FiberExecutorScheduler(Executors.newSingleThreadExecutor())

Of course, you are free to use different scheduler for different fibers and mix them freely.

However, in this case I would consider (again, without knowing your codebase, so it may or may not be a good suggestion) to use plain threads to call the IO library, and have them send messages to fibers over channels/actor-mailboxes.

hsestupin · 2015-10-05T10:25:33Z

Thanks for fast reply, that makes sense. I think your last suggestion is the most suitable for us. And as a temporary solution I wrapped call org.apache.avro.ipc.Requestor.getRemote to co.paralleluniverse.fibers.FiberAsync#runBlocking(java.util.concurrent.ExecutorService, co.paralleluniverse.common.util.CheckedCallable<V,E>) with single thread pool as a first argument. And now java.util.concurrent.locks.ReentrantLock starts to work correct in a thread worker. Thanks again for your really useful responses.

I think this issue can be closed because the original problem was happening due to incomplete instrumentation of application code.

pron · 2015-10-05T11:55:01Z

Glad to be of service.

pron added a commit that referenced this issue Sep 18, 2015

Put sync.unregister() in finally. See #118

fefd6d9

pron closed this as completed Oct 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RequestReplyHelper#call fails with IllegalMonitorStateException #118

RequestReplyHelper#call fails with IllegalMonitorStateException #118

hsestupin commented Sep 16, 2015

circlespainter commented Sep 17, 2015

hsestupin commented Sep 17, 2015

pron commented Sep 17, 2015

hsestupin commented Sep 17, 2015

pron commented Sep 17, 2015

hsestupin commented Sep 17, 2015

pron commented Sep 18, 2015

hsestupin commented Sep 18, 2015

pron commented Sep 18, 2015

hsestupin commented Sep 18, 2015

pron commented Sep 18, 2015

hsestupin commented Sep 18, 2015

pron commented Sep 18, 2015

hsestupin commented Sep 18, 2015

pron commented Sep 18, 2015

hsestupin commented Oct 1, 2015

hsestupin commented Oct 1, 2015

pron commented Oct 1, 2015

hsestupin commented Oct 5, 2015

pron commented Oct 5, 2015

hsestupin commented Oct 5, 2015

pron commented Oct 5, 2015

RequestReplyHelper#call fails with IllegalMonitorStateException #118

RequestReplyHelper#call fails with IllegalMonitorStateException #118

Comments

hsestupin commented Sep 16, 2015

circlespainter commented Sep 17, 2015

hsestupin commented Sep 17, 2015

pron commented Sep 17, 2015

hsestupin commented Sep 17, 2015

pron commented Sep 17, 2015

hsestupin commented Sep 17, 2015

pron commented Sep 18, 2015

hsestupin commented Sep 18, 2015

pron commented Sep 18, 2015

hsestupin commented Sep 18, 2015

pron commented Sep 18, 2015

hsestupin commented Sep 18, 2015

pron commented Sep 18, 2015

hsestupin commented Sep 18, 2015

pron commented Sep 18, 2015

hsestupin commented Oct 1, 2015

hsestupin commented Oct 1, 2015

pron commented Oct 1, 2015

hsestupin commented Oct 5, 2015

pron commented Oct 5, 2015

hsestupin commented Oct 5, 2015

pron commented Oct 5, 2015