Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestReplyHelper#call fails with IllegalMonitorStateException #118

Closed
hsestupin opened this issue Sep 16, 2015 · 22 comments
Closed

RequestReplyHelper#call fails with IllegalMonitorStateException #118

hsestupin opened this issue Sep 16, 2015 · 22 comments

Comments

@hsestupin
Copy link
Contributor

Hey.
First, I apologize for not being able to share all the code reproducing the problem which happens quite rarely (it seems like there is a race condition somewhere).
But I can describe environment in some words - we have complex actor's interaction algorithm at our actor system, some of them are communicating via RequestReplyHelper#call. And sometimes this exception raises. The problems are

  1. I don't have enough knowledge about actors mailbox queue internals, and don't understand what happens
  2. There isn't lots of javadoc comments such internal classes like SingleConsumerQueueChannel, OwnedSynchronizer

So its hard for me to reason about the source of this exception. Could you help me to find out are we doing something wrong or is this a bug in quasar-actors? It would be great if you've already encountered this problem and know the mistakes we are doing. Thanks.

java.lang.IllegalMonitorStateException: attempt by Fiber@10000046:Communicator[task: ParkableForkJoinTask@56d103cb(Fiber@10000046), target: co.paralleluniverse.actors.ActorRunner@2c27afe, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@6a9637db] but owned by Fiber@10000046:Communicator[task: ParkableForkJoinTask@56d103cb(Fiber@10000046), target: co.paralleluniverse.actors.ActorRunner@2c27afe, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@6a9637db]
    at co.paralleluniverse.strands.OwnedSynchronizer.register(OwnedSynchronizer.java:36)
    at co.paralleluniverse.strands.channels.SingleConsumerQueueChannel.receive(SingleConsumerQueueChannel.java:87)
    at co.paralleluniverse.actors.Actor.receive(Actor.java:462)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:236)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.doExec(FiberForkJoinScheduler.java:272)
    at co.paralleluniverse.fibers.Fiber.immediateExecHelper(Fiber.java:1205)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:1173)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark1(Fiber.java:691)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark(Fiber.java:619)
    at co.paralleluniverse.strands.Strand.yieldAndUnpark(Strand.java:531)
    at co.paralleluniverse.strands.OwnedSynchronizer.signalAndWait(OwnedSynchronizer.java:66)
    at co.paralleluniverse.strands.channels.QueueChannel.signalAndWait(QueueChannel.java:109)
    at co.paralleluniverse.strands.channels.QueueChannel.send0(QueueChannel.java:245)
    at co.paralleluniverse.strands.channels.QueueChannel.sendSync(QueueChannel.java:204)
    at co.paralleluniverse.actors.Mailbox.sendSync(Mailbox.java:73)
    at co.paralleluniverse.actors.Actor.sendSync(Actor.java:429)
    at co.paralleluniverse.actors.ActorRef.sendSync(ActorRef.java:97)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:173)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
@circlespainter
Copy link
Member

At first glance it looks to me like an actor's mailbox is being shared with some other fiber/actor trying to receive messages from it (and usually actors mailboxes are single-consumer as they are meant to be private to the actor for receive operations).

Is something like this somehow happening in your code?

@hsestupin
Copy link
Contributor Author

Actually I don't share actor's mailbox anywhere. All communications beetween actors is implemented through primitive message sending or RequestReplyHelper.call pattern. But what I found looks strange for me. Method ActorRef#sendSync is calling Fiber.exec according to the following stacktrace.

Caused by: java.lang.IllegalMonitorStateException: attempt by Fiber@10000014:Communicator[task: ParkableForkJoinTask@40e50f3a(Fiber@10000014), target: co.paralleluniverse.actors.ActorRunner@11999a26, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@3fcc90e2] but owned by Fiber@10000014:Communicator[task: ParkableForkJoinTask@40e50f3a(Fiber@10000014), target: co.paralleluniverse.actors.ActorRunner@11999a26, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@3fcc90e2]
    at co.paralleluniverse.strands.OwnedSynchronizer.register(OwnedSynchronizer.java:36)
    at co.paralleluniverse.strands.channels.SingleConsumerQueueChannel.receive(SingleConsumerQueueChannel.java:87)
    at co.paralleluniverse.actors.Actor.receive(Actor.java:462)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:236)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.doExec(FiberForkJoinScheduler.java:272)
    at co.paralleluniverse.fibers.Fiber.immediateExecHelper(Fiber.java:1205)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:1173)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark1(Fiber.java:691)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark(Fiber.java:619)
    at co.paralleluniverse.strands.Strand.yieldAndUnpark(Strand.java:531)
    at co.paralleluniverse.strands.OwnedSynchronizer.signalAndWait(OwnedSynchronizer.java:66)
    at co.paralleluniverse.strands.channels.QueueChannel.signalAndWait(QueueChannel.java:109)
    at co.paralleluniverse.strands.channels.QueueChannel.send0(QueueChannel.java:245)
    at co.paralleluniverse.strands.channels.QueueChannel.sendSync(QueueChannel.java:204)
    at co.paralleluniverse.actors.Mailbox.sendSync(Mailbox.java:73)
    at co.paralleluniverse.actors.Actor.sendSync(Actor.java:429)
    at co.paralleluniverse.actors.ActorRef.sendSync(ActorRef.java:97)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:173)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)

I guess at the moment when sendSync is trying to unpark some receiver-actor - exactly that actor is blocked on his mailbox receive method. Am I correct?
So I can inctroduce to you 2 stacktraces which are called OwnedSynchronizer.register on the same mailbox sequentially without calling OwnedSynchronizer.unregister by none of both fibers.

The first one (RequestReplyCall is nested in RequestReplyCall) -

at co.paralleluniverse.strands.OwnedSynchronizer.register(OwnedSynchronizer.java:35)
    at co.paralleluniverse.actors.Mailbox.lock(Mailbox.java:82)
    at co.paralleluniverse.actors.SelectiveReceiveHelper.receive(SelectiveReceiveHelper.java:103)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:174)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
    at service.actor.RemoteTransceiver.lambda$doRequest$1(RemoteTransceiver.java:80)
    at service.actor.RemoteTransceiver$$Lambda$52/525708236.get(Unknown Source)
    at util.Suspendables.getSuppressedInterruptable(Suspendables.java:58)
    at service.actor.RemoteTransceiver.transceive(RemoteTransceiver.java:60)
    at org.apache.avro.ipc.Requestor.getRemote(Requestor.java:285)
    at service.actor.ActorRegistry$AsyncSpecificRequestor$AsyncRequest.run(ActorRegistry.java:183)
    at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1618)
    at service.actor.avro.ActorRpcWrapper$$Lambda$32/13750979.accept(Unknown Source)
    at service.actor.avro.ActorRpcWrapper.handleMessage(ActorRpcWrapper.java:86)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:237)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.doExec(FiberForkJoinScheduler.java:272)
    at co.paralleluniverse.fibers.Fiber.immediateExecHelper(Fiber.java:1205)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:1173)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark1(Fiber.java:691)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark(Fiber.java:619)
    at co.paralleluniverse.strands.Strand.yieldAndUnpark(Strand.java:531)
    at co.paralleluniverse.strands.OwnedSynchronizer.signalAndWait(OwnedSynchronizer.java:66)
    at co.paralleluniverse.strands.channels.QueueChannel.signalAndWait(QueueChannel.java:109)
    at co.paralleluniverse.strands.channels.QueueChannel.send0(QueueChannel.java:245)
    at co.paralleluniverse.strands.channels.QueueChannel.sendSync(QueueChannel.java:204)
    at co.paralleluniverse.actors.Mailbox.sendSync(Mailbox.java:73)
    at co.paralleluniverse.actors.Actor.sendSync(Actor.java:429)
    at co.paralleluniverse.actors.ActorRef.sendSync(ActorRef.java:97)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:173)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
    at service.actor.RemoteTransceiver.lambda$doRequest$1(RemoteTransceiver.java:80)
    at service.actor.RemoteTransceiver$$Lambda$52/525708236.get(Unknown Source)
    at util.Suspendables.getSuppressedInterruptable(Suspendables.java:58)
    at service.actor.SyncTransceiver.transceive(SyncTransceiver.java:33)
    at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
    at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
    at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:118)
    at service.actor.ActorRegistry$RpcInvocationHandler.handleInvocation(ActorRegistry.java:212)
    at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
    at com.sun.proxy.$Proxy16.registerHttpHandler(Unknown Source)
    at lobby.http.CommandExecutor$RegisterCommand.run(CommandExecutor.java:146)
    at service.actor.avro.ActorRpcWrapper$$Lambda$28/541874155.accept(Unknown Source)
    at service.actor.avro.ActorRpcWrapper.handleMessage(ActorRpcWrapper.java:86)
    at lobby.http.CommandExecutor.handleMessage(CommandExecutor.java:74)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:237)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at jsr166e.ForkJoinTask.doExec(ForkJoinTask.java:261)
    at jsr166e.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:988)
    at jsr166e.ForkJoinPool.runWorker(ForkJoinPool.java:1628)
    at jsr166e.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

And the second

    at co.paralleluniverse.strands.OwnedSynchronizer.register(OwnedSynchronizer.java:35)
    at co.paralleluniverse.strands.channels.SingleConsumerQueueChannel.receive(SingleConsumerQueueChannel.java:87)
    at co.paralleluniverse.actors.Actor.receive(Actor.java:462)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:236)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.doExec(FiberForkJoinScheduler.java:272)
    at co.paralleluniverse.fibers.Fiber.immediateExecHelper(Fiber.java:1205)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:1173)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark1(Fiber.java:691)
    at co.paralleluniverse.fibers.Fiber.yieldAndUnpark(Fiber.java:619)
    at co.paralleluniverse.strands.Strand.yieldAndUnpark(Strand.java:531)
    at co.paralleluniverse.strands.OwnedSynchronizer.signalAndWait(OwnedSynchronizer.java:66)
    at co.paralleluniverse.strands.channels.QueueChannel.signalAndWait(QueueChannel.java:109)
    at co.paralleluniverse.strands.channels.QueueChannel.send0(QueueChannel.java:245)
    at co.paralleluniverse.strands.channels.QueueChannel.sendSync(QueueChannel.java:204)
    at co.paralleluniverse.actors.Mailbox.sendSync(Mailbox.java:73)
    at co.paralleluniverse.actors.Actor.sendSync(Actor.java:429)
    at co.paralleluniverse.actors.ActorRef.sendSync(ActorRef.java:97)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:173)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
    at service.actor.RemoteTransceiver.lambda$doRequest$1(RemoteTransceiver.java:80)
    at service.actor.RemoteTransceiver$$Lambda$52/525708236.get(Unknown Source)
    at util.Suspendables.getSuppressedInterruptable(Suspendables.java:58)
    at service.actor.SyncTransceiver.transceive(SyncTransceiver.java:33)
    at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
    at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
    at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:118)
    at service.actor.ActorRegistry$RpcInvocationHandler.handleInvocation(ActorRegistry.java:212)
    at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
    at com.sun.proxy.$Proxy16.registerHttpHandler(Unknown Source)
    at lobby.http.CommandExecutor$RegisterCommand.run(CommandExecutor.java:146)
    at service.actor.avro.ActorRpcWrapper$$Lambda$28/541874155.accept(Unknown Source)
    at service.actor.avro.ActorRpcWrapper.handleMessage(ActorRpcWrapper.java:86)
    at lobby.http.CommandExecutor.handleMessage(CommandExecutor.java:74)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior(BehaviorActor.java:237)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:293)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun(BehaviorActor.java:36)
    at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
    at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
    at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1024)
    at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1019)
    at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:730)
    at co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask.exec1(FiberForkJoinScheduler.java:265)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.doExec(ParkableForkJoinTask.java:116)
    at co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask.exec(ParkableForkJoinTask.java:73)
    at jsr166e.ForkJoinTask.doExec(ForkJoinTask.java:261)
    at jsr166e.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:988)
    at jsr166e.ForkJoinPool.runWorker(ForkJoinPool.java:1628)
    at jsr166e.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

@pron
Copy link
Contributor

pron commented Sep 17, 2015

Hi.
Can you explain what you mean exactly by

So I can inctroduce to you 2 stacktraces which are called OwnedSynchronizer.register on the same mailbox sequentially without calling OwnedSynchronizer.unregister by none of both fibers.

Especially when you say sequentially?

@hsestupin
Copy link
Contributor Author

Hi,
I meant that expected behaviour of working with OwnedSynchronizer object is the following - for each register call we had to call unregister before calling register again... Am I right? And I used word sequentially to point out there was no unregister method call between 2 sequantially
register calls. Does this make sense? Sorry for my bad explanation, hope it becomes clearer now

@pron
Copy link
Contributor

pron commented Sep 17, 2015

Yes, you are correct. But how are you reproducing it and how do you know there's no unregister?

@hsestupin
Copy link
Contributor Author

how do you know there's no unregister?

I've simply added 2 breakpoints on register and unregister methods that prints stacktraces.

But how are you reproducing

Unfortunately, I don't understand how to reproduce it, it happens quite rarely. I need some time to write a minimal showcase for reproducing it..

@pron
Copy link
Contributor

pron commented Sep 18, 2015

Do you provide any configurations to the actor's mailbox (size, policy)?

@hsestupin
Copy link
Contributor Author

No, I don't. Every actor is constructed by co.paralleluniverse.actors.behaviors.BehaviorActor#BehaviorActor(java.lang.String)

pron added a commit that referenced this issue Sep 18, 2015
@pron
Copy link
Contributor

pron commented Sep 18, 2015

I've put unregister in a finally block (where it should be anyway), but I think there must be some exception thrown for unregister not to have been called.

Anyway, can you try with 0.7.4-SNAPSHOT (built from the 0.7.0 branch and uploaded to sonatype)

@hsestupin
Copy link
Contributor Author

Yeah, for sure I can. Thanks.
(Actually you didn't have to upload new release, cause we already have our quasar fork :) )

@pron
Copy link
Contributor

pron commented Sep 18, 2015

Good. In that case just note that all bug-fixes go into the 0.7.0 branch.

@hsestupin
Copy link
Contributor Author

Unfortunately, it doesn't help. Everything remains the same, so we can conclude there is no exception thrown between calls to register and unregister.

Don't you find strange that exception message

java.lang.IllegalMonitorStateException: attempt by Fiber@10000002:Communicator[task: ParkableForkJoinTask@35c6721a(Fiber@10000002), target: co.paralleluniverse.actors.ActorRunner@2613a67d, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@4ef21c7a] but owned by Fiber@10000002:Communicator[task: ParkableForkJoinTask@35c6721a(Fiber@10000002), target: co.paralleluniverse.actors.ActorRunner@2613a67d, scheduler: co.paralleluniverse.fibers.FiberForkJoinScheduler@4ef21c7a]

tells us that exactly the same strand is trying to call OwnedSynchronizer.register once again?

@pron
Copy link
Contributor

pron commented Sep 18, 2015

Yes. Unfortunately my debugging capabilities are diminished due to not being able to reproduce the problem, so any further information you can provide -- like a flight recorder dump (if you can reproduce with it on) -- would be very helpful.

@hsestupin
Copy link
Contributor Author

This is so frustrating. With your commit test starts to always pass when I run gradle in debug-mode and fails without debug-mode sometimes like before...

I've just run it 100 times in debug mode and it passes always. May be it can be explained by the fact that debug-mode disables some JIT-specific optimizations and chances to make race condition happen are close to zero due to performance drop.

Ok, my next step would be to provide here flight recorder dump OR close this bug. Just need some time. thanks

@pron
Copy link
Contributor

pron commented Sep 18, 2015

No need to close this bug. It's a real race condition, and eventually we'll catch it. It isn't the first one nor the last. :)

@hsestupin
Copy link
Contributor Author

It seems I found out the reason why unregister doesn't get called. I have uninstrumented method in a callstack:

WARNING: Uninstrumented methods (marked '**') or call-sites (marked '!!') detected on the call stack: 
    at co.paralleluniverse.common.util.ExtendedStackTrace.here (ExtendedStackTrace.java:44 bci: 8)
    at co.paralleluniverse.fibers.Fiber.checkInstrumentation (Fiber.java:1618 bci: 0)
    at co.paralleluniverse.fibers.Fiber.verifySuspend (Fiber.java:1591 bci: 6)
    at co.paralleluniverse.fibers.Fiber.verifySuspend (Fiber.java:1586 bci: 3)
    at co.paralleluniverse.fibers.Fiber.park (Fiber.java:583 bci: 0)
    at co.paralleluniverse.fibers.Fiber.park (Fiber.java:595 bci: 4)
    at co.paralleluniverse.strands.Strand.park(java.lang.Object) (Strand.java:493 bci: 78)
    at co.paralleluniverse.strands.ConditionSynchronizer.await(int) (ConditionSynchronizer.java:54 bci: 255)
    at co.paralleluniverse.actors.Mailbox.await(int) (Mailbox.java:90 bci: 99)
    at co.paralleluniverse.actors.SelectiveReceiveHelper.receive(long,java.util.concurrent.TimeUnit,co.paralleluniverse.actors.MessageProcessor) (SelectiveReceiveHelper.java:145 bci: 917)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(co.paralleluniverse.actors.ActorRef,co.paralleluniverse.actors.behaviors.RequestMessage,long,java.util.concurrent.TimeUnit) (RequestReplyHelper.java:174 bci: 663)
    at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(co.paralleluniverse.actors.ActorRef,co.paralleluniverse.actors.behaviors.RequestMessage) (RequestReplyHelper.java:112 bci: 335)
    at service.actor.RemoteTransceiver$GetAvroResponse.get) (RemoteTransceiver.java:102 bci: 228)
    at service.actor.RemoteTransceiver$GetAvroResponse.get) (RemoteTransceiver.java:82 bci: 1) (optimized)
    at util.Suspendables.getSuppressedInterruptable(util.function.SuspendableSupplier) (Suspendables.java:58 bci: 88)
    at service.actor.RemoteTransceiver.transceive(java.util.List) (RemoteTransceiver.java:62 bci: 132)
    at org.apache.avro.ipc.Requestor.getRemote) (Requestor.java:285 bci: 281)
    at service.actor.ActorRegistry$AsyncHandler$MyRunnable.run) (ActorRegistry.java:179 bci: 108)
    at java.util.concurrent.CompletableFuture$AsyncRun.run) (CompletableFuture.java:1618 bci: 36) **
    at service.actor.avro.ActorRpcWrapper$$Lambda$32/1406230445.accept (Unknown Source bci: 4)
    at service.actor.avro.ActorRpcWrapper.handleMessage (ActorRpcWrapper.java:92 bci: 208)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.behavior (BehaviorActor.java:237 bci: 137)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun (BehaviorActor.java:293 bci: 122)
    at co.paralleluniverse.actors.behaviors.BehaviorActor.doRun (BehaviorActor.java:36 bci: 1) (optimized)
    at co.paralleluniverse.actors.Actor.run0 (Actor.java:697 bci: 244)
    at co.paralleluniverse.actors.ActorRunner.run (ActorRunner.java:51 bci: 148)
    at co.paralleluniverse.fibers.Fiber.run (Fiber.java:1024 bci: 11)
    at co.paralleluniverse.fibers.Fiber.run1 (Fiber.java:1019 bci: 1)

And what we've got in that case - when fiber parks it throws SuspendExecution and then in some place it's catched and is not rethrown as a RuntimeSuspendExecution. So the parked fiber couldn't be unparked correctly and it loses it stack including lines

at co.paralleluniverse.strands.ConditionSynchronizer.await(int) (ConditionSynchronizer.java:54 bci: 255)
at co.paralleluniverse.actors.Mailbox.await(int) (Mailbox.java:90 bci: 99)
at co.paralleluniverse.actors.SelectiveReceiveHelper.receive(long,java.util.concurrent.TimeUnit,co.paralleluniverse.actors.MessageProcessor) (SelectiveReceiveHelper.java:145 bci: 917)

That means method co.paralleluniverse.actors.Mailbox#unlock is never called. If my theory is correct then I need to instrument this method

at java.util.concurrent.CompletableFuture$AsyncRun.run) (CompletableFuture.java:1618 bci: 36) **

But I've tried to add

  • java.util.concurrent.CompletableFuture$AsyncRun.run or
  • java.util.concurrent.CompletableFuture.AsyncRun.run

to my META-INF/suspendables file. Unfortunately neither of those helps.
How could I force Quasar to instrument java.util.concurrent.CompletableFuture$AsyncRun.run?

@hsestupin
Copy link
Contributor Author

Yeah, it seems I was right. If I spawn all my actors with spawnThread (so it means Fiber.park will not being called) - than this problem goes away and never appears.

@pron
Copy link
Contributor

pron commented Oct 1, 2015

There should never be reason to instrument CompletableFuture. Either it's doing nothing in this case, or it's trying to do something asynchronous, while Quasar is inherently synchronous. Perhaps if you could explain what it is that your behavior method does, I could help with removing the CompletableFuture or using it differently.

@hsestupin
Copy link
Contributor Author

I've got your point! I fixed that problem, now instrumentation works fine (I got rid of CompletableFuture.AsyncRun at my stack trace) and this error never happens.

So summing up my experience I had to say that Quasar users should pay more attention to uninstrumented methods warnings otherwise they can get unpredictable behaviour of the application if SuspendExecution exception is silently ignored somewhere.

In any case I would like to ask one more question if you don't mind. Don't know is it related to this issue or not. When fiber is unparked is there any guarantee that worker thread would be the same as one before parking?

It seems for me new worker thread would be obtained from fiber-pool without any guarantee... To deal with that problem quasar provides some concurrent abstractions like co.paralleluniverse.strands.concurrent.ReentrantLock. But what should I do with third-party library classes which uses jdk native java.util.concurrent.locks.ReentrantLock ? At this point I rarely get exception:

java.lang.IllegalMonitorStateException
    at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
    at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
    at org.apache.avro.ipc.Requestor.getRemote(Requestor.java:292)

Thanks

@pron
Copy link
Contributor

pron commented Oct 5, 2015

When fiber is unparked is there any guarantee that worker thread would be the same as one before parking?

That entirely depends on the FiberScheduler. The default one does not make this guarantee.

To deal with that problem

Why is that a problem? Thread-locals travel with the fiber.

java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)

Ah, that's code that explicitly accesses the identity of the underlying thread. If you must use this in a fiber (and why does the fiber block between lock acquisition and release?) you can use a scheduler that pins fibers to threads. It shouldn't be hard to do. The simplest one is simply a scheduler that schedules all fibers onto a single thread, which is created with:

new FiberExecutorScheduler(Executors.newSingleThreadExecutor())

Of course, you are free to use different scheduler for different fibers and mix them freely.

However, in this case I would consider (again, without knowing your codebase, so it may or may not be a good suggestion) to use plain threads to call the IO library, and have them send messages to fibers over channels/actor-mailboxes.

@hsestupin
Copy link
Contributor Author

Thanks for fast reply, that makes sense. I think your last suggestion is the most suitable for us. And as a temporary solution I wrapped call org.apache.avro.ipc.Requestor.getRemote to co.paralleluniverse.fibers.FiberAsync#runBlocking(java.util.concurrent.ExecutorService, co.paralleluniverse.common.util.CheckedCallable<V,E>) with single thread pool as a first argument. And now java.util.concurrent.locks.ReentrantLock starts to work correct in a thread worker. Thanks again for your really useful responses.

I think this issue can be closed because the original problem was happening due to incomplete instrumentation of application code.

@pron
Copy link
Contributor

pron commented Oct 5, 2015

Glad to be of service.

@pron pron closed this as completed Oct 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants