Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preventing CancelledKeyException #170

Merged
merged 5 commits into from
Nov 14, 2017

Conversation

HoneyryderChuck
Copy link
Contributor

@HoneyryderChuck HoneyryderChuck commented Nov 7, 2017

Fixes #168 .

I'm not sure about what to do when the failure happens though, as right now I'm just ignoring it.

@tarcieri could use some suggestion here.

Due to the non-deterministic nature of it, can't also come up with a proper unit test.

@tarcieri
Copy link
Contributor

tarcieri commented Nov 7, 2017

My suggested behavior in #168 was to trigger a pseudo-read interest, so that when this monitor selects for reading and a read is attempted, EOFError is raised.

Perhaps a good first step is to confirm this is what the libev backend does already. The easiest way to do that is probably to write a test, and make sure the behavior is consistent across both backends.

@HoneyryderChuck
Copy link
Contributor Author

The easiest way to do that is probably to write a test,

Easy if it were deterministic :) During my tries, I even found out a java.nio.channels.ClosedSelectorException which is also not guarded against.

I can tell you that, by having just a quick look at the SelectionKey and Selector APIs, that a few of the methods used in the extension are not being guarded against, such as:

  • SelectionKey#readOps (throws java.nio.channels.CancelledKeyException)
  • SelectionKey#interestOps (throws java.nio.channels.CancelledKeyException)
  • Selector#selectedKey (throws java.nio.channels.ClosedSelectorException)
  • Selector#select (throws also java.nio.channels.ClosedSelectorException, besides the IOException being saved already)

A combination of factors can make these exceptions potentially leak to nio, i.e. the selector loops in one thread and is closed in a different one. I'll see about reproducing this, but it could be a good idea just to rescue these exceptions and act accordingly (even if not in the way I did).

@tarcieri
Copy link
Contributor

tarcieri commented Nov 8, 2017

I was suggesting writing tests for the libev backend in this case, then making the Java implementation match the libev behavior (or choose a meet-in-the-middle behavior).

My assumption is under the libev backend objects will select readable if closed, so that an attempted read operation will raise EOFError.

It would be good to confirm that is actually the case.

@HoneyryderChuck
Copy link
Contributor Author

My assumption is under the libev backend objects will select readable if closed, so that an attempted read operation will raise EOFError.

Ahh, I see your point now. I'll see about doing that.

Still, all of those other exceptions should be taken into consideration, as they aren't handled anywhere.

@tarcieri
Copy link
Contributor

tarcieri commented Nov 8, 2017

Handling them will require ensuring consistent semantics with the libev-backend, even if it involves having them pseudo-select as readable when the exception is thrown.

@HoneyryderChuck
Copy link
Contributor Author

I added a test to show the inconsistency in an edge case of the libev/nio backends, namely after the selectable has been closed, the readiness changes to nil in java nio (it's :rw for libev-based API).

This is because of the SelectionKey#readyOps API, which will call sun.nio.ch.SelectionKeyImpl.ensureValid, which probably checks the state of the selectable (I'm assuming here), while the libev-based NIO_monitor_readiness accesses the monitor struct, which won't update its state based on the selectable, and returns the memoized value for revents.

I didn't managed to trigger the aforementioned Exception yet, but based on this result, I'd say that this is a fundamental difference between the two implementations.

@HoneyryderChuck
Copy link
Contributor Author

This thread also explain well what's going on. Specifically, if the key is being selected in a separate thread and handled/closed somewhere else, it might lead to the CancelledKeyException.

@HoneyryderChuck
Copy link
Contributor Author

I also have another test here, and when I comment this line here, I can reproduce it more deterministically.

What I see is that the keys are only cancelled before the select call. My question would be: shouldn't there be other places where the key should be cancelled? (When the monitor is closed, for example). And would there be thread-safety considerations? The way I see it, the problem seems to be that I'm selecting in a thread and closing the monitors/selectables/the selector itself in a separate thread.

@tarcieri
Copy link
Contributor

One way or another the behavior should be made consistent.

What I have been vaguely suggesting above is catching the CancelledKeyExceptions, wherever they may be thrown, and returning something identical to what the libev backend would return.

This may or may not involve some active tracking of a closed state in the monitor object.

If the current libev backend behavior doesn't make sense, it may need a (breaking) change.

@HoneyryderChuck
Copy link
Contributor Author

@tarcieri I really think that a better approach would be to use the #isValid API everywhere before a method call that might blow. This is the recommended solution in the links I posted, there are just too many places this can happen (list of NIO4r methods above) and I don't think that the lib will benefit from forcing parity in this case.

I also suspect that this issue never arose before because most nio4r usage in the wild don't handle read/write events at once, and isn't particularly interested in jruby compatibility, but you think otherwise, feel free to link me any open projects where that might be the case.

@HoneyryderChuck
Copy link
Contributor Author

the last commit shows the minimal set of changes needed to be done to: make my test pass, and prevent the exception for the particular case I'm using it in (when calling #readyOps).

I think that more work could be done to prevent similar exceptions with other java NIO used API (i.e.: #interestOps), but it can be done in separately. So I'd consider this done, if you think it's ready.

@HoneyryderChuck
Copy link
Contributor Author

and travis is breaking due to not being able to resolve localhost (?). Related to #172

@tarcieri
Copy link
Contributor

@HoneyryderChuck try rebasing. #172 is fixed now

@HoneyryderChuck
Copy link
Contributor Author

@tarcieri done

@tarcieri
Copy link
Contributor

This looks good as a baseline fix

@tarcieri tarcieri merged commit 4e41e89 into socketry:master Nov 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants