Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error : "socket already registered" when upgrading from 0.1.11 to 0.1.13 #774

Closed
guydunigo opened this issue Nov 26, 2018 · 11 comments
Closed
Labels
C-bug Category: This is a bug.

Comments

@guydunigo
Copy link

Version

Working version (0.1.11) :
└── tokio v0.1.11
├── tokio-codec v0.1.1
│ └── tokio-io v0.1.10
├── tokio-current-thread v0.1.4
│ └── tokio-executor v0.1.5
├── tokio-executor v0.1.5 ()
├── tokio-fs v0.1.4
│ ├── tokio-io v0.1.10 (
)
│ └── tokio-threadpool v0.1.9
│ └── tokio-executor v0.1.5 ()
│ └── tokio-io v0.1.10 (
)
├── tokio-io v0.1.10 ()
├── tokio-reactor v0.1.7
│ ├── tokio-executor v0.1.5 (
)
│ └── tokio-io v0.1.10 ()
├── tokio-tcp v0.1.2
│ ├── tokio-io v0.1.10 (
)
│ └── tokio-reactor v0.1.7 ()
├── tokio-threadpool v0.1.9 (
)
├── tokio-timer v0.2.8
│ └── tokio-executor v0.1.5 ()
├── tokio-udp v0.1.3
│ ├── tokio-codec v0.1.1 (
)
│ ├── tokio-io v0.1.10 ()
│ └── tokio-reactor v0.1.7 (
)
└── tokio-uds v0.2.4
├── tokio-codec v0.1.1 ()
├── tokio-io v0.1.10 (
)
└── tokio-reactor v0.1.7 (*)

Failing version (0.1.13) :
└── tokio v0.1.13
├── tokio-codec v0.1.1
│ └── tokio-io v0.1.10
├── tokio-current-thread v0.1.3
│ └── tokio-executor v0.1.5
├── tokio-executor v0.1.5 ()
├── tokio-fs v0.1.4
│ ├── tokio-io v0.1.10 (
)
│ └── tokio-threadpool v0.1.8
│ └── tokio-executor v0.1.5 ()
│ └── tokio-io v0.1.10 (
)
├── tokio-io v0.1.10 ()
├── tokio-reactor v0.1.6
│ ├── tokio-executor v0.1.5 (
)
│ └── tokio-io v0.1.10 ()
├── tokio-tcp v0.1.2
│ ├── tokio-io v0.1.10 (
)
│ └── tokio-reactor v0.1.6 ()
├── tokio-threadpool v0.1.8 (
)
├── tokio-timer v0.2.8
│ └── tokio-executor v0.1.5 ()
├── tokio-udp v0.1.2
│ ├── tokio-codec v0.1.1 (
)
│ ├── tokio-io v0.1.10 ()
│ └── tokio-reactor v0.1.6 (
)
└── tokio-uds v0.2.3
├── tokio-io v0.1.10 ()
└── tokio-reactor v0.1.6 (
)

Platform

Linux Moi-arch 4.19.2-arch1-1-ARCH #1 SMP PREEMPT Tue Nov 13 21:16:19 UTC 2018 x86_64 GNU/Linux

Description

When updating from tokio 0.1.11 to 0.1.13, an io error appeared : socket already registered.
This appears when creating a new Frame on a TcpStream that already has a Frame attached to it.
For instance : I have a frame that listens to arriving packets and I want to create one to send messages.

I think this error comes from #660 : The new frame is registered in another reactor than the first one, thus creating the already registered error.

@carllerche
Copy link
Member

Interesting... Do you have a repro?

thoughts @stjepang?

@guydunigo
Copy link
Author

guydunigo commented Nov 28, 2018

Hi @carllerche , I made simple "proof-of-concept" for this error : https://github.com/guydunigo/tokio_test
Just run nc -l -p 8000 in a terminal and then cargo run.
When tokio is =0.1.13 in Cargo.toml, you will get the socket already registered error (not at every trial but quite often), but there is no issue with =0.1.11.

@carllerche carllerche assigned ghost Nov 28, 2018
@carllerche carllerche added the C-bug Category: This is a bug. label Nov 28, 2018
@ghost
Copy link

ghost commented Nov 30, 2018

@guydunigo Thanks for the example!

I can confirm the issue is reproducible and something between 0.1.11 and 0.1.13 is causing it. Looking into it...

@ghost
Copy link

ghost commented Dec 2, 2018

Ok, I've found the problem - it's in try_clone().

TL;DR TcpStream::try_clone() creates a clone with brand new Registration (not associated with any mio::Poll), but keeps the old SelectorId (possibly already associated with a mio::Poll). When polling the new TcpStream clone, we'll try to register it with a possibly different mio::Poll is, which will crash because SelectorIds don't match.

Introduction

Take a look at tokio_tcp::TcpStream:

pub struct TcpStream {
    io: PollEvented<mio::net::TcpStream>,
}

PollEvented is defined like this:

pub struct PollEvented<E: Evented> {
    io: Option<E>,
    inner: Inner,
}

struct Inner {
    registration: Registration,

    /// Currently visible read readiness
    read_readiness: AtomicUsize,

    /// Currently visible write readiness
    write_readiness: AtomicUsize,
}

The registration holds a reference to the mio handle:

pub struct Registration {
    /// Stores the handle. Once set, the value is not changed.
    ///
    /// Setting this requires acquiring the lock from state.
    inner: UnsafeCell<Option<Inner>>,

    /// Tracks the state of the registration.
    ///
    /// The least significant 2 bits are used to track the lifecycle of the
    /// registration. The rest of the `state` variable is a pointer to tasks
    /// that must be notified once the lock is released.
    state: AtomicUsize,
}

If we go down the rabbit hole and see what is this Option<Inner> field we'll eventually see that it holds a weak reference to mio::Poll. That's the Poll instance this Registration is associated with.

Once a Registration is associated with a Poll, it is tied to it forever and can only be used with that particular Poll.

The problem

Suppose we have an instance of tokio_tcp::TcpStream and call .try_clone() on it:

pub fn try_clone(&self) -> io::Result<TcpStream> {
    let io = self.io.get_ref().try_clone()?;
    Ok(TcpStream::new(io))
}

Now let's see what goes on in self.io.get_ref().try_clone() (this is mio::net::TcpStream::try_clone()):

pub fn try_clone(&self) -> io::Result<TcpStream> {
    self.sys.try_clone().map(|s| {
        TcpStream {
            sys: s,
            selector_id: self.selector_id.clone(),
        }
    })
}

Now, mio::net::TcpStream is defined like this:

pub struct TcpStream {
    sys: sys::TcpStream,
    selector_id: SelectorId,
}

The selector_id is an identifier for mio::Poll instance.

So the crux of the problem is this: When cloning a TcpStream, we create a new Registration that is not associated with any mio::Poll instance, but we keep the selector_id associated with mio::Poll.

This means the new TcpStream clone will try to register with whatever the current reactor is (because the new Registration is not associated with any reactor), but during registration we'll crash because our selector_id does not match the mio::Poll's selector_id we're registering with.

Solution

I tried a very simple change in mio::net::TcpStream::try_clone() - when cloning a TcpStream, create a new uninitialized SelectorId:

pub fn try_clone(&self) -> io::Result<TcpStream> {
    self.sys.try_clone().map(|s| {
        TcpStream {
            sys: s,
            // selector_id: self.selector_id.clone(), // Buggy.
            selector_id: SelectorId::new(), // Yay, fixed!
        }
    })
}

This seems to fix the reported issue, but I'm not sure whether it's the correct solution because I don't know if a clone of a TcpStream is allowed to be associated with another mio::Poll or not. My guess is yes, but not sure. @carllerche, thoughts?

If this is correct, should we apply the same fix in other places, e.g. mio::net::UdpStream::try_clone()?

@carllerche
Copy link
Member

@stjepang I feel like we discussed this in Gitter. Do you remember the conclusion? IIRC, we came to the conclusion that sockets could not be cloned due to how things worked.

In this case, the solution would be to deprecate try_clone and to update it to always return an error.

@guydunigo could you explain your use case for try_clone?

@carllerche
Copy link
Member

We should probably follow up with better splitting on sockets.

@ancwrd1
Copy link

ancwrd1 commented Jan 7, 2019

How do I send a Shutdown(Shutdown::Write) to a socket which has been split?
Before that change for try_clone I could clone it and call a shutdown method on it.

The use case is: split a stream, spawn two futures (read and write), when write is completed shutdown the write side so that remote peer shuts down and the read side terminates.
This is a legitimate case e.g. for print protocol (printer port 9100).

@ancwrd1
Copy link

ancwrd1 commented Jan 7, 2019

I tried the tokio::io::shutdown(writer) but the TcpStream implementation seems to lack the actual shutdown logic:

impl<'a> AsyncWrite for &'a TcpStream {
    fn shutdown(&mut self) -> Poll<(), io::Error> {
        Ok(().into())
    }

Is that a missing functionality bug?

@artemii235
Copy link

Removing the try_clone causes issues with following use case (a bit of pseudo code included):

  1. I'm connecting to remote JSON-RPC server working over TCP: TcpStream::connect(&addr).
  2. I assume that if I don't receive any data from it during last 60 seconds there's some issue with connection and I need to shutdown it and reconnect:
let last_chunk = Arc::new(AtomicU64::new(now_ms()));
let stream_clone = stream.try_clone().unwrap();
let interval = Interval::new(Duration::from_secs(60)));
interval.for_each(move |_| {
    let last = last_chunk.load(AtomicOrdering::Relaxed);
    if now_ms() - last > 60 {
        stream_clone.shutdown(Shutdown::Both).unwrap();
            // return err to shutdown interval execution
            return futures::future::err(());
        };
    futures::future::ok(())
})
  1. Next I split the initial stream with the Codec. It consumes the initial stream. Next I deal with Sink and Stream which is not too important, it's enough to know that Stream part updates the last_chunk.
let (sink, stream) = Bytes.framed(stream).split();
.... deal with sink and stream, update last_chunk when data is received

I guess that selectorId issue doesn't affect such use case, because I don't need to poll the cloned stream at all, I need a copy just to forcefully shutdown it in some circumstances.

Maybe I'm also doing something wrong and there's alternative way to do this?

@ghost
Copy link

ghost commented Apr 25, 2019

@carllerche What if we just wrap the PollEvented inside TcpStream into a Arc<Mutex<_>> and lock it on every operation? This way we could clone TcpStream by cloning the Arc and everything works out.

The only problem is that we pay a performance penalty due to locking, but maybe it's not too bad?

@carllerche
Copy link
Member

@stjepang I'm not sure how that would solve the problem. IIRC, to handle it Tokio would have to keep track of all outstanding clones and fanout notifications to all of them.

@artemii235 re: your example, you should be able to do this without cloning. Instead, just drop the stream on idle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

4 participants