Error : "socket already registered" when upgrading from 0.1.11 to 0.1.13 #774

guydunigo · 2018-11-26T16:05:08Z

Version

Working version (0.1.11) :
└── tokio v0.1.11
├── tokio-codec v0.1.1
│ └── tokio-io v0.1.10
├── tokio-current-thread v0.1.4
│ └── tokio-executor v0.1.5
├── tokio-executor v0.1.5 ()
├── tokio-fs v0.1.4
│ ├── tokio-io v0.1.10 ()
│ └── tokio-threadpool v0.1.9
│ └── tokio-executor v0.1.5 ()
│ └── tokio-io v0.1.10 ()
├── tokio-io v0.1.10 ()
├── tokio-reactor v0.1.7
│ ├── tokio-executor v0.1.5 ()
│ └── tokio-io v0.1.10 ()
├── tokio-tcp v0.1.2
│ ├── tokio-io v0.1.10 ()
│ └── tokio-reactor v0.1.7 ()
├── tokio-threadpool v0.1.9 ()
├── tokio-timer v0.2.8
│ └── tokio-executor v0.1.5 ()
├── tokio-udp v0.1.3
│ ├── tokio-codec v0.1.1 ()
│ ├── tokio-io v0.1.10 ()
│ └── tokio-reactor v0.1.7 ()
└── tokio-uds v0.2.4
├── tokio-codec v0.1.1 ()
├── tokio-io v0.1.10 ()
└── tokio-reactor v0.1.7 (*)

Failing version (0.1.13) :
└── tokio v0.1.13
├── tokio-codec v0.1.1
│ └── tokio-io v0.1.10
├── tokio-current-thread v0.1.3
│ └── tokio-executor v0.1.5
├── tokio-executor v0.1.5 ()
├── tokio-fs v0.1.4
│ ├── tokio-io v0.1.10 ()
│ └── tokio-threadpool v0.1.8
│ └── tokio-executor v0.1.5 ()
│ └── tokio-io v0.1.10 ()
├── tokio-io v0.1.10 ()
├── tokio-reactor v0.1.6
│ ├── tokio-executor v0.1.5 ()
│ └── tokio-io v0.1.10 ()
├── tokio-tcp v0.1.2
│ ├── tokio-io v0.1.10 ()
│ └── tokio-reactor v0.1.6 ()
├── tokio-threadpool v0.1.8 ()
├── tokio-timer v0.2.8
│ └── tokio-executor v0.1.5 ()
├── tokio-udp v0.1.2
│ ├── tokio-codec v0.1.1 ()
│ ├── tokio-io v0.1.10 ()
│ └── tokio-reactor v0.1.6 ()
└── tokio-uds v0.2.3
├── tokio-io v0.1.10 ()
└── tokio-reactor v0.1.6 ()

Platform

Linux Moi-arch 4.19.2-arch1-1-ARCH #1 SMP PREEMPT Tue Nov 13 21:16:19 UTC 2018 x86_64 GNU/Linux

Description

When updating from tokio 0.1.11 to 0.1.13, an io error appeared : socket already registered.
This appears when creating a new Frame on a TcpStream that already has a Frame attached to it.
For instance : I have a frame that listens to arriving packets and I want to create one to send messages.

I think this error comes from #660 : The new frame is registered in another reactor than the first one, thus creating the already registered error.

The text was updated successfully, but these errors were encountered:

carllerche · 2018-11-28T18:15:17Z

Interesting... Do you have a repro?

thoughts @stjepang?

guydunigo · 2018-11-28T22:39:10Z

Hi @carllerche , I made simple "proof-of-concept" for this error : https://github.com/guydunigo/tokio_test
Just run nc -l -p 8000 in a terminal and then cargo run.
When tokio is =0.1.13 in Cargo.toml, you will get the socket already registered error (not at every trial but quite often), but there is no issue with =0.1.11.

ghost · 2018-11-30T17:56:01Z

@guydunigo Thanks for the example!

I can confirm the issue is reproducible and something between 0.1.11 and 0.1.13 is causing it. Looking into it...

ghost · 2018-12-02T16:14:57Z

Ok, I've found the problem - it's in try_clone().

TL;DR TcpStream::try_clone() creates a clone with brand new Registration (not associated with any mio::Poll), but keeps the old SelectorId (possibly already associated with a mio::Poll). When polling the new TcpStream clone, we'll try to register it with a possibly different mio::Poll is, which will crash because SelectorIds don't match.

Introduction

Take a look at tokio_tcp::TcpStream:

pub struct TcpStream {
    io: PollEvented<mio::net::TcpStream>,
}

PollEvented is defined like this:

pub struct PollEvented<E: Evented> {
    io: Option<E>,
    inner: Inner,
}

struct Inner {
    registration: Registration,

    /// Currently visible read readiness
    read_readiness: AtomicUsize,

    /// Currently visible write readiness
    write_readiness: AtomicUsize,
}

The registration holds a reference to the mio handle:

pub struct Registration {
    /// Stores the handle. Once set, the value is not changed.
    ///
    /// Setting this requires acquiring the lock from state.
    inner: UnsafeCell<Option<Inner>>,

    /// Tracks the state of the registration.
    ///
    /// The least significant 2 bits are used to track the lifecycle of the
    /// registration. The rest of the `state` variable is a pointer to tasks
    /// that must be notified once the lock is released.
    state: AtomicUsize,
}

If we go down the rabbit hole and see what is this Option<Inner> field we'll eventually see that it holds a weak reference to mio::Poll. That's the Poll instance this Registration is associated with.

Once a Registration is associated with a Poll, it is tied to it forever and can only be used with that particular Poll.

The problem

Suppose we have an instance of tokio_tcp::TcpStream and call .try_clone() on it:

pub fn try_clone(&self) -> io::Result<TcpStream> {
    let io = self.io.get_ref().try_clone()?;
    Ok(TcpStream::new(io))
}

Now let's see what goes on in self.io.get_ref().try_clone() (this is mio::net::TcpStream::try_clone()):

pub fn try_clone(&self) -> io::Result<TcpStream> {
    self.sys.try_clone().map(|s| {
        TcpStream {
            sys: s,
            selector_id: self.selector_id.clone(),
        }
    })
}

Now, mio::net::TcpStream is defined like this:

pub struct TcpStream {
    sys: sys::TcpStream,
    selector_id: SelectorId,
}

The selector_id is an identifier for mio::Poll instance.

So the crux of the problem is this: When cloning a TcpStream, we create a new Registration that is not associated with any mio::Poll instance, but we keep the selector_id associated with mio::Poll.

This means the new TcpStream clone will try to register with whatever the current reactor is (because the new Registration is not associated with any reactor), but during registration we'll crash because our selector_id does not match the mio::Poll's selector_id we're registering with.

Solution

I tried a very simple change in mio::net::TcpStream::try_clone() - when cloning a TcpStream, create a new uninitialized SelectorId:

pub fn try_clone(&self) -> io::Result<TcpStream> {
    self.sys.try_clone().map(|s| {
        TcpStream {
            sys: s,
            // selector_id: self.selector_id.clone(), // Buggy.
            selector_id: SelectorId::new(), // Yay, fixed!
        }
    })
}

This seems to fix the reported issue, but I'm not sure whether it's the correct solution because I don't know if a clone of a TcpStream is allowed to be associated with another mio::Poll or not. My guess is yes, but not sure. @carllerche, thoughts?

If this is correct, should we apply the same fix in other places, e.g. mio::net::UdpStream::try_clone()?

carllerche · 2019-01-03T05:36:25Z

@stjepang I feel like we discussed this in Gitter. Do you remember the conclusion? IIRC, we came to the conclusion that sockets could not be cloned due to how things worked.

In this case, the solution would be to deprecate try_clone and to update it to always return an error.

@guydunigo could you explain your use case for try_clone?

carllerche · 2019-01-06T18:29:55Z

We should probably follow up with better splitting on sockets.

ancwrd1 · 2019-01-07T11:23:51Z

How do I send a Shutdown(Shutdown::Write) to a socket which has been split?
Before that change for try_clone I could clone it and call a shutdown method on it.

The use case is: split a stream, spawn two futures (read and write), when write is completed shutdown the write side so that remote peer shuts down and the read side terminates.
This is a legitimate case e.g. for print protocol (printer port 9100).

ancwrd1 · 2019-01-07T12:54:41Z

I tried the tokio::io::shutdown(writer) but the TcpStream implementation seems to lack the actual shutdown logic:

impl<'a> AsyncWrite for &'a TcpStream {
    fn shutdown(&mut self) -> Poll<(), io::Error> {
        Ok(().into())
    }

Is that a missing functionality bug?

artemii235 · 2019-04-16T07:18:54Z

Removing the try_clone causes issues with following use case (a bit of pseudo code included):

I'm connecting to remote JSON-RPC server working over TCP: TcpStream::connect(&addr).
I assume that if I don't receive any data from it during last 60 seconds there's some issue with connection and I need to shutdown it and reconnect:

let last_chunk = Arc::new(AtomicU64::new(now_ms()));
let stream_clone = stream.try_clone().unwrap();
let interval = Interval::new(Duration::from_secs(60)));
interval.for_each(move |_| {
    let last = last_chunk.load(AtomicOrdering::Relaxed);
    if now_ms() - last > 60 {
        stream_clone.shutdown(Shutdown::Both).unwrap();
            // return err to shutdown interval execution
            return futures::future::err(());
        };
    futures::future::ok(())
})

Next I split the initial stream with the Codec. It consumes the initial stream. Next I deal with Sink and Stream which is not too important, it's enough to know that Stream part updates the last_chunk.

let (sink, stream) = Bytes.framed(stream).split();
.... deal with sink and stream, update last_chunk when data is received

I guess that selectorId issue doesn't affect such use case, because I don't need to poll the cloned stream at all, I need a copy just to forcefully shutdown it in some circumstances.

Maybe I'm also doing something wrong and there's alternative way to do this?

ghost · 2019-04-25T16:28:55Z

@carllerche What if we just wrap the PollEvented inside TcpStream into a Arc<Mutex<_>> and lock it on every operation? This way we could clone TcpStream by cloning the Arc and everything works out.

The only problem is that we pay a performance penalty due to locking, but maybe it's not too bad?

carllerche · 2019-04-25T16:35:32Z

@stjepang I'm not sure how that would solve the problem. IIRC, to handle it Tokio would have to keep track of all outstanding clones and fanout notifications to all of them.

@artemii235 re: your example, you should be able to do this without cloning. Instead, just drop the stream on idle.

carllerche assigned ghost Nov 28, 2018

carllerche added the C-bug Category: This is a bug. label Nov 28, 2018

jonhoo mentioned this issue Dec 5, 2018

TcpStream::read sometimes blocks forever in 0.1.13 #790

Closed

ghost mentioned this issue Jan 3, 2019

tcp: deprecate TcpStream::try_clone() #824

Merged

tobz closed this as completed in #824 Jan 5, 2019

artemii235 mentioned this issue Apr 16, 2019

tokio-tcp can't be upgraded past 0.1.2 KomodoPlatform/komodo-defi-framework#380

Closed

vojtechkral mentioned this issue Jun 7, 2019

Consider a specialized split fn for TcpStream #174

Closed

mkindahl mentioned this issue Jul 21, 2019

udp: try_clone missing for UdpSocket #1307

Closed

blckngm mentioned this issue Oct 8, 2019

how to implement stream r/w in parallel? #1108

Closed

ghost mentioned this issue Jan 13, 2020

Add try_clone for TcpStream async-rs/async-std#642

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error : "socket already registered" when upgrading from 0.1.11 to 0.1.13 #774

Error : "socket already registered" when upgrading from 0.1.11 to 0.1.13 #774

guydunigo commented Nov 26, 2018

carllerche commented Nov 28, 2018

guydunigo commented Nov 28, 2018 •

edited

Loading

ghost commented Nov 30, 2018

ghost commented Dec 2, 2018 •

edited by ghost

Loading

carllerche commented Jan 3, 2019

carllerche commented Jan 6, 2019

ancwrd1 commented Jan 7, 2019 •

edited

Loading

ancwrd1 commented Jan 7, 2019

artemii235 commented Apr 16, 2019

ghost commented Apr 25, 2019

carllerche commented Apr 25, 2019

Error : "socket already registered" when upgrading from 0.1.11 to 0.1.13 #774

Error : "socket already registered" when upgrading from 0.1.11 to 0.1.13 #774

Comments

guydunigo commented Nov 26, 2018

Version

Platform

Description

carllerche commented Nov 28, 2018

guydunigo commented Nov 28, 2018 • edited Loading

ghost commented Nov 30, 2018

ghost commented Dec 2, 2018 • edited by ghost Loading

Introduction

The problem

Solution

carllerche commented Jan 3, 2019

carllerche commented Jan 6, 2019

ancwrd1 commented Jan 7, 2019 • edited Loading

ancwrd1 commented Jan 7, 2019

artemii235 commented Apr 16, 2019

ghost commented Apr 25, 2019

carllerche commented Apr 25, 2019

guydunigo commented Nov 28, 2018 •

edited

Loading

ghost commented Dec 2, 2018 •

edited by ghost

Loading

ancwrd1 commented Jan 7, 2019 •

edited

Loading