Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams recreate themselves after ending #24

Open
tegefaulkes opened this issue Oct 10, 2023 · 11 comments
Open

Streams recreate themselves after ending #24

tegefaulkes opened this issue Oct 10, 2023 · 11 comments
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity

Comments

@tegefaulkes
Copy link
Contributor

Specification

After a stream has ended it seems to re-create itself with the same streamId and get's emitted as a new stream event. This is after creating a single stream and ending it gracefully.

Here is a reproducible example.

import * as testsUtils from "./utils";
import { promise } from "@/utils";
import * as events from "@/events";
import WebSocketServer from "@/WebSocketServer";
import WebSocketClient from "@/WebSocketClient";
import Logger, { formatting, LogLevel, StreamHandler } from "@matrixai/logger";
import { KeyTypes, sleep } from "./utils";
import { WebSocketConnection, WebSocketStream } from "@";


test('quick test', async () => {
  const logger = new Logger(`${WebSocketClient.name} Test`, LogLevel.INFO, [
    new StreamHandler(
      formatting.format`${formatting.level}:${formatting.keys}:${formatting.msg}`,
    ),
  ]);
  const localhost = '127.0.0.1';
  const types: Array<KeyTypes> = ['RSA', 'ECDSA', 'ED25519'];
  // Const types: Array<KeyTypes> = ['RSA'];
  const defaultType = types[0];

  const tlsConfigServer = await testsUtils.generateConfig(defaultType);

  const connectionEventProm =
    promise<WebSocketConnection>();
  const server = new WebSocketServer({
    logger: logger.getChild(WebSocketServer.name),
    config: {
      key: tlsConfigServer.key,
      cert: tlsConfigServer.cert,
      verifyPeer: false,
    },
  });
  server.addEventListener(
    events.EventWebSocketServerConnection.name,
    (e: events.EventWebSocketServerConnection) =>
      connectionEventProm.resolveP(e.detail),
  );
  await server.start({
    host: localhost,
  });
  // If the server is slow to respond then this will time out.
  //  Then main cause of this was the server not processing the initial packet
  //  that creates the `WebSocketConnection`, as a result, the whole creation waited
  //  an extra 1 second for the client to retry the initial packet.
  const client = await WebSocketClient.createWebSocketClient(
    {
      host: localhost,
      port: server.port,
      logger: logger.getChild(WebSocketClient.name),
      config: {
        verifyPeer: false,
      },
    },
    { timer: 500 },
  );
  client.connection.addEventListener(
    events.EventWebSocketConnectionStream.name,
    (evt: events.EventWebSocketConnectionStream) => {
      logger.warn('stream emitted by CLIENT connection');
    },
  );
  const serverConnection = await connectionEventProm.p;
  const prom = promise<WebSocketStream>();
  serverConnection.addEventListener(
    events.EventWebSocketConnectionStream.name,
    (evt: events.EventWebSocketConnectionStream) => {
      logger.warn('stream emitted by SERVER connection');
      prom.resolveP(evt.detail)
    },
    );
  console.log('starting stream');
  const streamLocal = await client.connection.newStream();
  const streamPeer = await prom.p;
  const message = Buffer.from('message!');
  console.log('asd');
  const writerLocal = streamLocal.writable.getWriter()
  await writerLocal.write(message);
  await writerLocal.close();
  for await (const message of streamPeer.readable) {
    console.log(Buffer.from(message).toString());
  }
  const writerPeer = streamPeer.writable.getWriter()
  await writerPeer.write(message);
  await writerPeer.close();
  for await (const message of streamLocal.readable) {
    console.log(Buffer.from(message).toString());
  }
  console.log('waiting');
  await sleep(2000);
  console.log('waitied!')

  await client.destroy({ force: true });
  await server.stop({ force: true });
})

I have already applied a quick fix to this in the WebSocketConnection with the following.

    let stream = this.streamMap.get(streamId);
    if (stream == null) {
      // FIXME: tempfix for preventing dead stream re-creation
      if (this.usedIdSet.has(streamId)) {
        // Silently ignore the message
        return;
      } else {
        this.usedIdSet.add(streamId);
      }
// ...
// ...

But this is far from ideal, The whole problem needs to be investigated and have a proper fix applied for it.

Additional context

Tasks

  1. Remove temp fix.
  2. investigate the problem.
  3. Apply a proper fix.
@tegefaulkes tegefaulkes added the development Standard development label Oct 10, 2023
@amydevs
Copy link
Member

amydevs commented Oct 10, 2023

i think this is because of a race condition in how streams are created.

The opening of stream happens when an ACK message is received. So if the peer sends an ACK after a stream has been closed locally, the stream will be re-opened

@amydevs
Copy link
Member

amydevs commented Oct 10, 2023

Possible fixes could be

  • Implement a handshake for stream closing/opening
  • Have a separate message type for opening a stream

@CMCDragonkai
Copy link
Member

The opening of stream happens when an ACK message is received. So if the peer sends an ACK after a stream has been closed locally, the stream will be re-opened

This actually occurred before in js-quic. We also get a concurrent message that triggered the re-opening of the stream.

I forgot how we dealt with this? But @tegefaulkes has more details here.

@tegefaulkes
Copy link
Contributor Author

This is happening during a graceful close. Nothing concurrent is happening or weird should be causing this.

What happened in js-quic was triggered by a concurrent cancel cancellation of both sides of the stream. I don't think it's similar here.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 10, 2023

One of the ways we dealt with this is to have QUICStream.read and QUICStream.write to be synchronous methods.

Then when we process a stream event... we always create streams whether a stream is readable or writable, and if the stream already exists, then we run the read or write operation.

So if you look at QUICConnection.processStreams it looks very symmetric.

@CMCDragonkai
Copy link
Member

It can still be concurrent on the network if the reason the stream is being recreated is because the peer's sent ACK is only received after the stream is already destroyed.

Point is, we need some sort of condition that simply says that once a stream is closed/destroyed, it should never lead it being recreated in any form. If that requires keeping track of the peer stream ID counter, so be it.

Although that seems similar to MatrixAI/js-quic#66

@CMCDragonkai
Copy link
Member

The tempfix above isn't really efficient, stream IDs always are linearly incremented, so it's sufficient to simply keep a counter as the threshold.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 10, 2023

For js-quic, we are prompting quiche to shutdown the underlying quiche state. And we're trying to maintain consistency between QUICStream and quiche state.

For js-ws, there is no underlying library for our streams. All state is just in the JS side (that we invented and have full control over).

Therefore if there's a pending ACK that may end up recreating a stream, the solution is to ensure that you just ignore such packets if the stream ID counter is already gone past that point.

At some point you have to think about protocol robustness.

What if the peer is improperly implemented? What if they send you stream frames with stream IDs that aren't supposed to occur, or they try to re-use stream IDs.

I believe in QUIC, you're supposed to consider these as connection protocol errors and simply close the connection. You could do the same in WS.

However if it is natural for the ack to be delayed in processing, you probably don't want to close the whole connection because of this. Therefore the connection lifecycle should be able to gracefully deal with this delayed processing of the ACK.

Copy link
Contributor Author

This is fixed isn't it? Or is it still a temp fix? I'll have to double check…

@CMCDragonkai
Copy link
Member

Did you check this? @tegefaulkes

@CMCDragonkai CMCDragonkai added the r&d:polykey:supporting activity Supporting core activity label Aug 13, 2024
Copy link
Contributor Author

I haven't checked this yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity
Development

No branches or pull requests

3 participants