Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTLS: Use bio callback to get fragment packet. v5.0.156, v6.0.47 #3565

Merged
merged 4 commits into from
Jun 5, 2023

Conversation

winlinvip
Copy link
Member

@winlinvip winlinvip commented Jun 2, 2023

In the event that the certificate is large, such as a 4096-bit certificate or contains a significant amount of information, the DTLS packet may exceed the MTU of 1200 bytes, even if the MTU has been set using SSL_set_mtu.

image

Note: The capture packets for the above picture is srs-dtls-server-no-fragment.pcapng.zip.

To reproduce this issue, you can change the SRS certificate to 4096 bits:

srs_error_t SrsDtlsCertificate::initialize() {
        // Generates a key pair and stores it in the RSA structure provided in rsa.
        // @see https://www.openssl.org/docs/man1.0.2/man3/RSA_generate_key_ex.html
        int key_bits = 4096;

Then, please configure SRS to use the DTLS client role:

vhost __defaultVhost__ {
    rtc { enabled on; rtmp_to_rtc on; rtc_to_rtmp on; dtls_role active; }
}

To clarify further, the MTU is indeed effective. As an example, the fourth packet, which is the certificate, is split into two DTLS records. The first record is 1092 bytes, while the second record is 130 bytes.

Therefore, it is important to remember that this bug is not related to the MTU setting of DTLS, which may have caused confusion among myself and others.

The DTLS record has already been divided to comply with the MTU limit, ensuring that each record is smaller than 1200 bytes!!!

The issue arises when we use BIO_get_mem_data to retrieve the DTLS packet response to the peer. In this case, we receive all the packets, which total 2056 bytes. These packets include the ServerHello (82B), CertificateFragment (1092B), CertificateFragment (130B), ServerKeyExchange (564B), CertificateRequest (66B), and ServerHelloDone (12B). Although each of these packets is smaller than the MTU, we receive all of them in a single call.

It is possible to parse the DTLS record individually and send them as separate UDP packets, with each DTLS record contained within a single UDP packet.

int left = size;
while (left > 0) {
    char* p = (char*)data + size - left;
    int nn = 13 + ((int)(uint8_t)p[11])*256 + (uint8_t)p[12];
    nn = srs_min(nn, left);
    nn = srs_min(nn, 1500);
    printf("Split large=%d UDP to %d packet\n", size, nn);
    if (nn > 0 && (err = network_->write(p, nn, NULL)) != srs_success) {
        return srs_error_wrap(err, "dtls send size=%u", nn);
    }
    left -= nn;
}

However, this approach is not ideal, and we would prefer not to use it. This is because SSL should have already handled this process, and we should be able to use the appropriate SSL API to obtain the correct packet for transmission.

For instance, Janus does not encounter this issue because it does not use BIO_new. Instead, it uses BIO_janus_dtls_agent_new, which returns a BIO with BIO_METHOD.

dtls->write_bio = BIO_janus_dtls_agent_new(dtls);

BIO *BIO_janus_dtls_agent_new(void *dtls) {
	BIO* bio = BIO_new(BIO_janus_dtls_agent_method());
	return bio;
}

static BIO_METHOD *BIO_janus_dtls_agent_method(void) {
	return &janus_dtls_bio_agent_methods;
}

static BIO_METHOD janus_dtls_bio_agent_methods = {
	BIO_TYPE_BIO,
	"janus agent writer",
	janus_dtls_bio_agent_write,
};

static int janus_dtls_bio_agent_write(BIO *bio, const char *in, int inl) {
	JANUS_LOG(LOG_HUGE, "janus_dtls_bio_agent_write: %p, %d\n", in, inl);
	int bytes = nice_agent_send(handle->agent, component->stream_id, component->component_id, inl, in);

If you wish to use BIO_new with BIO_s_mem, there is an alternative callback available.

BIO_set_callback(bio_out, srs_dtls_bio_out_callback);
BIO_set_callback_arg(bio_out, (char*)this);

long srs_dtls_bio_out_callback(BIO* bio, int cmd, const char* argp, int argi, long argl, long ret) {
    if (cmd == BIO_CB_WRITE && argp && argi > 0) {
        SrsDtlsImpl* dtls = (SrsDtlsImpl*)BIO_get_callback_arg(bio);
        srs_error_t err = dtls->write_dtls_data((void*)argp, argi);
}

This is the intended outcome of this patch, and as a result, we can observe that we receive fragment UDP packets, each of which is smaller than the MTU.

image

Note: The capture packets for the above picture is srs-dtls-server-fragment.pcapng.zip

By the way, we have made some improvements to the MTU setting. We referred to mediasoup's implementation (versatica/mediasoup#217) when adding the DTLS_set_link_mtu function, and we have also removed the SSL_set_max_send_fragment function as it was deemed unnecessary. See bellow code:

    // We have set the MTU to fragment the DTLS packet. It is important to note that the packet is split
    // to ensure that each handshake packet is smaller than the MTU.
    // @see https://stackoverflow.com/questions/62413602/openssl-server-packets-get-fragmented-into-270-bytes-per-packet
    SSL_set_options(dtls, SSL_OP_NO_QUERY_MTU);
    SSL_set_mtu(dtls, DTLS_FRAGMENT_MAX_SIZE);
    // See https://github.com/versatica/mediasoup/pull/217
    DTLS_set_link_mtu(dtls, DTLS_FRAGMENT_MAX_SIZE);

As part of our refinement process, we have made changes to the handshake process. Specifically, we now call SSL_do_handshake only after ICE has completed, since SSL_read can handle the handshake messages in case the handshake process is not yet complete. See bellow code:

srs_error_t SrsDtlsImpl::start_active_handshake() {
    // During initialization, we only need to call SSL_do_handshake once because SSL_read consumes
    // the handshake message if the handshake is incomplete.
    // To simplify maintenance, we initiate the handshake for both the DTLS server and client after
    // sending out the ICE response in the start_active_handshake function. It's worth noting that
    // although the DTLS server may receive the ClientHello immediately after sending out the ICE
    // response, this shouldn't be an issue as the handshake function is called before any DTLS
    // packets are received.
    int r0 = SSL_do_handshake(dtls);
    int r1 = SSL_get_error(dtls, r0); ERR_clear_error();
    // Fatal SSL error, for example, no available suite when peer is DTLS 1.0 while we are DTLS 1.2.
    if (r0 < 0 && (r1 != SSL_ERROR_NONE && r1 != SSL_ERROR_WANT_READ && r1 != SSL_ERROR_WANT_WRITE)) {
        return srs_error_new(ERROR_RTC_DTLS, "handshake r0=%d, r1=%d", r0, r1);
    }

    if ((err = start_arq()) != srs_success) {

Upon receiving a DTLS packet, regardless of whether it's a handshake message or an application message, we can easily process it by feeding it to bio_in and then consuming it using SSL_read. It's worth noting that there's no need to use SSL_do_handshake to consume the handshake message since SSL_read can handle it on its own. See bellow code:

srs_error_t SrsDtlsImpl::do_on_dtls(char* data, int nb_data) {
    BIO_write(bio_in, data, nb_data);
    r0 = SSL_read(dtls, buf, sizeof(buf));
    if (!handshake_done_for_us && SSL_is_init_finished(dtls) == 1) {
        handshake_done_for_us = true;
        on_handshake_done();

It has been observed that SSL automatically resets the previous timeout to zero upon receiving DTLS packets, eliminating the need for manual resetting. Therefore, all that is required is to set the timeout as demonstrated in the following code:

unsigned int dtls_timer_cb(SSL* dtls, unsigned int previous_us) {
    unsigned int timeout_us = previous_us * 2;
    timeout_us = srs_max(timeout_us, 50 * 1000);
    timeout_us = srs_min(timeout_us, 30 * 1000 * 1000);
    return timeout_us;
}

Additionally, within the ARQ loop, it is important to obtain the timeout using DTLSv1_get_timeout and sleep for a specific duration to ensure precise retransmission timeout when calling DTLSv1_handle_timeout. Using a constant timeout for sleeping and calling DTLSv1_handle_timeout may result in a slightly imprecise and inaccurate timeout, but the deviation should not be significant. See bellow code:

srs_error_t SrsDtlsClientImpl::cycle() {
    while (1) {
        if (handshake_done_for_us) break;

        // If there is a timeout in progress, it sets *out to the time remaining
        // and returns one. Otherwise, it returns zero.
        timeval to = {0};
        int r0 = DTLSv1_get_timeout(dtls, &to);
        srs_utime_t timeout = r0 == 1 ? to.tv_sec + to.tv_usec : 0;

        // There is timeout to wait, so we should wait, because there is no packet in openssl.
        if (timeout > 0) {
            srs_usleep(timeout);
            continue;
        }

        // DTLSv1_handle_timeout is called when a DTLS handshake timeout expires. If no timeout
        // had expired, it returns 0. Otherwise, it retransmits the previous flight of handshake
        // messages and returns 1. If too many timeouts had expired without progress or an error
        // occurs, it returns -1.
        r0 = DTLSv1_handle_timeout(dtls);

The session close code has been refined to enable immediate closure upon receiving an SSL CloseNotify or fatal message. For instance, if a DTLS handshake fails, the session is closed due to the fatal IP (Illegal Parameter) message.

@winlinvip winlinvip force-pushed the bugfix/dtls-fragment branch 2 times, most recently from d9e8256 to 2d1db8a Compare June 4, 2023 10:00
@winlinvip winlinvip changed the title DTLS: Use bio callback to get fragment packet. DTLS: Use bio callback to get fragment packet. v5.0.156, v6.0.47 Jun 5, 2023
@winlinvip winlinvip merged commit 24c4919 into ossrs:develop Jun 5, 2023
winlinvip added a commit that referenced this pull request Jun 5, 2023
1. The MTU is effective, with the certificate being split into two DTLS records to comply with the limit.
1. The issue occurs when using BIO_get_mem_data, which retrieves all DTLS packets in a single call, even though each is smaller than the MTU.
1. An alternative callback is available for using BIO_new with BIO_s_mem.
1. Improvements to the MTU setting were made, including adding the DTLS_set_link_mtu function and removing the SSL_set_max_send_fragment function.
1. The handshake process was refined, calling SSL_do_handshake only after ICE completion, and using SSL_read to handle handshake messages.
1. The session close code was improved to enable immediate closure upon receiving an SSL CloseNotify or fatal message.

------

Co-authored-by: chundonglinlin <[email protected]>
winlinvip added a commit that referenced this pull request Jun 5, 2023
1. The MTU is effective, with the certificate being split into two DTLS records to comply with the limit.
2. The issue occurs when using BIO_get_mem_data, which retrieves all DTLS packets in a single call, even though each is smaller than the MTU.
3. An alternative callback is available for using BIO_new with BIO_s_mem.
4. Improvements to the MTU setting were made, including adding the DTLS_set_link_mtu function and removing the SSL_set_max_send_fragment function.
5. The handshake process was refined, calling SSL_do_handshake only after ICE completion, and using SSL_read to handle handshake messages.
6. The session close code was improved to enable immediate closure upon receiving an SSL CloseNotify or fatal message.

------

Co-authored-by: chundonglinlin <[email protected]>
winlinvip added a commit that referenced this pull request Jun 5, 2023
1. The MTU is effective, with the certificate being split into two DTLS records to comply with the limit.
2. The issue occurs when using BIO_get_mem_data, which retrieves all DTLS packets in a single call, even though each is smaller than the MTU.
3. An alternative callback is available for using BIO_new with BIO_s_mem.
4. Improvements to the MTU setting were made, including adding the DTLS_set_link_mtu function and removing the SSL_set_max_send_fragment function.
5. The handshake process was refined, calling SSL_do_handshake only after ICE completion, and using SSL_read to handle handshake messages.
6. The session close code was improved to enable immediate closure upon receiving an SSL CloseNotify or fatal message.

------

Co-authored-by: chundonglinlin <[email protected]>
johzzy pushed a commit to johzzy/srs that referenced this pull request Jun 26, 2023
…rs#3565)

1. The MTU is effective, with the certificate being split into two DTLS records to comply with the limit.
2. The issue occurs when using BIO_get_mem_data, which retrieves all DTLS packets in a single call, even though each is smaller than the MTU.
3. An alternative callback is available for using BIO_new with BIO_s_mem.
4. Improvements to the MTU setting were made, including adding the DTLS_set_link_mtu function and removing the SSL_set_max_send_fragment function.
5. The handshake process was refined, calling SSL_do_handshake only after ICE completion, and using SSL_read to handle handshake messages.
6. The session close code was improved to enable immediate closure upon receiving an SSL CloseNotify or fatal message.

------

Co-authored-by: chundonglinlin <[email protected]>
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TransByAI Translated by AI/GPT.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants