Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The asychttpserver dies when a reverse proxy is used. #18161

Open
mrhdias opened this issue Jun 3, 2021 · 25 comments
Open

The asychttpserver dies when a reverse proxy is used. #18161

mrhdias opened this issue Jun 3, 2021 · 25 comments

Comments

@mrhdias
Copy link
Contributor

mrhdias commented Jun 3, 2021

I made an app that uses the asynchttpserver library and it works well. But many times when there are a lot of requests the server die with the following error: Exception message: Too many open files.
I did a test with wrk and when the requests are made directly to the server without any reverse proxy in between the server don't crash. But when I use a reverse proxy the server goes down!

Nginx configuration

server {
    listen  8080;
    server_name test;

    location / {
        proxy_pass http://localhost:9000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
}

Example to test

import asynchttpserver, asyncdispatch

proc main {.async.} =
  var server = newAsyncHttpServer()
  proc cb(req: Request) {.async.} =
    let headers = {"Date": "Tue, 29 Apr 2014 23:40:08 GMT",
        "Content-type": "text/plain; charset=utf-8"}
    await req.respond(Http200, "Hello World", headers.newHttpHeaders())

  server.listen Port(9000)
  while true:
    if server.shouldAcceptRequest():
      await server.acceptRequest(cb)
    else:
      poll()

asyncCheck main()
runForever()

After run:

$ wrk --latency -t 1 -c 1000 http://test:8080/

Current Output

nim c -r test.nim                                                                            8m  Thu 03 Jun 2021 11:38:13 AM WEST
Hint: used config file '/usr/local/programs/x86_64/nim-1.4.8/config/nim.cfg' [Conf]
Hint: used config file '/usr/local/programs/x86_64/nim-1.4.8/config/config.nims' [Conf]
Hint: 7870 lines; 0.021s; 6.992MiB peakmem; Debug build; proj: /home/hdias/devel/test.nim; out: /home/hdias/devel/test [SuccessX]
Hint: /home/devel/test  [Exec]
/home/hdias/devel/test.nim(18) test
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208) processPendingCallbacks
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(29) acceptRequestNimAsyncContinue
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(145) acceptRequestIter
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(390) read
[[reraised from:
/home/hdias/devel/test.nim(18) test
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208) processPendingCallbacks
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(29) mainNimAsyncContinue
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(145) mainIter
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(390) read
]]
[[reraised from:
/home/hdias/devel/test.nim(18) test
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208) processPendingCallbacks
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(438) asyncCheckCallback
]]
Error: unhandled exception: Too many open files
Async traceback:
  /home/hdias/devel/test.nim(18)                                   test
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208)  processPendingCallbacks
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(29)      acceptRequestNimAsyncContinue
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(145)     acceptRequestIter
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(390)   read
  #[
    /home/hdias/devel/test.nim(18)                                   test
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208)  processPendingCallbacks
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(29)      mainNimAsyncContinue
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(145)     mainIter
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(390)   read
  ]#
Exception message: Too many open files
 [OSError]
Error: execution of an external program failed: '/home/hdias/devel/test '

Additional Information

Nim Compiler Version 1.4.8 [Linux: amd64]
Compiled at 2021-05-25
Copyright (c) 2006-2021 by Andreas Rumpf

git hash: 44e653a9314e1b8503f0fa4a8a34c3380b26fff3
active boot switches: -d:release
@mrhdias
Copy link
Contributor Author

mrhdias commented Jun 5, 2021

I did another test with several values for the argument assumedDescriptorsPerRequest. Even when it is equal to zero and in this case it shouldn't accept requests, the assumedDescriptorsPerRequest has no effect.

My OS is Arch Linux, if it helps in something.

import asynchttpserver, asyncdispatch

proc main {.async.} =
  var server = newAsyncHttpServer()
  proc cb(req: Request) {.async.} =
    let headers = {"Date": "Tue, 29 Apr 2014 23:40:08 GMT",
        "Content-type": "text/plain; charset=utf-8"}
    await req.respond(Http200, "Hello World", headers.newHttpHeaders())

  server.listen Port(9000)
  while true:
    echo "Max Descriptors: ", maxDescriptors()
    echo "Active Descriptors: ", activeDescriptors()
    if server.shouldAcceptRequest(assumedDescriptorsPerRequest = 1):
      echo "Accept Request: ", activeDescriptors()
      await server.acceptRequest(cb)
    else:
      poll()

asyncCheck main()
runForever()

Result:

Max Descriptors: 1023
Active Descriptors: 2
Accept Request: 2
Max Descriptors: 1023
Active Descriptors: 2
Accept Request: 2
Max Descriptors: 1023
Active Descriptors: 2
Accept Request: 2
Max Descriptors: 1023
Active Descriptors: 2
Accept Request: 2
Max Descriptors: 1023
Active Descriptors: 2
Accept Request: 2
/home/hdias/dev/test.nim(21) test
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208) processPendingCallbacks
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(29) acceptRequestNimAsyncContinue
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(145) acceptRequestIter
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(390) read
[[reraised from:
/home/hdias/dev/test.nim(21) test
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208) processPendingCallbacks
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(29) mainNimAsyncContinue
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(145) mainIter
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(390) read
]]
[[reraised from:
/home/hdias/dev/test.nim(21) test
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208) processPendingCallbacks
/usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(438) asyncCheckCallback
]]
Error: unhandled exception: Too many open files
Async traceback:
  /home/hdias/dev/test.nim(21)                                   test
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208)  processPendingCallbacks
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(29)      acceptRequestNimAsyncContinue
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(145)     acceptRequestIter
  /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(390)   read
  #[
    /home/hdias/dev/test.nim(21)                                   test
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1930) runForever
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1627) poll
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(1368) runOnce
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncdispatch.nim(208)  processPendingCallbacks
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(29)      mainNimAsyncContinue
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncmacro.nim(145)     mainIter
    /usr/local/programs/x86_64/nim-1.4.8/lib/pure/asyncfutures.nim(390)   read
  ]#
Exception message: Too many open files
 [OSError]
Error: execution of an external program failed: '/home/hdias/dev/test '

I did another test but with an app made in golang and in this case the issue also happens, but the server doesn't die. Is it possible to implement something identical in Nim?

2021/06/05 11:16:56 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:56 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:56 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:56 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 10ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:57 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:58 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms
2021/06/05 11:16:58 http: Accept error: accept tcp [::]:9000: accept4: too many open files; retrying in 5ms

@dom96
Copy link
Contributor

dom96 commented Jun 5, 2021

I still dislike this shouldAcceptRequest API and seeing that Golang handles this transparently suggests we should get rid of it and log failed accepts just like Go.

For reference, lots of discussion in the PR for this feature: #15957.

@mrhdias
Copy link
Contributor Author

mrhdias commented Jun 5, 2021

Yes @dom96, I also think that the implementation of golang is the most correct. But in the previous solution it was not possible to catch the exception #15925. I just want to catch the exception and have the server keep running and not die!

I took a look at the asyncdispatch file and I think the activeDescriptors function doesn't read the actual value of the number of open files.

...
  # https://github.com/nim-lang/Nim/blob/version-1-4/lib/pure/asyncdispatch.nim#L299
  proc getGlobalDispatcher*(): PDispatcher =
    if gDisp.isNil:
      setGlobalDispatcher(newDispatcher())
    result = gDisp
...
# https://github.com/nim-lang/Nim/blob/version-1-4/lib/pure/asyncdispatch.nim#L1939
proc activeDescriptors*(): int {.inline.} =
  ## Returns the current number of active file descriptors for the current
  ## event loop. This is a cheap operation that does not involve a system call.
  when defined(windows):
    result = getGlobalDispatcher().handles.len
  elif not defined(nimdoc):
    result = getGlobalDispatcher().selector.count
...

@timotheecour
Copy link
Member

timotheecour commented Jun 5, 2021

potential duplicate of #16603, or at least related to it.

note, i had a WIP branch to fix this on linux, by querying the actual number of fd's, see timotheecour#750, see the RFC here: nim-lang/RFCs#382

@mrhdias
Copy link
Contributor Author

mrhdias commented Jun 5, 2021

potential duplicate of #16603, or at least related to it.

note, i had a WIP branch to fix this on linux, by querying the actual number of fd's, see timotheecour#750, but there's probably a C API call to do that?

The function to get the number of opened file descriptors for user is useful and can be used in many contexts. I don't know if the function will slow everything down.
But why not simplify and make the OSError exception work?

Anything simple like this, but keep the http server running.

proc serve*(server: AsyncHttpServer, port: Port,
            callback: proc (request: Request): Future[void] {.closure, gcsafe.},
            address = "") {.async.} =

  listen server, port, address
  while true:
    var
      address = ""
      client: AsyncSocket 
    try:
      (address, client) = await server.socket.acceptAddr()
    except OSError as e:
      echo e.meg # Too many open files
      client.close()
      await sleepAsync(500)
      continue
      # poll()

     asyncCheck processClient(server, client, address, callback)

@timotheecour
Copy link
Member

timotheecour commented Jun 5, 2021

But why not simplify and make the OSError exception work?

to avoid serving a request if we know ahead of time it will fail (and cause other in-flight requests to fail as a result), eg:

without shouldAcceptRequest logic, you can end up in a situation where all in-flight requests fail, in cases where you have requests that take a while to complete and that can open multiple fd's over the course of the request:

serve req1 # will take a while to complete, and open several fds
serve req2
serve req3
serve req4 # deny based on shouldAcceptRequest
finish req1
finish req2
finish req3

without shouldAcceptRequest, you could get:

serve req1
serve req2
serve req3
serve req4
req1 fails because of too many files opened
req2 fails because of too many files opened
req3 fails because of too many files opened
req4 fails because of too many files opened

another important benefit is to avoid in-flight requests to fail (which could be a bad thing for transactions etc)

@Varriount
Copy link
Contributor

The entire basis of shouldAcceptRequest is fatally flawed. You cannot know ahead of time whether a connection will be rejected due to the file descriptor limit, as it is entirely possible for other threads to allocate or deallocate file descriptors between the call to shouldAcceptRequest and the call to accept. This is yet another TOCTOU bug.

@mrhdias
Copy link
Contributor Author

mrhdias commented Jun 6, 2021

I've been checking the code in asynchttpserver file and found that in the processClient function the connection is never closed after the "while" loop. So even if the keepalive header is off the number of open files is always increasing. With the keepalive off and if the connection is closed after the "while" loop the server never dies.

# https://github.com/nim-lang/Nim/blob/version-1-4/lib/pure/asynchttpserver.nim#L292
proc processClient(server: AsyncHttpServer, client: AsyncSocket, address: string,
                   callback: proc (request: Request):
                      Future[void] {.closure, gcsafe.}) {.async.} =
  var request = newFutureVar[Request]("asynchttpserver.processClient")
  request.mget().url = initUri()
  request.mget().headers = newHttpHeaders()
  var lineFut = newFutureVar[string]("asynchttpserver.processClient")
  lineFut.mget() = newStringOfCap(80)

  while not client.isClosed:
    let retry = await processRequest(
      server, request, client, address, lineFut, callback
    )
    if not retry: break

  client.close() # <<<==== Close the connection

mrhdias added a commit to mrhdias/Nim that referenced this issue Jun 6, 2021
I've been checking the code in asynchttpserver file and found that in the processClient function the connection is never closed after the "while" loop. So even if the keepalive header is off the number of open files is always increasing. With the keepalive off and if the connection is closed after the "while" loop the server never dies.

If the header keepalive is on and if it exceeds the pre-defined number of file descriptors it automatically closes the connection! I also added a timeout for keepalive connections.
@mrhdias
Copy link
Contributor Author

mrhdias commented Jun 6, 2021

But why not simplify and make the OSError exception work?

to avoid serving a request if we know ahead of time it will fail (and cause other in-flight requests to fail as a result), eg:

without shouldAcceptRequest logic, you can end up in a situation where all in-flight requests fail, in cases where you have requests that take a while to complete and that can open multiple fd's over the course of the request:

serve req1 # will take a while to complete, and open several fds
serve req2
serve req3
serve req4 # deny based on shouldAcceptRequest
finish req1
finish req2
finish req3

without shouldAcceptRequest, you could get:

serve req1
serve req2
serve req3
serve req4
req1 fails because of too many files opened
req2 fails because of too many files opened
req3 fails because of too many files opened
req4 fails because of too many files opened

another important benefit is to avoid in-flight requests to fail (which could be a bad thing for transactions etc)

@timotheecour, can you see this PR #18198?

@timotheecour
Copy link
Member

yes, i commented there

mrhdias added a commit to mrhdias/Nim that referenced this issue Jun 6, 2021
…ainst devel branch.

The client header "connection" if equal to "keep-alive" is disabled whenever the number of file descriptors exceeds the specified maxFDs or exceeds the keep-alive timeout.
If the value of header "connection" equal "close", the connection is closed, if it is "keep-alive" it remains open and the number of file descriptors increases and the server die.
To avoid this issue the "keep-alive" connection is disabled. It is temporary until the number of files descriptors goes down, when the value goes down the connection can remain open. It also solves the issue when the "connection" is equal to "update".
@dom96
Copy link
Contributor

dom96 commented Jun 6, 2021

I agree with @Varriount. My take on how to solve this:

#18198 (comment)

@Araq
Copy link
Member

Araq commented Jun 7, 2021

The entire basis of shouldAcceptRequest is fatally flawed. You cannot know ahead of time whether a connection will be rejected due to the file descriptor limit, as it is entirely possible for other threads to allocate or deallocate file descriptors between the call to shouldAcceptRequest and the call to accept. This is yet another TOCTOU bug.

No, it's not a TOCTOU bug, it's simply defensive programming. Real world analogy: I don't send you 2 dollars when I have only 1 dollar left on my account. Yes, by the time you get the $2 somebody might give $6 to me so that I don't run into debt. But the chances of this happening are too slim so I won't risk it.

@Varriount
Copy link
Contributor

Varriount commented Jun 7, 2021

No, it's not a TOCTOU bug, it's simply defensive programming.

How is this not a TOCTOU bug?

  • At a point in time, the code checks whether the per-process file descriptor limit has been reached.
  • If the the file descriptor limit hasn't been reached, the code assumes there is a file descriptor available, and continues. If the file descriptor limit has been reached, the server just silently continues, not raising any warning or error.
  • Right after the check, some other process in the system consumes all available file descriptors, and the system file descriptor limit is reached. Alternately, a thread in the same process consumes all available file descriptors, and the per-process file descriptor limit is reached. This occurs after the previous check, but before the next step.
  • At a point in time shortly after, the code calls accept, intending to allocate and use a file descriptor, without handling the case where the file descriptor limit has been exhausted.
  • Because the system- or process-level file descriptor limit has been reached, an unhandled error is thrown, halting the program.

Also, why is

You cannot know ahead of time whether a connection will be rejected due to the file descriptor limit, as it is entirely possible for other threads to allocate or deallocate file descriptors between the call to shouldAcceptRequest and the call to accept.

invalid?

Real world analogy: I don't send you 2 dollars when I have only 1 dollar left on my account.

How do you know that you only have 1 dollar left in your account?

Keep in mind that, in order for the analogy to be accurate:

  • Your bank account is constantly being used by several other people (you don't know how many at a given moment), with multiple deposits, withdrawals, and transfers being done every minute, and sometimes even every second.
  • Your bank doesn't charge a fee for attempting to withdraw or transfer more money than is available.
  • But your bank does charge a fee for asking what the current account balance is.

If you want to send me $2, it makes more sense to simply attempt to send me the money. Checking beforehand would incur a fee, and wouldn't actually guarantee that the transfer would be successful anyway. You would be able to get a rough estimate (for a fee), but that would be it.

To be more explicit, there are per-process and system-wide file descriptor limits, as well as per-process and system-wide file descriptor counts, All four of these can change at any given moment. In particular, the number of file descriptors being used can change very rapidly. Furthermore, calling accept will always implicitly check whether a file descriptor can be allocated, likely in a manner that is much more performant than what can be done in userspace (since I assume it mostly runs in kernel-mode, with access to kernel-mode data). Performing the same check (in a manner that is likely less efficient) before calling accept is just duplicating logic that will be carried out regardless.


To be clear, I can see a rough check possibly being useful, but it runs into the problems outlined in this comment: nim-lang/RFCs#382 (comment)

@Araq
Copy link
Member

Araq commented Jun 7, 2021

If the file descriptor limit has been reached, the server just silently continues, not raising any warning or error.

Well it continues and then eventually the connection would be accepted, much like if accept would have returned EAGAIN. You seem to misrepresent what is actually happening just so that you can name it "TOCTOU".

Maybe this is helpful:

However Rust does not prevent general race conditions.

This is pretty fundamentally impossible, and probably honestly undesirable. Your hardware is racy, your OS is racy, the other programs on your computer are racy, and the world this all runs in is racy. Any system that could genuinely claim to prevent all race conditions would be pretty awful to use, if not just incorrect.

from https://doc.rust-lang.org/nomicon/races.html

Performing the same check (in a manner that is likely less efficient) before calling accept is just duplicating logic that will be carried out regardless.

But it is not the same check at all, accept only cares about a single file descriptor being available and then you hobble along, hoping for the best.

@Araq
Copy link
Member

Araq commented Jun 7, 2021

I still dislike this shouldAcceptRequest API and seeing that Golang handles this transparently suggests we should get rid of it and log failed accepts just like Go.

So where is your PR? And how is "just log it" good API design? It's terrible, libraries shouldn't log by themselves, we have "frameworks" for that.

@Varriount
Copy link
Contributor

Varriount commented Jun 7, 2021

Well it continues and then eventually the connection would be accepted, much like if accept would have returned EAGAIN. You seem to misrepresent what is actually happening just so that you can name it "TOCTOU".

Yes, "much like", but not exactly like. This logic is less accurate, and is vulnerable to a preventable race condition between the time it runs, and the time accept is called.

However Rust does not prevent general race conditions.

I don't understand how this refutes any point in my argument, unless you're asserting that efforts to prevent race conditions are pointless. Yes, some race conditions can't be prevented, however others can, like in this situation.

But it is not the same check at all, accept only cares about a single file descriptor being available and then you hobble along, hoping for the best.

Your point here seems to be that the check being performed before the call to accept is superior to handling the exception from accept, because it attempts to take into account the fact that each request handler may be using multiple file descriptors. The problem is that, in order for this to be effective:

  • A loosely accurate measurement of the number of file descriptors each in-flight handler is using must be known.
  • The measurement must take into account both system and process file descriptor limits and counts, and never undercount them.
  • The check must remain valid between the time it is made, to the time accept is called.

The first point is possible, but practically tiresome. As someone who works on large backend services, it is very difficult to know how many file descriptors are being used by even a single kind of request handler - I would have to measure not only how many file descriptors each kind of request handler in the service uses, but also the file descriptors being used by support libraries and running threads. Given that a single service can contain possibly hundreds of kinds of request handlers, and that I would have to continuously re-measure as development occurs and libraries are updated, such measurements would quickly become unfeasible.

The second point is harder to do, but I suppose it is technically possible, if expensive from a performance standpoint (multiple syscalls would likely need to be made).

The last point is impossible to do with a check, unless you're running in kernel mode. You would need to ensure that no file descriptor is allocated between the time the check is made, and the time the accept call is made.


I have yet to be told how this argument:

You cannot know ahead of time whether a connection will be rejected due to the file descriptor limit, as it is entirely possible for other threads to allocate or deallocate file descriptors between the call to shouldAcceptRequest and the call to accept.

is false, and have yet to be told how something like this:

  # Pseudo-code. 
  listen server, port, address
  while true:
    try:
      var (address, client) = await server.socket.acceptAddr()
      asyncCheck processClient(server, client, address, callback)
    except NoFileDescriptorsLeftException:
      poll()

won't work.

Instead, tangential details of my arguments are attacked instead - classification of the bug - and unrelated points are brought up - the possible merits of the existing code.

Call the bug TOCTOU, or a race condition, or anything you want. Point out any number of possible benefits the current logic may have. At the end of the day, there is still a bug here, even in the presence of the current logic, and it won't be fixed by making a check before a call to accept. However, it can be fixed by properly handling the exceptions that accept raises.

@dom96
Copy link
Contributor

dom96 commented Jun 7, 2021

it is very difficult to know how many file descriptors are being used by even a single kind of request handler - I would have to measure not only how many file descriptors each kind of request handler in the service uses, but also the file descriptors being used by support libraries and running threads. Given that a single service can contain possibly hundreds of kinds of request handlers, and that I would have to continuously re-measure as development occurs and libraries are updated, such measurements would quickly become unfeasible.

Quoting @Varriount ^. This is the reason why the shouldAcceptRequest isn't useful and just adds more complexity to the API.

So where is your PR? And how is "just log it" good API design? It's terrible, libraries shouldn't log by themselves, we have "frameworks" for that.

This is already an edge case in my eyes, if you're running out of fds on your system then you need to fix that. For edge cases like this there is no shame to simply copy what other mature libraries do. Golang logs so why don't we do the same? The server will continue running and things will be fine.

You can also expose a function call that will throw an exception so that they can be handled for more custom circumstances.

Now you ask me where my PR is, which implies that you'd agree with me if I just created a PR. That implies that you will accept whatever comes first, which isn't great. It also forgets that @mrhdias is trying to create the PR, I'm trying to help them do so.

@timotheecour
Copy link
Member

So where is your PR?

@Araq my PR would start by implementing nim-lang/RFCs#382 but I'm not sure why you downvoted it (without leaving a comment); it's useful for other use cases but in particular for properly implementing shouldAcceptRequest, because activeDescriptors does not work (see #16603) and getOpenFiles will.

it is very difficult to know how many file descriptors are being used by even a single kind of request handler

estimating an upper bound is sufficient and should be feasible in lots of situations; in those cases server code can be made more robust against over-committing server and aborting in-flight requests by denying accept when fds start becoming scarce. The logic can be left for client code which have more context than stdlib, but a key ingredient for that is exposing getOpenFiles

@Varriount
Copy link
Contributor

I feel like there's some confusion here, as this appears to be about two questions:

  • Is shouldAcceptRequest a check that should even be performed?
  • Is shouldAcceptRequest sufficient to prevent file descriptor exhaustion exceptions?

My answer to both questions is "No". I am speaking as someone who has developed, and currently does develop, backend servers/services for a living.


The logic can be left for client code which have more context than stdlib, but a key ingredient for that is exposing getOpenFiles

Let's expand this. How about just providing client code with the ability to determine whether a connection should be accepted? Then, if a situation calls for it, they can implement this class of logic in a way that is tailored to their program. That way, instead of hard-coding feature after feature that we assume users will want and use, we leave it up to the users.

Also, lets be real: given how most applications are developed, do you really think a developer is going to remember to set this "estimated file descriptor usage" in 99.99% of cases? Honestly? Because if they don't set it, then in 99.99% of cases, this check is just wasted CPU cycles.

estimating an upper bound is sufficient and should be feasible in lots of situations

Do these situations overlap with the kinds of applications where this "feature" might actually be useful? And assuming that they do, you are stating that it is feasible for the for developers of such programs to:

  • know what system calls allocate file descriptors (believe me, it's more than just accept and open)
  • know which parts of the program will make those system calls.
  • know which parts of the dependencies the program uses will make those system calls.
  • repeatedly re-evaluate all of the above as both your code and dependencies are updated.

http://0pointer.net/blog/file-descriptor-limits.html

@Araq
Copy link
Member

Araq commented Jun 10, 2021

Now you ask me where my PR is, which implies that you'd agree with me if I just created a PR. That implies that you will accept whatever comes first, which isn't great. It also forgets that @mrhdias is trying to create the PR, I'm trying to help them do so.

Well here is what really happened: asynchttpserver had a severe bug (yet another btw). I fixed the bug. Yes, really. There was a test case that failed and after my PR it started to work. I know perfectly well that:

  • The mechanism I introduced is not the "real fix", which would be proper exception handling. I'm still waiting for a PR that does that, so far, everybody loves to talk about how bad my workaround is.
  • The mechanism I introduced is generally useful, yes, I know you don't agree. But I too did develop backend services that did run 24h 7 days a week. And I argue that the mechanism is worth well having.

If you think that you cannot estimate the number of file descriptor handles well then I cannot help you but I notice that every major OS has a maximum limit of these things, indicating that it's a resource you should be aware of.

@dom96
Copy link
Contributor

dom96 commented Jun 10, 2021

I really I just want us to fix this bug correctly now. You keep saying you are waiting for a PR, I am just trying to help mrhdias implement this PR.

Can we just agree to get rid of this API and expose proper exceptions for this case? With the default behaviour of logging these errors like Go (and probably most implementations) does?

@Araq
Copy link
Member

Araq commented Jun 11, 2021

You keep saying you are waiting for a PR, I am just trying to help mrhdias implement this PR.

Good, I have no objection here.

Can we just agree to get rid of this API and expose proper exceptions for this case? With the default behaviour of logging these errors like Go (and probably most implementations) does?

Well I happen to like the mechanism. And if you tell me that we need logging otherwise to compensate for this case I like the mechanism even moreso because the Nim standard library doesn't log. I know we have no official guideline that says that, because it never occured to me that anybody would accept a standard library that logs.

@mrhdias
Copy link
Contributor Author

mrhdias commented Jun 11, 2021

The psutils package of the python language has the "num_fds" function that gives he number of file descriptors currently opened for every OS.

#!/usr/bin/python

# https://psutil.readthedocs.io/en/latest/#psutil.Process.num_fds
# https://github.com/giampaolo/psutil/blob/master/psutil/_pslinux.py#L2135

import psutil

p = psutil.Process()
print(p.num_fds())

@Varriount
Copy link
Contributor

Note that psutils isn't part of Python's standard library.

@KhazAkar
Copy link

Note that psutils isn't part of Python's standard library.

Using resource module you can try replicate what psutil does here:
https://docs.python.org/3/library/resource.html

BTW, is this ticket still applicable today?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants