Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[windows] Use non blocking reads and writes #1221

Closed
reconbot opened this issue Jul 6, 2017 · 17 comments
Closed

[windows] Use non blocking reads and writes #1221

reconbot opened this issue Jul 6, 2017 · 17 comments
Labels

Comments

@reconbot
Copy link
Member

reconbot commented Jul 6, 2017

We need help with this one!

  • SerialPort Version: 5.0.0-beta6
  • Operating System and Hardware Platform: Windows / All

Right now we can only support 3 or 4 ports open at a time in the same process on windows unless the env var UV_THREADPOOL_SIZE is set to something higher than 4. This is because we're using WriteFile and ReadFile for our reads and writes which are blocking. We should transition to WriteFileEx and ReadFileEx which support callbacks when data is read or written. This should allow us to support "any" number of ports and have lower cpu utilization.

Relevant code; EIO_Write and EIO_Read

@reconbot reconbot added feature-request Feature or Enhancement windows labels Jul 6, 2017
@reconbot reconbot changed the title Use non blocking reads and writes on windows. [windows] Use non blocking reads and writes on windows. Jul 6, 2017
@reconbot reconbot changed the title [windows] Use non blocking reads and writes on windows. [windows] Use non blocking reads and writes Jul 6, 2017
@reconbot
Copy link
Member Author

reconbot commented Jul 6, 2017

Notes

http://tinyclouds.org/iocp-links.html

The fundamental variation is that in a Unix you generally ask the kernel to wait for state change in a file descriptor's readability or writablity. With overlapped I/O and IOCPs the programmers waits for asynchronous function calls to complete.

Unix non-blocking I/O is not beautiful.

https://msdn.microsoft.com/en-us/library/ms679360(v=vs.85).aspx

The error codes returned by a function are not part of the Windows API specification and can vary by operating system or device driver. For this reason, we cannot provide the complete list of error codes that can be returned by each function. There are also many functions whose documentation does not include even a partial list of error codes that can be returned.

@noopkat
Copy link

noopkat commented Jul 8, 2017

not sure I can help out with the C++ side, but happy to manually test! Let me know 😁

@reconbot
Copy link
Member Author

reconbot commented Jul 9, 2017

This was referenced Jul 29, 2017
@Craytor
Copy link

Craytor commented Aug 2, 2017

General question regarding this - is it possible that this will help the CPU usage go below 30%? Maybe 5%-10%?

@reconbot
Copy link
Member Author

reconbot commented Aug 3, 2017 via email

@dustmop
Copy link
Contributor

dustmop commented Aug 15, 2017

I have a working proof-of-concept using WriteFileEx and (incorrectly) CreateThread. Need to use uv_thread_id, implement the same for Read'ing, and clean it up. Will possibly have a PR by end of week?

@reconbot
Copy link
Member Author

Woh! This is great!

Why do we need threads?

@dustmop
Copy link
Contributor

dustmop commented Aug 16, 2017

From WriteFileEx: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365748(v=vs.85).aspx

"When this I/O operation finishes, and the calling thread is blocked in an alertable wait state, the operating system calls the function pointed to by lpCompletionRoutine"

An alertable wait state, from Alertable I/O: https://msdn.microsoft.com/en-us/library/windows/desktop/aa363772(v=vs.85).aspx

"To do this, it must enter an alertable state. A thread can only do this by calling one of the following functions with the appropriate flags: ..."

The WriteFileEx calls needs to be on a thread that can be blocked (with something like SleepEx), otherwise the OS won't call the async callback. Perhaps there's something I'm missing with regards to how node's events work, but I think the only way to do this is use a thread from the uv pool. Hacking on it today, I couldn't get the callback to ever trigger with everything staying on the main event loop thread. Hope this doesn't conflict with the expected utilization and performance improvements.

@dustmop
Copy link
Contributor

dustmop commented Aug 16, 2017

Actually, since the libuv threadpool only allows 3 or 4 threads, as mentioned a few times in this thread (sorry!) perhaps the correct solution is to actually use CreateThread for the WriteFileEx call and callback. That way, the libuv pool won't be exhausted. A new thread will be created for each write operation, which I don't understand the consequences of quite that well, but it doesn't seem possible to avoid it.

@reconbot
Copy link
Member Author

reconbot commented Aug 16, 2017 via email

@dustmop
Copy link
Contributor

dustmop commented Aug 23, 2017

Update on my status:

  1. Changing writing to use asynchronous callbacks is pretty simple. Using WriteFileEx and a FileIOCompletion callback just works, as long as its in a separate thread which can enter the alertable wait state.

  2. Reading is much trickier. The reason is due to how Windows does buffering and timeouts. For starters, the comment on line 159 about setting "ReadIntervalTimeout" to "MAXDWORD" meaning "Never timeout" is wrong. MAXDWORD means "timeout immediately", so the Read* call returns right away, even if there's zero bytes in the buffer. This is the reason the code needs to poll constantly, pegging the cpu, and this will remain the case even using ReadFileEx's asynchronous callbacks. In my own testing, sending a two byte string from an Arduino to SerialPort every second, ReadFileEx was called ~150,000 times before receiving anything.

  3. The fix for avoiding polling is to set "ReadIntervalTimeout" to 0, which is what actually means "never timeout". This will wait until the input buffer contains enough bytes to fulfill the requested size, but leads to another problem: the call will block if it doesn't have enough bytes, so the only safe technique is to request 1 byte at a time, which leads to lots of system calls receiving a single byte each.

  4. I believe the best technique is to set ReadIntervalTimeout to 0, call ReadFileEx, then once the callback triggers, set ReadIntervalTimeout to MAXDWORD and call SetCommTimeouts again, before calling ReadFile to get the rest of what's in the buffer. This should be safe as long as there's one thread reading at a time, which I believe should be the case?

Another possible approach would be to have two separate HANDLEs, one for reading and one for writing, so that reading could use WaitForSingleObject to see if there's data ready on the input buffer. I haven't tested this yet to see if it could work; I will put it off until later as a separate change.

I've been figuring this out using a stand-alone command-line application, so will need to integrate into Serialport, which shouldn't be too difficult at this point.

@reconbot
Copy link
Member Author

Wohhh! This is great!

  • Yes, only one read per HANDLE will be in flight at a time. Same with writes.
  • I'm unsure if we really really want to have an unlimited timeout, but maybe? Lets see how it works! Better than what we got for sure.

reconbot pushed a commit that referenced this issue Sep 2, 2017
…ws (#1313)

Instead of ReadFile and WriteFile, which block and transfer data synchronously, use ReadFileEx and WriteFileEx, which both allow async callbacks. In addition, change how timeouts are used for ReadFile*, using an unlimited timeout for the first byte, and no timeout for the rest of the data in the input buffer. This removes the need to poll entirely, while still retrieving all data available in the input buffer. In both cases, the I/O operations happen in their own threads, since Windows requires IOCompletion callbacks to wait for their calling thread to be in an "alertable wait state".

Fixes #1221
@reconbot
Copy link
Member Author

reconbot commented Sep 2, 2017

Some notes from testing;

  • Single device read and write low cpu 🎉
  • Passes on binding read no more than x bytes fix(windows): Read at max bytesToRead #1322
  • Passes on r/w for more 4+ arduinos
  • Passes on integration 2k read/write
  • Passes full Test suite

dustmop added a commit to dustmop/node-serialport that referenced this issue Sep 6, 2017
Instead of ReadFile and WriteFile, which block and transfer data synchronously, use ReadFileEx and WriteFileEx, which both allow async callbacks. In addition, change how timeouts are used for ReadFile*, using an unlimited timeout for the first byte, and no timeout for the rest of the data in the input buffer. This removes the need to poll entirely, while still retrieving all data available in the input buffer. In both cases, the I/O operations happen in their own threads, since Windows requires IOCompletion callbacks to wait for their calling thread to be in an "alertable wait state".

Fixes serialport#1221
@dustmop
Copy link
Contributor

dustmop commented Sep 6, 2017

Have some questions about the above testing results:

Passes on binding read no more than x bytes #1322

Should be fixed, now, if I understand correctly. I merged in the same fix from #1322 into the new PR.

Passes on r/w for more 4+ arduinos

I don't have a way to test this. Is this a regression? Does the current implementation handle it correctly?

Passes on integration 2k read/write

How do I run this?

Passes full Test suite

Is this the same as TEST_PORT=<port> npm test?

@reconbot
Copy link
Member Author

reconbot commented Sep 6, 2017

The previous implementation would block the libuv thread pool which has 4 threads by default. So worst your case your implementation does the same.

full test suite is TEST_PORT=<port> npm test if you leave out the TEST_PORT it skips a ton of tests

reconbot pushed a commit that referenced this issue Sep 7, 2017
Instead of ReadFile and WriteFile, which block and transfer data synchronously, use ReadFileEx and WriteFileEx, which both allow async callbacks. In addition, change how timeouts are used for ReadFile*, using an unlimited timeout for the first byte, and no timeout for the rest of the data in the input buffer. This removes the need to poll entirely, while still retrieving all data available in the input buffer. In both cases, the I/O operations happen in their own threads, since Windows requires IOCompletion callbacks to wait for their calling thread to be in an "alertable wait state".

Fixes #1221
@reconbot reconbot reopened this Sep 7, 2017
@reconbot
Copy link
Member Author

reconbot commented Sep 7, 2017

Keeping this open because we need to verify the # of concurrent arduinos we can run.

[email protected] has been released! 🎉

@reconbot
Copy link
Member Author

reconbot commented Oct 8, 2017

So far I haven't gotten support requests for this. It's passed my muster. I think it's worth going to prod.

I can't actually test how many concurrent connections we can make, it wouldn't be worse. We'll know for sure eventually. ¯\_(ツ)_/¯

@reconbot reconbot closed this as completed Oct 8, 2017
@lock lock bot locked and limited conversation to collaborators Apr 6, 2018
reconbot pushed a commit that referenced this issue Jul 24, 2018
…ws (#1313)

Instead of ReadFile and WriteFile, which block and transfer data synchronously, use ReadFileEx and WriteFileEx, which both allow async callbacks. In addition, change how timeouts are used for ReadFile*, using an unlimited timeout for the first byte, and no timeout for the rest of the data in the input buffer. This removes the need to poll entirely, while still retrieving all data available in the input buffer. In both cases, the I/O operations happen in their own threads, since Windows requires IOCompletion callbacks to wait for their calling thread to be in an "alertable wait state".

Fixes #1221
reconbot pushed a commit that referenced this issue Jul 24, 2018
Instead of ReadFile and WriteFile, which block and transfer data synchronously, use ReadFileEx and WriteFileEx, which both allow async callbacks. In addition, change how timeouts are used for ReadFile*, using an unlimited timeout for the first byte, and no timeout for the rest of the data in the input buffer. This removes the need to poll entirely, while still retrieving all data available in the input buffer. In both cases, the I/O operations happen in their own threads, since Windows requires IOCompletion callbacks to wait for their calling thread to be in an "alertable wait state".

Fixes #1221
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

4 participants