openconnection() raises unexpected python error with high timeout value #52

choksi81 · 2014-05-24T17:36:46Z

When invoking the openconnection() API call, if the timeout field is set to be a large number, unexpected python socket error is raised. Error can be replicated with the following code.

timeout = 300
start_time = getruntime()
log("Start time: " + str(start_time))

try:
    openconnection(gethostbyname("gmail.com"), 12345, getmyip(), 12345, timeout)
except TimeoutError:
    pass
except Exception, e:
    log("Error raised: " + str(e)) 

end_time = getruntime()
log("End time: " + str(end_time))
log("Elapsed time: " + str(end_time - start_time))

Output is:

Start time: 0.353349924088Internal Error

---
Uncaught exception!

---
Following is a full traceback, and a user traceback.
The user traceback excludes non-user modules. The most recent call is displayed last.

Full debugging traceback:
  "/home/monzum/exdisk/work/affix_library/namespace.py", line 1206, in wrapped_function
  "/home/monzum/exdisk/work/affix_library/emulcomm.py", line 1304, in openconnection
  "/home/monzum/exdisk/work/affix_library/emulcomm.py", line 1143, in _timed_conn_initialize
  "/usr/lib/python2.7/socket.py", line 224, in meth

User traceback:

Exception (with class 'socket.error'): [103](Errno) Software caused connection abort

The text was updated successfully, but these errors were encountered:

JustinCappos · 2014-07-21T23:24:39Z

Dani is working on this...

aaaaalbert · 2017-09-04T15:18:46Z

I can confirm that this problem exists, testing on Mac OS X 10.11.6 with Python 2.7.13 and the sample code above. A packet trace in Wireshark shows that the TCP stack sends multiple SYNs to the destination, gets no reply, increases the inter-segment spacing following a certain progression (1, 1, 1, 1, 2, 4, 8, 16, and 32 seconds), and then gives up after 75 seconds have elapsed. Then, the next call to a socket method raises a socket error with errno 22, "Invalid argument". (This is different from the Linux error, and Windows probably shows yet something else.)

I'll test on Travis and AppVeyor to see what the maximum allowed timeout is between platforms. We can then patch emulcomm to raise a RepyArgumentError for timeouts larger than this.

This patches `emulcomm` to provide some insight into the mechanics of too-large socket timeouts, and provides a unit test to trigger the instrumentation.

aaaaalbert · 2017-09-04T15:55:27Z

On Linux / Ubuntu 14.04.5 LTS, the maximum timeout is 127 seconds.

aaaaalbert · 2017-09-04T16:26:08Z

Windows Server 2012 R2 (as tested on AppVeyor) just raises a WSAETIMEDOUT every 20 seconds. Inside the 300 seconds period, it raises no other error.

The actual errno and message are (10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond')

aaaaalbert · 2017-09-04T16:35:52Z

(Side note, Ubuntu 14.04.5 LTS with an old Python 2.5.6 does it the Windows way...)

aaaaalbert · 2017-09-05T08:08:33Z

OK, seems we have two options here:

Cap the allowed timeout at below 75 seconds to accomodate OSX/BSD, or
Try to catch the Python socket exception and report something useful to the sandbox.

My problem with the first option is that hardcoded constants like these may change across OS versions, and we'd need to play catch-up. OTOH, the patch is a single additional max=75 in the namespace definition of the timeout parameter.

My problem with the second option is that the Linux/Mac errors are so unspecific (software abort / invalid argument) that we could end up masking other errors. (However, I don't think we've seen "other errors" in the Repy network API lately.) Also, since Windows doesn't raise a different type of error for large timeouts, and the upper limits for Linux and Mac are different, we'd be exposing the node's OS indirectly.

lukpueh · 2017-09-07T18:58:15Z

What about a combination of both, i.e. catch the Python socket exception and raise something that suggests a socket timeout error only if thetimeout argument is above a certain threshold and a generic RepySocketException otherwise, instead of preemptively failing with a RepyArgumentError?

aaaaalbert · 2017-09-07T19:25:10Z

I think this would be very similar to my second option, which is (un)reliable (due to the generic error number of the exception) and OS-dependent (since Windows always raises the same error). Am I missing something?

lukpueh · 2017-09-07T21:09:02Z

Or maybe I missed something. Let me try to rephrase:

(1) Always raise RepyArgumentError if timeout > max_timeout even before Python raised a socket.error
(problem: how chose the right max_timeout)

(2) Always re-raise Python's socket.error as RepySocketTimeoutError
(problem: could mask other errors)

(3) Re-raise Python socket.error as RepySocketTimeoutError if timeout > max_timeout and RepySocketError otherwise
(reduces the likelihood of masking other errors (2) and reduces the severity of choosing the wrong max_timeout because it does not fail preemptively)

I had (3) in mind and read your ideas as (1) and (2) respectively.

aaaaalbert · 2017-09-12T15:33:14Z

After having clarified my concerns with @lukpueh offline, I think we agree that OS specificity is the key issue here. I will propose a patch that clamps the possible timeout values at 75 seconds, and prepare a unit test that checks the validity of this number so that we don't shoot our future selves in the foot.

choksi81 assigned monzum May 24, 2014

aaaaalbert assigned dani-witherspoon and unassigned monzum Jul 31, 2014

aaaaalbert removed T: defect labels Sep 10, 2015

aaaaalbert unassigned dani-witherspoon Sep 4, 2017

aaaaalbert added a commit to aaaaalbert/repy_v2 that referenced this issue Sep 4, 2017

Test SeattleTestbed#52, socket timeout

89819fa

This patches `emulcomm` to provide some insight into the mechanics of too-large socket timeouts, and provides a unit test to trigger the instrumentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openconnection() raises unexpected python error with high timeout value #52

openconnection() raises unexpected python error with high timeout value #52

choksi81 commented May 24, 2014

JustinCappos commented Jul 21, 2014

aaaaalbert commented Sep 4, 2017

aaaaalbert commented Sep 4, 2017

aaaaalbert commented Sep 4, 2017

aaaaalbert commented Sep 4, 2017

aaaaalbert commented Sep 5, 2017

lukpueh commented Sep 7, 2017

aaaaalbert commented Sep 7, 2017

lukpueh commented Sep 7, 2017

aaaaalbert commented Sep 12, 2017

openconnection() raises unexpected python error with high timeout value #52

openconnection() raises unexpected python error with high timeout value #52

Comments

choksi81 commented May 24, 2014

JustinCappos commented Jul 21, 2014

aaaaalbert commented Sep 4, 2017

aaaaalbert commented Sep 4, 2017

aaaaalbert commented Sep 4, 2017

aaaaalbert commented Sep 4, 2017

aaaaalbert commented Sep 5, 2017

lukpueh commented Sep 7, 2017

aaaaalbert commented Sep 7, 2017

lukpueh commented Sep 7, 2017

aaaaalbert commented Sep 12, 2017