Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openconnection() raises unexpected python error with high timeout value #52

Open
choksi81 opened this issue May 24, 2014 · 10 comments
Open

Comments

@choksi81
Copy link
Contributor

When invoking the openconnection() API call, if the timeout field is set to be a large number, unexpected python socket error is raised. Error can be replicated with the following code.

timeout = 300
start_time = getruntime()
log("Start time: " + str(start_time))

try:
    openconnection(gethostbyname("gmail.com"), 12345, getmyip(), 12345, timeout)
except TimeoutError:
    pass
except Exception, e:
    log("Error raised: " + str(e)) 

end_time = getruntime()
log("End time: " + str(end_time))
log("Elapsed time: " + str(end_time - start_time))

Output is:

Start time: 0.353349924088Internal Error

---
Uncaught exception!

---
Following is a full traceback, and a user traceback.
The user traceback excludes non-user modules. The most recent call is displayed last.

Full debugging traceback:
  "/home/monzum/exdisk/work/affix_library/namespace.py", line 1206, in wrapped_function
  "/home/monzum/exdisk/work/affix_library/emulcomm.py", line 1304, in openconnection
  "/home/monzum/exdisk/work/affix_library/emulcomm.py", line 1143, in _timed_conn_initialize
  "/usr/lib/python2.7/socket.py", line 224, in meth

User traceback:

Exception (with class 'socket.error'): [103](Errno) Software caused connection abort
@JustinCappos
Copy link
Contributor

Dani is working on this...

@aaaaalbert
Copy link
Collaborator

I can confirm that this problem exists, testing on Mac OS X 10.11.6 with Python 2.7.13 and the sample code above. A packet trace in Wireshark shows that the TCP stack sends multiple SYNs to the destination, gets no reply, increases the inter-segment spacing following a certain progression (1, 1, 1, 1, 2, 4, 8, 16, and 32 seconds), and then gives up after 75 seconds have elapsed. Then, the next call to a socket method raises a socket error with errno 22, "Invalid argument". (This is different from the Linux error, and Windows probably shows yet something else.)

I'll test on Travis and AppVeyor to see what the maximum allowed timeout is between platforms. We can then patch emulcomm to raise a RepyArgumentError for timeouts larger than this.

aaaaalbert added a commit to aaaaalbert/repy_v2 that referenced this issue Sep 4, 2017
This patches `emulcomm` to provide some insight into the mechanics
of too-large socket timeouts, and provides a unit test to trigger
the instrumentation.
aaaaalbert added a commit to aaaaalbert/repy_v2 that referenced this issue Sep 4, 2017
This patches `emulcomm` to provide some insight into the mechanics
of too-large socket timeouts, and provides a unit test to trigger
the instrumentation.
@aaaaalbert
Copy link
Collaborator

On Linux / Ubuntu 14.04.5 LTS, the maximum timeout is 127 seconds.

@aaaaalbert
Copy link
Collaborator

Windows Server 2012 R2 (as tested on AppVeyor) just raises a WSAETIMEDOUT every 20 seconds. Inside the 300 seconds period, it raises no other error.

The actual errno and message are (10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond')

@aaaaalbert
Copy link
Collaborator

(Side note, Ubuntu 14.04.5 LTS with an old Python 2.5.6 does it the Windows way...)

@aaaaalbert
Copy link
Collaborator

OK, seems we have two options here:

  • Cap the allowed timeout at below 75 seconds to accomodate OSX/BSD, or
  • Try to catch the Python socket exception and report something useful to the sandbox.

My problem with the first option is that hardcoded constants like these may change across OS versions, and we'd need to play catch-up. OTOH, the patch is a single additional max=75 in the namespace definition of the timeout parameter.

My problem with the second option is that the Linux/Mac errors are so unspecific (software abort / invalid argument) that we could end up masking other errors. (However, I don't think we've seen "other errors" in the Repy network API lately.) Also, since Windows doesn't raise a different type of error for large timeouts, and the upper limits for Linux and Mac are different, we'd be exposing the node's OS indirectly.

@lukpueh
Copy link
Contributor

lukpueh commented Sep 7, 2017

What about a combination of both, i.e. catch the Python socket exception and raise something that suggests a socket timeout error only if thetimeout argument is above a certain threshold and a generic RepySocketException otherwise, instead of preemptively failing with a RepyArgumentError?

@aaaaalbert
Copy link
Collaborator

I think this would be very similar to my second option, which is (un)reliable (due to the generic error number of the exception) and OS-dependent (since Windows always raises the same error). Am I missing something?

@lukpueh
Copy link
Contributor

lukpueh commented Sep 7, 2017

Or maybe I missed something. Let me try to rephrase:

(1) Always raise RepyArgumentError if timeout > max_timeout even before Python raised a socket.error
(problem: how chose the right max_timeout)

(2) Always re-raise Python's socket.error as RepySocketTimeoutError
(problem: could mask other errors)

(3) Re-raise Python socket.error as RepySocketTimeoutError if timeout > max_timeout and RepySocketError otherwise
(reduces the likelihood of masking other errors (2) and reduces the severity of choosing the wrong max_timeout because it does not fail preemptively)

I had (3) in mind and read your ideas as (1) and (2) respectively.

@aaaaalbert
Copy link
Collaborator

After having clarified my concerns with @lukpueh offline, I think we agree that OS specificity is the key issue here. I will propose a patch that clamps the possible timeout values at 75 seconds, and prepare a unit test that checks the validity of this number so that we don't shoot our future selves in the foot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants