-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN integration test fails with: Remote end closed connection without response #720
YARN integration test fails with: Remote end closed connection without response #720
Comments
Looking up
The URL to the EJR was effectively the one recently introduced to avoid F5 rate-limiting: |
Can't seem to reproduce the issue, maybe a temporary hickup? Test steps:
from openeogeotrellis.backend import get_elastic_job_registry
ejr = get_elastic_job_registry()
ejr.health_check()
for _ in range(100):
ejr.get_job('j-2403131df01e4cbcb656ef2bdcefdd8b', user_id='1ff4f5cf-95cc-4bbb-ad8f-b5096d95006a')["status"] |
In the meanwhile, another integration test failed because a download request was unable to log resource usage with the ETL API:
Both of these cases are configured to be retried ( |
Got it: connection errors are retried but POST requests are not, most likely because this type of request is typically not idempotent; in this case (a search request and an idempotent usage report) they are though. |
Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 467, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 462, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/usr/lib64/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.8/http/client.py", line 268, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib64/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/usr/lib64/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/usr/lib64/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) ConnectionResetError: [Errno 104] Connection reset by peer During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 799, in urlopen retries = retries.increment( File "/opt/venv/lib64/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/opt/venv/lib64/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 467, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 462, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/usr/lib64/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.8/http/client.py", line 268, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib64/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/usr/lib64/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/usr/lib64/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1484, in full_dispatch_request rv = self.dispatch_request() File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1469, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/users/auth.py", line 88, in decorated return f(*args, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/views.py", line 671, in result costs = backend_implementation.request_costs(success=True, **request_costs_kwargs) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1525, in request_costs costs = etl_api.log_resource_usage( File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/integrations/etl_api.py", line 138, in log_resource_usage with self._session.post(f"{self._endpoint}/resources", headers={'Authorization': f"Bearer {access_token}"}, File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 637, in post return self.request("POST", url, data=data, json=json, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
This does not include JobNotFoundException. Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 467, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 462, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/usr/lib64/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.8/http/client.py", line 276, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 799, in urlopen retries = retries.increment( File "/opt/venv/lib64/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/opt/venv/lib64/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 467, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 462, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/usr/lib64/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.8/http/client.py", line 276, in _read_status raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1484, in full_dispatch_request rv = self.dispatch_request() File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1469, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/users/auth.py", line 88, in decorated return f(*args, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/views.py", line 889, in get_job_info job_info: BatchJobMetadata = backend_implementation.batch_jobs.get_job_info(job_id, user.user_id) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1829, in get_job_info job_metadata = registry.get_job_metadata(job_id, user_id) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/job_registry.py", line 810, in get_job_metadata ejr_job_info = self.elastic_job_registry.get_job(job_id=job_id, user_id=user_id) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 414, in get_job jobs = self._search(query=query, fields=fields or ["*"]) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 534, in _search return self._do_request("POST", "/jobs/search", json=query) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 320, in _do_request response = self._session.request( File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) Open-EO/openeo-geopyspark-driver#720
This does not include JobNotFoundException. Open-EO/openeo-geopyspark-driver#720 Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 467, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 462, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/usr/lib64/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.8/http/client.py", line 276, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 799, in urlopen retries = retries.increment( File "/opt/venv/lib64/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/opt/venv/lib64/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 467, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 462, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/usr/lib64/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.8/http/client.py", line 276, in _read_status raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1484, in full_dispatch_request rv = self.dispatch_request() File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1469, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/users/auth.py", line 88, in decorated return f(*args, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/views.py", line 889, in get_job_info job_info: BatchJobMetadata = backend_implementation.batch_jobs.get_job_info(job_id, user.user_id) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1829, in get_job_info job_metadata = registry.get_job_metadata(job_id, user_id) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/job_registry.py", line 810, in get_job_metadata ejr_job_info = self.elastic_job_registry.get_job(job_id=job_id, user_id=user_id) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 414, in get_job jobs = self._search(query=query, fields=fields or ["*"]) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 534, in _search return self._do_request("POST", "/jobs/search", json=query) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 320, in _do_request response = self._session.request( File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 467, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 462, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/usr/lib64/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.8/http/client.py", line 268, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib64/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/usr/lib64/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/usr/lib64/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) ConnectionResetError: [Errno 104] Connection reset by peer During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 799, in urlopen retries = retries.increment( File "/opt/venv/lib64/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/opt/venv/lib64/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 467, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/opt/venv/lib64/python3.8/site-packages/urllib3/connectionpool.py", line 462, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/usr/lib64/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.8/http/client.py", line 268, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib64/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/usr/lib64/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/usr/lib64/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1484, in full_dispatch_request rv = self.dispatch_request() File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1469, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/users/auth.py", line 88, in decorated return f(*args, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/views.py", line 671, in result costs = backend_implementation.request_costs(success=True, **request_costs_kwargs) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1525, in request_costs costs = etl_api.log_resource_usage( File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/integrations/etl_api.py", line 138, in log_resource_usage with self._session.post(f"{self._endpoint}/resources", headers={'Authorization': f"Bearer {access_token}"}, File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 637, in post return self.request("POST", url, data=data, json=json, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/opt/venv/lib64/python3.8/site-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Requests towards the ETL API are all idempotent so POST has been added to the list of retryable verbs. EJR API search requests are handled in a more ad-hoc way because simply retrying all POST requests will result in duplicate jobs and the EJR API does nothing to prevent that. Instead, the error will be propagated to the user and he will be aware that something might be off. |
Integration test
test_random_forest_train_and_load_from_jobid
failed while polling for job status:The text was updated successfully, but these errors were encountered: