Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gsutil runs into socket timeout with -m options #1739

Open
MichaelJThomas-2016 opened this issue Aug 31, 2023 · 0 comments
Open

gsutil runs into socket timeout with -m options #1739

MichaelJThomas-2016 opened this issue Aug 31, 2023 · 0 comments

Comments

@MichaelJThomas-2016
Copy link

MichaelJThomas-2016 commented Aug 31, 2023

Hi,

I am trying to rsync a bucket from gcs -> aws via gsutil.

I am using composer to schedule a bash script that runs:

    set -e; 
    export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
    export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}; 
    sudo apt-get update -y && sudo apt-get install google-cloud-cli -y # This in itself is an issue with the python runtime on GKE
    gsutil -o "GSUtil:max_upload_compression_buffer_size=8G" -m rsync -r  gs://MY-BUCKET/MY_PREFIX/year={{execution_date.year}}/month={{execution_date.strftime('%m')}}/day={{execution_date.strftime('%d')}} \
    s3://MY-BUCKET/MY_PREFIX/year={{execution_date.year}}/month={{execution_date.strftime('%m')}}/day={{execution_date.strftime('%d')}}

If I remove the -m option, composer fails out - An issue i should ask them about - but a few files upload. If I leave the -m I get:

[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - Traceback (most recent call last):
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/threading.py", line 980, in _bootstrap_inner
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     self.run()
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/threading.py", line 917, in run
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     self._target(*self._args, **self._kwargs)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/daisy_chain_wrapper.py", line 189, in PerformDownload
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     self.gsutil_api.GetObjectMedia(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 352, in GetObjectMedia
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self._GetApi(provider).GetObjectMedia(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1244, in GetObjectMedia
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self._PerformDownload(bucket_name,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1383, in _PerformDownload
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     apitools_download.GetRange(additional_headers=additional_headers,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 485, in GetRange
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     response = self.__GetChunk(progress, end_byte,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 418, in __GetChunk
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return http_wrapper.MakeRequest(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 359, in MakeRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     retry_func(ExceptionRetryArgs(http, http_request, e, retry,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/retry_util.py", line 84, in RetriesInDataTransferHandler
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     http_wrapper.RethrowExceptionHandler(retry_args)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 348, in MakeRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return _MakeRequestNoRetry(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/http_wrapper.py", line 397, in _MakeRequestNoRetry
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     info, content = http.request(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 544, in NewRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return request_orig(uri, method=method, body=body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/oauth2client/oauth2client/transport.py", line 173, in new_request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     resp, content = request(orig_request_method, uri, method, body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/oauth2client/oauth2client/transport.py", line 280, in request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return http_callable(uri, method=method, body=body, headers=headers,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/httplib2/python3/httplib2/__init__.py", line 1701, in request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     (response, content) = self._request(
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 452, in OverrideRequest
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     (response, content) = self._conn_request(conn, request_uri, method, body,
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 685, in _conn_request
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     new_data = http_stream.read(TRANSFER_BUFFER_SIZE)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_media.py", line 403, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     data = orig_read_func(amt)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/http/client.py", line 463, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     n = self.readinto(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/http/client.py", line 507, in readinto
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     n = self.fp.readinto(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/socket.py", line 704, in readinto
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self._sock.recv_into(b)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/ssl.py", line 1242, in recv_into
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self.read(nbytes, buffer)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -   File "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/ssl.py", line 1100, in read
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO -     return self._sslobj.read(len, buffer)
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - socket.timeout: The read operation timed out
[2023-08-31, 00:01:21 UTC] {subprocess.py:93} INFO - The read operation timed out

Not exactly sure if its on the AWS end or not, but any help would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant