Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wikihow is not retrying to fetch illustrations #164

Open
benoit74 opened this issue May 17, 2024 · 0 comments
Open

wikihow is not retrying to fetch illustrations #164

benoit74 opened this issue May 17, 2024 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@benoit74
Copy link
Collaborator

Task : https://farm.openzim.org/pipeline/dafaba4f-3b4c-4fa2-a541-e031be49402e/debug (wikihow_fr_maxi)

Logs:

[MainThread::2024-05-06 15:02:06,711] DEBUG:Adding illustrations
[MainThread::2024-05-06 15:17:50,755] ERROR:Interrupting process due to error: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))
[MainThread::2024-05-06 15:17:50,755] ERROR:('Connection aborted.', TimeoutError(110, 'Connection timed out'))
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/local/lib/python3.8/ssl.py", line 1073, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.8/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/local/lib/python3.8/ssl.py", line 1073, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.8/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.3-py3.8.egg/wikihow2zim/scraper.py", line 948, in run
    self.add_illustrations()
  File "/usr/local/lib/python3.8/site-packages/wikihow2zim-1.2.3-py3.8.egg/wikihow2zim/scraper.py", line 170, in add_illustrations
    handle_user_provided_file(source=self.conf.icon, dest=src_illus_fpath)
  File "/usr/local/lib/python3.8/site-packages/zimscraperlib/inputs.py", line 40, in handle_user_provided_file
    stream_file(url=str(source), fpath=dest)
  File "/usr/local/lib/python3.8/site-packages/zimscraperlib/download.py", line 197, in stream_file
    resp = session.get(
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))

Here it clearly looks like the illustration is fetched from an online source (i.e. it is not specified by the user in the recipe configuration) and the fetch through handle_user_provided_file fails. There is no retry mechanism so the thing just stops the scrape.

@benoit74 benoit74 added the bug Something isn't working label May 17, 2024
@benoit74 benoit74 added this to the 1.3.0 milestone May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant