Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal crashes - "[Errno 32] Broken pipe" #431

Closed
motin opened this issue Jul 30, 2019 · 5 comments
Closed

Fatal crashes - "[Errno 32] Broken pipe" #431

motin opened this issue Jul 30, 2019 · 5 comments
Labels

Comments

@motin
Copy link
Contributor

motin commented Jul 30, 2019

In automation/SocketInterface.py in send at line 168

Traceback (most recent call last):
  File "crawler.py", line 102, in <module>
    manager.execute_command_sequence(command_sequence)
  File "/opt/OpenWPM/automation/TaskManager.py", line 541, in execute_command_sequence
    self._distribute_command(command_sequence, index)
  File "/opt/OpenWPM/automation/TaskManager.py", line 341, in _distribute_command
    thread = self._start_thread(browser, command_seq)
  File "/opt/OpenWPM/automation/TaskManager.py", line 408, in _start_thread
    "site_url": command_sequence.url
  File "/opt/OpenWPM/automation/SocketInterface.py", line 168, in send
    sent = self.sock.send(msg[totalsent:])
socket.error: [Errno 32] Broken pipe

This currently prevents larger crawls from completing.

If this issue can't be resolved, it'd be great if it can be caught gracefully and the site visit retried so that the crawl can continue.

@motin
Copy link
Contributor Author

motin commented Jul 30, 2019

Maybe revisiting WebSockets (as partially implemented in #221) is a good idea?

@motin
Copy link
Contributor Author

motin commented Jul 31, 2019

A crawl with backoffLimit 6 failed after 20 of these exceptions, and got through about 3k sites.
Another crawl with backoffLimit 100 failed after 125 of these exceptions, and got through about 8k sites.

With this very limited data, it seems that these socket-induced crashes occur on average every 64-150 sites crawled, with a result in crashing the container about 30-80% of the times.

@motin
Copy link
Contributor Author

motin commented Jul 31, 2019

Relates to #255

@englehardt
Copy link
Collaborator

This is almost always thrown when another process that the host process is communicating with has crashed, leaving the other end of the socket dead. Thus, catching this exception and recovering from the crash will be difficult or impossible. For example, the stack trace you included is from https://github.com/mozilla/OpenWPM/blob/9f0442241909fdac273eeb444bf06c0f2beb781c/automation/TaskManager.py#L404-L409. This likely means that the data aggregator process has crashed, which is not going to be easy to recover from. In this particular case, the right fix is in #438.

@englehardt englehardt added the bug label Aug 6, 2019
@englehardt
Copy link
Collaborator

Closing this since the cause for this particular set of errors was fixed in #438. I don't think there's anything more to do here, for the reasons stated in my comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants