Fail on missing resources #160

rgaudin · 2020-10-26T10:49:32Z

As seen in #159, there are cases where we failed to download resources yet succeeded the scraper. We should fail on missing resources.

satyamtg · 2020-10-26T13:39:28Z

@rgaudin should we fail on all types of resources (which might be a bit tricky to implement as we have some invalid URLs too), or shall we fail on only major resources like videos?

rgaudin · 2020-10-26T13:40:46Z

Can you describe what are the failures we get currently? Which URLs and the reasons?

satyamtg · 2020-10-27T12:09:28Z

Yep. Some of them are during the subtitle download for some videos and fail with a 404, due to invalid links in the HTML (as it can be very random). We currently do acknowledge if download was successfull and rewrite the links only if successful downloads took place.

One solution would be to handle this explicitly for different xblocks and types of assets. A better solution would be to fail when we get errors and we have exhausted all retry attempts. But then we need to ensure that the URL exists and is not some random invalid URL due to which we fail the whole scraper.

Moreover, for some links, the content might not be available. An example would be video 8 on https://mooc.phzh.ch/courses/course-v1:PHZH+W-IB+2019_E/9a122b295d484793bbf1a33ab0217a69/ , which has been removed from YouTube, and hence youtube_dl would throw an error.

stale · 2020-12-26T12:19:50Z

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

benoit74 · 2023-07-10T08:18:40Z

Would it make sense to allow some failed resources like I did on iFixit. I mean we should probably not fail on first resource missing, but maybe an absolute and/or a relative threshold would make sens, e.g. if more than 10% of resources are missing, it means that we have a significant bug which should fail the scrapper run. Does it makes any sense?

rgaudin added the bug label Oct 26, 2020

rgaudin assigned satyamtg Oct 26, 2020

satyamtg added a commit that referenced this issue Nov 1, 2020

Fixes #160 - Fail properly on downloads

8dbd23e

satyamtg mentioned this issue Nov 1, 2020

Fixes #160 - Fail properly on downloads #164

Closed

kelson42 pinned this issue Nov 27, 2020

stale bot added the stale label Dec 26, 2020

stale bot removed the stale label Jul 10, 2023

benoit74 mentioned this issue Jul 10, 2023

Failure to download html + problem xblocks #175

Open

benoit74 assigned benoit74 and unassigned satyamtg Jul 11, 2023

benoit74 mentioned this issue Jul 11, 2023

Fail scrapper when there are too many errors while retrieving xblocks #178

Merged

benoit74 added this to the v1.1.0 milestone Jul 13, 2023

benoit74 closed this as completed in #178 Jul 14, 2023

kelson42 unpinned this issue Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail on missing resources #160

Fail on missing resources #160

rgaudin commented Oct 26, 2020

satyamtg commented Oct 26, 2020

rgaudin commented Oct 26, 2020

satyamtg commented Oct 27, 2020

stale bot commented Dec 26, 2020

benoit74 commented Jul 10, 2023

Fail on missing resources #160

Fail on missing resources #160

Comments

rgaudin commented Oct 26, 2020

satyamtg commented Oct 26, 2020

rgaudin commented Oct 26, 2020

satyamtg commented Oct 27, 2020

stale bot commented Dec 26, 2020

benoit74 commented Jul 10, 2023