Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce multithreading #63

Open
satyamtg opened this issue Jul 15, 2020 · 2 comments
Open

Introduce multithreading #63

satyamtg opened this issue Jul 15, 2020 · 2 comments

Comments

@satyamtg
Copy link
Contributor

We can eaisly support multithreading here by having multiple threads for for the download method of the xblock_extractor objects. However, we do have videos from youtube_dl which need to be in a separate queue (as that's throttled). So, I think we need to handle that in a good way here as multithreading drastically improves performance of this very scraper. Maybe we can have a main multithreaded process (because it has many HTTP requests) and handle youtube separately.

@rgaudin
Copy link
Member

rgaudin commented Jul 15, 2020

Agrees. Thanks for your experiments with multiprocessing.

This is very similar to other scrapers in that we have concurrent usages:

  • long cpu-intensive stuff we don't want to supervise (ffmpeg)
  • cpu-intensive stuff we want to supervise (images optimization)
  • unthrottled downloads
  • throttled downloads
  • unthrottled uploads

It's a lot of requirements that calls for flexibility. Also, we definitely want to assess our S3 performance before getting into this as we need to know where are the bottlenecks and which methods delivers best for those download/upload use cases.

This all renders this quite complex which is why I think we shall attempt to solve it on a less fragile scraper (youtube?) first and document/replicate onto others.

@stale
Copy link

stale bot commented Sep 13, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Sep 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants