-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for grabing all videos, no matter the language #171
Comments
@benoit74 I have an idea of how we can approach this issue. In entrypoint.py we can add a parser argument that scrapes videos in all languages (shown in screenshot), and we make sure that the user doesn't pass in arguments for languages and and all-languages at the same time in error handling. Then in scraper.py we would need to initialize all_languages in the class and then make sure we convert all language queries into TED language codes. Please let me know if I'm on the right track here |
I don't think we need to add a new parser argument, we can probably just make the language argument optional, and if not set it means we do not want a specific language but all videos available (in selected topic(s) or playlist(s)). Next when language argument is not set, we have to adapt queries that use this argument (or the derived source_languages attribute, or any other attribute) to not filter anymore by language. Adding all language codes is too cumbersome and risky (what if a new language appears and we do not support it yet in our list of TED codes). It is important to adapt both run mode: by playlist and by topic. And also to ensure the TED multi (where we create one ZIM per playlist or topic) is adapted as well. Is that clearer? WDYT about it? |
@benoit74 , I have read your reply to the question and here's what I understand: |
I would prefer to base the decision on |
@benoit74 , I have been digging through the code and I have made some fixes that should address the issue but I would require some little bit of clarification. As there is no flag to make a dry run, I had to download the output of the videos_with_lang.json Where I would require clarification is looking at the output, there is one video link irrespective of if the Also, I would also like to propose disabling the |
It looks normal, yes:
What is the issue if we do not do this? I don't see the problem. |
Okay, I think I misunderstood it a little. Still wrapping my head around all the options. |
Do not hesitate to continue to ask question or speak up if what I'm saying makes no sense, you have the code under your eyes, I have memories. |
Okay. Thanks for your assistance |
The scraper should support the case where a user want "all" languages.
For now, it is not possible, the user has to pass the precise list of languages needed.
The text was updated successfully, but these errors were encountered: