Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore support for Youtube-hosted videos (videos not hosted on TED CDN) #164

Closed
benoit74 opened this issue Mar 4, 2024 · 8 comments · Fixed by #168
Closed

Restore support for Youtube-hosted videos (videos not hosted on TED CDN) #164

benoit74 opened this issue Mar 4, 2024 · 8 comments · Fixed by #168
Assignees
Labels
Milestone

Comments

@benoit74
Copy link
Collaborator

benoit74 commented Mar 4, 2024

See openzim/zim-requests#849 for details

@benoit74 benoit74 added the bug label Mar 4, 2024
@benoit74 benoit74 self-assigned this Mar 4, 2024
@benoit74 benoit74 added this to the 2.1.1 milestone Mar 8, 2024
@benoit74
Copy link
Collaborator Author

https://www.ted.com/talks/william_sieghart_the_connective_potential_of_poetry has a null h264 property in playerData.resources.

The fact is that in such a case, the scraper does not work. While the player on the web page fallback to a Youtube player as can be seen in screenshot below:

image

Other videos hosted on TED CDN have a different player:

image

This issue is then more about the support for Youtube-hosted videos when the TED one is not available. This impact many topics where some videos are missing because hosted only on Youtube. I'm currently running an evaluation of the big science topic.

@benoit74
Copy link
Collaborator Author

Btw, the Firefox vs Chrome/Brave situation is just a side-effect we should not care about, it is mainly a TED problem in fact ^^

@benoit74 benoit74 changed the title Some videos are not found while they could be opened in Brave/Chrome Add support for Youtube-hosted videos (videos not hosted on TED CDN) Mar 11, 2024
@benoit74
Copy link
Collaborator Author

In the science topic, 55 videos have been ignored. It looks like all could have been downloaded. Total amount of videos in this topic is 1489, so this represent about 3.6%.

@rgaudin
Copy link
Member

rgaudin commented Mar 12, 2024

Good to know !

@benoit74
Copy link
Collaborator Author

I should have written "restore support for ...". It was working in the past, and most of the code is still there we just do not parse properly the video JSON data anymore.

@benoit74 benoit74 changed the title Add support for Youtube-hosted videos (videos not hosted on TED CDN) Restore support for Youtube-hosted videos (videos not hosted on TED CDN) Mar 14, 2024
@rgaudin
Copy link
Member

rgaudin commented Mar 14, 2024

Good news then (I think)

@benoit74
Copy link
Collaborator Author

Very good yes, I just had to add a bit more stuff to align everything a bit better, you will see in the PR, but at least if was fast and easy.

@benoit74
Copy link
Collaborator Author

Thank you for your good remembering of how the scraper worked in the past, it definitely help (together with the usual git bisect, I love this tool ^^)

@benoit74 benoit74 modified the milestones: 2.1.1, 3.0.0 Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants