-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix For SexbabesVR Scraper #1847
Conversation
The scene id in the the webpage now seems to be 614 for all scenes. Causing all scenes to be rescraped and never adding new scenes. This pulls the poster url which appears to have a unique identifier in the 2nd to last directory . Also updated the cover URL to pull the image used for the thumbnail on the index page. As the latest scene has has a SBS image for the cover where the thumbnail contains a more useful image All appears functional
da5a2df
to
28e93ab
Compare
There are three separate variations on how they have this information posted depending on the age of the scene. A random sampling over all scenes shows that the synopsis is successfully being scraped
It ran once I am unsure of how to properly test it tho.
The scraper update looks fine, but I think we should get rid of the migration. It takes forever without actually changing the scene IDs and in my case it even lead to a duplicate for some reason:
|
According to KLH there are some scenes that change scene ids starting around 610. Tomorrow, I will add logic to only check scenes starting at 600 and if they match what is already present don't update. This should greatly cut down on the migration time because newer scenes do get a little wonky. |
Added some error handling incase the website is unreachable. Added logic to ensure we only check scenes originating from SexBabesVR. Check only scenes starting at 600 as this is where the reported divergence between sceneID sources numbering occurred. And only update scenes that diverge in id
Done |
The scene id in on the scene webpage now seems to be 614 for all scenes. Causing all scenes to be rescraped and never adding new scenes.
This pulls the poster url which appears to have a unique identifier in the last directory.
Also updated the cover URL to pull the image used for the thumbnail on the index page. As the latest scene has a SBS image for the cover where the thumbnail contains a more useful image
All appears functional.
Might require deleting all scraped SexbabesVR scenes due to a shift in scene-id causing file mismatch and scene preview generation hiccupsAdded Migration Code