Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpath, improvement: add another xpath for image for Hustler #1370

Merged
merged 2 commits into from
Jul 27, 2023

Conversation

nrg101
Copy link
Contributor

@nrg101 nrg101 commented Jun 23, 2023

This adds another xpath selector for the existing Hustler.yaml scraper so that it can get the cover image for scene pages, e.g.

I left the existing selector in case there are different styles/layouts for scene pages where the existing selector will continue to work.

Copy link
Contributor

@JackDawson94 JackDawson94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for the image as intended.

However most of the selectors are not working, when scanning with the test links above., Performers, Date, Details are not parsed.

@nrg101
Copy link
Contributor Author

nrg101 commented Jun 30, 2023

I've tweaked the xpath selectors for Title, Date, Performers, and Details

URL Title Performers Date Details
https://www.hustler.com/model/sahara-skye/movie/20230629/CAUGHT_MY_BUSTY_NEIGHBOR_MASTURBATING_3 ✔️ ✔️ ✔️ ✔️
https://www.hustler.com/model/charity-bangs/movie/20151116/black_cock_justice_pt__2 ✔️ ✔️ ✔️
https://www.hustler.com/model/madelyn-monroe/movie/20151123/black_cock_justice_pt__2 ✔️ ✔️ ✔️
https://www.hustler.com/model/mona-azar/movie/20230601/STACKED_MILFS ✔️ ✔️ ✔️ ✔️

I think older scenes with descriptions must have something about the pages that means Details can't be scraped in the same way.

In any case, the images work and I've now fixed all of the issues for new scenes and all but one (Details) of the issues for older scenes.

Copy link
Contributor

@JackDawson94 JackDawson94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it looks like for these old urls the only way to get the Details would be to have a subscraper call the JSON api... Would probably need to migrate to Python or Ruby for this. So I think this is the best we can do with the current XPath scraper

postProcess:
- parseDate: Jan 02, 2006
Details: //meta[@property="og:description"]/@content|//div[@class="description"]/p
Image: //div[@class="img-container"]/img/@src
Details: //p[following-sibling::a[@class="clickable"]]|//meta[@property="og:description"]/@content|//div[@class="description"]/p
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Details: //p[following-sibling::a[@class="clickable"]]|//meta[@property="og:description"]/@content|//div[@class="description"]/p
Details: //div[@class="panel-content"]/div/div/text()|//meta[@property="og:description"]/@content|//div[@class="description"]/p

gets part of the details for older scenes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, when I try this with the new scenes, it also gets the first part (before clicking Read More) rather than the full description... will see if I can come up with a selector combo that gets the full description for all scenes

@Maista6969 Maista6969 merged commit a577cc6 into stashapp:master Jul 27, 2023
1 check passed
litcum22 pushed a commit to litcum22/CommunityScrapers that referenced this pull request Aug 1, 2023
xpath, improvement: add another xpath for image for Hustler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants