-
-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Xpath post processing and performer name query #333
Conversation
I have added the ability to use the xpath scraper to perform performer name queries from the "Scrape from..." drop down.
The When scraping a performer by fragment (ie from the Scrape Performer dialog), the server will check for To illustrate this, I've extended the Boobpedia scraper config. It should now be a pretty much feature-complete scraper for that site, fulfilling the requirements #310. Once this PR is merged, I will add this config to the community scrapers repo. See below:
|
It seems to work ok for me. A note on the Boobpedia scraper example not the code. On some older entries e.g. Mia Khalifa or Brandi Love the EyeColor attribute doesn't have an anchor so Eyecolor is not found. Removing the trailing A warning for the Birthdate on Brandi Love It seems to me that the entries on Boobpedia are not that "standarized" so it is natural that we cant match everything at once. |
3d21142
to
c50f90a
Compare
I've rebased to resolve conflicts. Please retest. |
Looks ok to me. |
* Extend xpath configuration. Support concatenation * Add parseDate parsing option * Add regex replacements * Add xpath query performer by name * Fix loading spinner on scrape performer * Change ReplaceAll to Replace
Note: by necessity, this PR supercedes and extends #332. I can close the other PR, or just rebase this one when it is merged (ideally the latter).
This change adds some post-processing functionality to the xpath scraping configuration.
In an xpath scraper, a field value may now be either a string xpath selector value, or a sub-object.
If it is a sub-object, it must contain the
selector
field, which has the xpath selector value. Within the sub-object, further fields are available to perform post-processing:concat
: if an xpath matches multiple elements, andconcat
is present, then all of the elements will be concatenated togetherreplace
: contains an array of sub-objects. Each sub-object must have aregex
andwith
field. Theregex
field is the regex pattern to replace, andwith
is the string to replace it with.$
is used to reference capture groups -$1
is the first capture group,$2
the second and so on. Replacements are performed in order of the array.parseDate
: if present, the value is the date format using go's reference date (2006-01-02). For example, if an example date was14-Mar-2003
, then the date format would be02-Jan-2006
. See the time.Parse documentation for details. When present, the scraper will convert the input string into a date, then convert it to the string format used by stash (YYYY-MM-DD
).Post-processing is done in order of the fields above -
concat
, thenregex
, thenparseDate
.Below are two example scrapers that will hopefully illustrate these concepts.
This Boobpedia scraper illustrates the
concat
andparseDate
operations:This pornhub performer scraper illustrates
replace
andparseDate
. I tested it against Mia Malkova's performer page on pornhub, since it had most of the information filled in: