-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add an "excludes" field to page selectors #239
Comments
Hey Kevin, Thanks for the kind words. This seems like a great idea to me, and I think you've identified a very clean implementation plan. I think the one thing left to consider is where and how users will edit these excludes selectors. I think it would be too much/overcomplicated to add this feature to the bookmarklet. It should be a sort of "advanced" feature for power users. So I think you could just get away with adding a field to the Note that you shouldn't need multiple exclude selectors, since you can just use a disjoint selector to represent this case (e.g. Let me know if you want any guidance on navigating rails, and thanks for contributing! |
Part of a solution to themarshallproject#239
Add exclude_selector to relevant views See themarshallproject#239
- Add exclude_selector to Page - Adjust Page.match_text and PageSnapshot.match_text to use exclude_selector - Expose exclude_selector in relevant views - Tests - Documentation Fixes themarshallproject#239
Hi Klaxon team,
First of all, thanks for open sourcing such an incredible tool. I've been experimenting with Klaxon to track changes in 2020 candidates's websites, and it has been super simple to use. Stellar work.
I am running into an issue, though, possibly because of the way I'm trying to use Klaxon. Because I'm interested in almost all changes in a candidate's website, I've been using
body
as my selector. That works great for some sites but picks up a lot of junk for others.The website Tim Ryan For Congress includes a section of his latest Twitter posts. I really don't care about those for the purposes of this (and these changes are filling up my free Heroku database fast). I think I can get around this limitation by using a
:not()
in my selector for this particular site. Luckily all content appears in aheader
orsection
element, and all these elements are siblings of each other, so I should be able to write a simple CSS selector to exclude the Tweets. But if the site were more complicated, I fear I'd run into some nokogiri limitations on its CSS selector logic that don't seem likely to be resolved until nokogiri v2.0.0.I'd like to propose adding some sort of exclude option for pages. I am envisioning the option as a CSS selector (or selectors). Any changes that occur within the excludes selectors would be ignored. In my Tim Ryan example above, my selector could remain
body
but I could exclude#feed-section
.What do you all think of an excludes option?
I am happy to try implementing this feature (although my Ruby skills are roughly nonexistent). I think it could be done fairly surgically by adjusting (
Page.match_text
)[https://github.com/themarshallproject/klaxon/blob/develop/app/models/page.rb#L39-L41] to do the following:.text
method to return the text of everything elsePlease let me know if you have any thoughts or considerations. And again, props to releasing such a great project.
The text was updated successfully, but these errors were encountered: