-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow direct manipulation of TagSpec object #83
Comments
This has come up a few times. Historically, I've pushed back on exposing this type directly since I consider it an implementation detail, and I've already changed the type signature dramatically several times to get things to run faster. However, I would be OK reworking things so that we are able to expose some more low level APIs while keeping the real internals hidden and out of the public API. Do you have a proposal for what such an API would look like or an idea of what sort of operations you are looking for? |
Essentially I would like 3 different types: A list/vector of A An I would really like to be able to pattern match on and inspect/print these types, it helps a lot with debugging and intuitiveness. The exact details of this api and what's underneath it is not important, but basically anything that's easy/intuitive/debuggable with the above Api should ideally not have to be changed too significantly to work.
|
I think that sounds doable. One area that I think still needs some consideration are the edge cases around how different types of malformed HTML are handled. We'd also need to ensure that we do not regress on performance, a lot of commits went into getting things running fast with the current data structures. Another option to consider would be exposing a scraper that returns a list of TagSoup Tags. You could then combine this with tags :: StringLike str => Selector -> Scraper str [Tag str]
tags = foldSpec (\tag s -> tag : s)
tree :: StringLike str => Selector -> Scraper str (TagTag str)
tree selector = tagTree <$> tags selector I suspect the first approach would take a nontrivial amount of work and I don't have much bandwidth myself to work on this right now. However, I'd be happy to take a patch if you want to take it on. |
Sometimes I would rather work with the node tree (and thus
TagSpec
) itself rather than theScraper
/SerialScraper
interface.It would be optimal for my use case if
TagSpec
and various functions for manipulating it (children
,name
etc.) were exposed as a low level api. The current high level api would then be a layer on top of that and would be the same as it is currently, except perhaps some extra functions for dropping into the low level api when desired.Of course
TagSpec
itself would have to be an abstract data type with a hidden constructor / fields rather than a tuple to preserve various invariants from being violated. It would also probably be worth renaming the type to something likeHtml
orNodes
or similar. Another thing to consider would be whether or not its worth having explicit types for when you know you have a single node vs potentially zero or multiple nodes (Tree
/Node
vsForest
/Nodes
/Html
) to make functions likename :: Node str -> str
make more sense.The text was updated successfully, but these errors were encountered: