You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When used not from under chroot, lists is idempotent.
ghci> scrapeStringLike "<foo><baz></baz></foo><bar></bar>" $ lists $ html anySelector
Just ["<foo><baz></baz></foo>","<bar></bar>"]
ghci> scrapeStringLike "<foo><baz></baz></foo><bar></bar>" $ lists $ lists $ html anySelector
Just [["<foo><baz></baz></foo>"],["<bar></bar>"]]
ghci> scrapeStringLike "<foo><baz></baz></foo><bar></bar>" $ lists $ lists $ lists $ html anySelector
Just [[["<foo><baz></baz></foo>"]],[["<bar></bar>"]]]
Not so much when used from under chroot:
ghci> scrapeStringLike "<outer><foo><baz></baz></foo><bar></bar></outer>" $ chroot "outer" $ lists $ html anySelector
Just ["<foo><baz></baz></foo>","<bar></bar>"]
ghci> scrapeStringLike "<outer><foo><baz></baz></foo><bar></bar></outer>" $ chroot "outer" $ lists $ lists $ html anySelector
Just [["<baz></baz>"],[]]
ghci> scrapeStringLike "<outer><foo><baz></baz></foo><bar></bar></outer>" $ chroot "outer" $ lists $ lists $ lists $ html anySelector
Just [[[]],[]]
This inconsistency makes it much harder to reason locally about the written code. IMHO for a good user experience, one of the two behaviors should be selected, and I think the former is better because it's less opinionated. (The second behavior could be obtained by interspersing chroots between lists calls).
I think, this can be fixed by setting ctxInChroot to False in toZipper (defined as a where binding for inSerial.)
The text was updated successfully, but these errors were encountered:
FWIW I don't particularly like the fact that inSerial changes the behavior when under a chroot. I find it unintuitive and it couples the combinators together. I'd prefer to have an explicit combinator that allows to drop inside the children of a node, rather than doing it implicitly in an ad-hoc way depending on whether we are under chroot or not.
Maybe something like:
children :: ScraperT str m a -> ScraperT str m a
root :: Selector -> ScraperT str m a -> ScraperT str m a
chroot = children . root
This change would of course alter the behavior when getting attributes of the chrooted node.
I think the reason for the discrepancy is that inSerial conceptually wraps the HTML in an outer set of tags when not in a chroot so that for HTML like <a>1</a><b>2</b>, inSerial would visit the <a> and <b> sub-trees.
It does seem broken that stacking multiple calls to inSerial wouldn't result in descending further into the tree.
Define
lists
as follows:When used not from under
chroot
,lists
is idempotent.Not so much when used from under
chroot
:This inconsistency makes it much harder to reason locally about the written code. IMHO for a good user experience, one of the two behaviors should be selected, and I think the former is better because it's less opinionated. (The second behavior could be obtained by interspersing
chroot
s betweenlists
calls).I think, this can be fixed by setting
ctxInChroot
toFalse
intoZipper
(defined as awhere
binding forinSerial
.)The text was updated successfully, but these errors were encountered: