-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Aggregator typelclass #5
Comments
Even with a provisional 👍, the cats issue would be able to be closed, hence a /CC to @johnynek as a cats maintainer |
So there's no agreement right now that the |
Aggregator is parallel. Fold is sequential. Algebird has both. Aggregator is basically: If I could do it again, I would do: trait Aggregator[-I, +O] {
type A
def prepare(i: I): A
def semigroup: Semigroup[A]
def present(a: A): O
} (hide the middle type in a type member). Since a semigroup can be applied in parallel (since it is associative), the Aggregator can. Also, CommutativeAggregator is interesting, since in map-reduce settings (like spark, hadoop), you can do pre-aggregation before you group by the key to finish the combining. Not the same as Fold. All Aggregators can be a Fold, but not vice-versa. |
Thanks for the explanation, now I understand. I'm wondering if we generalize this to trait Aggregator[F[_], -I, +O] {
type A
val applicative: Alternative[F]
def prepare(i: I): F[A]
def present(a: A): F[O]
} I think that |
I haven't thought about it. There may be some useful examples, but algebird has MANY Aggregator examples. So many things can be expressed that way (virtually everything anyone uses from SQL for instance). We got by at Stripe with almost all our feature generation for a long time being expressed as Aggregators. So, I would strongly encourage you to have this type since so many computations fall into the pattern. For Alternative, you have to be prepare to make a monoid for all |
That was the purpose of my comment about using |
I also agree this type is wonderfully powerful yet deceptively simple. My commercial use case was for a distributed, big data database for multi-dimensional time series. The exact dimension was a runtime parameter, so the monoid's So this is why I'm such a fan of the I'm sure a quick flick through some literature/haskell libs would yield a whole bunch of other additions for folds, but would bloat out cats. So why not have the |
Hmm, perhaps that should have been "deceptively powerful yet wonderfully simple". I'm sure you get my gist 😉 More importantly is to /CC @benhutchison who originally raised the issue and to add that a Reducer typeclass would be another possible addition, perhaps as an alternative to cats Reducible. I'm certainly not suggesting that origami be a general bucket for fold-like-things that don't make it into cats. Far from that. I think that there are many justified use cases that are perhaps a bit too specialized/edge-casy to fit into a more general FP library but fully deserve to exist. I concede that this might be extending the original intent from "Monadic folds" to perhaps "Monadic folds and more." And finally, here's another potential client for an |
Just wondering. An def aggregator[A, O : Monoid, B](f: A => O, finalize: O => B) = new FoldId[A, B] {
type S = O
def start = Monoid[O].empty
def fold = (s: S, a: A) => Monoid[O].combine(s, f(a))
def end(s: S) = finalize(s)
} The main difference with the On the question of |
Algebird's fold is here: indeed you can create a Fold from an Aggregator: but again, once you have a fold, you can't work in parallel. By contrast, with an Aggregator, after you do the initial prepare, you can build a tree of aggregation with the semigroup (as you can do any semigroup in parallel). Additionally, if you have a commutative semigroup, you can do "map-side" aggregation, which is a huge performance improvement in hadoop using scalding or spark. Lastly, immutable datastructures sadly often give you a big performance hit in these applications. Using a semigroup, we added I realize, I am repeating myself somewhat but what your comment seems to be missing is that in a map reduce setting (e.g. spark), you can use Aggregator for parallel aggregation before doing the groupings by key. Indeed you can always lift a Semigroup into a Monoid. If you don't mind boxing. Since scala boxes so much anyway, I'm not sure this is a huge deal. If you keep Semigroup, you can push the Option to applying it, eg. |
with mutable combines is just what we had. Also used twitterFutureSemigroup for final aggregation |
@johnynek thanks for your RDD example. I don't remember the hadoop's / spark specifics but performance-wise is it the same to:
rather than
|
Also, as discussed with @BennyHill I think that the main difference between folds in origami and algebird is that there are no traversal methods at all on origami folds. This is left to the collections using the folds to implement the traversals, a bit like you do with RDDs. |
WRT the motives for my original proposal, its funny how limited they were
and how divergent from the present discussion.
I was trying to do an efficient append operation over a parametrically
typed container (underneath a Vector), and wondered "What type-class can I
bring in defines append?". Googling around the only one I found was Ed
Kmetts Reducer [
https://hackage.haskell.org/package/reducers-3.12.1/docs/Data-Semigroup-Reducer.html]
which defined cons and snoc (ie append).
I kinda assumed at the time there was some foundational maths justifying
why snoc was on the Reducer interface. But since then Ive come to the
opinion that its inclusion in the Haskell typeclass was pretty arbitrary
and perhaps a mistake; eg not all reducible containers need necessarily
have 2 "ends" to append at.
I postulate a Snoc/Append typeclass might be separately useful, but IMO it
should be separated from reduction folds.
…On Thu, Aug 24, 2017 at 1:57 AM, Eric Torreborre ***@***.***> wrote:
Also, as discussed with @BennyHill <https://github.com/bennyhill> I think
that the main difference between folds in origami and algebird is that
there are no traversal methods at all on origami folds. This is left to the
collections using the folds to implement the traversals, a bit like you do
with RDDs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAF05B28HpgWeRziadhPrqegoB2TJhikks5sbEvSgaJpZM4O-or5>
.
|
For the record, I rediscovered the idea of folds when working on specs2 and my first attempt was to use,... Reducers (after discussing with Ed Kmett :-)). I think I got the idea of packaging the functionalities I needed as monadic folds on my way back from LambdaJam 2014. Then I realised, through a post from Gabriel Gonzales, that this idea had already been described in 1993 (I think). Unfortunately I was looking for that original blog post recently but couldn't find it. |
@benhutchison I think that cons and snoc form right monoid actions. I don't know of a term for their relationship specifically in the literature, but if something is a monoid, it will always have two ends to append at, because you can "concatenate" something onto either end. |
What are the two ends of a Set?
…On Sat, Aug 26, 2017 at 3:18 AM, Edmund Noble ***@***.***> wrote:
@benhutchison <https://github.com/benhutchison> I think that cons and
snoc form right monoid actions. I don't know of a term for their
relationship specifically in the literature, but if something is a monoid,
it will always have two ends to append at, because you can "concatenate"
something onto either end.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAF05DK_pu4_4-iAfEUP2W8tKmRo1H-Yks5sbwHvgaJpZM4O-or5>
.
|
Sets are commutative monoids, so both ends are equivalent. What I mean by "always have two ends to append at" is if you have a function |
I'm closing this issue now due to the lack of activity |
Ref typelevel/cats#1360
I think having an
Aggregator
somewhere would be handy, andorigami
looks a good home.Time permitting, I would be happy to help out here.
The text was updated successfully, but these errors were encountered: