-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Monoid constraint by CommutativeSemigroup in the reduce syntax #203
Conversation
Hi @alonsodomin, could you bump cats-effect to 0.5 and cats-mtl to the next version (after typelevel/cats-mtl#15 is merged and released)? |
rdd.csum shouldBe 1.to(20).zip(1.to(20)).toMap | ||
} | ||
// property("rdd Map[Int,Int] monoid example") { | ||
// import frameless.cats.implicits._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Map
instance was moved to alleycats
in typelevel/cats@1f0cba0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alonsodomin, can we bring this back based on @iravid suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @imarios, sorry for leavin this unattended. I just pushed an upgrade of the cats-mtl
package.
There is still the problem with the Map
test as there is no CommutativeSemigroup
instance possible of an arbitrary Map
, there could be one for SortedMap
but cats
(and alleycats
) do not provide with one.
You can see @iravid comment below in which he acknowledges that he was thinking of the Traversable
instance, not the CommutativeSemigroup
one. So, in this case, if frameless
is interested on enforcing correctness for those methods, probably it's better to drop the test for the Map
... either that, or provide to alleycats
a CommutativeSemigroup
instance for SortedMap
and wait until it's merged...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we replace Map with SortedMap? Will we be able to have this test back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @imarios, just made a PR to cats
adding a CommutativeMonoid
instance for SortedMap
typelevel/cats#2047. Once it gets merged I can recover the commented out tests in here.
Hi @iravid, I'll bump In the other hand Besides all of this. The tests do not seem to exercise the |
Codecov Report
@@ Coverage Diff @@
## master #203 +/- ##
==========================================
+ Coverage 95.71% 96.18% +0.47%
==========================================
Files 39 52 +13
Lines 793 944 +151
Branches 11 18 +7
==========================================
+ Hits 759 908 +149
- Misses 34 36 +2
Continue to review full report at Codecov.
|
@alonsodomin - entirely right re |
@iravid anything else you want to add for this PR? If not I can merge. Thank you @alonsodomin |
Looks great to me @imarios, thanks @alonsodomin! |
Something is failing in |
@OlivierBlanvillain yeah, it's an example about reducing an RDD with a |
Ah, cool, so this PR will bump us to cats 1.0.0 :-) |
I definitely like adding a However, moving from def csum(implicit m: CommutativeMonoid[A]): A = lhs.fold(m.empty)(_ |+| _)
// we may get better performance out of using `reduce` and catching
// `UnsupportedOperation` exceptions for empty RDDs
def csumOption(implicit m: CommutativeSemigroup[A]): Option[A] =
lhs.fold[Option[A]](None)((acc, a) => Some(acc.fold(a)(_ |+| a))) |
I do like the idea of avoiding exceptions, not sure about significantly performance improvements since most of the overhead is comming from the Spark runtime. But if the other members are onboard the change is quite straightforward. Would we also be interested in avoiding exceptions in the other |
@alonsodomin I'd love to see those I should probably add the disclaimer that everything that I say should be taken with a grain of salt, because I currently use neither Spark nor Frameless, but I think that I'll be using both in the near future :) |
@alonsodomin I believe that RC2 should be released now. Are you still available to give it a try? |
@ceedubs yeah, will give another try. |
The build finally passes now. This is still using the I do like @ceedubs proposal of having |
def csumOption(implicit m: CommutativeSemigroup[A]): Option[A] = | ||
lhs.aggregate[Option[A]](None)( | ||
(acc, a) => Some(acc.fold(a)(_ |+| a)), | ||
(l, r) => l.fold(r)(x => r.map(_ |+| x) <+> Some(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be reading this wrong, but this seems incorrect to me. Isn't x
getting added in twice in the case where both l
and r
are Some
?
If so, it's concerning that unit tests didn't catch this. Maybe some places that are calling toRDD
on a List
should be calling parallelize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably replace the <+>
operator by orElse
since I'm basically relying on the fact that the implementation of <+>
for Option
is that one.
In essence, I just want to express the Alternative
between the two Option
(<|>
) but in cats this is implemented based on MonoidK
, and therefore <|>
becomes <+>
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right that makes sense. I think that I'm one of the people who advocated for doing this in Cats, so I probably should have realized what was going on here :P
It may be more straightforward to just use orElse
, but if it works the right way then my main concern is gone. Thanks for the explanation!
I'm not a frameless maintainer, but for what it's worth, I've reviewed this and it looks like good stuff to me! |
@alonsodomin LGTM! Sorry for this taking so long. Did you cover everything you wanted from your side? If not, then I can merge. |
@imarios yeah, this is complete as far as I'm concerned. |
After the release of Cats 1.0.0-RC1,
CommutativeSemigroup
instances for tuples can be derived providing there is aCommutativeSemigroup
for the members of the tuples, which unblocks #117.Now, by changing the restriction from
Monoid
toCommutativeSemigroup
one of the tests is no longer correct, the one reducing anRDD[Map[Int, Int]]
into aMap[Int, Int]
because there is not aCommutativeSemigroup
for Maps.The test has been commented out and I'm submitting this for review and discussion.
Fixes #117