-
Notifications
You must be signed in to change notification settings - Fork 72
Add lazyZip operation (formerly zipWith) #223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the impact on performance if we implement zip
in terms of zipWith
, to avoid some code duplication?
I was doing that at this very moment :D. It's gonna take some time until the benchmarks complete. I'll keep you posted |
Oh well, I implemented this as well and literally wrote the exact same code. Nice work! |
Note:
@julienrf, the results look pretty much the same. I have the changes on a separate branch: marcelocenerine/collection-strawman@master...marcelocenerine:zip_in_terms_of_zipWith . Shall I apply them to this PR? |
IMO |
|
In Haskell it makes more sense because the mapping function comes first (due to currying conventions), which means that In Scala, however, Precedents in Scala that would match Haskell's way of thinking use |
I think we need to call it |
What do you mean by that? In Haskell zipWithIndex can be implemented as a special case of zip (which is a special case of zipWith): zipWithIndex xs = zip xs [0..((length xs) - 1)] |
The |
if you call this trait Coll[A] {
def zipBy[B](f:A=>B):Coll[(A,B)] = this map { it => (it, f(it)) }
} |
Thanks @marcelocenerine for running the benchmarks :) For me it makes sense to implement About the name… So far the proposals have been the following: xs.zipTo(ys)(_ + _)
xs.zipBy(ys)(_ + _)
xs.zipMap(ys)(_ + _)
xs.zipWith(ys)(_ + _)
xs.mapWith(ys)(_ + _) It’s unfortunate that the “with” doesn’t introduces the same thing as in Haskell. Therefore I don’t see a strong reason to follow the same naming convention. However the name |
Sounds good, @julienrf. I cherry-picked the commit and updated this PR |
@@ -172,12 +172,13 @@ sealed abstract class LazyList[+A] | |||
else prefix.lazyAppendAll(nonEmptyPrefix.tail.flatMap(f)) | |||
} | |||
|
|||
override final def zip[B](xs: collection.Iterable[B]): LazyList[(A, B)] = | |||
override final def zip[B](xs: collection.Iterable[B]): LazyList[(A, B)] = zipWith(xs)((_, _)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to override it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! After rewriting zip
in terms of zipWith
, there's no need to override zip
here.
@@ -76,14 +76,16 @@ class ImmutableArray[+A] private[collection] (private val elements: scala.Array[ | |||
ImmutableArray.fromIterable(View.Concat(xs, toIterable)) | |||
} | |||
|
|||
override def zip[B](xs: collection.Iterable[B]): ImmutableArray[(A, B)] = | |||
override def zip[B](xs: collection.Iterable[B]): ImmutableArray[(A, B)] = zipWith(xs)((_, _)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this override necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Thanks @marcelocenerine. Could you add some tests too? (in the |
9ee2026
to
207572c
Compare
207572c
to
b74d6be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @marcelocenerine! Overall that looks great, I’ve left a few comments, mainly about documentation or typos.
Could you also fix the merge conflicts?
@@ -835,9 +835,24 @@ trait IterableOps[+A, +CC[X], +C] extends Any { | |||
* corresponding elements of this $coll and `that`. The length | |||
* of the returned collection is the minimum of the lengths of this $coll and `that`. | |||
*/ | |||
def zip[B](xs: Iterable[B]): CC[(A @uncheckedVariance, B)] = fromIterable(View.Zip(toIterable, xs)) | |||
def zip[B](xs: Iterable[B]): CC[(A @uncheckedVariance, B)] = zipWith(xs)((_, _)) | |||
|
|||
// sound bcs of VarianceNote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is really related to the @uncheckedVariance
above, you should not insert a new line in between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooops....will do
* @param f The function to apply to each pair of elements | ||
* @tparam B the type of the elements in the second half of the combined pairs | ||
* @tparam R the type of the elements in the resulting collection | ||
* @return a new collection of type `That` containing the results of applying the given function `f` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use That
, that’s a left over of the standard collections based on CanBuildFrom
:)
We should just say “a new $Coll containing …”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also fix zip
’s scaladoc at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's good to know. I was in doubt about it. I guess you meant $Coll
with lowercase c
, right?
sure, I can ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, $coll
with lowercase c
is better, you’re right.
* to each pair of corresponding elements of this $coll and `that`. The length | ||
* of the returned collection is the minimum of the lengths of this $coll and `that`. | ||
*/ | ||
def zipWith[B, R](xs: Iterable[B])(f: (A, B) => R): CC[R] = fromIterable(View.ZipWith(toIterable, xs, f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rename the xs
parameter to that
, to match the scaladoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
case class Zip[A, B](underlying: Iterable[A], other: Iterable[B]) extends View[(A, B)] { | ||
def iterator() = underlying.iterator().zip(other) | ||
/** A view that generalizes the zip operation by applying a function to each pair of elements | ||
* in the underlying collection and another collection or iterator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove “or iterator” since this is not anymore the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
val array: ArrayOps[Int] = Array(1, 2, 3) | ||
val result: Array[(Int, String)] = array.zip(List("a", "b", "c")) | ||
|
||
assertArrayEquals(Array((1, "a"), (2, "b"), (3, "c")).asInstanceOf[Array[AnyRef]], result.asInstanceOf[Array[AnyRef]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn’t it working if you omit the asInstanceOf
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it does not compile:
overloaded method value assertArrayEquals with alternatives:
[error] (x$1: Array[Long],x$2: Array[Long])Unit <and>
[error] (x$1: Array[Int],x$2: Array[Int])Unit <and>
[error] (x$1: Array[Short],x$2: Array[Short])Unit <and>
[error] (x$1: Array[Char],x$2: Array[Char])Unit <and>
[error] (x$1: Array[Byte],x$2: Array[Byte])Unit <and>
[error] (x$1: Array[Object],x$2: Array[Object])Unit
[error] cannot be applied to (Array[(Int, String)], Array[(Int, String)])
[error] assertArrayEquals(Array((1, "a"), (2, "b"), (3, "c")), result)
[error] ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you replace that with assertTrue(Array((1, "a"), ...).equals(result))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I can do this but with sameElements
so that these asserts become a bit shorter.
@@ -17,4 +17,60 @@ class TreeSetTest { | |||
assertEquals(set, set drop Int.MinValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we need these tests since TreeSet
inherits the default implementation, which is already tested in IterableViewLikeTest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TreeSet
has overloaded zip
and zipWith
methods inherited from SortedSet
. These tests are testing the overloaded methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, that’s right.
My proposal would be to find a way to have any additional methods sit somewhere where we can easily decide where to put them in the end. There is a case for each individual method, but there is also a clear case against adding dozens of new methods. The trick is to balance one against the other. This general comment applies to all additions of new methods. In the case of
|
b74d6be
to
b9ff618
Compare
Wow, today I learn about Maybe it’s just me but I really think the discoverability of this One way to make things more discoverable would be to make |
@marcelocenerine @Ichoran @szeiger Do you have an opinion about the |
The problem with
|
No, sorry I wasn’t clear: (xs viewWith ys).map(binaryFunction) == (xs, ys).zipped.map(binaryFunction)
(xs viewWith ys).forall(binaryPredicate) == (xs, ys).zipped.forall(binaryPredicate)
(xs viewWith ys viewWith zs).filter(ternaryPredicate) == (xs, ys, zs).zipped.filter(ternaryPredicate) We would support |
@julienrf Ah, I misunderstood indeed. But then there's still the problem that it's binary only. I have used |
I prefer staying with |
My point is that it’s hard to know that it exists and that a
(xs1 wiewWith xs2 viewWith xs3 viewWith xs4).filter((x1, x2, x3, x4) => x1 == x2 && x3 != x4) |
def transform_zipWithIndex(bh: Blackhole): Unit = bh.consume(xs.zipWithIndex) | ||
|
||
@Benchmark | ||
def transform_lazyZip(bh: Blackhole): Unit = bh.consume(xs.lazyZip(xs).map((_, _))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add more benchmarks to compare xs.lazyZip(ys).map(f)
with xs.zip(ys).map(f.tupled)
?
* {{{ | ||
* val xs = $Coll(1, 2, 3) | ||
* val res = (xs lazyZip xs lazyZip xs lazyZip xs).map((a, b, c, d) => a + b + c + d) | ||
* // res == $Col(4, 8, 12) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: $Coll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe you should just use List
here.
* @return a decorator `Tuple2Zipped` that allows strict operations to be performed on the lazily evaluated pairs | ||
* or chained calls to `lazyZip`. Implicit conversion to `Iterable[(A, B)]` is also supported. | ||
*/ | ||
def lazyZip[B, C2[X] <: Iterable[X]](that: C2[B]): Tuple2Zipped[A, C1[A], B, C2[B]] = new Tuple2Zipped(`this`, that) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention we use the CC
identifier for Collection type Constructors, while C
is used for Collection types. BTW, I’m wondering, here, what’s the advantage of using a collection type constructor over a collection type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m wondering, here, what’s the advantage of using a collection type constructor over a collection type?
scala> import strawman.collection._
import strawman.collection._
scala> import strawman.collection.immutable._
import strawman.collection.immutable._
scala> val xs: List[Int] = List(1, 2, 3)
xs: strawman.collection.immutable.List[Int] = List(1, 2, 3)
scala> xs.lazyZip(xs)
<console>:19: error: value lazyZip is not a member of strawman.collection.immutable.List[Int]
xs.lazyZip(xs)
^
Using collection type in TupleNZipped.lazyZip
:
scala> xs.lazyZip(xs).lazyZip(xs)
<console>:19: error: inferred type arguments [Nothing,strawman.collection.immutable.List[Int]] do not conform to method lazyZip's type parameter bounds [B,C3 <: strawman.collection.Iterable[B]]
xs.lazyZip(xs).lazyZip(xs)
^
<console>:19: error: type mismatch;
found : strawman.collection.immutable.List[Int]
required: C3
xs.lazyZip(xs).lazyZip(xs)
^
I got confused with this one as well. I started with the signature suggested in one of your last comments, but then I faced these errors. This was pretty much the reason I used implicit conversions in the first iteration. Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok I see. Thanks for the explanation!
def next() = (elems1.next(), elems2.next()) | ||
} | ||
|
||
def className = getClass.getName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure the class name Tuple2Zipped
is very useful. What do you think of implementing toString
as follows?
def toString = s"$coll1.lazyZip($coll2)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do
object Tuple2Zipped { | ||
implicit def tuple2ZippedToIterable[El1, C1 <: Iterable[El1], | ||
El2, C2 <: Iterable[El2]](zipped2: Tuple2Zipped[El1, C1, El2, C2]): Iterable[(El1, El2)] = | ||
new View[(El1, El2)] { def iterator() = zipped2.iterator() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should be more explicit and also put View[(El1, El2)]
as the static return type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do
def tuple4Zipped_empty(): Unit = { | ||
assertTrue(zipped3.lazyZip(List.empty).isEmpty) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also check that TreeSet(1, 2, 3).lazyZip(List(4, 5, 6)).map((x, y) => x + y)
returns a TreeSet
?
And that TreeMap(1 -> "foo", 2 -> "bar").lazyZip(List("baz", "bah")).map { case ((k, v), s) => k -> (v ++ s) }
also returns a TreeMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not work well with TreeMap
:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here doesn't only concern TreeMap
but Map
s in general. I guess the reason is the same for having overloaded methods in Map
(e.g. map
, flatMap
, concat
).
Having a more specific implicit conversion in the Map
companion object may address this:
implicit class LazyZipOps[K, V, CC1[K, V] <: Map[K, V]](`this`: CC1[K, V]) {
def lazyZip[B, CC2[B] <: Iterable[B]](that: CC2[B]): Tuple2Zipped[(K, V), CC1[K, V], B, CC2[B]] = new Tuple2Zipped(`this`, that)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that’s a good idea.
Also, I think we don’t need to keep track of the precise type of the zipped collections: we just need C1
(to drive the type of the collection resulting from a transformation operation applied to the zipped collections), but we never need C2
, C3
or C4
.
So, I think we could simplify things as follows:
// TupleZipped.scala
final class Tuple2Zipped[El1, El2, C1 <: Iterable[El1]](coll1: C1, coll2: Iterable[El2]) { … }
final class Tuple3Zipped[El1, El2, El3, C1 <: Iterable[El1]](coll1: C1, coll2: Iterable[El2], coll3: Iterable[El3]) { … }
// same for Tuple4Zipped
class LazyZipOps[A, C1 <: Iterable[A]](`this`: C1) {
def lazyZip[B](that: Iterable[B]): Tuple2Zipped[A, B, C1] = new Tuple2Zipped[A, B, C1](`this`, that)
}
// Iterable.scala
object Iterable {
implicit def toLazyZipOps[CC[X] <: Iterable[X], A](it: CC[A]): LazyZipOps[A, CC[A]] = new LazyZipOps(it)
}
// Map.scala
object Map {
implicit def toLazyZipOps[CC[X, Y] <: Iterable[(X, Y)], K, V](it: CC[K, V]): LazyZipOps[(K, V), CC[K, V]] = new LazyZipOps(it)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I was considering doing that too. The collection types were needed in previous iterations when filter
was equivalent to (xs zip ys).filter(f.tupled).unzip
, but now I can get rid of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great, thanks! I’ve left some comments.
Hey @marcelocenerine what’s your status on this work? This really looks great but needs a few tweaks. If you are busy with other things I can try to finish it. Just let me know what works best for you :) |
Hi @julienrf , sorry for the delay. I was away last week and didn't find time to address all the comments yet.
I left a comment regarding the latter. Please see what you think. I believe I can finish it and update the PR this evening. But if you need it earlier than that, please feel free to cherry pick my commit and work on top of that. I can push the changes I have made so far... |
OK, thanks for your detailed response! So, I will let you finish :) |
545a971
to
83b2836
Compare
@julienrf, all the review comments are now addressed. Many thanks for your suggestions. I ended up renaming the I'm running the benchmarks again and will post the results as soon as they complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome contribution @marcelocenerine, thanks a lot!
import scala.{Boolean, StringContext, Unit} | ||
import scala.language.implicitConversions | ||
|
||
final class LazyZipOps[A, C1 <: Iterable[A]] private[collection](`this`: C1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even make it a value class (even though I bet the saving wouldn’t be significant at all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very easy change, so I think it's worth doing it despite how much saving we eventually get.
Please find attached the benchmark results comparing |
Thanks Marcelo. The benchmark results are interesting. For example, for |
@marcelocenerine That looks great to me, could you please squash your commits into a single one so that I can merge the PR? |
87d45b5
to
4dcda79
Compare
I figured there was a problem there. The views returned by My commits are now squashed. |
|
||
object LazyZip2 { | ||
implicit def lazyZip2ToIterable[El1, El2](zipped2: LazyZip2[El1, El2, _]): View[(El1, El2)] = | ||
new View[(El1, El2)] { def iterator() = zipped2.iterator() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also override knownSize
here, then:
override def knownSize = zipped2.coll1.knownSize min zipped2.coll2.knownSize
And same for other arities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops. done
4dcda79
to
7d1f06d
Compare
Woohoo, thanks a lot Marcelo! |
Changes made for #221:
zipWith
method tocollection.IterableOps
andcollection.Iterator
ZipWith
zipWith
method inImmutableArray
andLazyList
to be consistent with thezip
operationzipWith
method incollection.ArrayOps
andcollection.SortedSet
to be consistent with thezip
operationzipWith
,zip
andzipWithIndex
(the latter two were missing)It was not clear to me how the unit tests are organized. I wrote a bunch of them for the main collection types in both immutable and mutable packages to make sure my implementation works but did not include them in this PR.