Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distinct method on NonEmptyList and NonEmptyVector #1243

Merged
merged 2 commits into from
Aug 12, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions core/src/main/scala/cats/data/NonEmptyList.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import cats.instances.list._
import cats.syntax.order._

import scala.annotation.tailrec
import scala.collection.immutable.TreeSet
import scala.collection.mutable.ListBuffer

/**
Expand Down Expand Up @@ -106,6 +107,20 @@ final case class NonEmptyList[A](head: A, tail: List[A]) {
toList.iterator.map(A.show).mkString("NonEmptyList(", ", ", ")")

override def toString: String = s"NonEmpty$toList"

/**
* Remove duplicates. Duplicates are checked using `Order[_]` instance.
*/
def distinct(implicit O: Order[A]): NonEmptyList[A] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only question: there is an O(N^2) algorithm that uses only Eq[A] I wonder is it makes sense to also implement that?

we could use something like: https://github.com/non/algebra/blob/master/core/src/main/scala/algebra/Priority.scala

(or we could use Xor for that) and have something like:

def distinct(implicit oe: Xor[Eq[A], Order[A]]): NonEmptyList[A] = oe match {
  case Xor.Right(ord) => // do the tree set which is `O(log N)` per check
  case Xor.Left(eq) => // do a "listset" approach of checking each item, this incurs `O(N)` per check
}

If we had a Hash[A] type that potentially extended Eq[A] we could even have something like:

def distinct(implicit oe: Xor[Eq[A], Xor[Order[A], Hash[A]]]): NonEmptyList[A]

which would prefer to use hash sets, then tree sets, then list sets.

This is perhaps best served with different method names so readers can be more clear which complexity they get, but it is interesting that the semantics of the method don't care.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Priority recursive, i.e. can support > 2 implicits? Regarding hash sets and Hash typeclass, since hash sets implementations in Scala are based on Object.hashcode() it would require some wrapping of the elements, wouldn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps best served with different method names so readers can be more clear which complexity they get

I strongly agree with this statement. Especially since IntelliJ has a nasty habit of telling people that imports are unused when they are only used to bring in implicit instances. It would be really unfortunate if deleting an import brought the Hash or Order instance out of scope and some code went from O(n) to O(n^2).

I don't feel a strong need to add an O(n^2) version that requires Eq[A] instead of Order[A], because O(n^2) operations are often impractical, and I suspect that most of the cases in which you want to do something like this you probably can have an Order[A] available. I could definitely see some value in a hash-based approach, but that's a higher barrier given what's currently in Cats.

Personally I'm pretty happy to go forward with this approach and leave open the possibility of another approach in the future.

implicit val ord = O.toOrdering

val buf = ListBuffer.empty[A]
tail.foldLeft(TreeSet(head)) { (elementsSoFar, a) =>
if (elementsSoFar(a)) elementsSoFar else { buf += a; elementsSoFar + a }
}

NonEmptyList(head, buf.toList)
}
}

object NonEmptyList extends NonEmptyListInstances {
Expand Down
16 changes: 15 additions & 1 deletion core/src/main/scala/cats/data/NonEmptyVector.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ package cats
package data

import scala.annotation.tailrec
import scala.collection.immutable.VectorBuilder
import scala.collection.immutable.{TreeSet, VectorBuilder}
import cats.instances.vector._

/**
Expand Down Expand Up @@ -130,6 +130,20 @@ final class NonEmptyVector[A] private (val toVector: Vector[A]) extends AnyVal {
def length: Int = toVector.length

override def toString: String = s"NonEmpty${toVector.toString}"

/**
* Remove duplicates. Duplicates are checked using `Order[_]` instance.
*/
def distinct(implicit O: Order[A]): NonEmptyVector[A] = {
implicit val ord = O.toOrdering

val buf = Vector.newBuilder[A]
tail.foldLeft(TreeSet(head)) { (elementsSoFar, a) =>
if (elementsSoFar(a)) elementsSoFar else { buf += a; elementsSoFar + a }
}

NonEmptyVector(head, buf.result())
}
}

private[data] sealed trait NonEmptyVectorInstances {
Expand Down
6 changes: 6 additions & 0 deletions tests/src/test/scala/cats/tests/NonEmptyListTests.scala
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,12 @@ class NonEmptyListTests extends CatsSuite {
(i :: nel).toList should === (i :: nel.toList)
}
}

test("NonEmptyList#distinct is consistent with List#distinct") {
forAll { nel: NonEmptyList[Int] =>
nel.distinct.toList should === (nel.toList.distinct)
}
}
}

class ReducibleNonEmptyListCheck extends ReducibleCheck[NonEmptyList]("NonEmptyList") {
Expand Down
6 changes: 6 additions & 0 deletions tests/src/test/scala/cats/tests/NonEmptyVectorTests.scala
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,12 @@ class NonEmptyVectorTests extends CatsSuite {
test("Cannot create a new NonEmptyVector[Int] from apply with a an empty Vector") {
"val bad: NonEmptyVector[Int] = NonEmptyVector(Vector.empty[Int])" shouldNot compile
}

test("NonEmptyVector#distinct is consistent with Vector#distinct") {
forAll { nonEmptyVector: NonEmptyVector[Int] =>
nonEmptyVector.distinct.toVector should === (nonEmptyVector.toVector.distinct)
}
}
}

class ReducibleNonEmptyVectorCheck extends ReducibleCheck[NonEmptyVector]("NonEmptyVector") {
Expand Down