Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data.List.compareLength #257

Closed
Bodigrim opened this issue Feb 21, 2024 · 35 comments
Closed

Add Data.List.compareLength #257

Bodigrim opened this issue Feb 21, 2024 · 35 comments
Labels
approved Approved by CLC vote base-4.21 Implemented in base-4.21 (GHC 9.12)

Comments

@Bodigrim
Copy link
Collaborator

Bodigrim commented Feb 21, 2024

Let's extend Data.List / Data.List.NonEmpty with new functions:

compareLength :: [a] -> Int -> Ordering
compareLength xs n
  | n < 0 = GT
  | otherwise = foldr
    (\_ f m -> if m > 0 then f (m - 1) else GT)
    (\m -> if m > 0 then LT else EQ)
    xs
    n

compareLength :: NonEmpty a -> Int -> Ordering
compareLength xs n
  | n < 1 = GT
  | otherwise = foldr
    (\_ f m -> if m > 0 then f (m - 1) else GT)
    (\m -> if m > 0 then LT else EQ)
    xs
    n

The idea is that compareLength xs n serves as a safer and faster alternative to compare (length xs) n. Similarly, it's better to write compareLength xs 10 == LT instead of length xs < 10. While length would force and traverse the entire spine of xs (which could even diverge if xs is infinite), compareLength traverses at most n elements to determine its result.

(This deficiency of length has been long recognised by warning STAN-0103 in stan)

> compareLength [] 0
EQ
> compareLength [] 1
LT
> compareLength [`a`] 1
EQ
> compareLength [`a`, `b`] 1
GT
> compareLength [0..] 100
GT
> compareLength undefined (-1)
GT
> compareLength ('a' : undefined) 0
GT

Prior art:

The choice of name compareLength follows precedents in prior art, but also it's most natural to name a better version of (compare .) . length as compareLength.

MR can be found at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12822


  • How about generalizing from [a] and NonEmpty a to any Foldable?

    While the definition above uses only foldr and indeed works for any Foldable, it's hard to tell whether compareLength is better than (compare .) . length for a given Foldable or no. If length is O(1) then compare (length xs) n is faster than compareLength xs n which is O(n). Thus the proposal argues for adding compareLength only to containers for which length is known to be inefficient.

    Having the monomorphic Data.List.compareLength does not preclude anyone from raising a proposal to add Data.Foldable.compareLength or whatever they fancy to argue for.

  • How about generalizing from Int to any Num?

    I imagine that in the majority of cases n in compareLength xs n is a literal constant, so the change would force everyone to write an explicit type as in compareLength xs (10 :: Int), which will get very annoying soon. So let's stick to the usual convention of Int for length/indexing in Data.List.

    One could argue for genericlengthCompare xs n, polymorphic by n similar to the existing genericLength, but again, let me leave it to others to propose.

  • How about adding a rewrite rule changing compare (length xs) n to compareLength xs n?

    Such rule could affect semantics, making diverging programs terminate. While I'd expect such change to be a desirable one in the majority of cases, I'd rather leave it to a (potential) subsequent proposal.

@treeowl
Copy link

treeowl commented Feb 21, 2024

  1. Is the other argument order also seen in the wild? I would have expected the list to be last.
  2. An equally important function is the continuous extension of compare `on` length. I believe we should consider the names of both functions at the same time.

@Bodigrim
Copy link
Collaborator Author

Is the other argument order also seen in the wild? I would have expected the list to be last.

No, I cannot find any examples for Int -> [a] -> Ordering.

An equally important function is the continuous extension of compare `on` length. I believe we should consider the names of both functions at the same time.

Prior art in extra suggests comparingLength :: [a] -> [a] -> Ordering.

@treeowl
Copy link

treeowl commented Feb 21, 2024

Another function whose name I think we should have in the backs of our minds when considering this proposal is the one checking whether the length of a list is at least a particular number, and the two-list variants of that one (i.e., the extensions of (>=) `on` length and (>=) `on` length, extra lazy in the first and second argument, respectively). This zoo is out there in the wild (e.g., the GHC source code), but I don't know if anyone's come up with a nice, consistent, naming scheme.

Edit: similarly, less than and greater than.

@thielema
Copy link

similar:
https://hackage.haskell.org/package/utility-ht-0.0.17.1/docs/Data-List-Match.html#v:compareLength

A simple implementation would be:

compareLength :: [a] -> Int -> Ordering
compareLength xs n = compare (void xs) (replicate n ())

@Bodigrim
Copy link
Collaborator Author

I'm not sure I want to bikeshed naming of other similar functions right now. There is consistent prior art to use compareLength for [a] -> Int -> Ordering, and I'm happy to leave further developments to others. I do not think that such developments should steer us away from a well-attested name.

With regards to lazy analogs of length xs {<,<=,>,>=} c: while they could be marginally better than compareLength in certain scenarios, I'll leave it to others to convince CLC members that they are worth inclusion into base. IMO compareLength solves 95% of issues with performance and laziness.


compareLength xs n = compare (void xs) (replicate n ()) is neat, but foldr in the top post is much better in terms of performance. It compiles to a tight little loop without any extra allocations:

compareLength1 :: forall {a}. [a] -> Int# -> Ordering
compareLength1
  = \ (@a_s13T) (ds_s13L :: [a_s13T]) (ww_s13O :: Int#) ->
      case ds_s13L of {
        [] ->
          case <# 0# ww_s13O of {
            __DEFAULT ->
              case ww_s13O of {
                __DEFAULT -> GT;
                0# -> EQ
              };
            1# -> LT
          };
        : y_a13H ys_a13I ->
          case ># 0# ww_s13O of {
            __DEFAULT -> compareLength1 ys_a13I (-# ww_s13O 1#);
            1# -> GT
          }
      }

@parsonsmatt
Copy link

This seems like a solid improvement and follows prior art. I'm in favor.

@meooow25
Copy link

What should be the result of compareLength undefined (-1)?

For Data.ByteString.Lazy.compareLength it is GT.
For Data.Text.compareLength and Data.List.Extra.compareLength it is undefined.

In other words, should we be lazy in the list if the length is negative?

@thielema
Copy link

There seem to be three sensible functions with this name:

compareLength_L :: Int -> [a] -> Ordering
compareLength_R :: [a] -> Int -> Ordering
compareLength_C :: [a] -> [a] -> Ordering

and no clear winner. Of compareLength_L and compareLength_R I would prefer compareLength_L because most often the Int argument will be a constant, which allows for partial application like in map (compareLength 1).

I guess you want to save an import of a utility library. However the price to pay is to restrict your code to a recent version GHC. I would not add such a function to base.

@tomjaguarpaw
Copy link
Member

I'm finding it hard to understand where this would be used. Could someone point to examples in the wild?

@treeowl
Copy link

treeowl commented Feb 22, 2024

I'm finding it hard to understand where this would be used. Could someone point to examples in the wild?

I don't remember the names of the functions, or the module in which they're defined, but GHC uses several such functions. Vague recollection: try git grep lengthAtLeast.

@sjoerdvisscher
Copy link

The text and bytestring versions of this function come with an extensive set of rewrite rules, should these be added here too?

And what about generalising to Foldable and/or Num?

@hsyl20
Copy link

hsyl20 commented Feb 22, 2024

As @treeowl mentions, GHC uses:

lengthAtMost :: [a] -> Int -> Bool 
lengthIsNot :: [a] -> Int -> Bool 
lengthIs :: [a] -> Int -> Bool  
lengthAtLeast :: [a] -> Int -> Bool 
lengthExceeds :: [a] -> Int -> Bool 
lengthLessThan :: [a] -> Int -> Bool 

compareLength :: [a] -> [b] -> Ordering 
equalLength :: [a] -> [b] -> Bool 

These are used to avoid forcing lazy lists or to avoid traversing potentially long lists.

@Bodigrim
Copy link
Collaborator Author

What should be the result of compareLength undefined (-1)?
...
In other words, should we be lazy in the list if the length is negative?

@meooow25 A weak precedent is that take is lazy in the list if the length is non-positive:

> take 0 undefined
[]

Ideally I'd love to have compareLength :: [a] -> Word -> Ordering, which does not pose this kind of tricky questions, but it would be out of line with the rest of Data.List API.


I guess you want to save an import of a utility library. However the price to pay is to restrict your code to a recent version GHC. I would not add such a function to base.

@thielema this seems to be a general argument against adding anything ever to base, isn't it? Is this proposal somehow particularly aggravating?

I want base to exhibit best practices and help users to avoid common peformance / divergence pitfalls. Users can continue using utility libraries such as extra, which will conditionally re-export a definition from base, so there is no price to pay in terms of restricting to new GHCs only. We followed this pattern already for several functions, e. g., (!?) and unsnoc.


@tomjaguarpaw this proposals stems from our discussion with @noughtmare at kowainik/stan#550.

@meooow25
Copy link

@Bodigrim that makes sense. But the proposed implementation needs some changes if it is to be lazy.

@VitWW
Copy link

VitWW commented Feb 26, 2024

Since GHC uses compareLength :: [a] -> [b] -> Ordering (since things with same type only are comparable) and lengthSomething :: [a] -> Int -> Bool ,
it is good to name them:

lengthGuess :: [a] -> Int -> Ordering
-- or
guessLength :: [a] -> Int -> Ordering


compareLength :: [a] -> [b] -> Ordering 

@noughtmare
Copy link

I still like compareLength more because it is the concatenation of the two functions you would otherwise use: compare (length xs) n. I'd use comparingLength for the function that GHC calls compareLength, because that's again the concatenation of the two functions you'd use: comparing length xs ys

@Bodigrim
Copy link
Collaborator Author

@sjoerdvisscher excellent questions!

The text and bytestring versions of this function come with an extensive set of rewrite rules, should these be added here too?

Rules rewriting compare (length xs) n to compareLength xs n could change semantics, making diverging programs terminate. While I'd expect such change to be a desirable one in the majority of cases, I'd rather left it to a (potential) subsequent proposal.

In theory users might use length xs as a poor man's deepseq and expect that after compare (length xs) n the spine of xs has been forced. Is it a reasonable expectation?.. Could it be crucial property for certain programs not to explode?.. Let me leave it to someone else to explore.

And what about generalising to Foldable [...]?

While the definition above indeed works for any Foldable, it's hard to tell whether compareLength is better than (compare .) . length for a given Foldable or no. If length is O(1) then compare (length xs) n is faster than compareLength xs n which is O(n).

Note that introduction of a monomorphic Data.List.compareLength does not preclude anyone from raising a proposal to add Data.Foldable.compareLength or whatever they fancy to argue for.

I think a good addition to the proposal would be Data.List.NonEmpty.compareLength, because we know that length is inefficient for NonEmpty too.

And what about generalising to [...] Num?

It would be quite attractive for two reasons:

  • to use Word instead of Int so that negative numbers are not an option,
  • to replace compare (genericLength xs) (10 :: Int8) (which does not do what you think) with compareLength xs (10 :: Int8) (which does what you think).

Yet I imagine that in the majority of cases n in compareLength xs n is a literal constant, so the change would force everyone to write an explicit type as in compareLength xs (10 :: Int), which will be annoying soon. So I think I'd stick to monomorphic Int, continuing the notorious tradition of Data.List.

@mixphix
Copy link
Collaborator

mixphix commented Mar 22, 2024

@Bodigrim do you have an MR prepared for this function? More than once since you brought it up have I wished we already had it in base.

@Bodigrim
Copy link
Collaborator Author

Bodigrim commented Jun 9, 2024

I've updated the proposal and linked the MR.

@meooow25
Copy link

I suggest a couple of changes.

 compareLength :: [a] -> Int -> Ordering
 compareLength xs n
   | n < 0 = GT
-  | otherwise = foldr (\_ f m -> if 0 > m then GT else f (m - 1)) (compare 0) xs n
+  | otherwise = foldr (\_ f m -> if m > 0 then f (m - 1) else GT) (\m -> if m > 0 then LT else EQ) xs n
  • In the fold function, m being 0 is enough to tell us that the list is longer.
  • Given that m is never negative, the final check does not need to test for GT because it's impossible.

@Bodigrim
Copy link
Collaborator Author

@meooow25 could you possibly elaborate what's improved by the suggested change?

@meooow25
Copy link

Sure,

  • If we stop at m == 0, we have to look at one less element to conclude with a result. This is observable with an example like
    λ> foldr (\_ f m -> if 0 > m then GT else f (m - 1)) (compare 0) (1:2:undefined) 1
    *** Exception: Prelude.undefined
    ...
    λ> foldr (\_ f m -> if m > 0 then f (m - 1) else GT) (compare 0) (1:2:undefined) 1
    GT
  • compare 0 is short for (an equivalent of) \m -> if 0 < m then LT else (if 0 == m then EQ else GT). With the above change, the last branch which requires m < 0 is impossible. So the 0 == m test will always be true and can be removed.

hubot pushed a commit to ghc/ghc that referenced this issue Jun 15, 2024
`compareLength xs n` is a safer and faster alternative to `compare (length xs) n`.
The latter would force and traverse the entire spine (potentially diverging),
while the former traverses as few elements as possible.

The implementation is carefully designed to maintain as much laziness as possible.

As per haskell/core-libraries-committee#257
@Bodigrim
Copy link
Collaborator Author

@meooow25 thanks, I've update the MR. Does it look good now?

@meooow25
Copy link

Looks good, thanks!

I am wondering if it would be useful to INLINE compareLength for list fusion, but I am not convinced. Usually you would want to do something with the list after checking the length, which would kill fusion. I leave this to someone with a stronger opinion.

@Bodigrim
Copy link
Collaborator Author

My preference is to apply {-# INLINE #-} only if well justified, so I'd rather leave it to another proposal / proposer to investigate.

@Bodigrim
Copy link
Collaborator Author

I'll trigger a vote soon unless there are more comments / suggestions.

@Bodigrim
Copy link
Collaborator Author

Dear CLC members, let's vote on the proposal to add Data.List{,.NonEmpty}.compareLength :: [a] -> Int -> Ordering, which is a safer and faster alternative to compare (length xs) n. Please refer to the top post for more details, examples and prior art in existing libraries.

@hasufell @velveteer @mixphix @tomjaguarpaw @angerman @parsonsmatt


+1 from me, unsurprisingly.

@parsonsmatt
Copy link

+1, anything that reduces length on lists is a win

@velveteer
Copy link
Contributor

+1

@Bodigrim Bodigrim added the vote-in-progress The CLC vote is in progress label Jun 21, 2024
hubot pushed a commit to ghc/ghc that referenced this issue Jun 22, 2024
`compareLength xs n` is a safer and faster alternative to `compare (length xs) n`.
The latter would force and traverse the entire spine (potentially diverging),
while the former traverses as few elements as possible.

The implementation is carefully designed to maintain as much laziness as possible.

As per haskell/core-libraries-committee#257
@Bodigrim
Copy link
Collaborator Author

@mixphix @hasufell @tomjaguarpaw @angerman just a gentle reminder to vote.

@hasufell
Copy link
Member

+1

2 similar comments
@mixphix
Copy link
Collaborator

mixphix commented Jun 24, 2024

+1

@tomjaguarpaw
Copy link
Member

+1

@Bodigrim
Copy link
Collaborator Author

Thanks all, that's enough votes to approve.

@Bodigrim Bodigrim added approved Approved by CLC vote and removed vote-in-progress The CLC vote is in progress labels Jun 24, 2024
hubot pushed a commit to ghc/ghc that referenced this issue Jun 24, 2024
`compareLength xs n` is a safer and faster alternative to `compare (length xs) n`.
The latter would force and traverse the entire spine (potentially diverging),
while the former traverses as few elements as possible.

The implementation is carefully designed to maintain as much laziness as possible.

As per haskell/core-libraries-committee#257
hubot pushed a commit to ghc/ghc that referenced this issue Jun 25, 2024
`compareLength xs n` is a safer and faster alternative to `compare (length xs) n`.
The latter would force and traverse the entire spine (potentially diverging),
while the former traverses as few elements as possible.

The implementation is carefully designed to maintain as much laziness as possible.

As per haskell/core-libraries-committee#257
hubot pushed a commit to ghc/ghc that referenced this issue Jun 26, 2024
`compareLength xs n` is a safer and faster alternative to `compare (length xs) n`.
The latter would force and traverse the entire spine (potentially diverging),
while the former traverses as few elements as possible.

The implementation is carefully designed to maintain as much laziness as possible.

As per haskell/core-libraries-committee#257
hubot pushed a commit to ghc/ghc that referenced this issue Jun 27, 2024
`compareLength xs n` is a safer and faster alternative to `compare (length xs) n`.
The latter would force and traverse the entire spine (potentially diverging),
while the former traverses as few elements as possible.

The implementation is carefully designed to maintain as much laziness as possible.

As per haskell/core-libraries-committee#257
@Bodigrim Bodigrim added the base-4.21 Implemented in base-4.21 (GHC 9.12) label Jun 28, 2024
@Bodigrim
Copy link
Collaborator Author

There is a discussion what to do with the existing compareLength in extra at ndmitchell/extra#112, which might be of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved by CLC vote base-4.21 Implemented in base-4.21 (GHC 9.12)
Projects
None yet
Development

No branches or pull requests