Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve {Set,Map}.fromDistinct{Asc,Desc}List #950

Merged
merged 5 commits into from
Jun 24, 2023

Conversation

meooow25
Copy link
Contributor

As discussed in #949, this is a different implementation of the current strategy that is faster and is a good consumer of the input list. However there is an increase in allocations in the no-fusion case.

Benchmarks

On GHC 9.2.5:

Before
Set
  fromDistinctAscList:         OK (0.15s)
    37.7 μs ± 2.9 μs, 159 KB allocated, 3.1 KB copied, 8.0 MB peak memory
  fromDistinctAscList:fusion:  OK (0.12s)
    58.7 μs ± 5.6 μs, 448 KB allocated,  12 KB copied, 8.0 MB peak memory
  fromDistinctDescList:        OK (0.16s)
    38.1 μs ± 3.6 μs, 159 KB allocated, 3.1 KB copied, 8.0 MB peak memory
  fromDistinctDescList:fusion: OK (0.26s)
    61.5 μs ± 3.1 μs, 480 KB allocated,  13 KB copied, 8.0 MB peak memory

Map
  fromDistinctAscList:         OK (0.18s)
    41.1 μs ± 2.8 μs, 191 KB allocated, 4.5 KB copied, 9.0 MB peak memory
  fromDistinctAscList:fusion:  OK (0.14s)
    67.0 μs ± 5.4 μs, 574 KB allocated,  17 KB copied, 9.0 MB peak memory
  fromDistinctDescList:        OK (0.16s)
    40.2 μs ± 2.7 μs, 191 KB allocated, 4.5 KB copied, 9.0 MB peak memory
  fromDistinctDescList:fusion: OK (0.15s)
    70.8 μs ± 6.1 μs, 608 KB allocated,  19 KB copied, 9.0 MB peak memory

After

Set
  fromDistinctAscList:         OK (0.21s)
    26.1 μs ± 1.5 μs, 224 KB allocated, 4.4 KB copied, 8.0 MB peak memory, 30% less than baseline
  fromDistinctAscList:fusion:  OK (0.22s)
    25.6 μs ± 1.5 μs, 288 KB allocated, 7.7 KB copied, 8.0 MB peak memory, 56% less than baseline
  fromDistinctDescList:        OK (0.22s)
    25.9 μs ± 1.3 μs, 224 KB allocated, 4.4 KB copied, 8.0 MB peak memory, 31% less than baseline
  fromDistinctDescList:fusion: OK (0.20s)
    26.0 μs ± 1.4 μs, 288 KB allocated, 7.9 KB copied, 8.0 MB peak memory, 57% less than baseline

Map
  fromDistinctAscList:         OK (0.14s)
    34.1 μs ± 3.0 μs, 271 KB allocated, 6.5 KB copied, 9.0 MB peak memory, 17% less than baseline
  fromDistinctAscList:fusion:  OK (0.11s)
    30.2 μs ± 3.0 μs, 336 KB allocated,  10 KB copied, 9.0 MB peak memory, 54% less than baseline
  fromDistinctDescList:        OK (0.14s)
    33.5 μs ± 2.9 μs, 271 KB allocated, 6.5 KB copied, 9.0 MB peak memory, 16% less than baseline
  fromDistinctDescList:fusion: OK (0.13s)
    31.2 μs ± 2.9 μs, 336 KB allocated,  10 KB copied, 9.0 MB peak memory, 55% less than baseline

I also decided to try the latest GHC 9.6.2 and it turns out to be better there.

Before
Set
  fromDistinctAscList:         OK (0.15s)
    35.3 μs ± 3.2 μs, 159 KB allocated, 3.2 KB copied, 8.0 MB peak memory
  fromDistinctAscList:fusion:  OK (0.23s)
    55.0 μs ± 3.0 μs, 448 KB allocated,  12 KB copied, 8.0 MB peak memory
  fromDistinctDescList:        OK (0.15s)
    36.1 μs ± 2.7 μs, 159 KB allocated, 3.2 KB copied, 8.0 MB peak memory
  fromDistinctDescList:fusion: OK (0.24s)
    56.6 μs ± 2.8 μs, 480 KB allocated,  13 KB copied, 8.0 MB peak memory

Map
  fromDistinctAscList:         OK (0.17s)
    40.9 μs ± 3.6 μs, 191 KB allocated, 4.4 KB copied, 9.0 MB peak memory
  fromDistinctAscList:fusion:  OK (0.13s)
    64.5 μs ± 5.7 μs, 574 KB allocated,  17 KB copied, 9.0 MB peak memory
  fromDistinctDescList:        OK (0.17s)
    40.1 μs ± 2.9 μs, 191 KB allocated, 4.4 KB copied, 9.0 MB peak memory
  fromDistinctDescList:fusion: OK (0.14s)
    68.0 μs ± 5.6 μs, 608 KB allocated,  19 KB copied, 9.0 MB peak memory

After:

Set
  fromDistinctAscList:         OK (0.16s)
    19.7 μs ± 1.4 μs, 224 KB allocated, 4.4 KB copied, 8.0 MB peak memory, 44% less than baseline
  fromDistinctAscList:fusion:  OK (0.13s)
    17.9 μs ± 1.4 μs, 288 KB allocated, 7.7 KB copied, 8.0 MB peak memory, 67% less than baseline
  fromDistinctDescList:        OK (0.16s)
    20.1 μs ± 1.5 μs, 224 KB allocated, 4.4 KB copied, 8.0 MB peak memory, 44% less than baseline
  fromDistinctDescList:fusion: OK (0.16s)
    18.5 μs ± 1.5 μs, 288 KB allocated, 7.9 KB copied, 8.0 MB peak memory, 67% less than baseline

Map
  fromDistinctAscList:         OK (0.22s)
    26.3 μs ± 1.9 μs, 272 KB allocated, 6.6 KB copied, 9.0 MB peak memory, 35% less than baseline
  fromDistinctAscList:fusion:  OK (0.18s)
    21.1 μs ± 1.5 μs, 336 KB allocated,  10 KB copied, 9.0 MB peak memory, 67% less than baseline
  fromDistinctDescList:        OK (0.22s)
    26.1 μs ± 1.7 μs, 272 KB allocated, 6.6 KB copied, 9.0 MB peak memory, 34% less than baseline
  fromDistinctDescList:fusion: OK (0.18s)
    21.7 μs ± 1.4 μs, 336 KB allocated,  10 KB copied, 9.0 MB peak memory, 68% less than baseline

Closes #949.

A faster and fusion-friendly implemention of the current strategy.

On GHC 9.2.5:
For Set this takes 56% less time when there is fusion and 30% when not.
For Map this takes 55% less time when there is fusion and 16% when not.
@meooow25
Copy link
Contributor Author

meooow25 commented Jun 3, 2023

Hi @treeowl, do you mind taking a look at this?

@treeowl
Copy link
Contributor

treeowl commented Jun 3, 2023

I haven't forgotten (yet)! I will look at it as soon as I can.

@meooow25
Copy link
Contributor Author

Hi, did you get a chance to take a look?

@@ -3410,8 +3413,7 @@ instance (Ord k) => GHCExts.IsList (Map k v) where
-- If the list contains more than one value for the same key, the last value
-- for the key is retained.
--
-- If the keys of the list are ordered, linear-time implementation is used,
-- with the performance equal to 'fromDistinctAscList'.
-- If the keys of the list are ordered, a linear-time implementation is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the same approach (and share code) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a rough attempt but it performed worse for the not-sorted case. I'll try to see if it can be improved, but it'd probably be best in a separate PR.

linkAll (State0 stk) = foldl'Stack (\l x r -> link x l r) Tip stk
linkAll (State1 l0 stk) = foldl'Stack (\l x r -> link x l r) l0 stk

{-# INLINE fromDistinctDescList #-} -- INLINE for fusion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you break this up to reduce the amount of code that needs to be inlined at each call site? I expect the pieces just need to be INLINABLE to specialize, and the top level bit should probably do with that as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next surely needs to be inlined but maybe we can make linkTop and linkAll top level inlinable and let GHC decide. I'll see if that affects bechmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Things become worse if linkTop is not INLINABLE, but with it I don't see any changes in benchmark results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@treeowl does the change look good?

@treeowl
Copy link
Contributor

treeowl commented Jun 15, 2023

I'm sorry I've been so slow. I finally added some review comments.

And leave further optimization to GHC.
@treeowl
Copy link
Contributor

treeowl commented Jun 24, 2023

Let's do it. Thanks!

@treeowl treeowl merged commit 48196fb into haskell:master Jun 24, 2023
@meooow25 meooow25 deleted the from-distinct-mono-list branch July 14, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fusible Set.fromDistinctAscList definition
2 participants