Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ new ] Map and Set #3

Merged
merged 31 commits into from
Sep 25, 2024
Merged

[ new ] Map and Set #3

merged 31 commits into from
Sep 25, 2024

Conversation

Matthew-Mosior
Copy link
Contributor

@Matthew-Mosior Matthew-Mosior commented Aug 23, 2024

This PR adds a finite Map and Set implementation, ported from Haskell.

Map

This Map implementation is very much on par (if not a bit more performant) with SortedMap in terms of performance.

Map is faster than SortedMap for the following operations:

  • fromList (most likely stemming from a preliminary sorting test prior to generating the internal tree)
  • delete
  • update
  • lookup
  • keys
  • values

Map is slower than SortedMap for the following operation:

  • insert

Set

This Set implementation is very much on par (if not a bit more performant) with SortedSet in terms of performance.

Set is faster than SortedSet for the following operations:

  • fromList (most likely stemming from a preliminary sorting test prior to generating the internal tree)
  • delete
  • member (contains)
  • union
  • intersection
  • difference

Set is slower than SortedSet for the following operation:

  • insert

Summary

Overall, I think this PR offers solid implementations of both a map/dictionary and set data structure which both can provide great utility covering numerous use cases.

Results (Profiling)

# 1: containers/List/1
  time per run : 11.73 ns
  mean         : 45.28 ns
  r2           : 0.9993927919125739


# 2: containers/List/100
  time per run : 1.359 μs
  mean         : 1.310 μs
  r2           : 0.9999484730156484


# 3: containers/List/1000
  time per run : 13.46 μs
  mean         : 13.55 μs
  r2           : 0.9998059974521858


# 4: containers/fromListMap/1
  time per run : 19.92 ns
  mean         : 37.28 ns
  r2           : 0.9994116199259881


# 5: containers/fromListMap/100
  time per run : 6.382 μs
  mean         : 6.281 μs
  r2           : 0.9995784596151399


# 6: containers/fromListMap/1000
  time per run : 76.45 μs
  mean         : 77.43 μs
  r2           : 0.9995926135054102


# 7: containers/fromListSortedMap/1
  time per run : 28.42 ns
  mean         : 48.05 ns
  r2           : 0.9998841550283749


# 8: containers/fromListSortedMap/100
  time per run : 13.40 μs
  mean         : 12.97 μs
  r2           : 0.9997831540341396


# 9: containers/fromListSortedMap/1000
  time per run : 200.1 μs
  mean         : 205.1 μs
  r2           : 0.9996161562977962


# 10: containers/fromListSet/1
  time per run : 14.72 ns
  mean         : 31.04 ns
  r2           : 0.9999113953545874


# 11: containers/fromListSet/100
  time per run : 5.356 μs
  mean         : 5.199 μs
  r2           : 0.9998941171907879


# 12: containers/fromListSet/1000
  time per run : 67.34 μs
  mean         : 67.02 μs
  r2           : 0.9992315622085863


# 13: containers/fromListSortedSet/1
  time per run : 28.16 ns
  mean         : 45.37 ns
  r2           : 0.9993528610828151


# 14: containers/fromListSortedSet/100
  time per run : 13.10 μs
  mean         : 12.87 μs
  r2           : 0.999969094826013


# 15: containers/fromListSortedSet/1000
  time per run : 202.3 μs
  mean         : 204.6 μs
  r2           : 1.0001180741207758


# 16: containers/insertMap/10
  time per run : 604.4 ns
  mean         : 627.8 ns
  r2           : 0.9999876929544413


# 17: containers/insertSortedMap/10
  time per run : 402.6 ns
  mean         : 434.6 ns
  r2           : 0.9999333898342786


# 18: containers/insertSet/10
  time per run : 635.0 ns
  mean         : 657.6 ns
  r2           : 0.9999788903453586


# 19: containers/insertSortedSet/10
  time per run : 454.7 ns
  mean         : 486.7 ns
  r2           : 0.9999102455220619


# 20: containers/deleteMap/10
  time per run : 866.2 ns
  mean         : 871.6 ns
  r2           : 0.9999454931424272


# 21: containers/deleteSortedMap/10
  time per run : 971.7 ns
  mean         : 987.0 ns
  r2           : 0.9997588662664558


# 22: containers/deleteSet/10
  time per run : 729.3 ns
  mean         : 741.9 ns
  r2           : 0.9997890445012843


# 23: containers/deleteSortedSet/10
  time per run : 965.1 ns
  mean         : 1.022 μs
  r2           : 0.9992856831442604


# 24: containers/updateMap/10
  time per run : 1.094 μs
  mean         : 1.108 μs
  r2           : 0.9999370477949228


# 25: containers/updateSortedMap/10
  time per run : 1.621 μs
  mean         : 1.703 μs
  r2           : 0.9995988053322473


# 26: containers/lookupMap/10
  time per run : 496.3 ns
  mean         : 522.1 ns
  r2           : 0.9998662589760252


# 27: containers/lookupSortedMap/10
  time per run : 572.8 ns
  mean         : 600.5 ns
  r2           : 0.9998008089430952


# 28: containers/memberSet/10
  time per run : 380.1 ns
  mean         : 405.7 ns
  r2           : 0.9999577671697981


# 29: containers/containsSortedSet/10
  time per run : 566.6 ns
  mean         : 612.9 ns
  r2           : 0.9999350025977768


# 30: containers/keysMap/1000000
  time per run : 412.2 ms
  mean         : 410.1 ms
  r2           : 0.9999680219219297


# 31: containers/keysSortedMap/1000000
  time per run : 684.4 ms
  mean         : 675.3 ms
  r2           : 0.999572041447055


# 32: containers/valuesMap/1000000
  time per run : 414.7 ms
  mean         : 410.9 ms
  r2           : 0.9945389944902798


# 33: containers/valuesSortedMap/1000000
  time per run : 670.9 ms
  mean         : 673.7 ms
  r2           : 0.9999263720915622


# 34: containers/unionSet/1000000
  time per run : 503.3 ms
  mean         : 507.3 ms
  r2           : 0.9989775555804369


# 35: containers/unionSortedSet/1000000
  time per run : 1.047  s
  mean         : 1.043  s
  r2           : 0.9998500838041852


# 36: containers/differenceSet/1000
  time per run : 185.8 μs
  mean         : 183.8 μs
  r2           : 1.000002432289213


# 37: containers/differenceSortedSet/1000
  time per run : 397.5 μs
  mean         : 405.6 μs
  r2           : 0.9995717445025095


# 38: containers/intersectionSet/1000
  time per run : 151.0 μs
  mean         : 146.1 μs
  r2           : 0.9995860905349901


# 39: containers/intersectionSortedSet/1000
  time per run : 477.0 μs
  mean         : 487.5 μs
  r2           : 1.000514616434706

@Matthew-Mosior Matthew-Mosior changed the title [ new] Map [ new ] Map Aug 23, 2024
@Matthew-Mosior Matthew-Mosior marked this pull request as ready for review August 23, 2024 23:33
src/Data/Map/Internal.idr Outdated Show resolved Hide resolved
@stefan-hoeck
Copy link
Owner

Thanks for this! One question: How does this perform compared with Data.SortedMap in base, which is also a self-balancing sorted tree using Ord for the keys?

@Matthew-Mosior
Copy link
Contributor Author

Thanks for this! One question: How does this perform compared with Data.SortedMap in base, which is also a self-balancing sorted tree using Ord for the keys?

I haven't thoroughly tested yet, but will certainly do so, and report back here.

@Matthew-Mosior Matthew-Mosior marked this pull request as draft September 18, 2024 15:31
@Matthew-Mosior Matthew-Mosior changed the title [ new ] Map [ new ] Map and Set Sep 18, 2024
@Matthew-Mosior Matthew-Mosior marked this pull request as ready for review September 19, 2024 00:54
@Matthew-Mosior
Copy link
Contributor Author

@stefan-hoeck

This should be ready for you to look at when you have a chance.

src/Data/Map.idr Outdated
Comment on lines 367 to 374
export
find : Ord k => k -> Map k v -> v
find _ Tip = assert_total $ idris_crash "Map.!: given key is not an element in the map"
find k (Bin _ kx x l r) =
case compare k kx of
LT => find k l
GT => find k r
EQ => x
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a total function, because it can crash at runtime when the key is not in the map. We must therefore not use assert_total here: assert_total is for those cases where we know (and can proof) that a function is total but have no other way to convince Idris. This function should be partial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefan-hoeck

Awesome catch here, I totally forgot to add those annotations, they should be added now via 5586d97.

@stefan-hoeck
Copy link
Owner

This is incredible and really, really cool. The API is huge, and it's clean code and well documented. Thank you so much for doing this. I did not yet have the time to look at the whol thing in detail, (it's a lot of code), but I'm deeply impressed by the thoroughness of it.

We should have some tests for this, but I suggest the following: We should soon merge this as it is, and then we can set up some property tests in a later PR. That way, CI will run automatically, without me always clicking the "approve" button. I can setup some basic Hedgehog tests if you haven't used those before, but for some of the more intricate functions it will make sense if you come up with some test cases of your own.

Again, this is huge, so thank you very much for your contribution!

@Matthew-Mosior
Copy link
Contributor Author

Matthew-Mosior commented Sep 19, 2024

This is incredible and really, really cool. The API is huge, and it's clean code and well documented. Thank you so much for doing this. I did not yet have the time to look at the whol thing in detail, (it's a lot of code), but I'm deeply impressed by the thoroughness of it.

We should have some tests for this, but I suggest the following: We should soon merge this as it is, and then we can set up some property tests in a later PR. That way, CI will run automatically, without me always clicking the "approve" button. I can setup some basic Hedgehog tests if you haven't used those before, but for some of the more intricate functions it will make sense if you come up with some test cases of your own.

Again, this is huge, so thank you very much for your contribution!

You're welcome, really glad I could help contribute this! I'm very excited by the prospect of having these data structures at the communities' disposal now.

That sounds great, this should be in a good place to merge as of now (barring any linting issues). I really like the idea of setting up a CI pipeline, and I would definitely be interested in helping write some of the more nuanced tests.

@Matthew-Mosior
Copy link
Contributor Author

Matthew-Mosior commented Sep 19, 2024

@stefan-hoeck

Linting issues should be fixed via 953a4ce and good to merge.

@Matthew-Mosior
Copy link
Contributor Author

@stefan-hoeck

Wanted to check in and see if there was anything else needed before merging this PR.

@stefan-hoeck
Copy link
Owner

Sorry for the delay. And once again: Thanks for adding this. It looks really well designed and documented.

@stefan-hoeck stefan-hoeck merged commit 3f303a1 into stefan-hoeck:main Sep 25, 2024
2 checks passed
@Matthew-Mosior
Copy link
Contributor Author

Sorry for the delay. And once again: Thanks for adding this. It looks really well designed and documented.

No worries at all! Very excited for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants