Cleanup varmat multi-index to use a hashmap instead of a set #3188

SteveBronder · 2023-05-03T19:38:43Z

Submission Checklist

Run unit tests: ./runTests.py src/test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: Steve Bronder

Summary

When we first wrote the assignments for the new matrix type we didn't optimize very hard for multi_index assignments. The discourse thread here makes a case for using these so I decided to fix these up a bit.

Instead of using a set to keep track of unique indices being assigned to we just use a map with the key being the index to assign to and the value being the linear index position to access the right hand side vector / matrix. I haven't benchmarked this but it should take this from O(N * ln(N)) to O(N). It also cuts out an N sized vector we used to originally store indices. It also cuts out an if statement we had to do in the loop which should be much nicer for the CPU

Intended Effect

Make multi index assignment more performant

How to Verify

All tests still pass for multi index assignment

Side Effects

None hopefully

Documentation

No doc changes

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Steve Bronder

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

…a hashmap

…mat-multi-index

WardBrian · 2023-05-12T15:39:56Z

I'd like to see some more benchmarks on this before merging. For the model @mike-lawrence was using on the forums, I was able to re-create the following numbers

Matrix mode	Time (approximate)
`AoS`	3 seconds
`SoA` (2.32.1 release)	12 seconds
`SoA` (this branch)	8 seconds

So for a very large model with a lot of multi-indexing, this seems to be a win over the current implementation but still quite far off from the default (AoS) implementation.

There are really two questions:

Would alternative implementations (like Boost 1.82's flat_map or our own small implementation of the same) would be faster (curious, but not essential)
Are there models where this change makes multi-index assignment slower?

SteveBronder · 2023-05-15T21:07:16Z

Yes I'm going to writeup a google benchmark for this vs AoS soon. I think the big Q for me is at whether multi index assignment should always be turned off in the compiler. Those are like 3x slowdowns so I think there would need to be a lot of other work being done before SoA leads to an improvement here

bob-carpenter · 2023-05-15T21:11:53Z

I think the big Q for me is at whether multi index assignment should always be turned off in the compiler. Those are like 3x slowdowns

Given that we have to support multi-index assignment in the language. Is the issue just getting it efficient for varmat relative to matvar? How are you measuring? Also given that the advantage of varmat is downstream operation memory locality, not the assignment itself, how are you measuring?

SteveBronder · 2023-05-15T21:31:41Z

Given that we have to support multi-index assignment in the language. Is the issue just getting it efficient for varmat relative to matvar?

Yep!

How are you measuring? Also given that the advantage of varmat is downstream operation memory locality, not the assignment itself, how are you measuring?

These are Qs I don't have answers to rn but I'm thinking about. It's hard! My first thought is varying sizes of multi-index assignments followed by a large multiplication then call to normal_lpdf.

bob-carpenter · 2023-05-16T16:41:33Z

My first thought is varying sizes of multi-index assignments followed by a large multiplication then call to normal_lpdf.

Me, too. That sounds completely reasonable. Of course, based on size of the matrix, results will vary, so it'll be good to report for a range of sizes.

SteveBronder · 2023-06-07T20:01:30Z

So I've built a little repo for generally doing benchmarks with Stan and Stan math

https://github.com/SteveBronder/stan-perf

In there I have benchmarks for testing the current impl, using unordered_map, and a hash table with open addressing and linear probing.

I'm running a benchmark that does a little more work rn, but for just a plain assignment multi index is pretty slow for SoA. The current version (set) is horrific, unordered map is about twice as good, and the custom map about twice as good as unordered map.

I took just the unordered map and custom map and looked at their average performance relative to matrix. It seems like it's about a 50% slowdown even with the custom map, but imo I think as long as the user is doing just a few operations the new memory scheme should still be an improvement. I'm running some benchmarks rn to test that.

SteveBronder · 2023-06-08T14:48:29Z

Okay so after updating the tests to also do a little extra work at the end (dot product and addition) I think we should add the custom hash table.

https://github.com/SteveBronder/stan-perf/tree/main/benchmarks/varmat_multi_assign

Here are the updated raw plots where custom map is much faster after only doing a little extra work after the assignment.

And here is the ratio of times which shows the custom map being about 30% faster than matrix<var>.

@bob-carpenter are you okay with me adding a paired down version of the hashmap from here? I'm not sure what folder it would go in?

bob-carpenter · 2023-06-08T14:59:33Z

In terms of bigger picture, we want matrices to work in the case where users set every entry. In fact, I think that's a very common use case for something like GPs (and why I want to get comprehensions in ASAP). I'm actually OK having struct of arrays (I thought that was what we'd settled on?) be constant and not user settable at all. That is, if a container gets modified, use array of structs.

Are there example use cases where a Stan program only sets a few elements of a larger matrix? Fiddling with the diagonal of a matrix maybe, but that's still N ops for an N x N matrix.

@bob-carpenter are you okay with me adding a paired down version of the hashmap from here?

Totally OK adding a custom map as long as it has sufficient testing.

I'm not sure what folder it would go in?

We can create something like a new utility class folder if there isn't something like that already. I guess that'd be inside prim since this won't do any explicit autodiff, right?

SteveBronder · 2023-06-08T15:39:53Z

In terms of bigger picture, we want matrices to work in the case where users set every entry. In fact, I think that's a very common use case for something like GPs (and why I want to get comprehensions in ASAP).

Yes I agree when @WardBrian gets back I think we will talk about how to do that

I'm actually OK having struct of arrays (I thought that was what we'd settled on?) be constant and not user settable at all. That is, if a container gets modified, use array of structs.

I think it's pretty easy to add the hash table and we should be able to allow for all the subsettting besides single index in a loop

Are there example use cases where a Stan program only sets a few elements of a larger matrix? Fiddling with the diagonal of a matrix maybe, but that's still N ops for an N x N matrix.

I think sometimes people will do things with contiguous slices of a matrix, like saying operating over each column. The speed for that stuff should be fine and this change should make the rest fine as well.

@bob-carpenter are you okay with me adding a paired down version of the hashmap from here?

Totally OK adding a custom map as long as it has sufficient testing.

Alrighty yeah the hashmap I used comes from this repo (MIT) licensed. It doesn't have any tests but is a well known table so I may kill two birds with one stone and add tests here and send a PR upstream to that PR with the tests

We can create something like a new utility class folder if there isn't something like that already. I guess that'd be inside prim since this won't do any explicit autodiff, right?

I think I'd keep the folder in the model folder since I'd rather not have to go back and forth between repos and this is the only place we use it atm. If that changes then we can always put it in the math library

SteveBronder added 2 commits May 3, 2023 14:55

Cleanup varmat multi-index so that instead of using a set we can use …

6c6ed0a

…a hashmap

clang-format

4c36684

WardBrian mentioned this pull request May 9, 2023

boost 1.82 upgrade stan-dev/math#2898

Closed

Merge remote-tracking branch 'origin/develop' into feature/faster-var…

86ccc58

…mat-multi-index

WardBrian linked an issue Jul 24, 2023 that may be closed by this pull request

SoA optimization can over-promote objects used inside loops to produce slower code stan-dev/stanc3#1280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup varmat multi-index to use a hashmap instead of a set #3188

Cleanup varmat multi-index to use a hashmap instead of a set #3188

SteveBronder commented May 3, 2023 •

edited

Loading

WardBrian commented May 12, 2023

SteveBronder commented May 15, 2023

bob-carpenter commented May 15, 2023

SteveBronder commented May 15, 2023

bob-carpenter commented May 16, 2023

SteveBronder commented Jun 7, 2023

SteveBronder commented Jun 8, 2023

bob-carpenter commented Jun 8, 2023

SteveBronder commented Jun 8, 2023

Cleanup varmat multi-index to use a hashmap instead of a set #3188

Are you sure you want to change the base?

Cleanup varmat multi-index to use a hashmap instead of a set #3188

Conversation

SteveBronder commented May 3, 2023 • edited Loading

Submission Checklist

Summary

Intended Effect

How to Verify

Side Effects

Documentation

Copyright and Licensing

WardBrian commented May 12, 2023

SteveBronder commented May 15, 2023

bob-carpenter commented May 15, 2023

SteveBronder commented May 15, 2023

bob-carpenter commented May 16, 2023

SteveBronder commented Jun 7, 2023

SteveBronder commented Jun 8, 2023

bob-carpenter commented Jun 8, 2023

SteveBronder commented Jun 8, 2023

SteveBronder commented May 3, 2023 •

edited

Loading