Implement StructArrays as core for agent data #492

fbanning · 2021-04-22T16:21:03Z

StructArrays in Agents.jl?

The mission

Attempting to solve #396 via a complete reimplementation of agent data in the ABM struct by replacing the model.agents Dict with a StructArray. Current API should remain "as is" as far as possible.

The caveats

Tests mostly work fine but not all of them, of course, because so much of the underlying structure of Agents.jl has to be changed to accomodate a move away from the battle-tested dictionary approach.

Mixed models don't work right now (not the focus of this first iteration). Those tests have been commented out because the thrown errors were annoying me.

The benchmarks

Arbitrary benchmark code:

using BenchmarkTools, Agents, Random

mutable struct SAAgent <: AbstractAgent
    id::Int
    money::Float64
end

function init(n)
    Random.seed!(42)
    p = Dict(
        :abs_growth => 0.0,
    )
    m = ABM(SAAgent, properties = p)
    for i in 1:n
        add_agent!(SAAgent(i, rand(m.rng)), m)
    end
    return m
end

function m_s!(m)
    m.abs_growth = 0 
    for a in allagents(m)
        gains = a.money * rand(m.rng, [-0.005, -0.001, 0.01])
        m.abs_growth += gains
        a.money += gains
    end
end

@benchmark begin
    m = init(2_500)
    adata = [:money]
    mdata = [:abs_growth]
    run!(m, dummystep, m_s!, 2_500; adata=adata, mdata=mdata);
end

Dict:

BenchmarkTools.Trial:
  memory estimate:  3.58 GiB
  allocs estimate:  43936674
  --------------
  minimum time:     3.347 s (14.65% GC)
  median time:      3.535 s (12.17% GC)
  mean time:        3.535 s (12.17% GC)
  maximum time:     3.724 s (9.95% GC)
  --------------
  samples:          2
  evals/sample:     1

StructArrays:

BenchmarkTools.Trial:
  memory estimate:  7.12 GiB
  allocs estimate:  198792134
  --------------
  minimum time:     13.096 s (6.24% GC)
  median time:      13.096 s (6.24% GC)
  mean time:        13.096 s (6.24% GC)
  maximum time:     13.096 s (6.24% GC)
  --------------
  samples:          1
  evals/sample:     1

The conclusion

Either my implementation is borked (actually highly likely because this is a major overhaul) or it's just not more performant to use StructArrays like I tried to do here.

Eager to hear your thoughts. :)

src/core/model.jl

fbanning · 2021-04-23T12:01:24Z

Improvements have been made with this naive implementation of a Dict{Int,Int}(id => index):

BenchmarkTools.Trial: 
  memory estimate:  7.16 GiB
  allocs estimate:  201602564
  --------------
  minimum time:     10.755 s (8.20% GC)
  median time:      10.755 s (8.20% GC)
  mean time:        10.755 s (8.20% GC)
  maximum time:     10.755 s (8.20% GC)
  --------------
  samples:          1
  evals/sample:     1

fbanning · 2021-04-23T12:26:29Z

Extending on the benchmark, this one also aggregates some data across all agents and should be a better way to judge the potential benefits of directly accessing columns of the StructArray :

using BenchmarkTools, Agents, Random
using StatsBase: mean

mutable struct SAAgent <: AbstractAgent
    id::Int
    money::Float64
end

function init(n)
    Random.seed!(42)
    p = Dict(
        :abs_growth => 0.0,
        :mean_wealth => 0.0,
    )
    m = ABM(SAAgent, properties = p)
    for i in 1:n
        add_agent!(SAAgent(i, rand(m.rng)), m)
    end
    return m
end

function m_s!(m)
    m.abs_growth = 0 
    for a in allagents(m)
        gains = a.money * rand(m.rng, [-0.005, -0.001, 0.01])
        m.abs_growth += gains
        a.money += gains
    end
    m.mean_wealth = mean(a.money for a in allagents(m)) # for Dictionary
    m.mean_wealth = mean(m.agents.money) # for StructArray
end

@benchmark begin
    m = init(2_500)
    adata = [:money]
    mdata = [:abs_growth, :mean_wealth]
    run!(m, dummystep, m_s!, 2_500; adata=adata, mdata=mdata);
end

Dict:

BenchmarkTools.Trial: 
  memory estimate:  3.59 GiB
  allocs estimate:  43956686
  --------------
  minimum time:     3.820 s (10.55% GC)
  median time:      3.894 s (13.87% GC)
  mean time:        3.894 s (13.87% GC)
  maximum time:     3.967 s (17.07% GC)
  --------------
  samples:          2
  evals/sample:     1

StructArrays:

BenchmarkTools.Trial: 
  memory estimate:  7.17 GiB
  allocs estimate:  201630070
  --------------
  minimum time:     10.408 s (8.25% GC)
  median time:      10.408 s (8.25% GC)
  mean time:        10.408 s (8.25% GC)
  maximum time:     10.408 s (8.25% GC)
  --------------
  samples:          1
  evals/sample:     1

As you can see, aggregating data across all agents costs some time (about 10% increase) in the regular Agents.jl approach.
With the StructArray approach, this aggregation does not seem to have any measurable cost to it at all.

So while in this benchmark it's overall ~3x as slow, takes up twice the memory and needs 5 times as many allocations, the expected speedup of aggregation over columns is real. I'm really wondering why the memory estimate is doubled though...

AayushSabharwal · 2021-04-26T12:16:15Z

This piqued my interest. Currently, I'm a bit tight on time as my semester is coming to a close, so I haven't been able to explore the code properly. However, if I understand correctly, add/deleting an agent is an O(n) operation, since the subsequent elements have to be shifted back an index, and the corresponding entries in the Dict updated? If so, I have a couple ideas, but I'm not sure of their feasibility. Just throwing them out to see if it's valid,

To delete an agent, it could just be swapped with the last one and popped off the end? That way shifting of all higher indices is avoided, and only one (the former last index) needs to be updated.
Deletes could possibly be pooled and avoid even swapping if the indices are simply invalidated. All invalid indices are stored in a SortedSet. Any access to these indices throws an error/does whatever is appropriate. When an agent is added, it simply takes the smallest invalid index. This would maintain the sorted order of ids in the array. However, this approach will leave a lot of invalid indices if the agent population falls drastically.

It's possible I've misunderstood something. If so, please correct me.

Libbum · 2021-04-26T12:30:15Z

Decent ideas @AayushSabharwal. We're looking at any way it's possible to leverage this without causing major slow-downs in other regions of the codebase. It's all 'is this possible' at the moment, so any further ideas are welcomed.

fbanning · 2021-04-26T13:43:33Z

To delete an agent, it could just be swapped with the last one and popped off the end

Definitely feasible, I think.

Keep in mind that the current benchmark only adds 100 agents and doesn't remove any agents at all. So at least from my point of view, the process of deleting agents is of a lower priority right now.

All invalid indices are stored in a `SortedSet`

Possible, yes. However, it would leave a big memory footprint because we would have to keep the rows in the StructArray so that the indexing functions as before.
If we delete an agent, this will leave an unused row in our agent data until a new agent is added there. Furthermore, constant checks are needed to always check whether a given id is valid in the current context of the model. It's basically introducing another lookup which can be avoided by an up-to-date dict in the first place.
Or imagine an empty model where the user adds an agent with id = 1 and another one with id = 101 (for whatever reasons). Having static indices with a set of invalid indices would then require us to add 99 rows to the StructArray which are just placeholders to be filled later, whenever the user decides to add an agent with exactly this id. But the model still needs to store all these "placeholder agents".
I would rather not go down that route with respect to memory efficiency.

It's all 'is this possible' at the moment

Currently I'm inclined to say "yes, but it isn't really more performant".

Compared to our regular "id => struct" implementation, performance is bad enough as it currently is. Just lazily accessing the rows (i.e. accessing a single agent's field values via model[id].property) takes so much computational time, that it just doesn't seem sensible to opt for StructArrays instead of regular structs.
The benefit of aggregation isn't used often enough in standard ABMs (i.e. altering microdata of heterogenous agents) that it would really allow us to leverage the power of a StructArray. The downside seems just too big for me at the moment. If we can somehow avoid that downside of LazyRow access then we might actually have a trade-off that is worthy of consideration. Right now (with my naive implementation here), this doesn't seem feasible.

so any further ideas are welcome

Most definitely!

Datseris · 2021-12-28T12:32:33Z

I guess we can close this? Thanks a lot Frederik for the effort, but the conclusion is that it doesn't really offer that much benefit?

fbanning · 2021-12-31T19:18:24Z

I guess we can close this? Thanks a lot Frederik for the effort, but the conclusion is that it doesn't really offer that much benefit?

I agree because nobody has commented on this PR since April and nobody seems to have found any better ways of implementing StructArrays. It was a nice idea but sadly didn't fulfill its promise. I'll just close this draft PR.

If at some point somebody wants to dive back into this topic, I'd be very happy to chime back in. For now we got other things to focus on. :)

first iteration

5e34d81

fbanning marked this pull request as draft April 22, 2021 16:22

Libbum reviewed Apr 23, 2021

View reviewed changes

src/core/model.jl Outdated Show resolved Hide resolved

add naive id => index dict

d9d6a23

fbanning closed this Dec 31, 2021

fbanning mentioned this pull request Dec 21, 2022

Papers using ABMs in Julia that didn't use Agents.jl #683

Open

Tortar mentioned this pull request Nov 28, 2023

Multi Agent model implementation proposal #841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement StructArrays as core for agent data #492

Implement StructArrays as core for agent data #492

fbanning commented Apr 22, 2021

fbanning commented Apr 23, 2021

fbanning commented Apr 23, 2021 •

edited

Loading

AayushSabharwal commented Apr 26, 2021

Libbum commented Apr 26, 2021

fbanning commented Apr 26, 2021 •

edited

Loading

Datseris commented Dec 28, 2021

fbanning commented Dec 31, 2021

Implement StructArrays as core for agent data #492

Implement StructArrays as core for agent data #492

Conversation

fbanning commented Apr 22, 2021

StructArrays in Agents.jl?

The mission

The caveats

The benchmarks

The conclusion

fbanning commented Apr 23, 2021

fbanning commented Apr 23, 2021 • edited Loading

AayushSabharwal commented Apr 26, 2021

Libbum commented Apr 26, 2021

fbanning commented Apr 26, 2021 • edited Loading

Datseris commented Dec 28, 2021

fbanning commented Dec 31, 2021

fbanning commented Apr 23, 2021 •

edited

Loading

fbanning commented Apr 26, 2021 •

edited

Loading