Faster agent attribute collection #576

Corvince · 2018-08-23T10:26:52Z

Addresses #575

This implementation adds a fast-track for collecting agent attributes if all reporters are string-based (no custom or lambda functions). If no string-based reporters are present or for a mixture of reporters it defaults to a only slightly improved version of the current implementation.

I also changed the structure of the class variable agent_vars, to store less data, it specifically stores the agent.unique_id only once and not for every reporter. This breaks code that relies on that variable, although none of the examples do.

One idea to encourage the use of string-based reporters for agent reporters would be to include a warning if an agent reporter is not string-based and tell users that they could benefit from switching to string-based reporters (or even that support for non string-based reporters will be dropped in a much later version of mesa).

Hope you like it

Only significantly faster if all agent reporters are attributes

coveralls · 2018-08-23T10:29:50Z

Coverage increased (+0.1%) to 80.693% when pulling 4ebbb0e on Corvince:FastDC into d662b7a on projectmesa:master.

coveralls · 2018-08-23T10:29:50Z

Coverage increased (+0.1%) to 80.693% when pulling 4ebbb0e on Corvince:FastDC into d662b7a on projectmesa:master.

coveralls · 2018-08-23T10:29:50Z

Coverage increased (+0.2%) to 80.753% when pulling 3f4f26b on Corvince:FastDC into d662b7a on projectmesa:master.

Corvince · 2018-08-24T08:28:51Z

Okay, forget this implementation!

I found a much better way with virtually no overhead!!

Instead of evaluating the reporter functions directly we can use the map function to create a generator and only calculate the values once we build up the dataframe.

Something like:
agent_records = map(reportgetter, model.schedule.agents)

This will also work with custom reporter functions and not just string based reporters. I think I could still speed up the creation of the dataframe for pure string reporters, but I think this will be less critical.

Corvince · 2018-09-10T10:04:10Z

Oh gosh I wish I could trust my eyes more...

I somehow thought that map would auto-magically track the agents current state at the cost of increased memory usage. I quickly checked it and everything seemed fine, but I must have looked at the wrong data or something.

Unfortunately (but logically), no auto-magic is happening. With the nature of lazy evaluation once you create the DataFrame, the values for all steps are actually only from the current state (the last step).

Guess I was too excited 😢 I will look into this further and probably revert to the previous implementation (which speeds things up, but still with some overhead).

Until then, please don't merge anything just yet.

Corvince · 2018-09-11T09:57:36Z

Ok, new implementation online, again. And since I am spamming this thread already, here is some explanation how the current implementation is faster (for those interested).

In the original implementation we had to access each agent 2*reports times (because each reporter would also store the agents unique id). For 2000 agents and 5 reporters this would result in 20000 memory accesses and operations).
In my implementation we only access each agent once and collect all its attributes using only built-in functions (i.e. only 2000 memory accesses). That is why the speed-up only works for attribute reporters.
Note, however, that I know nothing about the internal process/optimization of python. The difference in memory accesses is probably smaller, but the main speed-up is from using a built-in function.

Now in the current implementation I have "outsourced" the attribute collection into a function called _record_agents which maps the attribute collection to each agent. This mapping is very fast. The "slow" part of collecting data is evaluating the mapping and putting it in a list. This was missing in the previous (extremely fast) implementation and is now done in the collect function. The advantage of this approach is that we can also write other functions that, for example, write the mapping directly to a database or file without first constructing a list and then iterating over that list.

NB: collect now stores agent records in a dictionary with model steps as the keys. That means it would also be trivial to add a "step" keyword to the get_agent_vars_dataframe() to return only the dataframe for a single step or a range of steps (the latter not so trivial, notation-wise)

Faster agent attribute collection

4ebbb0e

Only significantly faster if all agent reporters are attributes

Corvince added 2 commits August 24, 2018 12:29

Lazy evaluation of reporter functions

aaa4c54

small bugfix

8118107

Corvince added 2 commits September 11, 2018 10:02

Revert to a sane approach

920c9ca

add _record_agents function

3f4f26b

dmasad merged commit 08d4d0e into projectmesa:master Sep 30, 2018

jackiekazil added this to the Kearny v0.8.5 milestone Oct 5, 2018

Corvince mentioned this pull request Feb 15, 2019

Model state format #574

Open

Corvince deleted the FastDC branch June 14, 2019 03:04

rht mentioned this pull request Oct 29, 2022

Feature Request: Agent DataCollection Can't Handle Different Attributes in ActivationByType #1419

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster agent attribute collection #576

Faster agent attribute collection #576

Corvince commented Aug 23, 2018

coveralls commented Aug 23, 2018

coveralls commented Aug 23, 2018

coveralls commented Aug 23, 2018 •

edited

Loading

Corvince commented Aug 24, 2018

Corvince commented Sep 10, 2018

Corvince commented Sep 11, 2018

Faster agent attribute collection #576

Faster agent attribute collection #576

Conversation

Corvince commented Aug 23, 2018

coveralls commented Aug 23, 2018

coveralls commented Aug 23, 2018

coveralls commented Aug 23, 2018 • edited Loading

Corvince commented Aug 24, 2018

Corvince commented Sep 10, 2018

Corvince commented Sep 11, 2018

coveralls commented Aug 23, 2018 •

edited

Loading