Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster agent attribute collection #576

Merged
merged 5 commits into from
Sep 30, 2018
Merged

Conversation

Corvince
Copy link
Contributor

Addresses #575

This implementation adds a fast-track for collecting agent attributes if all reporters are string-based (no custom or lambda functions). If no string-based reporters are present or for a mixture of reporters it defaults to a only slightly improved version of the current implementation.

I also changed the structure of the class variable agent_vars, to store less data, it specifically stores the agent.unique_id only once and not for every reporter. This breaks code that relies on that variable, although none of the examples do.

One idea to encourage the use of string-based reporters for agent reporters would be to include a warning if an agent reporter is not string-based and tell users that they could benefit from switching to string-based reporters (or even that support for non string-based reporters will be dropped in a much later version of mesa).

Hope you like it

Only significantly faster if all agent reporters are attributes
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 80.693% when pulling 4ebbb0e on Corvince:FastDC into d662b7a on projectmesa:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 80.693% when pulling 4ebbb0e on Corvince:FastDC into d662b7a on projectmesa:master.

@coveralls
Copy link

coveralls commented Aug 23, 2018

Coverage Status

Coverage increased (+0.2%) to 80.753% when pulling 3f4f26b on Corvince:FastDC into d662b7a on projectmesa:master.

@Corvince
Copy link
Contributor Author

Okay, forget this implementation!

I found a much better way with virtually no overhead!!

Instead of evaluating the reporter functions directly we can use the map function to create a generator and only calculate the values once we build up the dataframe.

Something like:
agent_records = map(reportgetter, model.schedule.agents)

This will also work with custom reporter functions and not just string based reporters. I think I could still speed up the creation of the dataframe for pure string reporters, but I think this will be less critical.

@Corvince
Copy link
Contributor Author

Oh gosh I wish I could trust my eyes more...

I somehow thought that map would auto-magically track the agents current state at the cost of increased memory usage. I quickly checked it and everything seemed fine, but I must have looked at the wrong data or something.

Unfortunately (but logically), no auto-magic is happening. With the nature of lazy evaluation once you create the DataFrame, the values for all steps are actually only from the current state (the last step).

Guess I was too excited 😢 I will look into this further and probably revert to the previous implementation (which speeds things up, but still with some overhead).

Until then, please don't merge anything just yet.

@Corvince
Copy link
Contributor Author

Ok, new implementation online, again. And since I am spamming this thread already, here is some explanation how the current implementation is faster (for those interested).

In the original implementation we had to access each agent 2*reports times (because each reporter would also store the agents unique id). For 2000 agents and 5 reporters this would result in 20000 memory accesses and operations).
In my implementation we only access each agent once and collect all its attributes using only built-in functions (i.e. only 2000 memory accesses). That is why the speed-up only works for attribute reporters.
Note, however, that I know nothing about the internal process/optimization of python. The difference in memory accesses is probably smaller, but the main speed-up is from using a built-in function.

Now in the current implementation I have "outsourced" the attribute collection into a function called _record_agents which maps the attribute collection to each agent. This mapping is very fast. The "slow" part of collecting data is evaluating the mapping and putting it in a list. This was missing in the previous (extremely fast) implementation and is now done in the collect function. The advantage of this approach is that we can also write other functions that, for example, write the mapping directly to a database or file without first constructing a list and then iterating over that list.

NB: collect now stores agent records in a dictionary with model steps as the keys. That means it would also be trivial to add a "step" keyword to the get_agent_vars_dataframe() to return only the dataframe for a single step or a range of steps (the latter not so trivial, notation-wise)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants