-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1419. DataCollector accepts an arbitrary schedule at creation (d… #1481
Fix #1419. DataCollector accepts an arbitrary schedule at creation (d… #1481
Conversation
…creation (defaults to model.schedule otherwise) and will return None if an attribute is not found instead of throwing an AttributeError.
I'm unclear why the black hook is failing. Running |
You need to apply black formatter to your changed files after pip installing it |
Whoop, yes done. |
Codecov ReportBase: 91.34% // Head: 91.13% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1481 +/- ##
==========================================
- Coverage 91.34% 91.13% -0.21%
==========================================
Files 15 15
Lines 1306 1309 +3
Branches 223 226 +3
==========================================
Hits 1193 1193
- Misses 80 81 +1
- Partials 33 35 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
mesa/datacollection.py
Outdated
return agent_records | ||
|
||
@staticmethod | ||
def _get_reports(collector, steps, agent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method name is too generic. It could have meant getting the model report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used elsewhere in the code. Why not put the function back in _record_agents
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrongly assumed that by avoiding redefining the same function every call we'd get a performance improvement. This was not the case.
I used the introductory tutorial at "Collecting Data" and removed the model_reporters
, then timed this:
from timeit import default_timer as timer # not called while timing, just here to show what package I was using
start = timer()
model.datacollector.collect(model)
end = timer()
print(end - start) # time in seconds
For each variation:
6.0040998505428433e-05 # original
0.00016220801626332104 # this commit
0.00014566699974238873 # leaving the function in _record_agents
4.683298175223172e-05 # leaving the function in _record_agents and removing the model argument in _record_agents that is no longer used
So actually I was mistaken but we can get it to be pretty fast by leaving the func inside and removing the unused arg. The functions in question for the last timed test are:
def _record_agents(self, schedule):
"""Record agents data in a mapping of functions and agents."""
rep_funcs = self.agent_reporters.values()
def get_reports(agent):
_prefix = (schedule.steps, agent.unique_id)
reports = tuple(rep(agent) for rep in rep_funcs)
return _prefix + reports
agent_records = map(get_reports, schedule.agents)
return agent_records
def collect(self, model):
"""Collect all the data for the given model object."""
if self.schedule is None:
schedule = model.schedule
else:
schedule = self.schedule
if self.model_reporters:
for var, reporter in self.model_reporters.items():
# Check if Lambda operator
if isinstance(reporter, types.LambdaType):
self.model_vars[var].append(reporter(model))
# Check if model attribute
elif isinstance(reporter, partial):
self.model_vars[var].append(reporter(model))
# Check if function with arguments
elif isinstance(reporter, list):
self.model_vars[var].append(reporter[0](*reporter[1]))
else:
self.model_vars[var].append(self._reporter_decorator(reporter))
if self.agent_reporters:
agent_records = self._record_agents(schedule)
self._agent_records[schedule.steps] = list(agent_records)
If we like that, I can commit whichever is preferred. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the function defined inside _record_agents
is clearer. This way, you can have the name remain short, get_reports
.
The only reason we need to pass in |
I thought about this but I decided against it because that would negate what I was trying to do: namely, to have an arbitrary schedule be data-collected (really the agents in the schedule) where a model can have several schedules with different types of agents. I was also able to mix the agents together if I wanted them all intermingled during a |
But you can still have a data collector |
To make sure I'm understanding, we're suggesting that I noticed that we pass I'm also unclear on why we would want to pass in a schedule to a datacollector at the |
Great discussion, now that we talk about it I wonder if we should require a model or schedule at all. I think we could achieve a clearer separation of concerns if we keep the data collection as independent as possible. We could also just pass in a list of agents to the collect method and then collect over these agents. I am not sure anymore about the advantage of baking them in. Expanding on this idea we could gain maximal flexibility by also providing the reporter functions per collect call. |
This is nice: to be able to collect on a subset of agents (a representative sample, say). My question is then: what about model level reporters? Would
Is there a reason one would want to change the reporter functions during the execution of a model? I also don't know enough about CS to know whether or not this would create performance issues. What're your thoughts? |
Yeah, I didn't really think about model level parameters. I don't have an idea on the top of my head, but I don't think we need to consider multi model simulations. I don't think this is a common use case (although maybe one should think about it, maybe it is just something that was feasible until now).
Na I don't think reporters would change, I was thinking in the direction that you could use different reporters for different sets of agents. ( So some reporters a, b for type A, some other reporters c, d for type B) |
1 similar comment
Yeah, I didn't really think about model level parameters. I don't have an idea on the top of my head, but I don't think we need to consider multi model simulations. I don't think this is a common use case (although maybe one should think about it, maybe it is just something that was feasible until now).
Na I don't think reporters would change, I was thinking in the direction that you could use different reporters for different sets of agents. ( So some reporters a, b for type A, some other reporters c, d for type B) |
Agreed. I suppose one new thing that would be possible would be the possibility of the same agents acting with different models in the same simulation. But even then I'm unsure how this could not be accomplished by just using different groups of agents that have different characteristics.
Makes sense. My way of solving this was to just have different schedules (which are then passed to different datacollectors, potentially) and mixing them when stepping through the model. I was able to make a simple subclass of the I suppose I'm also weary of breaking changes which might be the case if we made it possible to send in a list of agents to collect data from and completely separated the |
I forgot that |
This is not accurate. It should be "to allow associating a |
@rht Great, shall I commit with the changes above; i.e. keeping |
Yes, sounds good. |
Thanks for this effort! When you've processed the review comments, could you add a (few) usage examples? Maybe we also can mention it in one of the tutorials. |
Yes, no problem. Should that go in examples/ or shall I update some of the docs? |
…() now accepts a scheudle instead of a model. Attributes that do not exist return None instead of an AttributeError.
I think for the scope of this PR, adding a test case for a custom schedule object would be sufficient. As for the doc/example, we don't have a documentation .rst that describes the complete features of the |
Looking at To be completely honest, I've never written a unit test in my life so I suppose I'm looking for some guidance here. Should I just write a simple script, semi-based on [EDIT: Ended up just adding some sample code to the useful-snippets file. Seemed easiest and most appropriate since I figure most people won't end up using this functionality regularly and it feels weird to write unit tests for custom classes that don't appear natively in the project.] |
…n the same model.
Content wise this is perfect and really useful, especially since Mesa is used a lot as a teaching tool. Good, accessible documentation and examples are essential for that, so thanks! The document doesn't seem to render correctly however, I think you need to add a white line after the We also might have to mention (link to) this snippet in a tutorial or example, but that can indeed be done in another PR. Maybe we can expand the wolf_sheep example, since it already uses different agent types. (CC @tpike3) |
How close are we to a merge on this? I can use this to show snippets as I rerecord session 6 |
It's almost there. Just need to tidy the code snippet. |
Will have it done by this afternoon PST. |
docs/useful-snippets/tmp.py
Outdated
@@ -0,0 +1,49 @@ | |||
from random import shuffle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is accidentally committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved! Sorry, oversight.
OK, pending the discussion result of #1506. |
Hi @woolgathering! @rht, @tpike3, @jackiekazil and I discussed multiple agent types for the DataCollector. We discussed multiple options, including adding a type attribute to the Agent class (consistent, but conditional filtering would have take a large performance impact) and layering the DataCollector with a dict key for each agent type like in #1142 (not very scalable to other use cases). The problem to we converged to was that Mesa supports a single scheduler and a single datacollector by default. This PR adds multiple schedulers with multiple datacollectors. However, there is also a use case for a single scheduler (for example Do you think this PR supports (or can be adapted to) that case of single scheduler with multiple data collectors, and if so would you be willing to add an additional example for that? |
Use cases to consider:
|
Very good cases! Aside from examples, how complex would it be to add tests for these cases? |
I think we can do this in a separate PR, because it requires a scheduler that groups not by |
@woolgathering, what do you think? |
I feel we now have some momentum on this issue, and I would really like not to lose it again this time. @woolgathering have you seen the comment above and what do you think of it? Especially about the single-scheduler multi-datacollector case/example? If you don't have time to work on it this is also perfectly fine, then please let it know, then we can make another plan forward. |
I'm generally of the opinion that PRs should be fairly limited in scope so adding the test cases would probably be best in another PR. I'm honestly not sure how common those cases will be. I was thinking about the single scheduler, multiple collector this weekend. It seems that the cleanest way would be to just pass a list to the collector of the agents one wants to track. The problem there, of course, is that this seems it would require relatively significant reworking of the DataCollector class and would complicate adding/removing agents. Given the way this fix is currently written, I can't think of a simple way of doing it. Suppose I have the problem here: that I have 1000 agents but I only want to track 100 specific agents. This would be a case where one could put all the agents into a single scheduler, then only capture in the data collector the 100 desired agents. This is my understanding of the problem. However... the same thing could be accomplished using multiple schedulers mixed together as in the example I wrote in the snippets. This could even be adapted to the case where one wants a specific order (in a single scheduler, all agents have a So I think overall I agree with @rht in that those cases could work out of the box and get the same end result, even if on the execution layer it looks different. |
Thanks for your extensive reply and the time thinking about this! While I feel using multiple schedulers is not the most intuitive for users, it’s an improvement over the current situation when that means the datacollector can be used for each scheduler. If you and @rht think another example with that use case is out of the scope of this PR, I would say go ahead and merge it.
@tpike3 this opportunity window has probably passed right? |
I sketched an example for 3 data collectors, 1 scheduler ( |
Hmmm, I wonder if it would be best to be able to pass a list of agents to the DataCollector after all... then if it's That may not be as hard, though, especially if the list were passed on each |
It could as well be a I think for now, to solve the Sugarscape G1MT for the complexity tutorial use case (1 scheduler, multiple data collectors), we should just implement a custom filter. The mental model with the custom filter is much simpler. Also, it's only 200 agents at most, and so should be fine. We can have a more performant option later. |
I should add, that this is convenient for rapid prototyping a model, before scaling it up. |
Inspired other updates to datacollector |
…efaults to model.schedule otherwise) and will return None if an attribute is not found instead of throwing an AttributeError.
Fix #1419.
DataCollector
now accepts an optional fourth argumentschedule
which is an arbitrary schedule in the model. If none is provided, theDataCollector
defaults to previous behavior, assuming that there exists an attribute to each agent calledmodel.scheudle
.This also does not result in an
AttributeError
when an attribute does not exist for an agent. It instead returnsNone
.