Fix #1419. DataCollector accepts an arbitrary schedule at creation (d… #1481

woolgathering · 2022-10-29T21:13:58Z

…efaults to model.schedule otherwise) and will return None if an attribute is not found instead of throwing an AttributeError.

Fix #1419. DataCollector now accepts an optional fourth argument schedule which is an arbitrary schedule in the model. If none is provided, the DataCollector defaults to previous behavior, assuming that there exists an attribute to each agent called model.scheudle.

This also does not result in an AttributeError when an attribute does not exist for an agent. It instead returns None.

…creation (defaults to model.schedule otherwise) and will return None if an attribute is not found instead of throwing an AttributeError.

woolgathering · 2022-10-29T21:22:02Z

I'm unclear why the black hook is failing. Running flake8 . --ignore=F403,E501,E123,E128,W504,W503 --exclude=docs,build on my machine returns no errors. Please advise.

Tortar · 2022-10-29T21:40:38Z

You need to apply black formatter to your changed files after pip installing it

woolgathering · 2022-10-30T03:14:48Z

Whoop, yes done.

codecov · 2022-10-30T05:53:00Z

Codecov Report

Base: 91.34% // Head: 91.13% // Decreases project coverage by -0.20% ⚠️

Coverage data is based on head (d486f2e) compared to base (952857c).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1481      +/-   ##
==========================================
- Coverage   91.34%   91.13%   -0.21%     
==========================================
  Files          15       15              
  Lines        1306     1309       +3     
  Branches      223      226       +3     
==========================================
  Hits         1193     1193              
- Misses         80       81       +1     
- Partials       33       35       +2

Impacted Files	Coverage Δ
mesa/datacollection.py	`95.29% <100.00%> (-2.39%)`	⬇️
mesa/space.py	`95.78% <0.00%> (-0.45%)`	⬇️
mesa/batchrunner.py	`92.88% <0.00%> (-0.04%)`	⬇️
mesa/visualization/TextVisualization.py	`39.47% <0.00%> (ø)`
mesa/visualization/ModularVisualization.py	`78.88% <0.00%> (ø)`
mesa/visualization/modules/ChartVisualization.py	`91.66% <0.00%> (ø)`
mesa/visualization/modules/HexGridVisualization.py	`96.29% <0.00%> (ø)`
mesa/visualization/modules/NetworkVisualization.py	`100.00% <0.00%> (ø)`
...esa/visualization/modules/BarChartVisualization.py	`88.88% <0.00%> (ø)`
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

rht · 2022-10-30T06:30:28Z

mesa/datacollection.py

        return agent_records

+    @staticmethod
+    def _get_reports(collector, steps, agent):


This method name is too generic. It could have meant getting the model report.

It's not used elsewhere in the code. Why not put the function back in _record_agents?

I wrongly assumed that by avoiding redefining the same function every call we'd get a performance improvement. This was not the case.

I used the introductory tutorial at "Collecting Data" and removed the model_reporters, then timed this:

from timeit import default_timer as timer # not called while timing, just here to show what package I was using start = timer() model.datacollector.collect(model) end = timer() print(end - start) # time in seconds

For each variation:

6.0040998505428433e-05 # original 0.00016220801626332104 # this commit 0.00014566699974238873 # leaving the function in _record_agents 4.683298175223172e-05 # leaving the function in _record_agents and removing the model argument in _record_agents that is no longer used

So actually I was mistaken but we can get it to be pretty fast by leaving the func inside and removing the unused arg. The functions in question for the last timed test are:

def _record_agents(self, schedule): """Record agents data in a mapping of functions and agents.""" rep_funcs = self.agent_reporters.values() def get_reports(agent): _prefix = (schedule.steps, agent.unique_id) reports = tuple(rep(agent) for rep in rep_funcs) return _prefix + reports agent_records = map(get_reports, schedule.agents) return agent_records def collect(self, model): """Collect all the data for the given model object.""" if self.schedule is None: schedule = model.schedule else: schedule = self.schedule if self.model_reporters: for var, reporter in self.model_reporters.items(): # Check if Lambda operator if isinstance(reporter, types.LambdaType): self.model_vars[var].append(reporter(model)) # Check if model attribute elif isinstance(reporter, partial): self.model_vars[var].append(reporter(model)) # Check if function with arguments elif isinstance(reporter, list): self.model_vars[var].append(reporter[0](*reporter[1])) else: self.model_vars[var].append(self._reporter_decorator(reporter)) if self.agent_reporters: agent_records = self._record_agents(schedule) self._agent_records[schedule.steps] = list(agent_records)

If we like that, I can commit whichever is preferred. Thoughts?

I think having the function defined inside _record_agents is clearer. This way, you can have the name remain short, get_reports.

rht · 2022-10-30T07:16:23Z

The only reason we need to pass in model in collect is to access its scheduler. I wonder if we should make model an optional argument in collect.

woolgathering · 2022-10-30T21:07:44Z

The only reason we need to pass in model in collect is to access its scheduler. I wonder if we should make model an optional argument in collect.

I thought about this but I decided against it because that would negate what I was trying to do: namely, to have an arbitrary schedule be data-collected (really the agents in the schedule) where a model can have several schedules with different types of agents. I was also able to mix the agents together if I wanted them all intermingled during a step() with a simple schedule subclass which did not interfere with the data collecting.

rht · 2022-10-31T03:29:18Z

But you can still have a data collector collect receiving arbitrary schedule without model. Making it optional doesn't negate this capability.

woolgathering · 2022-10-31T17:46:45Z

To make sure I'm understanding, we're suggesting that collect have two arguments: model (optional) and schedule (optional), correct?

I noticed that we pass model to the reporter() when we loop over the reporters. If schedule is defined in collect, we would just do reporter(schedule.model). But if model is defined in collect, we then do reporter(model), yes? I just want to make sure I understand before I start making commits.

I'm also unclear on why we would want to pass in a schedule to a datacollector at the collect stage. This to me would imply that someone would want to change the collected schedule while the model is running which seems weird. I would think that a datacollector is "tied" to a specific schedule with specific agents and that we wouldn't want to change it while the model is running, hence having the datacollector have a schedule attribute.

Corvince · 2022-10-31T18:04:28Z

Great discussion, now that we talk about it I wonder if we should require a model or schedule at all.

I think we could achieve a clearer separation of concerns if we keep the data collection as independent as possible.

We could also just pass in a list of agents to the collect method and then collect over these agents. I am not sure anymore about the advantage of baking them in. Expanding on this idea we could gain maximal flexibility by also providing the reporter functions per collect call.

woolgathering · 2022-10-31T18:39:55Z

We could also just pass in a list of agents to the collect method and then collect over these agents. I am not sure anymore about the advantage of baking them in.

This is nice: to be able to collect on a subset of agents (a representative sample, say). My question is then: what about model level reporters? Would collect accept a list of agents and a model on which to collect model level info? I also can't think of an instance in which I'd want to have more than one model running in a simulation, though it's entirely possible that I just haven't encountered the use-case yet.

Expanding on this idea we could gain maximal flexibility by also providing the reporter functions per collect call.

Is there a reason one would want to change the reporter functions during the execution of a model? I also don't know enough about CS to know whether or not this would create performance issues. What're your thoughts?

Corvince · 2022-10-31T20:55:14Z

We could also just pass in a list of agents to the collect method and then collect over these agents. I am not sure anymore about the advantage of baking them in.

This is nice: to be able to collect on a subset of agents (a representative sample, say). My question is then: what about model level reporters? Would collect accept a list of agents and a model on which to collect model level info? I also can't think of an instance in which I'd want to have more than one model running in a simulation, though it's entirely possible that I just haven't encountered the use-case yet.

Yeah, I didn't really think about model level parameters. I don't have an idea on the top of my head, but I don't think we need to consider multi model simulations. I don't think this is a common use case (although maybe one should think about it, maybe it is just something that was feasible until now).

Expanding on this idea we could gain maximal flexibility by also providing the reporter functions per collect call.

Is there a reason one would want to change the reporter functions during the execution of a model? I also don't know enough about CS to know whether or not this would create performance issues. What're your thoughts?

Na I don't think reporters would change, I was thinking in the direction that you could use different reporters for different sets of agents. ( So some reporters a, b for type A, some other reporters c, d for type B)

Corvince · 2022-10-31T20:59:44Z

We could also just pass in a list of agents to the collect method and then collect over these agents. I am not sure anymore about the advantage of baking them in.

This is nice: to be able to collect on a subset of agents (a representative sample, say). My question is then: what about model level reporters? Would collect accept a list of agents and a model on which to collect model level info? I also can't think of an instance in which I'd want to have more than one model running in a simulation, though it's entirely possible that I just haven't encountered the use-case yet.

Yeah, I didn't really think about model level parameters. I don't have an idea on the top of my head, but I don't think we need to consider multi model simulations. I don't think this is a common use case (although maybe one should think about it, maybe it is just something that was feasible until now).

Expanding on this idea we could gain maximal flexibility by also providing the reporter functions per collect call.

Is there a reason one would want to change the reporter functions during the execution of a model? I also don't know enough about CS to know whether or not this would create performance issues. What're your thoughts?

Na I don't think reporters would change, I was thinking in the direction that you could use different reporters for different sets of agents. ( So some reporters a, b for type A, some other reporters c, d for type B)

woolgathering · 2022-10-31T22:21:11Z

Yeah, I didn't really think about model level parameters. I don't have an idea on the top of my head, but I don't think we need to consider multi model simulations. I don't think this is a common use case (although maybe one should think about it, maybe it is just something that was feasible until now).

Agreed. I suppose one new thing that would be possible would be the possibility of the same agents acting with different models in the same simulation. But even then I'm unsure how this could not be accomplished by just using different groups of agents that have different characteristics.

Na I don't think reporters would change, I was thinking in the direction that you could use different reporters for different sets of agents. ( So some reporters a, b for type A, some other reporters c, d for type B)

Makes sense. My way of solving this was to just have different schedules (which are then passed to different datacollectors, potentially) and mixing them when stepping through the model. I was able to make a simple subclass of the BaseScheduler that took in multiple schedules and mixed the agents in them. So sort of like RandomActivation but it could mix agents from multiple schedules. This achieves the same functionality as what you're suggesting if all the agents are the same type, if I'm understanding correctly.

I suppose I'm also weary of breaking changes which might be the case if we made it possible to send in a list of agents to collect data from and completely separated the DataCollector out. I'd leave that up to the existing contributors to decide if/when that is appropriate.

rht · 2022-11-01T05:29:51Z

I forgot that model is needed in collect for the model reporters. Let's keep the scope of this PR small for now, i.e. to allow passing an explicit schedule object in collect. The use case is specifically when you have multiple schedulers in one model.

rht · 2022-11-01T05:33:51Z

to allow passing an explicit schedule object in collect.

This is not accurate. It should be "to allow associating a DataCollector object with a custom scheduler, in case a model has multiple schedulers"

woolgathering · 2022-11-01T07:13:10Z

@rht Great, shall I commit with the changes above; i.e. keeping get_reports defined inside _record_agents and passing in model to collect()? I thought it was pretty clean overall and it ran pretty quick.

rht · 2022-11-01T08:37:16Z

Yes, sounds good.

EwoutH · 2022-11-01T21:47:28Z

Thanks for this effort! When you've processed the review comments, could you add a (few) usage examples?

Maybe we also can mention it in one of the tutorials.

woolgathering · 2022-11-02T05:05:59Z

Thanks for this effort! When you've processed the review comments, could you add a (few) usage examples?

Yes, no problem. Should that go in examples/ or shall I update some of the docs?

…() now accepts a scheudle instead of a model. Attributes that do not exist return None instead of an AttributeError.

rht · 2022-11-02T05:27:42Z

I think for the scope of this PR, adding a test case for a custom schedule object would be sufficient.

As for the doc/example, we don't have a documentation .rst that describes the complete features of the DataCollector yet. Having to make an example simulation code for a particular feature is overkill, considering that an example usually consists of multiple files. Maybe a short doc in https://github.com/projectmesa/mesa/tree/main/docs/useful-snippets would do for now. And we can defer this documentation to a separate PR.

woolgathering · 2022-11-02T05:55:19Z

Looking at test_datacollector.py as an example, it looks like I'd need to edit the initialize_data_collector() function in the Model class since it would assign the DataCollector to self.datacollector in every case (i.e. overwrite if I called it multiple times for multiple collectors and schedules).

To be completely honest, I've never written a unit test in my life so I suppose I'm looking for some guidance here. Should I just write a simple script, semi-based on test_datacollector.py with a bunch of assertions?

[EDIT: Ended up just adding some sample code to the useful-snippets file. Seemed easiest and most appropriate since I figure most people won't end up using this functionality regularly and it feels weird to write unit tests for custom classes that don't appear natively in the project.]

…n the same model.

EwoutH · 2022-11-02T09:52:07Z

Ended up just adding some sample code to the useful-snippets file.

Content wise this is perfect and really useful, especially since Mesa is used a lot as a teaching tool. Good, accessible documentation and examples are essential for that, so thanks!

The document doesn't seem to render correctly however, I think you need to add a white line after the .. code:: python part.

We also might have to mention (link to) this snippet in a tutorial or example, but that can indeed be done in another PR. Maybe we can expand the wolf_sheep example, since it already uses different agent types. (CC @tpike3)

tpike3 · 2022-11-05T10:33:47Z

How close are we to a merge on this? I can use this to show snippets as I rerecord session 6

rht · 2022-11-05T10:45:08Z

It's almost there. Just need to tidy the code snippet.

jacob-thrackle · 2022-11-07T18:53:23Z

Will have it done by this afternoon PST.

rht · 2022-11-08T02:51:45Z

docs/useful-snippets/tmp.py

@@ -0,0 +1,49 @@
+from random import shuffle


This file is accidentally committed.

Solved! Sorry, oversight.

rht · 2022-11-08T22:57:05Z

OK, pending the discussion result of #1506.

EwoutH · 2022-11-10T12:31:52Z

Hi @woolgathering! @rht, @tpike3, @jackiekazil and I discussed multiple agent types for the DataCollector. We discussed multiple options, including adding a type attribute to the Agent class (consistent, but conditional filtering would have take a large performance impact) and layering the DataCollector with a dict key for each agent type like in #1142 (not very scalable to other use cases).

The problem to we converged to was that Mesa supports a single scheduler and a single datacollector by default. This PR adds multiple schedulers with multiple datacollectors. However, there is also a use case for a single scheduler (for example RandomActivationByType) but with multiple datacollectors.

Do you think this PR supports (or can be adapted to) that case of single scheduler with multiple data collectors, and if so would you be willing to add an additional example for that?

rht · 2022-11-10T15:53:56Z

Do you think this PR supports (or can be adapted to) that case of single scheduler with multiple data collectors, and if so would you be willing to add an additional example for that?

RandomActivationByType.agents_by_type can be considered to be a collection of schedulers (which keys are static), so I think it would work out of the box with the solution in this PR.

Use cases to consider:

Currently, RandomActivationByType groups agents by their agent class, obtained via type(agent). But sometimes, one might want to group by various agent attributes. It might be possible to have 2 agent classes sharing the same attribute, hence, grouped in the same schedule. Or the other way around: 2 agents with the same agent class, but have different attributes
An agent attributes may change over time, and so, the agent might be moved to a different step stage as the simulation progresses.

EwoutH · 2022-11-10T16:23:44Z

Very good cases!

Aside from examples, how complex would it be to add tests for these cases?

rht · 2022-11-11T09:06:20Z

Aside from examples, how complex would it be to add tests for these cases?

I think we can do this in a separate PR, because it requires a scheduler that groups not by type(agent). And once the 1 scheduler with multiple data collectors has been documented, it would require a little tweak to accommodate for the scheduler with custom grouping.

EwoutH · 2022-11-12T14:35:47Z

@woolgathering, what do you think?

EwoutH · 2022-11-14T09:10:02Z

I feel we now have some momentum on this issue, and I would really like not to lose it again this time.

@woolgathering have you seen the comment above and what do you think of it? Especially about the single-scheduler multi-datacollector case/example?

If you don't have time to work on it this is also perfectly fine, then please let it know, then we can make another plan forward.

woolgathering · 2022-11-14T20:52:53Z

I'm generally of the opinion that PRs should be fairly limited in scope so adding the test cases would probably be best in another PR. I'm honestly not sure how common those cases will be.

I was thinking about the single scheduler, multiple collector this weekend. It seems that the cleanest way would be to just pass a list to the collector of the agents one wants to track. The problem there, of course, is that this seems it would require relatively significant reworking of the DataCollector class and would complicate adding/removing agents. Given the way this fix is currently written, I can't think of a simple way of doing it.

Suppose I have the problem here: that I have 1000 agents but I only want to track 100 specific agents. This would be a case where one could put all the agents into a single scheduler, then only capture in the data collector the 100 desired agents. This is my understanding of the problem. However... the same thing could be accomplished using multiple schedulers mixed together as in the example I wrote in the snippets. This could even be adapted to the case where one wants a specific order (in a single scheduler, all agents have a unique_id and the mix could be ordered according to that), or to activate them by type (no different than having multiple schedulers with different types executed in series).

So I think overall I agree with @rht in that those cases could work out of the box and get the same end result, even if on the execution layer it looks different.

EwoutH · 2022-11-14T21:29:41Z

Thanks for your extensive reply and the time thinking about this!

While I feel using multiple schedulers is not the most intuitive for users, it’s an improvement over the current situation when that means the datacollector can be used for each scheduler. If you and @rht think another example with that use case is out of the scope of this PR, I would say go ahead and merge it.

How close are we to a merge on this? I can use this to show snippets as I rerecord session 6

@tpike3 this opportunity window has probably passed right?

rht · 2022-11-15T08:58:44Z

I sketched an example for 3 data collectors, 1 scheduler (RandomActivationByType) in https://github.com/rht/mesa/tree/test_dc_schedule. It didn't work, because the data collector strictly assumes that the object being passed has an attribute agents (see https://github.com/rht/mesa/blob/dbb29de3decb350fac406ba985fb846e14597960/mesa/datacollection.py#L167).

woolgathering · 2022-11-15T20:42:55Z

Hmmm, I wonder if it would be best to be able to pass a list of agents to the DataCollector after all... then if it's None it would just default to schedule.agents (all the agents). It would then be on the user to make sure that if they're adding/removing agents to do it in both places.

That may not be as hard, though, especially if the list were passed on each collect. It just feels kind of cumbersome and bloated. Thoughts?

rht · 2022-11-16T11:29:48Z

It could as well be a dict instead of a list, or maybe have DataCollector.__getitem__, not sure.

I think for now, to solve the Sugarscape G1MT for the complexity tutorial use case (1 scheduler, multiple data collectors), we should just implement a custom filter. The mental model with the custom filter is much simpler. Also, it's only 200 agents at most, and so should be fine. We can have a more performant option later.

rht · 2022-11-16T11:33:45Z

The mental model with the custom filter is much simpler.

I should add, that this is convenient for rapid prototyping a model, before scaling it up.

tpike3 · 2023-06-19T19:37:01Z

Inspired other updates to datacollector

jacob-thrackle added 2 commits October 29, 2022 14:10

Fix projectmesa#1419. DataCollector accepts an arbitrary schedule at …

f81be13

…creation (defaults to model.schedule otherwise) and will return None if an attribute is not found instead of throwing an AttributeError.

Removed unused attrgetter

1918e0c

Fixed formatting with black

f9622c8

rht reviewed Oct 30, 2022

View reviewed changes

tpike3 added this to the v1.2.0 Taylor milestone Oct 31, 2022

EwoutH mentioned this pull request Nov 1, 2022

Enable Read the Docs generation on Pull Requests #1420

Closed

Fix projectmesa#1419 as discussed in projectmesa#1481. _record_agents…

4fcf6a0

…() now accepts a scheudle instead of a model. Attributes that do not exist return None instead of an AttributeError.

Added sample snippet code for data collecting on multiple schedules i…

bf0f3a1

…n the same model.

Reformatted snippet code

3d336d2

rht reviewed Nov 8, 2022

View reviewed changes

Removed accidentally commited file.

d486f2e

Simpdaddyjosh approved these changes Nov 18, 2022

View reviewed changes

tpike3 modified the milestones: v1.2.0 Taylor, Mesa 2.0 Dec 8, 2022

tpike3 modified the milestones: Mesa 2.0, Milestone 1.2 Jan 8, 2023

tpike3 modified the milestones: Milestone 1.2, Mesa 2.0 Feb 5, 2023

rht mentioned this pull request May 21, 2023

feat: Implement exclude_none_values in DataCollector #1702

Merged

tpike3 removed this from the Mesa 2.0 milestone Jun 19, 2023

tpike3 closed this Jun 19, 2023

Fix #1419. DataCollector accepts an arbitrary schedule at creation (d… #1481

Fix #1419. DataCollector accepts an arbitrary schedule at creation (d… #1481

Conversation

woolgathering commented Oct 29, 2022

woolgathering commented Oct 29, 2022

Tortar commented Oct 29, 2022 • edited Loading

woolgathering commented Oct 30, 2022

codecov bot commented Oct 30, 2022 • edited Loading

Codecov Report

rht Oct 30, 2022

Choose a reason for hiding this comment

rht Oct 30, 2022

Choose a reason for hiding this comment

woolgathering Oct 30, 2022 • edited Loading

Choose a reason for hiding this comment

rht Oct 31, 2022

Choose a reason for hiding this comment

rht commented Oct 30, 2022

woolgathering commented Oct 30, 2022 • edited Loading

rht commented Oct 31, 2022

woolgathering commented Oct 31, 2022

Corvince commented Oct 31, 2022

woolgathering commented Oct 31, 2022 • edited Loading

Corvince commented Oct 31, 2022

Corvince commented Oct 31, 2022

woolgathering commented Oct 31, 2022 • edited Loading

rht commented Nov 1, 2022 • edited Loading

rht commented Nov 1, 2022

woolgathering commented Nov 1, 2022

rht commented Nov 1, 2022

EwoutH commented Nov 1, 2022

woolgathering commented Nov 2, 2022

rht commented Nov 2, 2022

woolgathering commented Nov 2, 2022 • edited Loading

EwoutH commented Nov 2, 2022 • edited Loading

tpike3 commented Nov 5, 2022

rht commented Nov 5, 2022

jacob-thrackle commented Nov 7, 2022

rht Nov 8, 2022

Choose a reason for hiding this comment

woolgathering Nov 8, 2022

Choose a reason for hiding this comment

rht commented Nov 8, 2022

EwoutH commented Nov 10, 2022 • edited Loading

rht commented Nov 10, 2022

EwoutH commented Nov 10, 2022

rht commented Nov 11, 2022

EwoutH commented Nov 12, 2022

EwoutH commented Nov 14, 2022

woolgathering commented Nov 14, 2022

EwoutH commented Nov 14, 2022

rht commented Nov 15, 2022

woolgathering commented Nov 15, 2022

rht commented Nov 16, 2022

rht commented Nov 16, 2022

tpike3 commented Jun 19, 2023

Tortar commented Oct 29, 2022 •

edited

Loading

codecov bot commented Oct 30, 2022 •

edited

Loading

woolgathering Oct 30, 2022 •

edited

Loading

woolgathering commented Oct 30, 2022 •

edited

Loading

woolgathering commented Oct 31, 2022 •

edited

Loading

woolgathering commented Oct 31, 2022 •

edited

Loading

rht commented Nov 1, 2022 •

edited

Loading

woolgathering commented Nov 2, 2022 •

edited

Loading

EwoutH commented Nov 2, 2022 •

edited

Loading

EwoutH commented Nov 10, 2022 •

edited

Loading