Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Instant" command processing #1613

Closed
alice-i-cecile opened this issue Mar 11, 2021 · 17 comments
Closed

"Instant" command processing #1613

alice-i-cecile opened this issue Mar 11, 2021 · 17 comments
Labels
A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible C-Usability A targeted quality-of-life change that makes Bevy easier to use S-Needs-Design-Doc This issue or PR is particularly complex, and needs an approved design doc before it can be merged

Comments

@alice-i-cecile
Copy link
Member

alice-i-cecile commented Mar 11, 2021

What problem does this solve or what need does it fill?

Having to wait until a hard sync point (currently the end of the Stage) for basic commands to process slows down prototyping badly. This is particularly frustrating with

  • spawning entities
  • despawning entities
  • adding components
  • removing components

Without this feature, you have to be very thoughtful about how and when you're using commands, even when prototyping, and where you place systems that rely on other things being mutated via commands. Failing to account for command delays properly can result in frame lags or buggy behavior, and moving systems between stages to accommodate the required hard syncs takes a lot of work, which must be refactored whenever your prototype changes significantly.

Possible Solutions

This is a very technically complex feature request, and as such there are several potential solutions that deserve more exploration. The gist of the challenge is that commands can, in general, touch arbitrary data, making them very hard to schedule. To get around that, regular hard sync points exist, which have exclusive access to the entire world and can perform arbitrary operations to process any commands that may have built up.

  1. Bit-flipping: see Fast add/remove components via bit flipping. #356. Iterators have to be aware that they can skip components in tables due to change detection, so we can use this property to toggle a bit to "soft-remove" an entity from an archetype without mutable access to the entire World (and possibly do something similar to add it to a new one). We can use hard syncs to clean up the archetypes, and recover data locality / future performance. From @BoxyUwU.
  2. Commands subworlds: Copy the data into a subworld when using commands, queries must access the main world and all subworlds. From Kae on Discord.
  3. Dynamically scheduled commands: fragment commands based on the data that they need to access. Dynamically schedule them at a point between they are created and the first time that data would be used. Remove most hard syncs in the process. See this comment on System organization overhaul and the road to a stageless schedule #1375.
@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible help wanted C-Usability A targeted quality-of-life change that makes Bevy easier to use labels Mar 11, 2021
@Davier
Copy link
Contributor

Davier commented Mar 11, 2021

I like the idea of having a "staging" world that commands insert into.
As discussed in #1446, it would be useful to spawn loaded assets (especially scenes) and modify them on the fly.

@Ratysz
Copy link
Contributor

Ratysz commented Mar 11, 2021

"It makes prototyping easier by letting you think about X less" does not sound like compelling motivation to me for this case. Maybe I'm picking up more from phrasing than there is.

I fairly sure we already had this conversation on why multiple worlds is not a good general solution for this. Some form of slowboat staging world could be a thing, but it doesn't have to be powered by commands; copying out parts of the world for things like rendering is a different matter entirely.

Fragmenting commands could work, but it is an entire new layer of constraints on the systems graph - automatically supporting that for all systems is bound to harm parallelization; it should be opt-in, and it should be immediately obvious which commands can be "instant", perhaps to the point of having separate dedicated queues. This also doesn't have to be dynamic: the system must declare what commands it may issue, exactly the same way it must declare what data it accesses or if it cares about execution order, so "scheduling commands" of a system can be done as part of scheduling the system itself.

It bears mentioning that no solution will be able to make all commands instant, only a subset.

@cart
Copy link
Member

cart commented Mar 11, 2021

Something to consider: we can allow "immediate" insertions and removals if we can guarantee that no other system is reading or writing a given storage (Table, SparseSet, Archetypes, Entities).

We could add a new api that enables users to claim these resources uniquely.

Spawning/despawning in a specific Archetype requires unique write access to the following:

  • Archetype value
    • Cannot be run concurrently with &Archetypes queries (requires new logic) or other operations that need this Archetype value (which can be guaranteed if all ArchetypeComponents are grabbed uniquely and that is globally treated as a unique Archetype borrow)
  • Table of archetype
    • Claiming Write access to each ArchetypeComponent in the archetype prevents concurrent reads/writes to components
  • Sparse Sets in archetype
    • Claiming Write access to each sparse Component in the archetype prevents concurrent reads/writes (this could be translated to all current ArchetypeComponents for a given Component to avoid complicating the scheduler)
  • Entities
    • This is the "hard" part. Grabbing this uniquely prevents all other spawn/despawn actions, in addition to metadata lookups for things like Query::get_component::<T>(entity) and Query::get(entity). This wouldn't block Query iteration as it doesn't access the metadata collection. This would almost certainly require some kind of synchronization.

Meeting the constraints above should enable things like:

// Claims an _exact_ archetype (using the logic above) and nothing else
// (and initializes the archetype if it doesn't already exist)
fn sys(mut writer: ArchetypeWriter<(A, B, C)>) { 
  writer.spawn((A(0), B(0), C(0)));
  writer.despawn(some_entity); // fails if entity is not in archetype
}

// Claims _all_ archetypes that currently match the query  (using the logic above)
// no new archetypes are allowed (to prevent the need to lock Archetypes uniquely and
// because we can't anticipate what component types are needed ahead of time)
fn sys(mut writer: QueryWriter<(A, B, C)>) {
  writer.spawn((A(0), B(0), C(0)));
  writer.spawn((A(1), B(1), C(1), D(1)); // note that D's storage was implicitly claimed by QueryWriter
  writer.despawn(some_entity);
}

// Claims the before/after archetype (using the logic above) and nothing else
// (and initializes the archetypes if they doesn't already exist)
fn sys(mut remover: Remover<(A, B, C),  B>) { 
  remover.remove(entity);
}

We should also consider just making it possible to add exclusive systems (aka hard sync points) before/after any system in a stage. Solving that problem allows for simpler, less restrictive world access.

@alice-i-cecile
Copy link
Member Author

We should also consider just making it possible to add exclusive systems (aka hard sync points) before/after any system in a stage. Solving that problem allows for simpler, less restrictive world access.

This should be vastly improved by a stageless design. Following up on it though, I would much rather see users able to add parallel systems as exclusive systems that induce a hard sync point, whose commands are immediately processed. This allows them to stick to the standard type signatures, and then convert them back into ordinary parallel systems quickly once they're done prototyping.

e.g. add_system(my_system) becomes add_exclusive_system(my_system) and vice versa, quickly letting you replace them without needing to completely rewrite things in terms of World like you do now. Using the .system() and .exclusive_system() methods would also work for now, but ideally that boilerplate will be able to be removed Soon:tm:.

@alice-i-cecile
Copy link
Member Author

Something to consider: we can allow "immediate" insertions and removals if we can guarantee that no other system is reading or writing a given storage (Table, SparseSet, Archetypes, Entities).

This idea feels like... partially exclusive systems to me? I like them a lot, but for the same reasons as my last comment, I feel like they should be defined upon system insertion, rather than forcing a modification of the parameter types in your system.

@alice-i-cecile
Copy link
Member Author

Overall, I think that the "run this parallel system as an exclusive system", inserted into arbitrary points in the schedule is the most promising approach to try out first because:

  • It dovetails very nicely with the stageless work we want to do already
  • As far as I can see, it introduces no new serious technical challenges
  • It enables people to write exclusive systems with more comfortable syntax, rather than learning to need how to grab the data they need from the world directly
  • It's very easy to convert systems to-and-from this form, allowing for simple prototyping
  • It won't impose any overhead on optimized games that only use carefully scheduled parallel systems

@Ratysz
Copy link
Contributor

Ratysz commented Mar 12, 2021

Any system can already be used as an exclusive system, you just need to insert it with .exclusive_system(). Naturally, that comes with all the restrictions, main one being that it's inserted into the spots where exclusive systems go. It's not possible to insert such a coerced exclusive system into arbitrary points in the schedule for the same reasons as with regular exclusive systems.

Stageless will indeed let us make nicer APIs for this: specifying hard sync points without the additional boilerplate of stages is what stageless is all about, in the end. I'm working on it, but it'll need more thinking time; multiple issues seemingly need to be solved at the same time and I would like to avoid doing that with a ground-up rewrite.

None of that is relevant to the issue at hand, though. Commands are intended to enable parallel systems to perform generally non-parallelizable operations while remaining parallel; making some of the systems exclusive defeats the purpose. Calling systems issuing disjoint commands "partially exclusive" is confusing.

@alice-i-cecile
Copy link
Member Author

Here's a really clear use case for this functionality, with no great workarounds.

@AlphaModder
Copy link

I've got another usecase for this functionality, indirectly. I've been discussing how to do rendering with @mtsr and a few others on discord recently, and we're looking at an architecture that classes systems into one of two kinds: simulation and extraction.

Simulation systems, well, simulate the world, while extraction systems copy data into renderer data structures. The details of this process are not important to this issue, but what is important is that there is what I've been calling a 'soft sync point' between simulation and extraction. That is, any system marked extraction needs to run after all systems labeled simulation that touch the same data have finished their work. This is different from the hard sync point introduced by a stage, because it's not the case that every extraction system must run after all simulation systems, and in fact for performance reasons we want an extraction system to run as soon as possible once the data it needs is available.

In slightly more mathematical language, a hard sync point between stage A and stage B expresses the constraint forall a, b : System. ((stage(a) = A and stage(b) = B) implies a <= b) on the system ordering poset. But instead, we want the constraint forall a, b : System. ((label(a) = Simulation and label(b) = Extraction) implies not(a >= b)). This is not equivalent, since it permits Simulation and Extraction systems to be unordered with respect to each other. Only if system dependencies prevent parallelization (i.e. a writes to a resource b reads) does the constraint come into play.

If the only way for systems to touch the World was through queries, it would be trivial (I think) to implement this kind of constraint. However, the problem is commands. It is critical that if a simulation system runs commands that touch render data, those commands are processed by the time an extraction system reads that data. But right now, commands can only be processed during the hard sync point between two stages, which completely eliminates the possibility of running any extraction systems before all simulation systems have completed. So getting optimal performance out of this rendering model would require some kind of finer-grained command scheduling.

At the time I ran into this problem, I wasn't aware this issue existed, so I tried to come up with my own idea for command scheduling. Rather than abandoning commands entirely, or providing instant execution, my aim was to preserve most of the way commands work now, but to automatically schedule execution of commands per-system in a granular fashion, rather than at manually specified points. To be honest, I'm not convinced what I came up with is preferable to what @cart described above, and it's got a bit of a mad science flavor to it, but I was asked to add it to this issue on discord, so I'll try to sketch it out under the cut.

On-demand command scheduling explanation

The approach is based on two core ideas:

  • A system should not make changes to entities it did not either obtain from a query or create itself.
  • A system must declare in its signature what components it could possibly add or remove from entities it accesses.

From an API perspective, here's how I would implement this. First, create a borrowed type representing the ID of an entity a system may make changes to:

pub struct LiveEntity<'w>(Entity, PhantomData<&'w ()>); // I think this is the right variance...

impl<'w> From<LiveEntity<'w>> for Entity { 
	fn from(live: LiveEntity<'w>) -> Entity { live.0 }
}

Then, change <EntityFetch as Fetch<'w>>::Item from Entity to LiveEntity<'w>. Likewise, alter the signatures of Commands::current_entity and Commands::set_current_entity to use LiveEntity instead of Entity. For each possible operation a command can apply to an entity, we make the method on Commands require a LiveEntity and an instance of a Copy ZST marker that can be obtained as a SystemParam. This type should include the types of components, thus making the components a system can add or remove part of its signature. For example, for insert_one, it would look like this:

pub struct FetchInsert<T>(PhantomData<T>);

#[derive(Copy, Clone)]
pub struct Insert<'a, T>(PhantomData<&'a T>);

impl<'a, T> SystemParam for Insert<'a, T> {
    type Fetch = FetchInsert<'a, T>;
}

impl<'a, T> FetchSystemParam<'a> for FetchInsert<T> {
	type Item = Insert<'a, T>;
	fn init(system_state: &mut SystemState, world: &World, resources: &mut Resources) { }
	unsafe fn get_param(system_state: &'a SystemState, world: &'a World, resources: &'a Resources) -> Option<Self::Item> {
		Some(Insert(PhantomData))
	}
}

impl Commands {
    fn insert_one<T: Component>(&mut self, entity: LiveEntity, insert: Insert<T>, component: T) { ... } // relying on lifetime elision here, might not work
}

There would be an analogous interface for remove_one and friends, and spawning an entity with a component would probably just require the same marker type as an insert. despawn would need its own marker Despawn, I think. There is no way to support the Commands::add_command API in this approach, since it takes exclusive access to the world as a given.

With these APIs, I believe it should be possible to fully describe which archetypes a system could access with a struct like this:

struct SystemAccess {
    any: DNF<ComponentId>,
    none: DNF<ComponentId>,
    add: Vec<ComponentId>,
    remove: Vec<ComponentId>,
    can_despawn: bool,
}

Here, DNF<T> refers to a boolean expression in disjunctive normal form, with variables represented by the type T.

Conceptually, there is a simple function that checks if an archetype could contain entities live in a system:

fn is_archetype_live(system: &SystemAccess, archetype: &Archetype) -> bool {
    archetype.matches(system.any) && !archetype.matches(system.none)
}

And still conceptually, there is a more complicated function to determine if a system could generate commands touching a given archetype:

fn is_archetype_modified(system: &SystemAccess, archetype: &Archetype) -> bool {
    if archetype == Archetype::Empty {
        return false; // special case, since the empty archetype never actually needs to reallocate storage
    }
    let is_live = is_archetype_live(system, archetype);
    let can_move_out = is_live && (system.can_despawn || system.add.is_empty() || archetype.matches(system.remove));
    let can_move_in = match is_live { 
        true => archetype.matches(system.add),
        false => archetype.matches(system.any - system.remove) // This subtraction gives the DNF generated by removing any component ids in remove from each clause in any.
    }
    can_move_out || can_move_in
}

While I have chosen to express it in code, the curried function is_archetype_modified(system, _) is just a boolean expression with one variable for each component type. An archetype then corresponds to an assignment of these variables.

Thus, for two systems s1 and s2 with a soft sync point between them, we can construct the expression is_archetype_modified(s1, archetype) && is_archetype_live(s2, archetype) and use our favorite SAT solver to determine whether to flush s1's commands before running s2. SAT requires worst-case time exponential in the number of variables, but we only need to do this once, and the number of variables involved would only be the sum of the number of components mentioned by each system's SystemAccess, so it's possible that effiency would not be a big problem. Once it has been determined what systems require s1's commands to be flushed before running, we can insert a 'system' with the appropriate exclusive archetype access into the schedule, have it perform the flush, and make the dependent systems depend on it, rather than s1 directly. And thus, we have command execution with soft sync points.

As a disclaimer, I am actually not that familiar with the specific implementation details of the Bevy ECS, just the implementation of ECSs in general, so it's possible I have missed something critical in this proposal. Feel free to point out any problems.

@mtsr
Copy link
Contributor

mtsr commented Apr 1, 2021

SAT requires worst-case time exponential in the number of variables, but we only need to do this once, and the number of variables involved would only be the sum of the number of components mentioned by each system's SystemAccess, so it's possible that effiency would not be a big problem. Once it has been determined what systems require s1's commands to be flushed before running, we can insert a 'system' with the appropriate exclusive archetype access into the schedule, have it perform the flush, and make the dependent systems depend on it, rather than s1 directly. And thus, we have command execution with soft sync points.

If it's doable to achieve this without high constants in the implementation and the number of components isn't prohibitive, soft sync points could give significant performance benefits, since CPU occupancy should better than with one or more hard sync points. Combine this with pipelining and the wins are even bigger, for rendering, but even simulation of frame N+1 could start while frame N is still winding down.

It would be worthwhile exploring the actual cost of computing scheduling constraints. I think some publicized ECS based games have had rather high numbers of components types. Will something like this effictively put limits on that? Maybe there could be a way to do (partial) precomputation at build time? But what about dynamic systems and components? Could these include their own (partially) precomputed constraints that can be completed on plugging in?

Edit: Actually, having this seems like it would make pipelining much easier to implement.

@AlphaModder
Copy link

I think some publicized ECS based games have had rather high numbers of components types. Will something like this effictively put limits on that?

It's worth pointing out that although an archetype conceptually represents a variable assignment in the number of component types throughout the entire application, the proposition is_archetype_modified(s1, archetype) && is_archetype_live(s2, archetype) will only ever depend on types mentioned in one of the system's SystemAccess fields. So there should not be any particular limit on the total number of component types. I'd have to benchmark it, but since the vast majority of systems operate on only a few component types, even a brute force O(2^n) algorithm not be unacceptably slow.

@alice-i-cecile
Copy link
Member Author

alice-i-cecile commented Aug 7, 2021

In conversation with Cart, I proposed the following solution for instant component addition / removal, which seems promising:

  1. Make component addition / removal a method on Query, which restricts you to altering m
  2. Add a heightened-locking component wrapper, Modify, which allows you to add / remove components of that type. This gives our schedule the information it needs, without constantly incurring a locking cost.

The final API would look like:

fn toggle_system(query: Query<Modify<Toggled>, With<ToBeToggled>>){
  for toggle in query.iter_mut(){
     // Modify wrapper would work like Option<T> queries and return an Option
     match maybe_toggled {
      // No type is needed, since we have access to it from the `Modify` wrapper
       Some(_) => maybe_toggled.remove(),
       // We are only allowed to insert components with the `Toggled` type here
       None => maybe_toggled.insert(Toggled)
     }
  }
} 

The existing commands API would have to remain intact, since you may want to insert / remove components of arbitrary types.

You'd also want to extend this to both bundles (easy) and spawning / despawning (pretty hard).

@alice-i-cecile
Copy link
Member Author

More related work: bevyengine/rfcs#57, bevyengine/rfcs#30

@ItsDoot
Copy link
Contributor

ItsDoot commented Dec 14, 2023

Would this be mostly solved by #9822, which makes it easier to understand when commands are applied?

@alice-i-cecile
Copy link
Member Author

Yep, this plus manual apply_deferred will effectively solve the need here.

@james7132
Copy link
Member

With auto-insert sync points in #9822 and the followup PRs, this seems to be largely transparent to most users now, where the only concern being the performance impact of a sync point. Is this still desirable to have as an ECS feature?

@alice-i-cecile
Copy link
Member Author

Yep, I'm content with the state of things now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible C-Usability A targeted quality-of-life change that makes Bevy easier to use S-Needs-Design-Doc This issue or PR is particularly complex, and needs an approved design doc before it can be merged
Projects
None yet
Development

No branches or pull requests

8 participants