Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a guard to prevent interleaving of async calls #3291

Closed
divega opened this issue Oct 1, 2015 · 20 comments
Closed

Add a guard to prevent interleaving of async calls #3291

divega opened this issue Oct 1, 2015 · 20 comments
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@divega
Copy link
Contributor

divega commented Oct 1, 2015

We did this in EF6 to prevent multiple threads from using the same context concurrently, and to guide users to the right async patterns.

See #3240.

@mikary
Copy link
Contributor

mikary commented Dec 7, 2015

The EF6 implementation of guards around async calls appears to focus on wrapping top level API calls in semaphore checks via memory barrier (using Interlocked exchanges to increment and decrement a ThrowingMonitor instance attached to the ObjectContext).

EF6 updates the semaphore (and throws if already set) on the following operations:

  • RefreshEntitiesAsync
  • SaveChangesAsync
  • ExecuteStoreCommandAsync
  • ExecuteStoreQueryAsync
  • GetResultAsync (Query)

There is also a method on the ThrowingMonitor that checks the semaphore value without modification. This method is used by sync code paths, presumably to catch interleaving sync and async calls with a minimal performance impact. It is also used for multiple overloads for the same method that call a shared private/protected version to throw earlier with a smaller stack trace.

This throwing but not mutating method is called in the following locations:

  • RefreshAsync
  • RefreshEntities
  • SaveChanges
  • SaveChangesAsync
  • ExecuteFunction
  • ExecuteStoreCommand
  • ExecuteStoreCommandAsync
  • ExecuteStoreQuery
  • ExecuteStoreQueryAsync
  • GetEnumerator (Query)
  • GetEnumeratorAsync (Query)
  • GetResult (Query)
  • GetResultAsync (Query)
  • ExecuteSqlQuery
  • ExecuteSqlQueryAsync
  • Find
  • FindAsync

Note: The GetResultAsync semaphore encapsulates query compilation and execution, but not enumeration due to the method returning an ObjectResult<T>

@mikary
Copy link
Contributor

mikary commented Dec 7, 2015

Based on discussions with the team so far, this enhancement has the following requirements / considerations:

  • The guard should never throw when the context is being used properly
  • The async entry points for customer code should all be covered
  • There are no special cases (i.e. AsNoTracking queries are not thread safe)
  • The performance impact should be kept to a minimum

We can get the same effect as the EF6 ObjectContext from a scoped service with a similar implementation to ThrowingMonitor.

There are fewer entry points to the current EF7 implementation

  • SaveChanges (Async)
  • EnsureCreated (Async)
  • EnsureDeleted (Async)
  • GetEnumerator - on InternalDbSet
  • AsyncEnumerable - on InternalDbSet
  • ExecuteSqlCommand (Async)

Notes on Query:

  • In EF6, query creates a DbDataReader and passes it into the ObjectResult via the Shaper<T>, which means the Task returned by GetResultAsync has already executed the query, but the results will not be read until the ObjectResult returns the IEnumerator<T> and this enumerator is enumerated.
  • In EF7, query creates an AsyncQueryingEnumerable that does not execute the query until the first MoveNext call.
  • If we add semaphores around the query entry point (EntityQueryProvider or QueryCompiler) we will only cover query compilation and construction of the IAsyncEnumerable without catching command execution.

The previous implementation relied on placing guards around the entry points to the API. While effective, this does not give any deep / common coverage. This kind of coverage could be added in the consolidated RelationalCommand API to ensure a single DbContext instance is never executing more than one command at a time. This guard would have to introduce a second semaphore variable because its execution would generally be subsumed by a higher level operation (i.e. query)

  • It has been suggested that such a guard could be implemented without using memory barriers, to avoid a performance impact if we are willing to accept a lower level of reliability for concurrency detection
  • The anticipated performance impact of introducing a memory barrier for each command may also be negligible enough that it is worth including to increase chances of detection

@mikary
Copy link
Contributor

mikary commented Dec 7, 2015

Could I have you take a look at my description of the considerations for query and relational commands for this enhancement to see if there are any important factors that I have overlooked?
@AndriySvyryd @anpete

@mikary
Copy link
Contributor

mikary commented Dec 7, 2015

I may have missed some entry points in the DbContext where we could add guarding calls (i.e. Add / Attach). Would we like to add guarding calls to these (or other) areas in the API?
@ajcvickers

@ajcvickers
Copy link
Member

@mikary Personally I think this should probably be scoped only to entry points that should be awaited.

Also, when you say, "It has been suggested that such a guard could be implemented without using memory barriers, to avoid a performance impact if we are willing to accept a lower level of reliability for concurrency detection" how is this going to work without the potential for throwing when the code is correct? What is to stop the second thread getting a stale value if no memory barriers are used?

@mikary
Copy link
Contributor

mikary commented Dec 8, 2015

@ajcvickers As long as the calling code is correct, there shouldn't be a second thread running at the same time. It shouldn't be any different from a standard sync or async code path that increments and decrements a counter.
The likelihood of properly throwing in the negative use case is pretty questionable at best. I listed the option for completeness, but I think it makes the most sense to either implement with proper memory barriers or to leave these checks out all together.

@ajcvickers
Copy link
Member

@mikary Thread 1 runs, increments counter, decrements counter. Thread 2 runs looks at counter but doesn't see the decremented value because no memory barrier; throws. Code is correct. Throws anyway.

@divega
Copy link
Contributor Author

divega commented Dec 8, 2015

Unless I am missing something, I agree with @ajcvickers is right that doing this in unsafe way could lead to false positives, and obviously we don't want that.

I am interested in hearing more details about this:

Personally I think this should probably be scoped only to entry points that should be awaited.

Initially I was of the same opinion but now I am finding it harder to defend it. I get that a guard at the database access level would not catch any concurrent access of the sync entry points that don't access the database it should catch most other misuses. Is it that the cost on the happy path would be too high?

@mikary
Copy link
Contributor

mikary commented Dec 8, 2015

@ajcvickers I don't see how that is a correct use case, we're talking about a counter that is scoped to a single DbContext instance. If more than one thread is operating on the context the code is not correct.

@ajcvickers
Copy link
Member

@mikary If more than one thread is operating concurrently on the context the code is not correct. One thread making a call and that call ending, followed by another thread making a call on the same instance is valid.

@divega It's a balance between the value of the check and the overhead (in terms of pure perf, possible contention, and implementation/maintenance cost). Async invites, even encourages, people to do multi-threading without even thinking about it. Catching this has high value. Also, async by its nature in our code means an I/O operation, and async is slow, so the perf/contention factor is less likely to be a problem. On the other extreme, to @mikary's question, If we think that the value in trying to catch general multi-threaded use is high enough, then we should try to test what the overhead is of the check for very fast, non I/O operations. But this investigation is also a cost we have to pay. I don't think it is worth it.

@mikary
Copy link
Contributor

mikary commented Dec 8, 2015

@divega If the concern @ajcvickers raised about reference counting and threading is an issue for this case, we have other areas in the code (i.e. RelationalConnection) where the same problem would also be an issue. Do we want to open an issue / take an action item to review it?

@AndriySvyryd
Copy link
Member

@ajcvickers Wouldn't there need to be some memory barrier between the calls on different threads in that case anyway? Otherwise how are they synchronized not to overlap?

@mikary
Copy link
Contributor

mikary commented Dec 9, 2015

After further discussion, we are looking at putting the guards around the main entry points to the API (SaveChanges, ADO commands like ExecuteSqlCommand, and Query). These guards should be in Core if possible, but deep enough in the stack to cover as many code paths as we can.

SaveChanges
Potential locations:

  • DbContext - Guarding service taken from ServiceProvider, likely in LazyRef
  • StateManager - Constructor injection
  • Database - Constructor injection, requires converting abstract methods to concrete implementation

Recommendation:

  • StateManager - lowest level concrete implementation in Core, no impact on public API

ExecuteSqlCommand
Potential/Recommended location

  • RelationalDatabaseFacadeExtensions - only viable option

Query
There are two stages of query execution that need to be covered independently due to the current factoring

  • Query parameterization and compilation
  • Query execution (command execution and enumeration)

Potential locations (compilation):

  • InternalDbSet - Guarding service taken from ServiceProvider, likely in LazyRef
  • EntityQueryable - Adding service would be messy
  • EntityQueryProvider - Adding service would be messy
  • QueryCompiler - Constructor injection
  • Database - Constructor injection, does not cover parameterization

Recommendation:

  • QueryCompiler - lowest level that includes parameterization, simple implementation, no public API changes

Potential locations (execution and enumeration):

  • EntityQueryModelVisitor - Built into executor lambda, guarding service added to QueryContext
  • QueryingEnumerable / AsyncQueryingEnumerable - guard in place around each MoveNext call, guarding service added to QueryContext, introduces memory barrier for each retrieved row.

Recommendation:

  • EntityQueryModelVisitor - Covers all of query execution, single guard per query enumeration

@mikary
Copy link
Contributor

mikary commented Dec 9, 2015

Additional thoughts on Query execution and enumeration

  • EntityQueryModelVisitor - The lambda execution does not cover the actual command execution and enumeration, a potential solution would be to wrap the IAsyncEnumerable from the Expression and perform the guard operations in the wrapper.
  • QueryingEnumerable / AsyncQueryingEnumerable - This solution would only cover Relational

@anpete
Copy link
Contributor

anpete commented Dec 9, 2015

@mikary As a starting point, put it anywhere we do diagnostics exception interception:

untitled

This gets you all queries and SaveChanges.

@rowanmiller rowanmiller modified the milestones: 7.0.0-rc2, 7.0.0 Dec 9, 2015
@rowanmiller rowanmiller changed the title Consider adding a guard to prevent interleaving of async calls Add a guard to prevent interleaving of async calls Dec 9, 2015
@divega
Copy link
Contributor Author

divega commented Dec 9, 2015

It seems to me we shouldn't need to put a guard around query parameterization and compilation because chances are we can detect misuse later during query execution. But perhaps I am missing something?

@anpete
Copy link
Contributor

anpete commented Dec 9, 2015

Singleton queries get executed immediately.

@divega
Copy link
Contributor Author

divega commented Dec 9, 2015

Singleton queries get executed immediately.

You mean that is what I was missing? I see this as another query execution path that we need to put guards around (i.e. it is not only about results enumeration) but still not a reason to put a guard around query compilation and parameterization.

@anpete
Copy link
Contributor

anpete commented Dec 9, 2015

Yes, and those paths are in QueryCompiler.

@divega
Copy link
Contributor Author

divega commented Dec 9, 2015

I just wanted to make clear that putting a guard around query compilation and parameterization is not a goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

No branches or pull requests

6 participants