Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SqlAlchemy session handling for QueryBuilder #3708

Merged
merged 3 commits into from
Jan 31, 2020

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Jan 7, 2020

Fixes #3645 and fixes #3372

@CasperWA
Copy link
Contributor

CasperWA commented Jan 19, 2020

Concerning using this in a REST-API server, i.e., aiida-optimade, what do you suggest to do?
I have tried to retrieve the session via get_manager().get_backend().get_session() before handling a request and then calling session.close() at the end, but this still prompts a QueuePool overflow for both backends.

Edit:
Calling get_manager().get_backend_manager().reset_backend_environment() after every response handle seems to work.
However, this meant I also discovered a wrong import in your code for that function in the django backend.
I am still not sure this is the correct approach though.

@@ -32,6 +32,8 @@ def _load_backend_environment(self):

def reset_backend_environment(self):
"""Reset the backend environment."""
from aiida.orm.implementation.django.querybuilder import reset_session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from aiida.orm.implementation.django.querybuilder import reset_session
from . import reset_session

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still an issue?

Copy link
Contributor

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look's great. I do have some change requests though (see comments).

@@ -62,7 +62,7 @@ def reset_backend_environment(self):
from aiida.backends import sqlalchemy
if sqlalchemy.ENGINE is not None:
sqlalchemy.ENGINE.dispose()
sqlalchemy.SCOPED_SESSION_CLASS = None
sqlalchemy.SESSION_FACTORY = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this method simply call aiida.backends.sqlalchemy.reset_session()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure does :)

@@ -32,6 +32,8 @@ def _load_backend_environment(self):

def reset_backend_environment(self):
"""Reset the backend environment."""
from aiida.orm.implementation.django.querybuilder import reset_session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still an issue?

@@ -9,6 +9,12 @@
###########################################################################
"""Backend query implementation classes"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note here that the general QueryBuilder here is based on the assumption that all queries are performed with SQLAlchemy, no matter the actual backend? I.e., just copy some stuff in here from your commit messages.

def get_session():
"""Return a database session that can be used by the `QueryBuilder` to perform its query.

If there is an exception within the context then the changes will be rolled back and the state will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be under the description of the transaction method?

Since the `QueryBuilder` implementation uses SqlAlchemy to map the query
onto the models in order to generate the SQL to be sent to the database,
it requires a session, which is an :class:`sqlalchemy.orm.session.Session`
instance. The only purpose is for SqlAlchemy to be able to connect to the
database perform the query and retrieve the results. Even the Django
backend implementation will use SqlAlchemy for its `QueryBuilder` and so
also needs an SqlA session. It is important that we do not reuse the
scoped session factory in the SqlAlchemy implementation, because that
runs the risk of cross-talk once profiles can be switched dynamically in
a single python interpreter.

Therefore the Django implementation of the `QueryBuilder` should keep its
own SqlAlchemy engine and scoped session factory instances that are used
to provide the query builder with a session.
The scoped session is an instance of SqlAlchemy's `Session` class that
is used by the query builder to connect to the database, for both the
SqlAlchemy database backend as well as for Django. Both database
backends need to maintain their own scoped session factory which can be
called to get a session instance.

Certain applications need access to the session. For example,
applications that run AiiDA in a threaded way, such as a REST API server
need to manually close the session after the query has finished because
this is not done automatically when the thread ends. The associated
database connection remains open causing an eventual timeout when a new
request comes in. The method `Backend.get_session` provides an official
API to access the global scoped session instance which can then be
closed.

Additionally, a lot of code that was duplicated across the two
implementations of the `QueryBuilder` for the two database backends has
been moved to the abstract `BackendQueryBuilder`. Normally this code
does indeed belong in the implementations but since the current
implementation for both backends is based on SqlAlchemy they are both
nearly identical. When in the future a new backend is implemented that
does not use SqlAlchemy the current code can be factored out to a
specific `SqlAlchemyQueryBuilder` that can be used for both database
backends.
Presumably, the `multiprocessing.recreate_after_fork` hook on the
session creation of the SqlAlchemy was added because of the design of
the old daemon of the v0.* series. There the daemon was implemented
using `celery` which would spawn workers, by forking the main process.
Since the definition of forking process means "cloning a process with
the exact same state", the forked process would have the same session
which is undesirable. This is why the session has to be reset such that
an independent one has to be initialized.

With the new engine design, there is no forking anymore by the daemon
and it is also not supported. Instead, each daemon worker is launched as
a completely new process and so will not start with an existing session.

Finally, the `multiprocessing.recreate_after_fork` is not even
documented and according to `https://bugs.python.org/issue21372` this is
for a reason, because it is not intended for external use. Combined with
the fact that its cross-platform consistent operation is pulled into
question, it is better to not use it.
@sphuber sphuber force-pushed the fix_3645_backend_session_handling branch from 9bc6f85 to 3c304e9 Compare January 31, 2020 10:06
Copy link
Contributor

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers @sphuber!
Note for later: We should add good documentation about threat handling with AiiDA SQLAlchemy sessions / use of QueryBuilder. This is especially important for REST API frameworks and servers, where multi-threading is commonplace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants