-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SqlAlchemy session handling for QueryBuilder
#3708
Improve SqlAlchemy session handling for QueryBuilder
#3708
Conversation
Concerning using this in a REST-API server, i.e., Edit: |
aiida/backends/djsite/manager.py
Outdated
@@ -32,6 +32,8 @@ def _load_backend_environment(self): | |||
|
|||
def reset_backend_environment(self): | |||
"""Reset the backend environment.""" | |||
from aiida.orm.implementation.django.querybuilder import reset_session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from aiida.orm.implementation.django.querybuilder import reset_session | |
from . import reset_session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still an issue?
a8b6026
to
84044e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look's great. I do have some change requests though (see comments).
aiida/backends/sqlalchemy/manager.py
Outdated
@@ -62,7 +62,7 @@ def reset_backend_environment(self): | |||
from aiida.backends import sqlalchemy | |||
if sqlalchemy.ENGINE is not None: | |||
sqlalchemy.ENGINE.dispose() | |||
sqlalchemy.SCOPED_SESSION_CLASS = None | |||
sqlalchemy.SESSION_FACTORY = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this method simply call aiida.backends.sqlalchemy.reset_session()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure does :)
aiida/backends/djsite/manager.py
Outdated
@@ -32,6 +32,8 @@ def _load_backend_environment(self): | |||
|
|||
def reset_backend_environment(self): | |||
"""Reset the backend environment.""" | |||
from aiida.orm.implementation.django.querybuilder import reset_session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still an issue?
@@ -9,6 +9,12 @@ | |||
########################################################################### | |||
"""Backend query implementation classes""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note here that the general QueryBuilder here is based on the assumption that all queries are performed with SQLAlchemy, no matter the actual backend? I.e., just copy some stuff in here from your commit messages.
def get_session(): | ||
"""Return a database session that can be used by the `QueryBuilder` to perform its query. | ||
|
||
If there is an exception within the context then the changes will be rolled back and the state will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be under the description of the transaction
method?
Since the `QueryBuilder` implementation uses SqlAlchemy to map the query onto the models in order to generate the SQL to be sent to the database, it requires a session, which is an :class:`sqlalchemy.orm.session.Session` instance. The only purpose is for SqlAlchemy to be able to connect to the database perform the query and retrieve the results. Even the Django backend implementation will use SqlAlchemy for its `QueryBuilder` and so also needs an SqlA session. It is important that we do not reuse the scoped session factory in the SqlAlchemy implementation, because that runs the risk of cross-talk once profiles can be switched dynamically in a single python interpreter. Therefore the Django implementation of the `QueryBuilder` should keep its own SqlAlchemy engine and scoped session factory instances that are used to provide the query builder with a session.
The scoped session is an instance of SqlAlchemy's `Session` class that is used by the query builder to connect to the database, for both the SqlAlchemy database backend as well as for Django. Both database backends need to maintain their own scoped session factory which can be called to get a session instance. Certain applications need access to the session. For example, applications that run AiiDA in a threaded way, such as a REST API server need to manually close the session after the query has finished because this is not done automatically when the thread ends. The associated database connection remains open causing an eventual timeout when a new request comes in. The method `Backend.get_session` provides an official API to access the global scoped session instance which can then be closed. Additionally, a lot of code that was duplicated across the two implementations of the `QueryBuilder` for the two database backends has been moved to the abstract `BackendQueryBuilder`. Normally this code does indeed belong in the implementations but since the current implementation for both backends is based on SqlAlchemy they are both nearly identical. When in the future a new backend is implemented that does not use SqlAlchemy the current code can be factored out to a specific `SqlAlchemyQueryBuilder` that can be used for both database backends.
Presumably, the `multiprocessing.recreate_after_fork` hook on the session creation of the SqlAlchemy was added because of the design of the old daemon of the v0.* series. There the daemon was implemented using `celery` which would spawn workers, by forking the main process. Since the definition of forking process means "cloning a process with the exact same state", the forked process would have the same session which is undesirable. This is why the session has to be reset such that an independent one has to be initialized. With the new engine design, there is no forking anymore by the daemon and it is also not supported. Instead, each daemon worker is launched as a completely new process and so will not start with an existing session. Finally, the `multiprocessing.recreate_after_fork` is not even documented and according to `https://bugs.python.org/issue21372` this is for a reason, because it is not intended for external use. Combined with the fact that its cross-platform consistent operation is pulled into question, it is better to not use it.
9bc6f85
to
3c304e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers @sphuber!
Note for later: We should add good documentation about threat handling with AiiDA SQLAlchemy sessions / use of QueryBuilder. This is especially important for REST API frameworks and servers, where multi-threading is commonplace.
Fixes #3645 and fixes #3372