Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Centralize get_session() for backend QueryBuilders #3676

Closed

Conversation

CasperWA
Copy link
Contributor

Fixes #3645

Extend the current reset backend functionality to also be valid for the django backend.
Centralize common static methods in Backend and BackendQueryBuilder.

Extend the current reset backend functionality to also be valid for the
django backend.
Centralize common static methods.
@abc.abstractmethod
def reset_backend_environment(self):
@staticmethod
def reset_backend_environment():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be removed from the django specific implementation. This method is for actually resetting the entire database environment, i.e. what is done by _load_backend_environment. The fact that we use SqlAlchemy for the query builder even for the Django environment is an implementation detail of the query builder not the whole database backend as a whole. So this is also not the correct place to reset the SqlA session for the query builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method never existed for django, but was empty. But you're saying that having the method actually doing something for the django backend is wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And whether I reset the session for the QB or not shouldn't matter? If I want to reset the session, for any reason, I guess the place to do this should indeed be on the backend level, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is intended to reset the database environment for that specific backend. This is necessary if you want to switch profile in the same interpreter. When you load a profile, as soon as the database backend is needed the _load_backend_environment is called. In the case of Django this will populate the aiida.backends.djsite.settings module with the correct database connection parameters and call django.setup. If you switch profile with another database, you have to "undo" i.e. reset this and do it again for the new profile settings. For SqlAlchemy this loading and resetting is simply the setting and resetting of the session.

Currently, the switching of profiles is not yet supported, which is why reset_backend_environment is not yet implemented for Django, because I haven't looked into it yet on how to do it. However, that doesn't mean it should be used for resetting the SqlAlchemy session when the querybuilder wants to. This is an implementation detail of the query builder and so should be performed there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Thank you for the detailed explanation. I will find a work-around, specifically for the QB.

@CasperWA
Copy link
Contributor Author

Instead of resetting the backend, calling the function aiida.backends.sqlalchemy.reset_session() also works for both backends. Calling aiida.backends.sqlalchemy.recreate_after_fork() also works, but only for SQLA.
reset_session() creates a new engine and a new session.
recreate_after_fork() disposes the engine and starts a new session. Since this function does not check if the global ENGINE is set or not, this runs into an error for the django backend, where ENGINE may not always be set, apparently.

@CasperWA CasperWA changed the title Reset backend functionality for django Centralize get_session() for backend QueryBuilders Dec 16, 2019
@giovannipizzi
Copy link
Member

Just to be sure (I'm not sure it's clear): the sole purpose of recreate_after_fork() is to be called as a callback when the process is forked (e.g. by a demoniser - this was the case with celery), here:

https://github.com/aiidateam/aiida-core/blob/0644e7d9ff492fdb2c94d608c575c697daee6d32/aiida/backends/sqlalchemy/__init__.py#L78

I don't think it should be (re)used for other purposes.

By the way, I realise now that probably:

  1. We should dispose of the passed engine, and not the global one
  2. When switching backend, probably we should deregister the callback if we dispose of the engine?

I'm adding this comments also to #2813

My other comment is: should we really reuse logic from the sqlalchemy backend, importing it from the django code? Is this what we do elsewhere?
I think that, following the indication of @sphuber, probably it's better to have a specific reset function (anyway it should be of a few lines only) directly in the Django backend, that knows what is set and what is not in the case of Django (so e.g. it does not crash because ENGINE is not set etc.)?

@CasperWA
Copy link
Contributor Author

Just to be sure (I'm not sure it's clear): the sole purpose of recreate_after_fork() is to be called as a callback when the process is forked (e.g. by a demoniser - this was the case with celery), here:

https://github.com/aiidateam/aiida-core/blob/0644e7d9ff492fdb2c94d608c575c697daee6d32/aiida/backends/sqlalchemy/__init__.py#L78

I don't think it should be (re)used for other purposes.

That makes sense. I also could not understand why you passed an engine without using it.

By the way, I realise now that probably:

1. We should dispose of the passed engine, and not the global one

Yup.

2. When switching backend, probably we should deregister the callback if we dispose of the engine?

My other comment is: should we really reuse logic from the sqlalchemy backend, importing it from the django code? Is this what we do elsewhere?

For the QueryBuilder, yes. Since it is based on SQLA, this makes sense in that context. There is no django-specific thing that needs to be done, it is indeed only the QB-used session(s), and attachments to it that needs to be reset.
Since it is so deeply grained into the code base to use the global scoped session for everything SQLA, it is difficult to generally split up these concepts, I think.
I tried initially doing the "session-injection", but it proved difficult to handle all use-cases due to this exact problem.

I think that, following the indication of @sphuber, probably it's better to have a specific reset function (anyway it should be of a few lines only) directly in the Django backend, that knows what is set and what is not in the case of Django (so e.g. it does not crash because ENGINE is not set etc.)?

That's fine, but know that it will be the exact same as for the SQLA backend, since again, it is the SQLA that is used for the QB.

@CasperWA
Copy link
Contributor Author

I think that, following the indication of @sphuber, probably it's better to have a specific reset function (anyway it should be of a few lines only) directly in the Django backend, that knows what is set and what is not in the case of Django (so e.g. it does not crash because ENGINE is not set etc.)?

That's fine, but know that it will be the exact same as for the SQLA backend, since again, it is the SQLA that is used for the QB.

The reason why this still may make sense, is for conceptual reasons.
Handling QB-specific circumstances outside the realm of QB is not intuitive, hence, having a method or flag in the QB (or even the backend) to do the same thing as what is done when calling reset_session() would be more intuitive.

@sphuber
Copy link
Contributor

sphuber commented Dec 17, 2019

I think design wise we should take a step back and see what the various requirements are.

  1. We need an ORM to map our database schema's and provide with an engine to connect to said database. This is what our Backend layer does.
  2. On top of the ORM we have built the QueryBuilder. Just as an implementation detail this uses SqlAlchemy to map the query onto SQL for both database backends. For this purpose it needs an SqlAlchemy engine to connect to the database and perform the query. When using the SqlA backend, we can simply use the same engine we use normally, however, for Django we need to construct a special one. Here we should not use the SqlA infrastructure that is used for its generic database environment.

If the QueryBuilder needs an SqlA session independent of the backend and it is only used there, it should be close to the QueryBuilder. Potentially it should have an internal get_session method that is then implemented in the backend specific implementations. For SqlA it just uses the one it already has, the Django one will have to construct one, but I don't think it should just go and use the SqlA database backend code.

Now I remember that having the QueryBuilder be responsible of creating and handling the session was problematic. If that is true, it might be better to use dependency injection and insert it into the query builder instance. The caller will then manage the lifetime. If this is the better design we should simply have some code (independent of the normal database environment) to manage SqlA sessions with the sole purpose of injecting it into query builder instances.

We can discuss and work this further out in person if you like in case I have missed any crucial limitations

@CasperWA
Copy link
Contributor Author

@sphuber, you're hitting some issues we have already turned over several times, and things both @szoupanos, @giovannipizzi and I have tried to consider and test.

To me, it seems it indeed depends on a question of design. Having QB handle a session/having the user handle it, is not as easy as it seems, due to the intricacies of how AiiDA otherwise uses sessions.
The easiest way to meet all requirements, it seems to me, is to replicate the code needed for the django QB in the django backend. It will be similar (if not exactly equal to) the current code in the SQLA backend, but at least they will conceptually and "import"-wise be completely separated (at this point). It should be noted that this was not the case previously, where the django QB would indeed fetch some SQLA-backend code to get a session.

So in the end, I agree, let's sit down and go over how it may be done, meeting all of our various requirements.

@CasperWA CasperWA changed the title Centralize get_session() for backend QueryBuilders [WIP] Centralize get_session() for backend QueryBuilders Dec 24, 2019
@sphuber
Copy link
Contributor

sphuber commented Dec 24, 2019

@CasperWA just FYI I have a working branch that tackles this problem and others. I haven't opened a PR yet because I am not sure yet about certain things, but maybe I should already open it to start discussion?

@CasperWA
Copy link
Contributor Author

@CasperWA just FYI I have a working branch that tackles this problem and others. I haven't opened a PR yet because I am not sure yet about certain things, but maybe I should already open it to start discussion?

Would be good. It would already supersede this PR and may also supersede @szoupanos' #3655.

In any case, it would be good to have the diff online to be able to discuss.

@sphuber
Copy link
Contributor

sphuber commented Jan 7, 2020

Closing in favor of #3708

@sphuber sphuber closed this Jan 7, 2020
@CasperWA CasperWA deleted the fix_3645_reset_db_and_centralize branch January 10, 2020 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sessions not handled correctly in QB for asynchronous function calls
3 participants