Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: NodeRepositoryMixin -> NodeRepository #5439

Closed
wants to merge 3 commits into from

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Mar 14, 2022

This PR forms part of addressing #4976 (comment) and related to https://en.wikipedia.org/wiki/Composition_over_inheritance,
and reduces the number of attributes/methods on aiida.orm.Node from 83 to 68.

Essentially, it deprecates the following methods (whilst still allowing them to be used):

Node.copy_tree
Node.delete_object
Node.get_object
Node.get_object_content
Node.glob
Node.list_objects
Node.list_object_names
Node.open
Node.put_object_from_filelike
Node.put_object_from_file
Node.put_object_from_tree
Node.walk
Node.repository_metadata  # -> Node.repo.metadata 

and instead places them under the Node.ctx.repository. "namespace", e.g. Node.ctx.repository.copy_tree.

Not only does this reduce the number of methods, it also makes it a lot clearer to users, the purpose of these methods.

All deprecated methods, can still be used as before, via "re-routing" in Node.__getattr__.
So as not to suddenly hit users with many deprecation warnings,
the aiida.common.warnings.warn_deprecation function has been implemented,
which only "activates" warnings when the environmental variable AIIDA_WARN_v3 is set to true or 1.
This implementation was inspired by https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-to-2-0-step-two-turn-on-removedin20warnings

Deprecation warnings show as e.g.

 tests/test_dataclasses.py::TestCifData::test_reload_cifdata
  /home/runner/work/aiida-core/aiida-core/tests/test_dataclasses.py:155: AiidaDeprecationWarning: `CifData.list_object_names` is deprecated, use `CifData.repo.list_object_names` instead. (this will be removed in v3)

Comment on lines -208 to -270
@override
def put_object_from_filelike(self, handle: io.BufferedReader, path: str):
"""Store the byte contents of a file in the repository.

:param handle: filelike object with the byte content to be stored.
:param path: the relative path where to store the object in the repository.
:raises TypeError: if the path is not a string and relative path.
:raises aiida.common.exceptions.ModificationNotAllowed: when the node is sealed and therefore immutable.
"""
self.check_mutability()

if isinstance(handle, io.StringIO):
handle = io.BytesIO(handle.read().encode('utf-8'))

if isinstance(handle, tempfile._TemporaryFileWrapper): # pylint: disable=protected-access
if 'b' in handle.file.mode:
handle = io.BytesIO(handle.read())
else:
handle = io.BytesIO(handle.read().encode('utf-8'))

self._repository.put_object_from_filelike(handle, path)
self._update_repository_metadata()

@override
def put_object_from_file(self, filepath: str, path: str):
"""Store a new object under `path` with contents of the file located at `filepath` on the local file system.

:param filepath: absolute path of file whose contents to copy to the repository
:param path: the relative path where to store the object in the repository.
:raises TypeError: if the path is not a string and relative path, or the handle is not a byte stream.
:raises aiida.common.exceptions.ModificationNotAllowed: when the node is sealed and therefore immutable.
"""
self.check_mutability()
self._repository.put_object_from_file(filepath, path)
self._update_repository_metadata()

@override
def put_object_from_tree(self, filepath: str, path: str = None):
"""Store the entire contents of `filepath` on the local file system in the repository with under given `path`.

:param filepath: absolute path of the directory whose contents to copy to the repository.
:param path: the relative path where to store the objects in the repository.
:raises TypeError: if the path is not a string and relative path.
:raises aiida.common.exceptions.ModificationNotAllowed: when the node is sealed and therefore immutable.
"""
self.check_mutability()
self._repository.put_object_from_tree(filepath, path)
self._update_repository_metadata()

@override
def delete_object(self, path: str):
"""Delete the object from the repository.

:param key: fully qualified identifier for the object within the repository.
:raises TypeError: if the path is not a string and relative path.
:raises FileNotFoundError: if the file does not exist.
:raises IsADirectoryError: if the object is a directory and not a file.
:raises OSError: if the file could not be deleted.
:raises aiida.common.exceptions.ModificationNotAllowed: when the node is sealed and therefore immutable.
"""
self.check_mutability()
self._repository.delete_object(path)
self._update_repository_metadata()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am missing something, none of these methods, actually changed anything from the ones they are overriding!?
Perhaps the check_mutability was meant to also check for whether the node is sealed, but that is certainly not the case at present?

@chrisjsewell
Copy link
Member Author

Note, I also need to update the documentation, but will wait until the changes are approved in principle to do that

def warn_deprecation(message: str, version: int, stacklevel=2) -> None:
"""Warns about a deprecation for a future aiida-core version.

Warnings are activated if the `AIIDA_WARN_v{major}` environment variable is set to `True`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it work like this though? Why do we need an env variable to tell the code which deprecation warnings to print instead of just printing all of them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plugin developers and users can choose when they want to switch them on, usually when they get round to refactoring their code.
This is exactly how other libraries do their deprecations: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-to-2-0-step-two-turn-on-removedin20warnings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You "turn-on" the deprecations when running your test suite, then apply all the fixes, until there are no more warnings, job done.
This is exactly how I updated aiida-core to the sqlalchemy v2 API, and how I updated all the code in these PRs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we don't want to be flooding users with loads of deprecation warnings for v3, as soon as they update to v2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, I see. I agree that it is good to give them the option to silence them, but should they not be "on" by default? I would bet a lot of people may have AiiDA scripts that need update and are not necessarily being tested in this CI automated way. If we don't notify them by default they run the risk of only noticing this once the deprecated features are dropped and then they get errors instead of warnings.

On the other hand, why do we need to distinguish between versions for this (AIIDA_WARN_v{major})? Why not have just AIIDA_DEPRECATION_WARNINGS? At any given time there will only be one set of deprecations, why do we need the selectivity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they are only just upgrading to v2, and likely won't want to be hit with lots of deprecation warnings also for v3. These can be turned on by default, in a later v2 release, closer to a v3 release.

@sphuber
Copy link
Contributor

sphuber commented Mar 16, 2022

I am generally in favor of the idea of this PR and those that follow up in the same vein, however, there are a few questions and for simplicity will collect them here.

  • I agree with @ramirezfranciscof that we should be careful with hiding deprecation warnings. I understand that they can be annoying but they are there for a reason. If we opt for a solution where they are hidden by default, then we should still communicate this very clearly and (as already suggested) at a later stage in the v2.x release cycle make them show by default. I think the user base of AiiDA is not necessarily comparable to that of sqlalchemy and I agree with @ramirezfranciscof that those users won't be used to enabling deprecation warning manually through defining an environment variable.
  • If we decide to adopt the new deprecation warning system, I would propose we split that off from this PR and deal with that separately first and make sure it is used consistently across the codebase, ensuring that existing deprecations also adopt this new mechanism.
  • Regarding the name Node.attrs: it is indeed unfortunate that attributes already exist. I think it would be really valuable to be able to repurpose that and we should try if that is possible by trying to see how often attributes is used currently in existing plugins and by user code.
  • Same problem with Node.extras. I really don't like Node.xtras though. We should really think hard if there isn't an alternative.
  • Regarding the NodeRepository: on the one hand good to centralize these related methods under a single namespace. But I am wondering if we should tie this to the name "repo". In recent months we have been discussing how we can liberate some of the terminology from the implementation details of the current default storage backend. The fact that attributes are in the database and files are in a repository on disk is in a sense an implementation detail. For a user the more useful distinction would be "that what is queryable" and that "which isn't". Are there other ways of making this more clear perhaps also in the interface?
  • As a side note (I don't think any of your PRs actually do or propose this): I really don't think we should "hide" methods from users by prefixing them with underscores. If those methods should be callable from outside the class, they should be public. If they are not really intended for end-users, then so be it. The other approach is what was done in the very early stages of AiiDA and we have been moving away from it for, what I think, good reasons.

I think we should discuss these points, and any other that may be relevant, with the team and then come to a decision. Then we can move on with the implementation.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 16, 2022

at a later stage in the v2.x release cycle make them show by default.

Exactly this was the point; it's going to be discouraging for users, updating from v1, if the have to work out too many changes at the same time.

For a user the more useful distinction would be "that what is queryable" and that "which isn't".

Two things to note here:

  1. I would say a ~parallel distinction is just JSONable data vs binary data, that's basically the difference.
  2. "Users" should never really be using attrs or repo; they are for plugin developers to implement data plugins with, then users should use the specific methods on a data class, e.g. using Int.value rather than Int.attrs.get('value'). (Obviously there is overlap though, when thinking about end users vs plugin developers)

@chrisjsewell
Copy link
Member Author

Note, Node.objects would maybe have been an ideal replacement for Node.repo, but again that is also already in use, bizarrely for NodeCollection, which makes no sense 🤷😒

@sphuber
Copy link
Contributor

sphuber commented Mar 17, 2022

Exactly this was the point; it's going to be discouraging for users, updating from v1, if the have to work out too many changes at the same time.

On the other hand, these changes would be relatively easy for aiida-upgrade to automate, so using that the effort should be minimal.

I would say a ~parallel distinction is just JSONable data vs binary data, that's basically the difference.

Yes, kind of, but the JSONable data could also be written to the file repository and then it is still not queryable. So if you maintain that distinction, it is not automatically clear what can be queried for through the QueryBuillder.

"Users" should never really be using attrs or repo; they are for plugin developers to implement data plugins with, then users should use the specific methods on a data class, e.g. using Int.value rather than Int.attrs.get('value'). (Obviously there is overlap though, when thinking about end users vs plugin developers)

I am not sure about this. Sure maybe for custom Data plugins, the developers of that would provide properties directly on the class to retrieve important attributes, but the various process nodes in aiida-core, CalcJobNode for example, store plenty of important data in their attributes and so very often users interact directly with the attribute methods to inspect and retrieve them. I don't think this is necessarily a bad thing either, especially as long you also still have to define attributes.some_key in the QueryBuilder to address them.

@chrisjsewell
Copy link
Member Author

especially as long you also still have to define attributes.some_key in the QueryBuilder to address them.

Well you won't after #5088

@chrisjsewell
Copy link
Member Author

On the other hand, these changes would be relatively easy for aiida-upgrade to automate

Well not so much. aiida-upgrade (and basically any refactor tool) uses static analysis to find what to change, it can't understand dynamic variables.
e.g. if you have:

whatever = load_node(1)
x = whatever.attributes["a"]

it would need to change this, currently, to:

whatever = load_node(1)
x = whatever.attrs.all["a"]

But it has no idea that whatever is a Node. So, I think, the only way to do this, would be to "blanket" convert every attributes attribute it finds to attrs.all.

I am generally in favor of the idea of this PR and those that follow up in the same vein

That's good though 😄
Just to clarify on my previous comments: I don't really "care" that much about the deprecation strategy and naming, as much as the general design change, that's the key here; I don't want to get bogged down in bikeshedding 😬

@chrisjsewell
Copy link
Member Author

@sphuber I've updated this PR, to comply with #5465, and split it into the additional commits

@sphuber
Copy link
Contributor

sphuber commented Apr 6, 2022

superseded by #5472

@sphuber sphuber closed this Apr 6, 2022
@chrisjsewell chrisjsewell deleted the orm-node-repo branch April 7, 2022 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants