♻️ REFACTOR: `NodeRepositoryMixin` -> `NodeRepository` #5439

chrisjsewell · 2022-03-14T02:24:37Z

This PR forms part of addressing #4976 (comment) and related to https://en.wikipedia.org/wiki/Composition_over_inheritance,
and reduces the number of attributes/methods on aiida.orm.Node from 83 to 68.

Essentially, it deprecates the following methods (whilst still allowing them to be used):

Node.copy_tree
Node.delete_object
Node.get_object
Node.get_object_content
Node.glob
Node.list_objects
Node.list_object_names
Node.open
Node.put_object_from_filelike
Node.put_object_from_file
Node.put_object_from_tree
Node.walk
Node.repository_metadata  # -> Node.repo.metadata

and instead places them under the Node.ctx.repository. "namespace", e.g. Node.ctx.repository.copy_tree.

Not only does this reduce the number of methods, it also makes it a lot clearer to users, the purpose of these methods.

All deprecated methods, can still be used as before, via "re-routing" in Node.__getattr__.
So as not to suddenly hit users with many deprecation warnings,
the aiida.common.warnings.warn_deprecation function has been implemented,
which only "activates" warnings when the environmental variable AIIDA_WARN_v3 is set to true or 1.
This implementation was inspired by https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-to-2-0-step-two-turn-on-removedin20warnings

Deprecation warnings show as e.g.

 tests/test_dataclasses.py::TestCifData::test_reload_cifdata
  /home/runner/work/aiida-core/aiida-core/tests/test_dataclasses.py:155: AiidaDeprecationWarning: `CifData.list_object_names` is deprecated, use `CifData.repo.list_object_names` instead. (this will be removed in v3)

chrisjsewell · 2022-03-14T02:32:20Z

aiida/orm/utils/mixins.py

-    @override
-    def put_object_from_filelike(self, handle: io.BufferedReader, path: str):
-        """Store the byte contents of a file in the repository.
-
-        :param handle: filelike object with the byte content to be stored.
-        :param path: the relative path where to store the object in the repository.
-        :raises TypeError: if the path is not a string and relative path.
-        :raises aiida.common.exceptions.ModificationNotAllowed: when the node is sealed and therefore immutable.
-        """
-        self.check_mutability()
-
-        if isinstance(handle, io.StringIO):
-            handle = io.BytesIO(handle.read().encode('utf-8'))
-
-        if isinstance(handle, tempfile._TemporaryFileWrapper):  # pylint: disable=protected-access
-            if 'b' in handle.file.mode:
-                handle = io.BytesIO(handle.read())
-            else:
-                handle = io.BytesIO(handle.read().encode('utf-8'))
-
-        self._repository.put_object_from_filelike(handle, path)
-        self._update_repository_metadata()
-
-    @override
-    def put_object_from_file(self, filepath: str, path: str):
-        """Store a new object under `path` with contents of the file located at `filepath` on the local file system.
-
-        :param filepath: absolute path of file whose contents to copy to the repository
-        :param path: the relative path where to store the object in the repository.
-        :raises TypeError: if the path is not a string and relative path, or the handle is not a byte stream.
-        :raises aiida.common.exceptions.ModificationNotAllowed: when the node is sealed and therefore immutable.
-        """
-        self.check_mutability()
-        self._repository.put_object_from_file(filepath, path)
-        self._update_repository_metadata()
-
-    @override
-    def put_object_from_tree(self, filepath: str, path: str = None):
-        """Store the entire contents of `filepath` on the local file system in the repository with under given `path`.
-
-        :param filepath: absolute path of the directory whose contents to copy to the repository.
-        :param path: the relative path where to store the objects in the repository.
-        :raises TypeError: if the path is not a string and relative path.
-        :raises aiida.common.exceptions.ModificationNotAllowed: when the node is sealed and therefore immutable.
-        """
-        self.check_mutability()
-        self._repository.put_object_from_tree(filepath, path)
-        self._update_repository_metadata()
-
-    @override
-    def delete_object(self, path: str):
-        """Delete the object from the repository.
-
-        :param key: fully qualified identifier for the object within the repository.
-        :raises TypeError: if the path is not a string and relative path.
-        :raises FileNotFoundError: if the file does not exist.
-        :raises IsADirectoryError: if the object is a directory and not a file.
-        :raises OSError: if the file could not be deleted.
-        :raises aiida.common.exceptions.ModificationNotAllowed: when the node is sealed and therefore immutable.
-        """
-        self.check_mutability()
-        self._repository.delete_object(path)
-        self._update_repository_metadata()


Unless I am missing something, none of these methods, actually changed anything from the ones they are overriding!?
Perhaps the check_mutability was meant to also check for whether the node is sealed, but that is certainly not the case at present?

chrisjsewell · 2022-03-14T04:10:37Z

Note, I also need to update the documentation, but will wait until the changes are approved in principle to do that

ramirezfranciscof · 2022-03-15T11:14:17Z

aiida/common/warnings.py

+def warn_deprecation(message: str, version: int, stacklevel=2) -> None:
+    """Warns about a deprecation for a future aiida-core version.
+
+    Warnings are activated if the `AIIDA_WARN_v{major}` environment variable is set to `True`.


Why does it work like this though? Why do we need an env variable to tell the code which deprecation warnings to print instead of just printing all of them?

So the plugin developers and users can choose when they want to switch them on, usually when they get round to refactoring their code.
This is exactly how other libraries do their deprecations: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-to-2-0-step-two-turn-on-removedin20warnings

You "turn-on" the deprecations when running your test suite, then apply all the fixes, until there are no more warnings, job done.
This is exactly how I updated aiida-core to the sqlalchemy v2 API, and how I updated all the code in these PRs

Basically, we don't want to be flooding users with loads of deprecation warnings for v3, as soon as they update to v2

Mmm, I see. I agree that it is good to give them the option to silence them, but should they not be "on" by default? I would bet a lot of people may have AiiDA scripts that need update and are not necessarily being tested in this CI automated way. If we don't notify them by default they run the risk of only noticing this once the deprecated features are dropped and then they get errors instead of warnings.

On the other hand, why do we need to distinguish between versions for this (AIIDA_WARN_v{major})? Why not have just AIIDA_DEPRECATION_WARNINGS? At any given time there will only be one set of deprecations, why do we need the selectivity?

Because they are only just upgrading to v2, and likely won't want to be hit with lots of deprecation warnings also for v3. These can be turned on by default, in a later v2 release, closer to a v3 release.

sphuber · 2022-03-16T21:59:29Z

I am generally in favor of the idea of this PR and those that follow up in the same vein, however, there are a few questions and for simplicity will collect them here.

I agree with @ramirezfranciscof that we should be careful with hiding deprecation warnings. I understand that they can be annoying but they are there for a reason. If we opt for a solution where they are hidden by default, then we should still communicate this very clearly and (as already suggested) at a later stage in the v2.x release cycle make them show by default. I think the user base of AiiDA is not necessarily comparable to that of sqlalchemy and I agree with @ramirezfranciscof that those users won't be used to enabling deprecation warning manually through defining an environment variable.
If we decide to adopt the new deprecation warning system, I would propose we split that off from this PR and deal with that separately first and make sure it is used consistently across the codebase, ensuring that existing deprecations also adopt this new mechanism.
Regarding the name Node.attrs: it is indeed unfortunate that attributes already exist. I think it would be really valuable to be able to repurpose that and we should try if that is possible by trying to see how often attributes is used currently in existing plugins and by user code.
Same problem with Node.extras. I really don't like Node.xtras though. We should really think hard if there isn't an alternative.
Regarding the NodeRepository: on the one hand good to centralize these related methods under a single namespace. But I am wondering if we should tie this to the name "repo". In recent months we have been discussing how we can liberate some of the terminology from the implementation details of the current default storage backend. The fact that attributes are in the database and files are in a repository on disk is in a sense an implementation detail. For a user the more useful distinction would be "that what is queryable" and that "which isn't". Are there other ways of making this more clear perhaps also in the interface?
As a side note (I don't think any of your PRs actually do or propose this): I really don't think we should "hide" methods from users by prefixing them with underscores. If those methods should be callable from outside the class, they should be public. If they are not really intended for end-users, then so be it. The other approach is what was done in the very early stages of AiiDA and we have been moving away from it for, what I think, good reasons.

I think we should discuss these points, and any other that may be relevant, with the team and then come to a decision. Then we can move on with the implementation.

chrisjsewell · 2022-03-16T23:16:40Z

at a later stage in the v2.x release cycle make them show by default.

Exactly this was the point; it's going to be discouraging for users, updating from v1, if the have to work out too many changes at the same time.

For a user the more useful distinction would be "that what is queryable" and that "which isn't".

Two things to note here:

I would say a ~parallel distinction is just JSONable data vs binary data, that's basically the difference.
"Users" should never really be using attrs or repo; they are for plugin developers to implement data plugins with, then users should use the specific methods on a data class, e.g. using Int.value rather than Int.attrs.get('value'). (Obviously there is overlap though, when thinking about end users vs plugin developers)

chrisjsewell · 2022-03-16T23:22:06Z

Note, Node.objects would maybe have been an ideal replacement for Node.repo, but again that is also already in use, bizarrely for NodeCollection, which makes no sense 🤷😒

sphuber · 2022-03-17T06:54:43Z

Exactly this was the point; it's going to be discouraging for users, updating from v1, if the have to work out too many changes at the same time.

On the other hand, these changes would be relatively easy for aiida-upgrade to automate, so using that the effort should be minimal.

I would say a ~parallel distinction is just JSONable data vs binary data, that's basically the difference.

Yes, kind of, but the JSONable data could also be written to the file repository and then it is still not queryable. So if you maintain that distinction, it is not automatically clear what can be queried for through the QueryBuillder.

"Users" should never really be using attrs or repo; they are for plugin developers to implement data plugins with, then users should use the specific methods on a data class, e.g. using Int.value rather than Int.attrs.get('value'). (Obviously there is overlap though, when thinking about end users vs plugin developers)

I am not sure about this. Sure maybe for custom Data plugins, the developers of that would provide properties directly on the class to retrieve important attributes, but the various process nodes in aiida-core, CalcJobNode for example, store plenty of important data in their attributes and so very often users interact directly with the attribute methods to inspect and retrieve them. I don't think this is necessarily a bad thing either, especially as long you also still have to define attributes.some_key in the QueryBuilder to address them.

chrisjsewell · 2022-03-17T07:17:53Z

especially as long you also still have to define attributes.some_key in the QueryBuilder to address them.

Well you won't after #5088

chrisjsewell · 2022-03-17T14:37:27Z

On the other hand, these changes would be relatively easy for aiida-upgrade to automate

Well not so much. aiida-upgrade (and basically any refactor tool) uses static analysis to find what to change, it can't understand dynamic variables.
e.g. if you have:

whatever = load_node(1)
x = whatever.attributes["a"]

it would need to change this, currently, to:

whatever = load_node(1)
x = whatever.attrs.all["a"]

But it has no idea that whatever is a Node. So, I think, the only way to do this, would be to "blanket" convert every attributes attribute it finds to attrs.all.

I am generally in favor of the idea of this PR and those that follow up in the same vein

That's good though 😄
Just to clarify on my previous comments: I don't really "care" that much about the deprecation strategy and naming, as much as the general design change, that's the key here; I don't want to get bogged down in bikeshedding 😬

chrisjsewell · 2022-04-05T13:56:58Z

@sphuber I've updated this PR, to comply with #5465, and split it into the additional commits

This is a pre-cursor to moving methods off of the `Node`

sphuber · 2022-04-06T10:07:43Z

superseded by #5472

chrisjsewell requested review from sphuber and mbercx March 14, 2022 02:24

chrisjsewell mentioned this pull request Mar 14, 2022

👌 IMPROVE: Hide dev/non-public API methods #4976

Closed

chrisjsewell commented Mar 14, 2022

View reviewed changes

chrisjsewell mentioned this pull request Mar 14, 2022

♻️ REFACTOR: EntityAttributesMixin -> NodeAttributes #5442

Merged

chrisjsewell force-pushed the orm-node-repo branch 3 times, most recently from a6a257f to fc666e3 Compare March 14, 2022 22:14

chrisjsewell mentioned this pull request Mar 15, 2022

♻️ REFACTOR: EntityExtrasMixin -> EntityExtras #5445

Merged

ramirezfranciscof reviewed Mar 15, 2022

View reviewed changes

chrisjsewell mentioned this pull request Mar 15, 2022

Refactor AbstractRepositoryBackend with transaction/commit protocol #5451

Open

chrisjsewell force-pushed the orm-node-repo branch from fc666e3 to f6e8c0a Compare March 21, 2022 23:29

chrisjsewell added this to the v2.0.0 milestone Mar 22, 2022

chrisjsewell force-pushed the orm-node-repo branch from f6e8c0a to b9efc45 Compare April 5, 2022 13:55

chrisjsewell added 3 commits April 5, 2022 15:57

🔧 MAINTAIN: Add warn_deprecation function

38cfc51

🔧MAINTAIN: Add Node.ctx

079b3bf

This is a pre-cursor to moving methods off of the `Node`

♻️ REFACTOR: NodeRepositoryMixin -> NodeRepository

84058dd

chrisjsewell force-pushed the orm-node-repo branch from b9efc45 to 84058dd Compare April 5, 2022 13:57

sphuber closed this Apr 6, 2022

chrisjsewell deleted the orm-node-repo branch April 7, 2022 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

♻️ REFACTOR: `NodeRepositoryMixin` -> `NodeRepository` #5439

♻️ REFACTOR: `NodeRepositoryMixin` -> `NodeRepository` #5439

chrisjsewell commented Mar 14, 2022 •

edited

Loading

chrisjsewell Mar 14, 2022

chrisjsewell commented Mar 14, 2022

ramirezfranciscof Mar 15, 2022

chrisjsewell Mar 15, 2022

chrisjsewell Mar 15, 2022

chrisjsewell Mar 15, 2022

ramirezfranciscof Mar 16, 2022

chrisjsewell Mar 16, 2022

sphuber commented Mar 16, 2022

chrisjsewell commented Mar 16, 2022 •

edited

Loading

chrisjsewell commented Mar 16, 2022

sphuber commented Mar 17, 2022

chrisjsewell commented Mar 17, 2022

chrisjsewell commented Mar 17, 2022

chrisjsewell commented Apr 5, 2022

sphuber commented Apr 6, 2022

♻️ REFACTOR: NodeRepositoryMixin -> NodeRepository #5439

♻️ REFACTOR: NodeRepositoryMixin -> NodeRepository #5439

Conversation

chrisjsewell commented Mar 14, 2022 • edited Loading

chrisjsewell Mar 14, 2022

Choose a reason for hiding this comment

chrisjsewell commented Mar 14, 2022

ramirezfranciscof Mar 15, 2022

Choose a reason for hiding this comment

chrisjsewell Mar 15, 2022

Choose a reason for hiding this comment

chrisjsewell Mar 15, 2022

Choose a reason for hiding this comment

chrisjsewell Mar 15, 2022

Choose a reason for hiding this comment

ramirezfranciscof Mar 16, 2022

Choose a reason for hiding this comment

chrisjsewell Mar 16, 2022

Choose a reason for hiding this comment

sphuber commented Mar 16, 2022

chrisjsewell commented Mar 16, 2022 • edited Loading

chrisjsewell commented Mar 16, 2022

sphuber commented Mar 17, 2022

chrisjsewell commented Mar 17, 2022

chrisjsewell commented Mar 17, 2022

chrisjsewell commented Apr 5, 2022

sphuber commented Apr 6, 2022

♻️ REFACTOR: `NodeRepositoryMixin` -> `NodeRepository` #5439

♻️ REFACTOR: `NodeRepositoryMixin` -> `NodeRepository` #5439

chrisjsewell commented Mar 14, 2022 •

edited

Loading

chrisjsewell commented Mar 16, 2022 •

edited

Loading