-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create restapi endpoint for counting full_types #4277
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #4277 +/- ##
===========================================
+ Coverage 79.41% 79.48% +0.08%
===========================================
Files 482 482
Lines 35287 35326 +39
===========================================
+ Hits 28020 28076 +56
+ Misses 7267 7250 -17
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
For reference: I just tried this with the 2D materials database and the results seem to indicate that the time of the The numbers remained mostly the same for django or sqlalchemy backends. I'll see if I can try some bigger database. |
abedd62
to
36d65e8
Compare
36d65e8
to
d6f6f8d
Compare
Ok, I was checking on the status of this and noticed the work in progress (WIP) tag set by @CasperWA . I'm not sure exactly what would be interpreted as a WIP but just in case let me clarify the current situation:
|
@ramirezfranciscof Thank you Francisco for clarifying this. |
I've tested the new endpoint on the AiiDA databases used in Materials Cloud: it's very fast for small databases, and for the biggest ones (until ~1.4 million nodes) it takes around 3 seconds. @giovannipizzi which level of specification do we want to show in the statistics pie chart? Is it ok to have the namespaces (or labels) of the leaves, i.e. namespaces that do not have subspaces? Does it make sense to include this information in the |
Thanks! I think it's fast enough. @asle85 - I don't know, I think we need to check together a couple of examples to decide. If this PR anyway gives enough info to create any piechart, I suggest to move on and merge it, and we can discuss what to show in the pie chart in an issue on the issue tracker of one of the Materials Cloud repositories. |
d6f6f8d
to
8db49d5
Compare
8db49d5
to
6a03ce3
Compare
Ok, good; I just need now someone to review the code and I'll merge. I requested to @sphuber , but he may be busy with other stuff, so feel free to request to another person or even do it yourself if you have the time @giovannipizzi . |
@flavianojs I cannot add you as reviewer not sure why (maybe you need to be part of the aiidateam and you are not?), but yeah, as we told you feel free to take a look and comment. |
I'll look through this tomorrow @ramirezfranciscof, pester me if you don't hear anything lol |
Hey @chrisjsewell : friendly reminder if you can check this. |
Yep it generally looks fine to me 👍 "Ideally", I'd say re-write the test file as pytest (using the fixtures), then just use pytest-regressions to validate the whole JSON blob, since its all deterministic. But that's a lot of hassle! But, at the very least, check that the top-level counter is 9 data = {
'counter':
9,
'full_type':
'node.%|',
'label':
'node',
'namespace':
'node',
'path':
'node',
'subspaces': [{
'counter':
6,
'full_type':
'data.%|',
'label':
'Data',
'namespace':
'data',
'path':
'node.data',
'subspaces': [{
'counter':
1,
'full_type':
'data.array.%|',
'label':
'array',
'namespace':
'array',
'path':
'node.data.array',
'subspaces': [{
'counter': 1,
'full_type': 'data.array.kpoints.KpointsData.|',
'label': 'KpointsData',
'namespace': 'kpoints',
'path': 'node.data.array.kpoints',
'subspaces': []
}]
}, {
'counter': 1,
'full_type': 'data.cif.CifData.|',
'label': 'CifData',
'namespace': 'cif',
'path': 'node.data.cif',
'subspaces': []
}, {
'counter': 2,
'full_type': 'data.dict.Dict.|',
'label': 'Dict',
'namespace': 'dict',
'path': 'node.data.dict',
'subspaces': []
}, {
'counter': 1,
'full_type': 'data.folder.FolderData.|',
'label': 'FolderData',
'namespace': 'folder',
'path': 'node.data.folder',
'subspaces': []
}, {
'counter': 1,
'full_type': 'data.structure.StructureData.|',
'label': 'StructureData',
'namespace': 'structure',
'path': 'node.data.structure',
'subspaces': []
}]
}, {
'counter':
3,
'full_type':
'process.%|%',
'label':
'Process',
'namespace':
'process',
'path':
'node.process',
'subspaces': [{
'counter':
3,
'full_type':
'process.calculation.%|%',
'label':
'Calculation',
'namespace':
'calculation',
'path':
'node.process.calculation',
'subspaces': [{
'counter': 1,
'full_type': 'process.calculation.calcfunction.CalcFunctionNode.|',
'label': 'CalcFunctionNode',
'namespace': 'calcfunction',
'path': 'node.process.calculation.calcfunction',
'subspaces': []
}, {
'counter': 2,
'full_type': 'process.calculation.calcjob.CalcJobNode.|',
'label': 'CalcJobNode',
'namespace': 'calcjob',
'path': 'node.process.calculation.calcjob',
'subspaces': []
}]
}]
}]
} |
aiida/restapi/common/identifiers.py
Outdated
if process_type == '': | ||
builder.append(orm.Data, filters={'node_type': {'==': node_type}}) | ||
else: | ||
builder.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should also really be a test for this scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment
912a8d0
to
35e1487
Compare
@ramirezfranciscof Since we are adding a new endpoint, it makes sense to release a new minor version of the REST API, i.e change the version to 4.1.0 in the config. In this way we can detect from the version number if the endpoint will be there or not. |
da9dcc1
to
cb78300
Compare
c81604d
to
0899e0e
Compare
Thanks for the rebase @ramirezfranciscof that helps a lot. I will start going through it now. First question that I have now straight away: can you give me examples of the process nodes that have an empty process type string and those that have |
There should be calcfunctions with EDIT: I think it might be this one for the |
aiida/restapi/common/identifiers.py
Outdated
filters['process_type'] = {'==': process_type} | ||
else: | ||
filters['process_type'] = {'or': [{'==': ''}, {'==': None}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are going to account for process types that are both an empty string or None
, then we might as well get rid of the conditional entirely and just always set filters['process_type'] = {'==': process_type}
. This single line will have exactly the same effect as the current four lines. We should add a comment that currently (probably due to a bug in the migrations) the process type can be empty or None
at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't think it would have the same effect because currently if process_type
is either ''
or None
, it will search for both ''
and None
. Your change would make None
just search for None
and ''
just search for ''
.
The reason I grouped these together is that this function takes the full_type
string (that looks like node.type.Descriptor|process.type.descriptor
) and splits it into the node_type
and the process_type
. It is possible to pass an empty process_type
in this way (node.type.Descriptor|
) but it is not possible to do so with a null process type (as there is no simple way to distinguish between ' None'
string and None
in node.type.Descriptor|None
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you are saying that the full types will look like node.type|None
both when process_type
is ''
as well as None
? In that case, you are right that what I proposed is not the same. I am still worried about all these edge cases that really shouldn't exist and we should figure out how and why they came about. At the very least, you should add a comment here to explain that your additional clause is because there (erroneously) exist process nodes with a process_type equal to empty string or null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you are saying that the full types will look like
node.type|None
both when process_type is''
as well asNone
?
Nono, it will show node.type|
both when it is ''
and when it is None
. I mentioned the node.type|None
example to illustrate that it is not easy to separate the None
case from the ''
case because you need to encode this into a string (the full_type
) and therefore you would then also need to distinguish between the null None
and the string 'None'
.
I am still worried about all these edge cases that really shouldn't exist and we should figure out how and why they came about. At the very least, you should add a comment here to explain that your additional clause is because there (erroneously) exist process nodes with a process_type equal to empty string or null
Ok, I can add that, and we could open an issue. I mean, it might be the case that this was a bug in some migrations that already got fixed, but nevertheless these nodes will exist in databases so I think it is not bad to be able to catch these in the most securely way possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ramirezfranciscof . I have some suggestions and questions. I already reviewed the first commit in a different tab, but those don't show up in this review. I hope they won't get lost and show up...
841b511
to
dc77945
Compare
Hey, just want to say I have some corrections that should go into these files as well. I have implemented them in the #4337 PR, but I haven't had time to update it yet, so just highlighting some things that may be interesting to implement here: Ah - and I see now I am missing uploading the stuff that touches on generating the node and process types. Since it's on my home desktop computer and I'm currently in the office, I'll have to push that tonight, if I remember. It just simplifies the creation of the Edit: If this information is completely out-of-scope of this PR, then please disregard it 😅 |
dc77945
to
334a100
Compare
@CasperWA haha, well yes it is in scope, this is one of the changes I implemented here. Please do try to push the changes asap and ping me when you do; we were trying to get this PR merge for tomorrow's release. Do you remember what changes you made related to this that you could share now? |
I believe I simplified this function to a single f-string: aiida-core/aiida/restapi/common/identifiers.py Lines 64 to 77 in bd197f3
Remove the try-except here: aiida-core/aiida/restapi/translator/nodes/node.py Lines 566 to 569 in bd197f3
Using instead the .get method.
The latter change here relies on what Maybe you've already fixed this (didn't check, sorry)? But that's at least what I can remember for now :) I'll have to push what I have locally tonight otherwise. |
@CasperWA Ok, thanks, I see you've updated your branch. I think we are both dealing with the problem of atipical @sphuber @chrisjsewell ready for a new pass. |
@ramirezfranciscof I was more thinking of moving some of the changes over to this PR, not if the changes in each of the PRs are compatible with each other. But I'll leave it up to you and just rebase whenever, if it comes to that 👍 |
@CasperWA Yeah, we can also do that. I can check with @sphuber if he thinks its better to do that: this one has to go today, so it will depend on how much work would be to incorporate both together. For starters, we might need to rebase your branch over the first commit of this one (title "Further considerations for @chrisjsewell would you be able to give this a final pass before our meeting at 15? |
I think you misunderstand me. My PR should not be incorporated into this PR. But I implemented some minor fixes in my PR that may be relevant to get into this one. But it seems it will be too difficult, so let's keep it separate and I will update my PR accordingly. |
334a100
to
ca8f0e2
Compare
The `process_type` attribute has changed over the years; currently it must have some descriptor for processes and be None for data types. Apparently this has not only been the case, and thus old databases may have both data and process nodes with either empty strings ('') and/or None entries in their `process_type` attributes. Additionally, there were some problems with how the unregistered entry points were considered that made it impossible to query for them. In order to consider all of this when filtering and doing statistics, it has been decided to: 1) Group all instances of a given node_type that have either '' or None as their process_type in the same `full_type` (`node_type|` ) and hence always query for both when the `process_type` is missing. 2) Remove the `aiida.descriptor:` and the `no-entry-point` from the `process_type` part of unregistered processes. This was interfeering when the `full_type` was given to return the filtering options to query for these processes. Tests were adapted to test this new compatibility aspects.
This feature returns a namespace tree of the available node types in the database (data node_types + process process_types) with the addition of a count at each leaf / branch. It also has the option of doing so for a single user, if the pk is provided as an option.
ca8f0e2
to
b29e955
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @ramirezfranciscof
Fixes #4247
Fixes #4541
Work in progress: currently only the end leaves are counted, and not the container branches.
aiida/restapi/api.py
andaiida/restapi/common/utils.py
only add the new endpoint to relevant lists, andaiida/restapi/resources.py
redirects the request to the new endpoint to an internal function.get_namespace
) resides inaiida/restapi/translator/nodes/node.py
, and is the same as the one requested for thefull_types
endpoint, except it has been adapted with an optional keyword to count the nodes. This in turn is passed toget_node_namespace
inside ofaiida/restapi/common/identifiers.py
.aiida/restapi/common/identifiers.py
, where the namespace class is defined. The initialization was adapted to include a counter, which receives anNone
value unless the option to count nodes was selected when callingget_node_namespace
. This value gets returned when theget_description
method is called (when requesting the endpoint).