Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files endpoint #360

Merged
merged 36 commits into from
Jun 2, 2022
Merged

Files endpoint #360

merged 36 commits into from
Jun 2, 2022

Conversation

merkys
Copy link
Member

@merkys merkys commented Jun 8, 2021

In #211 the need to provide links to files from entries has been discussed. @rartino proposed introducing files endpoint as the most universal way to deal with files, recycling JSON API relationships to provide links between files and other types of entries.

Differences from suggestions in #211:

  • url attribute is merely a string instead of JSON API links object, as I cannot think of a use case where the links object would be of use;
  • I did not formalize means to describe link attributes between other entries and files. This could be done later on in other PRs.

Closes #211. Addresses the original intent of #343 superseding it.

Edit: url redefined as JSON API links object.

@merkys merkys added the PR/ready-for-review Add this flag if you are the author of the PR and you want it to be reviewed. Remove it when editing label Jun 8, 2021
Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single comment for now. I'll do another review later with pedantic syntax corrections when we all start to agree that this should go in.

Great effort on this though - and I'm not at all against an explicit /files endpoint. It makes it more specific and clear where to place these kind of entries - instead of in a meta or as a links value... Although, the latter there might be a solution if a single file represents any one entry (e.g., a CIF file for a structure, or a single paper for a reference).

Finally, for completeness I'd like to keep the JSON API philosophy of keeping all links or rather URL-related values as a mix of a string, a Links object, and null. For the specific attribute url here, it's fine to not allow null, but it doesn't make sense to me to disallow the use of a Links object - other than just because. It might be that one wants to specify a meta value for the links that this links only works locally on a specific computer (which is very weird) or that the link points to an external resource. E.g., I could think of a case where a structure in an AiiDA DB comes from COD, so the file is kept and the url can then both point to the COD link and state as much.

To counter-argue my last point, one could say that this information could also be put directly into the files-entry's meta field (that's at the same level as id, type, attributes, etc.)

optimade.rst Outdated Show resolved Hide resolved
@merkys
Copy link
Member Author

merkys commented Jun 8, 2021

A single comment for now. I'll do another review later with pedantic syntax corrections when we all start to agree that this should go in.

Thanks!

Great effort on this though - and I'm not at all against an explicit /files endpoint. It makes it more specific and clear where to place these kind of entries - instead of in a meta or as a links value... Although, the latter there might be a solution if a single file represents any one entry (e.g., a CIF file for a structure, or a single paper for a reference).

I agree that single file could be linked more easier, but I prefer more homogeneous approach.

Finally, for completeness I'd like to keep the JSON API philosophy of keeping all links or rather URL-related values as a mix of a string, a Links object, and null. For the specific attribute url here, it's fine to not allow null, but it doesn't make sense to me to disallow the use of a Links object - other than just because. It might be that one wants to specify a meta value for the links that this links only works locally on a specific computer (which is very weird) or that the link points to an external resource. E.g., I could think of a case where a structure in an AiiDA DB comes from COD, so the file is kept and the url can then both point to the COD link and state as much.

To counter-argue my last point, one could say that this information could also be put directly into the files-entry's meta field (that's at the same level as id, type, attributes, etc.)

My motivation for describing url as string is both the lack of a use case for URL-less file and the wish to promote stuff from meta to attributes. But I understand the wish to be homogeneous here too (BTW, references.url is also defined as a string to remain close to BibTeX).

@ml-evs
Copy link
Member

ml-evs commented Jun 8, 2021

Looks like the CI was initially failing here due to the Fastly outage (which impacted PyPI), everything seems okay now

rartino
rartino previously requested changes Jul 7, 2021
Copy link
Contributor

@rartino rartino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, except for some URL adjustments and a suggested addition.

Also a question: what is it in the present specification that says which entry types/endpoints are optional to implement? Based on what in the specification can I presently decide not to implement the /files endpoint?

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@ml-evs ml-evs modified the milestone: v1.2 Jul 7, 2021
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @merkys, we should definitely add this functionality to OPTIMADE somehow.

I have two general comments (with more specific comments below):

  1. It looks like the aim isn't really to allow querying on a files endpoint, so perhaps we could simplify this by just adding the fields described here into the meta of a "special" kind of JSON:API link entry? That way, instead of needing to define a new entry type (with corresponding endpoints), files can be served as JSON:API links (per structure entry), rather than the relationships/included route.
  2. If we decide to go ahead with this approach (rather than my point above), does that make a /files endpoint mandatory? I would rather not maintain a database of files but instead serve everything on-the-fly (I think this would be a common use case for people with computational data). In that case, my /files endpoint might return no data, but a particular file id /files/structure/1.cif might have a response (where the file ID is structure/odbx/1.cif)...

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@merkys
Copy link
Member Author

merkys commented Sep 22, 2021

I have two general comments (with more specific comments below):

  1. It looks like the aim isn't really to allow querying on a files endpoint, so perhaps we could simplify this by just adding the fields described here into the meta of a "special" kind of JSON:API link entry? That way, instead of needing to define a new entry type (with corresponding endpoints), files can be served as JSON:API links (per structure entry), rather than the relationships/included route.

I think we have discussed this approach in #211 (it was my initial suggestion), and the consensus there was that /files endpoint would be more homogeneous with the rest of the specification.

  1. If we decide to go ahead with this approach (rather than my point above), does that make a /files endpoint mandatory? I would rather not maintain a database of files but instead serve everything on-the-fly (I think this would be a common use case for people with computational data). In that case, my /files endpoint might return no data, but a particular file id /files/structure/1.cif might have a response (where the file ID is structure/odbx/1.cif)...

I see your point in not serving the /files endpoint (entry listing might just be pointless). However, they seem mandatory as per the specification:

Entry listing endpoints return a list of resource objects representing entries of a specific type.

"Entry listing" endpoint response dictionaries MUST have a :field:data key.
The value of this key MUST be a list containing dictionaries that represent individual entries.

On the other hand, if served on-the-fly, entry listing could be easily populated with just minimal data:

{
    "id": "1.cif",
    "type": "files",
    "attributes": {
        "url": "/files/structure/1.cif"
    }
}

@sauliusg
Copy link
Contributor

Just a thought – it is usually very useful when files in any file system have as a subset attributes mandated by POSIX; in particular atime, mtime, ctime. Should we maybe include them?

@merkys
Copy link
Member Author

merkys commented Sep 27, 2021

Just a thought – it is usually very useful when files in any file system have as a subset attributes mandated by POSIX; in particular atime, mtime, ctime. Should we maybe include them?

I can in principle introduce provisions for optional atime, mtime, ctime fields.

@merkys
Copy link
Member Author

merkys commented Nov 22, 2021

@CasperWA @ml-evs @rartino @sauliusg Are there any blockers left for this PR? Please let me know if I did not address any of your comments.

I think standardizing of the way we link in files is of relatively high priority. For example, we at the COD very much would want to provide CIF files for the COD entries, since CIF files are "the ultimate source of truth" for the crystal data at the COD.

@ml-evs
Copy link
Member

ml-evs commented Jan 24, 2022

A comment arising from a discussion I just had with @CasperWA: how do we see this endpoint interacting with the proposed #364 endpoint? In my mind, this endpoint already fulfills many of the criteria we discussed at the workshop for /archives, we might just need a way to codify that a particular files entry points to an OPTIMADE archive type, and a way to add a relationship between the resource and the database. Maybe this could be done via a relationships field in the info endpoint of the database.

Do we feel that this is overloading/missing the point of the current files endpoint design?

@merkys
Copy link
Member Author

merkys commented Jan 24, 2022

@ml-evs, @CasperWA: I gave #364 a glance, and to me /files and /archives entry types seem to be largely overlapping. However, in /archives, file tree hierarchy seems to be important, whereas in /files it is not mandated, as the organization of actual files is left for the provider to decide on. Maybe this could indeed be represented using relationships.

Additionally, /archives has means to specify licenses. I see communication of licenses as an issue on its own (I opened #102 quite some time ago), and would be happy to see license becoming a property supported by all entry types.

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@JPBergsma
Copy link
Contributor

During the discussions about the trajectories, a point was raised that it could be a good idea to include data about the "role" of a file.
A file could for example be an input structure, an output structure, or a file with settings used to do the calculation.
We thought it would therefore be good to somehow indicate this "role" a file has. This does not necessarily have to be done at the files endpoint. We may also be able to do this in the relationships field.

I was wondering whether you prefer to discuss this as part of the Files endpoint PR or you would prefer if I open a seperate Issue for this.

@merkys
Copy link
Member Author

merkys commented Jun 1, 2022

@JPBergsma I see the "role" as a relationship attribute. Thus I think "role" belongs in meta. If there is a need to standardize on that, I would suggest opening a separate PR (or amend #24 or #34) after current PR is merged.

Copy link
Contributor

@rartino rartino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR/ready-for-review Add this flag if you are the author of the PR and you want it to be reviewed. Remove it when editing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Links to files
6 participants