Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor API endpoints for simpler architecture/design/implementation to maintain/extend #131

Closed
al-niessner opened this issue May 4, 2022 · 12 comments
Assignees
Labels

Comments

@al-niessner
Copy link
Contributor

💪 Motivation

Adding new grouping endpoints (like /bundles, /collections, and /products) like /observations requires adding 3 new endpoints and then to add the referencing endpoints are more endpoints (see #12 for more details on the math). Changes #11 #12 and #128 all require many endpoint alteration and additions. Changing the architecture/design to reduce the effort for these types of changes is the goal.

📖 Additional Details

⚖️ Acceptance Criteria

To add the new grouping observation, should not require any new endpoints.

⚙️ Engineering Details

@al-niessner al-niessner added enhancement New feature or request needs:triage requirement the current issue is a requirement sprint-backlog labels May 4, 2022
@jordanpadams jordanpadams added B13.0 and removed enhancement New feature or request labels May 5, 2022
@jordanpadams jordanpadams changed the title As a maintainer I want a simpler architecture/design/implementation to maintain/extend Refactor API endpoints for simpler architecture/design/implementation to maintain/extend May 9, 2022
@jordanpadams jordanpadams added enhancement New feature or request p.should-have and removed needs:triage requirement the current issue is a requirement labels May 9, 2022
@jordanpadams jordanpadams removed their assignment May 9, 2022
@al-niessner
Copy link
Contributor Author

@jordanpadams @tloubrieu-jpl

Question: Can one always tell a bundle from a collection from a product (not bundle or collection) by how many ':' separators it has? For instance is the naming always urn:nasa:pds:bundlename::ver while collections urn:nasa:pds:bundlename:collectionname::ver and products urn:nasa:pds:bundlename:collectionname:productname::ver?

Are there every any other prefixes.? Are there any instance where there does not hold?

@tloubrieu-jpl
Copy link
Member

Hi @al-niessner, That sounds like a question for @jordanpadams , but regarding the API/Registry design, I think we said in the past that we don't want to get any logic from the content of the ID because it is not the right way to design an ID in the first place.

@al-niessner
Copy link
Contributor Author

@tloubrieu-jpl

I agree that abusing the lidvid in to determine product class is not very robust or elegant. I do want to know if I can use to determine or infer hierarchy. For instance if the number of ':' separators is equal does that mean they are always siblings/cousins?

The problem I having trouble resolving is one that you posed already in #12 that I am going reiterate here in slightly different words:

Mission 800 lb Gorilla demands that they be able to access all non bundle and collection items with the endpoint /800LbGorilla. The enum in swagger supports this. The problem that is eluding me and causing me to waffle in the new design (it is still MVC etc but using interfaces to encapsulate functionality tighter) is how to do that. Do I change the interface to have an interface with must, must not, filter (and, and not, or)? Do I make a preset map that is <String, List> instead of <String,String>? Saying must not is less likely to need work in the future. Widening the map gets much harder after <String, List> and probably just means moving to must, must not, filter interface sooner rather than later.

I was trying to see if I could reliably hold the dam back with lidvid interpretation instead of moving to must, must not, filter interface.

How often are new product classes added? Has it been stable for 20 years despite numerous gorillas demanding change?

@al-niessner
Copy link
Contributor Author

@jordanpadams @tloubrieu-jpl

Can I get list of all known product_class values please?

@jordanpadams
Copy link
Member

@jordanpadams
Copy link
Member

@al-niessner

Question: Can one always tell a bundle from a collection from a product (not bundle or collection) by how many ':' separators it has? For instance is the naming always urn:nasa:pds:bundlename::ver while collections urn:nasa:pds:bundlename:collectionname::ver and products urn:nasa:pds:bundlename:collectionname:productname::ver?

You can infer what "type" of product it is with the number of colons. The Standard states:

urn:<national_agency>:<archiving_agency>:<bundle_id>:<collection_id>:<product_id>`

Depending on how you want to use that information, one thing to keep in mind per our conversation on changing LIDs when a product is versioned, productname may have synonymous collection LIDs, e.g. productname::1.0 can belong to urn:nasa:pds:bundlename:collectionname while productname::2.0can belong to urn:nasa:pds:bundlename:differentcollectionname. Per @tdddblog's recent implementation with a new alternate_ids field, this should be something we can now support with the registry.

NASA-PDS/registry-mgr#46

That may have gone down a rabbit hole you were not interested in, in which case, ignore that second half.

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Jun 2, 2022

Hi @al-niessner , @jordanpadams

I am not finding the ticket where Al's full proposal for refactoring the URL path is (although I have pasted your proposal in a note of mine).
Sorry if it feels like I am ignoring some aspects of Al's proposal but I am trying to stick to some design principles that I am providing here and the ticket objective (and Al's proposal) was mainly focused on simplification of the registry API code which is important but not our main driver I think.

Here is my proposal, that we can discuss this afternoon:

Drivers for the design:
In this order of priority:

  1. Transparently shows the PDS4 model (as much as possible) so not to re-invent a PDS model and require to document new concepts specifically for the API (for example use relationships as specified in https://pds.nasa.gov/datastandards/documents/dd/current/PDS4_PDS_DD_1H00.html#N-2021801474)
  2. Follow the web restful API design principle and best practices (for example resources specified as nouns are preferred over verbs in the URL path)
  3. Refer to use cases to validate the design
  4. Have in mind a simple implementation which will help with the maintainability of the code and the readability of the behavior of the API

Search and identifier resolution:

For all products:
http://pds.nasa.gov/api/search/1.0/products/{id}

For specific groups of products (classes):
http://pds.nasa.gov/api/search/1.0/{class} as found in https://pds.nasa.gov/datastandards/documents/dd/current/PDS4_PDS_DD_1H00.html#d5e85, by taking the part after Product_ and making it lower case and plural, for example:

Note the list given in https://pds.nasa.gov/datastandards/documents/dd/current/PDS4_PDS_DD_1H00.html#d5e85 is sometimes not consistent in having nouns as the part after Product_ although most of them are nouns. Rest-ful apis prefers nouns (see https://restfulapi.net/resource-naming/ ). We could replace the adjective or verbs by related nouns:

  • browse --> browseable (anything better ?)
  • observational --> observation

in swagger specification we would only have this endpoint entry:
http://pds.nasa.gov/api/search/1.0/{class} where class is 'products' (anything) or other specific classes (softwares, bundles...)

Referencing:
Post used a reference: https://www.moesif.com/blog/technical/api-design/REST-API-Design-Best-Practices-for-Sub-and-Nested-Resources/

From any identifier resolution URL (for example http://pds.nasa.gov/api/search/1.0/observationals/{id} or http://pds.nasa.gov/api/search/1.0/bundles /{id}) we can get their references by roles of the association as shown on the diagram on page https://pds.nasa.gov/datastandards/documents/im/current/index_1H00.html

For example:

  • bundles has-member collections --> bundles/{id}/member-collections returns list of collection ids
  • collections has-member observationals --> collections/{id}/member-observationals returns list of observationals and documents
  • collections as-member documents --> collections/{id}/document-members
  • bundles has members collections have members observationals or documents --> /bundles/{id}/member-collections/member-observationals (but should we do that or rely on the user to do 2 requests instead of one, since I don't know the exact use case, I would lean toward the later)
  • observationals is-member-of collections --> observationals/{id}/owner-collections returns list of collection ids
  • documents is-member-of collections --> documents/{id}/owner-collections returns list of collection ids
  • collections is-member-of bundles--> collections/{id}/owner-bundles returns list of bundles id
  • observationals is-member-of collections is-member-of bundles --> observationals/{id}/owner-collections/owner-bundles (but should we do that or rely on the user to do 2 requests instead of one)

Alternate option is not to have instead of collection-members only members or instead of bundle-owners only owners . That would less explicit for the user but more simple and that might be obvious enough/documented elsewhere for the users.

products/{id} would not have any referencing besides the generic referencing that a product might have, could be for example Citation_Information which applies to any class of product (see https://pds.nasa.gov/datastandards/documents/dd/current/PDS4_PDS_DD_1H00.html#N-2006280352). Although we don't have specific entries with identifiers for dois in the registry yet.

@al-niessner
Copy link
Contributor Author

al-niessner commented Jun 2, 2022

@tloubrieu-jpl @jordanpadams

If you want a super long prefix like http://pds.nasa.gov/api/search/1.0/ that is fine. Maybe it is good to have its version as part of the URL for clarity too. I like search too but it is being used a verb here -- asking server to search not return a search that happened a year ago that is sitting in a file which is the noun.

What I strong disagree with is /{class}/{id} like /observation/{id} or /bundle/{id} because it means the user has to work at matching the class to the id given. Why? Given an id, we know the class or can look it up. Why make the user tell us? More importantly, what to do if they get it wrong? Why should it matter if they get it wrong? If my boss (use case) gives me an id and I do not know what it is, then I have to look it up first? I know you are shooting at something with making the user specify the type of id provided but cannot tell what.

All of your examples are simply a case of /{id}/references/{class/group} just spelled out instead of using enums. Makes it hard for me to understand the desire to spell them out. What is the advantage of spelling them out? What is the advantage of making the user specify the class/group of the id?

@al-niessner
Copy link
Contributor Author

@tloubrieu-jpl

Are you thinking /bundles/{id} and /observations/{id} as nesting?

The examples given a like customer id and account id. The interesting part of the examples for nesting is that an account cannot be a customer and a customer cannot be an account. Therefore nesting like /customers/{customer id}/accounts/{account id} make sense because you have two independent resources.

In the case of bundles, collections, observations, and products they are not independent so you have a single resource /products if you like but only if products always means everything and does not exclude bundles and collections.

Therefore you may have /bundles /collections ... but only /products/{id} since the id can specify any of the various types.

Maybe what you want is (I moved the version because it is the API that is versioned maybe not the groups and products):

  • ://pds.nasa.gov/api/1.0/groups/{group name} -- where group name is currently any product class but gives room for future requests when users figure out they are allowed to ask.
  • ://pds.nasa.gov/api/1.0/products/{id} -- same as latest
  • ://pds.nasa.gov/api/1.0/products/{id}/{all,latest}
  • ://pds.nasa.gov/api/1.0/products/{id}/references/{group name} -- same as latest
  • ://pds.nasa.gov/api/1.0/products/{id}/references/{group name}/{all,latest}
  • ://pds.nasa.gov/api/1.0/products/{id}/referencedby/{group name} -- same as latest
  • ://pds.nasa.gov/api/1.0/products/{id}/referencedby/{group name}/{all,latest}

Here you have the two resources groups (again today is equivalent to product class) and products (all entities in the data base) that can be identified with a lidvid.

The nested dcoument you site states many reasons not to do nested many of which apply to us: redundant endponts, multiple database queries, and long URLs. Those 3 cover us pretty well and with the changing URLs it pretty much says that nesting is not necessarily the way to go.

@tloubrieu-jpl
Copy link
Member

They way I was thinking we can go around nesting is the url like:
/collections/{id}/observational would return a list of ids instead of a list of observational product description but this is questionable if we look at the use case. If we support search criteria on these end-points that makes it a useful end-point.

I can understand that using enumeration in the path is convenient from a coding point of view but for example:
/products/{id}/references/collections it is not clear what that would mean depending on what the {id} is. Most of time that would be unrelevant.

Sorry if I was unclear also that http://pds.nasa.gov/api/search/1.0/ is the url where the registry API is deployed in production, so you can now request for example http://pds.nasa.gov/api/search/1.0/products or http://pds.nasa.gov/api/search/1.0/collections. We are having an API url scheme like http://pds.nasa.gov/api///...., this is detailed in https://nasa-pds.github.io/pds-api/overview.html . You are right search is a verb. I didn't thought of that.

@al-niessner
Copy link
Contributor Author

I agree that references is murky. /products/{id}/contained/collections and /products/{id}/contained_by/collections? Maybe just referencing for references is better?

You could do /products/{id}/groups/collections if you want it resource oriented but then how to describe the reverse? /groups/collection/products/{id}?

  • /products/{bundle id}/groups/collections - children
  • /groups/collections/products/{bundle id} - parents

generic

  • /products/{id}/groups/{group name} - same as id references
  • /groups/{group name}/products/{id} - same as id is referenced by

That is resource oriented but to me is not any clearer.

@jordanpadams
Copy link
Member

closed per #142

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants