[DS-4301] Added Content Reports section and Filtered Collections report therein #202

jeffmorin · 2022-09-13T15:27:13Z

This pull request features a REST specification for the first of the two REST-based reports to be ported to DSpace 7.x (see ticket DS-4301, DSpace/DSpace#7641), the Filtered Collections report.

Following a suggestion from Andrea Bollini, I created a new /contentreports branch in which I created a /filteredcollections URI for the report itself. I added an entry under Endpoints Under Development/Discussion in endpoints.md.

I also added a missing link and removed a duplicate entry in endpoints.md.

jeffmorin · 2022-12-02T18:44:28Z

Added the second report, Filtered Items (for metadata queries).

tdonohue

@jeffmorin : Gave this a quick review today (while also reviewing the corresponding backend PR: DSpace/DSpace#8598 -- still testing it though). I have some questions about the design here, namely about the GET vs POST versions of these endpoints... I'm not sure I understand why POST is used so heavily (as POST tends to mean you are creating something new).

Also, I noticed that in the old REST API, these were all GET queries: https://wiki.lyrasis.org/display/DSDOC6x/REST+Reports+-+Summary+of+API+Calls

It's possible I'm simply forgetting why these are switched to POST, or it has to do with the limit to the number of params that you pass to a GET. But, I think we need to describe GET vs POST better in this Contract.

tdonohue · 2022-12-16T18:37:45Z

contentreports-filteredcollections.md

+?filters=is_discoverable,has_multiple_originals,has_pdf_original
+```
+
+In POST mode, it is defined as a JSON document like this:


I'd prefer if we describe GET and POST mode separately. I'm having a hard time understanding the way this is documented. When would someone use GET and when would they use POST? It's unclear if everything below this point in the docs is ONLY for POST or if it also applies to GET? Could we give some more examples here as to what the differences are?

I reorganized report parameterization in the Filtered Collections report documentation. I also fixed a few mistakes and added some info I realised that was missing (e.g., definition of "basic report" in Filtered Items).

About usage of GET vs. POST, please see my other comments below.

tdonohue · 2022-12-16T18:40:18Z

contentreports-filtereditems.md

+[Back to the list of all defined endpoints](endpoints.md)
+
+## Statistics for the whole repository
+**POST /api/contentreports/filtereditems**


Why is this a POST instead of a GET? I notice that the "statistics" endpoints always use GET except when they are adding data to the statistics. See https://github.com/DSpace/RestContract/blob/main/statistics-reports.md and https://github.com/DSpace/RestContract/blob/main/statistics-viewevents.md

Could we better describe why we need to use POST for these endpoints? It appears they are readonly, which implies they might be switched to GET.

That's true, I thought about it. My concern with GET is, as you suggested above, the limited length of the parameters passed as part of the URL. There should be no problems with the Filtered Collections report (only Boolean filters).

Parameterization of the Filtered Items report, however, is much more complex and can easily become long enough to exceed any limit enforced by application servers for URL query strings. This is why I implemented this report as a POST endpoint.

For (a bit of) uniformity, I also added POST support to the Filtered Collections report.

Besides, while the HTTP spec clearly states that GET should be used for read-only requests, I saw nothing stating that POST should be used only for data-changing requests.

If you feel that everything should be switched to GET anyway, I can do it. In this case, the Filtered Items report shall be thoroughly tested with lots of parameters selected to make sure that nothing goes wrong due to too long a query string.

About Filtered Items: The parameters could be organized into a GET query string, although such a string might become quite long. Another concern I had while designing the API for this report is the "query predicates" part: these are structured parameters (a query predicate is a (field, operator, value) tuple). This is another reason why I didn't include a GET version of this report.

Please check: https://github.com/DSpace/RestContract/blob/main/search-endpoint.md solution.

A query string solution it's used for filtering results:

f.<:filter-name>=<:filter-value>,<:filter-operator>

where a filter-name belongs to a predefined structure that is previously returned. Like: f.title=rainbows,notcontains

{ "query":"my query", "scope":"9076bd16-e69a-48d6-9e41-0238cb40d863", "appliedFilters": [ { "filter" : "title", "operator" : "notcontains", "value" : "abcd", "label" : "abcd" },

I think this discussion stems from a disagreement between HTTP and REST about what POST is for. RFC 9110 says that creating a new resource is only one possible use for POST. "The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics." The description of POST here is quite a bit narrower. One might say that REST is a reuse of HTTP syntax with different semantics.

So it can be argued that REST is not a very good fit for this operation, but it is what we have.

Would it be a violation of the "spirit of REST" to consider such a POST to be storing a report description, which is consumed in the act of generating the report? Reports may take some time to create. Suppose that one POSTs a document describing the desired report and receives a token in the response. The report generator runs in the background. When finished, the token can be presented (using GET) to receive the report, and the report description is then destroyed.

jeffmorin · 2023-01-11T15:34:37Z

I changed all contentreports occurrences to contentreport to address a request in the REST layer regarding category names.

paulo-graca · 2023-04-12T16:49:49Z

Thank you @jeffmorin for this important contribution. I would like to help you bringing this to DSpace core. I didn't (yet) check your implementation and I was expecting to have the DSpace demo site working for comparing, but it's unavailable.

I agree with @tdonohue , our Rest Contract defines that POST method should be used to create new resources, leaving the GET to access resources:
https://github.com/DSpace/RestContract#on-collection-of-resources-endpoints

My understanting it's that we aren't creating resources, but access them, so we should use GET.
I understand that we have a GET max chars limitation and a POST could help us in this matter. That's why this solution.
A somehow similiar feature (to me) was addressed at:
https://github.com/DSpace/RestContract/blob/main/search-endpoint.md

At the discovery or search endpoints, a complex structure is defined at the backend, stating what the client can do (defining available browses, facets, sorts options,...). It's also possible use query string to pass params. But, when pre-defining what structure to use, you don't need to pass a lot of query params.
I didn't saw (yet) your implementation, but this feature could somehow benefict from discovery's rest design.

mwoodiupui · 2023-04-12T18:46:10Z

I could argue that we are neither creating nor accessing resources, because these reports are not resources; they are snapshots of the operational state of a store of resources. (I realize that this is the opposite of what I said above.) What is wanted is really not REST at all, but REST is all that we have to work with. REST basically defines putting things into a store and getting them out again. Complex configurable reports on the status of the store itself are something that REST isn't able to do if one cares about ideological purity.

The notion of triggering pre-defined reports (which is how I understand the Discovery analogy) is interesting, but I think that we need to consider how the reports will be used. Statistics might be a useful model. One sometimes doesn't yet know what questions to ask, so one engages in wide-ranging exploration of the data to find promising perspectives. At other times (such as after exploratory examination) one has specific questions to be answered, possibly over and over again as the dataset evolves. Is this a good fit to the way we expect Content Reports to be used? Because doing exploratory work by repeatedly editing a set of preconfigured reports on the back-end sounds slow, tedious and exhausting.

paulo-graca · 2023-04-14T15:34:29Z

I don't have access to this feature right now, but I found a very useful video by Terry Brady (@terrywbrady )
https://www.youtube.com/watch?v=K2gGHYUZI40 and he shows some work they did and how we can interact with it.

Just to contextualize others, I would also like to add the Wiki page URL for the DS6 feature:
https://wiki.lyrasis.org/display/DSDOC6x/REST+Reports+-+Summary+of+API+Calls

jeffmorin · 2023-04-19T19:59:27Z

I just added a GET endpoint to the Filtered Items content report. I also made a minor update to the Filtered Collections report documentation.

abollini

@jeffmorin thanks for sharing your proposal to move the DSpace 6 reporting system to DSpace 7. I have added my feedback inline

contentreport-filteredcollections.md

abollini · 2023-05-07T13:35:42Z

contentreport-filteredcollections.md

+The GET-based endpoint takes a `filters` query parameter whose value is a comma-separated list of filters
+like the following:
+```
+?filters=is_discoverable,has_multiple_originals,has_pdf_original
+```
+
+Alternatively, the comma-separated list can be replaced by a repetition of the `filters` parameter
+for each requested filter:
+```
+?filters=is_discoverable&filter=has_multiple_originals&filter=has_pdf_original
+```
+
+
+Please see [below](#available-filters) for the list of available filters.
+
+**POST /api/contentreport/filteredcollections**


I would recommend against that. We should use the filter to build an unique id of the report, similar to what is done in the statistics endpoint see https://github.com/DSpace/RestContract/blob/main/statistics-reports.md
In this way the /api/contentreports/filteredcollections will represent our collection of resources endpoint (it can return 405 as there is no use case in scrolling all the potential report) and /api/contentreports/filteredcollections/<concatenation-of-filter-as-report-id> the individual report. In this way the HTTP caching mechanism will work at the best

abollini · 2023-05-07T14:02:48Z

contentreport-filtereditems.md

This seems to be very close to the discovery endpoint... we need to build a query and get the list of matching items. If we want to get a different representation of the resulting items it should be an "export" script.
My suggestion is to look to which "filter" type are not yet supported in discover and eventually implement them instead than build from scratch a separate endpoint. This would allow to reuse almost everything in the UI just defining a special discovery configuration used for this purpose (similar to what has been done for the administrative search)

jeffmorin · 2023-05-25T17:20:04Z

Please see the comment I wrote at DSpace/DSpace#8598.

jeffmorin · 2023-11-21T15:18:27Z

My fork is now synchronized with the main branch.

paulo-graca

Thank you @jeffmorin !
I've already had a manifested my thoughts about when we should bring this feature into DS. But that doesn't mean that I don't want it for DS8. I just think we need to straighten our heads in which way is the right path for it, then build the needed scaffolding to have this feature in top of it. Just for comparison, a specific Working Group was created just to address discussions about Entities single feature on DSpace 7.
I'm afraid that having this feature as is, others might build features on top of it, or use it and then it will be very hard to have it refactored later on.

DS6 content report feature has a set of functionalities that are still missing in this specification on this PR, for example collection filters (I didn't see any endpoint specification that retrieved available filters) and also filtering by collection id.

You can take a glance on:
https://wiki.lyrasis.org/display/DSDOC6x/REST+Reports+-+Collection+Report+Screenshots+with+Annotated+API+Calls

Also, I think queryPredicates at the item filtering level could be better described in the documentation. And I also think we need to have endpoints for what type of operations are available within fields (perhaps advanced search could help in here).
Even if the implementation doesn't support it yet, I think those missing endpoints should be defined at this phase.

mwoodiupui · 2024-02-12T13:14:05Z

Perhaps some of the difficulty is in calling it "beta"? That implies that the design is finished and the code is expected to work well. Would "experimental" or "preview" be more suitable? With a stern warning: "the work is unfinished and we expect to make substantial changes in the final, supported code to address, at least, these issues...."

Because that's what was agreed upon, I think: an early release in 8.0 despite 8.0's compressed schedule, to get this facility into the hands of users this year and to gather wider experience; and a stable release with full support by 9.0.

If people choose to rely on internals of code when they've been clearly warned not to, that is their problem, and we should not be shy about reminding them. Politely appreciative of feedback: yes; apologetic: no.

It would be helpful to have missing features logged as issues so that they can be studied, scheduled and tracked.

tdonohue · 2024-02-12T15:44:26Z

@paulo-graca : From talking with @pilasou and @jeffmorin , I don't think it's possible to update this PR any further in time for 8.0. The decision for 8.0 was to scale back because @pilasou and @jeffmorin don't have availability to do any more detailed work on this code. We have to remember that, while this is wanted by many, the development on this is entirely donated by U of Laval.

There are only two options that remain:

We accept this work pretty much "as-is" for 8.0 based on the agreement we achieved in recent DevMtgs (see notes from Feb 8 meeting). Then we would work to improve it / fix bugs in 8.1+ and 9.0.
OR, the only other option is to delay everything until 9.0. There is no time remaining to add more features or even design new endpoints....both of those take time, and we have no time left to get this into 8.0. (Keep in mind, this feature was initially built for 7.x, but was delayed until 8.0. This option would be yet another delay)

As decided in the meeting on Feb 8, we voted for option 1. That means we let this into 8.0 so that everyone can start to improve/enhance it. But, put warnings on it that it is missing features & there are some known issues. Those missing features / known issues can be detailed in GitHub issue tickets to be addressed in future 8.x or 9.0 releases.

paulo-graca · 2024-02-12T19:22:11Z

@tdonohue, to me, there are two different aspects here:

The first one is what you just mentioned, and it's related to code changes and missing features. I completely understand that we don't have the time and necessary resources to address them in time for the 8.0 release. I want to make it clear that I don't oppose it for DS8, nor am I in any kind of blocking position here. I want to move it forward. The decision in our meeting was to address only minor things like the requests (POST->GET) and merge it as it is.
The second aspect relates specifically to this PR, the REST contract, and my comments were about it. I understand that some specifications are still missing, and some parts should be more detailed, such as the queryPredicates. I understand that the REST contract, generally speaking, could address the definition of parts that aren't yet implemented. So, I think my suggestions don't pose a blocking state to this feature (Content Reports) for DS8, especially because the REST Contract isn't tied to any specific version of DSpace. It's very version-agnostic (>=7.0)

tdonohue · 2024-02-12T19:29:59Z

@paulo-graca : To clarify my point... I disagree with this statement:

Even if the implementation doesn't support it yet, I think those missing endpoints should be defined at this phase.

I feel the REST Contract should only define the endpoints that are implemented at this time. Otherwise, it is misleading to other developers as to what features actually exist. (And to clarify, we do have a versioned REST Contract. We have the main branch which is pre-8.0, and the dspace-7_x branch, which is the 7.x version of the REST Contract. So, the contract is versioned in the same way that the backend is versioned because the contract describes the backend.)

Overall, I agree that we should enhance this contract as necessary to describe the current implementation. However, we should not add endpoints which do not yet exist, as the Contract is meant to describe the implementation.

I hope that clarifies things. If you do have feedback on the contract compared to the current implementation, then please do point it out. We both agree that there is the opportunity to clean up this PR as the implementation PR also gets cleaned up.

tdonohue

👍 Thanks @jeffmorin ! This looks good now.

Jean-François Morin added 2 commits September 13, 2022 10:47

Added a missing link and deleted a duplicate entry

177457e

Added the Filtered Collections report spec

1d35992

jeffmorin mentioned this pull request Sep 13, 2022

[DS-4301] Ensure that DSpace REST Report capabilities are part of the DSpace 8 REST API DSpace/DSpace#7641

Closed

jeffmorin and others added 6 commits November 30, 2022 11:20

Merge branch 'DSpace:main' into main

022db3c

Filtered Items report

7a51061

Merge branch 'main' of github.com:jeffmorin/RestContract

9e094d3

JSON and content fixes

77bcc5e

JSON fix

3470e30

JSON fix

b8a1274

This was referenced Dec 7, 2022

Content reports ported from DSpace 6.x DSpace/DSpace#8598

Merged

Content reports ported from DSpace 6.x DSpace/dspace-angular#1985

Closed

tdonohue assigned jeffmorin Dec 16, 2022

tdonohue self-requested a review December 16, 2022 17:51

tdonohue added this to the 7.5 milestone Dec 16, 2022

tdonohue requested a review from abollini December 16, 2022 18:36

tdonohue requested changes Dec 16, 2022

View reviewed changes

Jean-François Morin added 2 commits January 9, 2023 15:28

Improved API documentation

1527355

Changed contentreports category to contentreport

8bad9b3

Merge branch 'DSpace:main' into main

a1a13dd

tdonohue removed this from the 7.5 milestone Feb 21, 2023

Merge branch 'DSpace:main' into main

3cc4598

jeffmorin mentioned this pull request Mar 24, 2023

Content reports ported from DSpace 6.x DSpace/dspace-angular#2163

Merged

6 tasks

paulo-graca self-requested a review March 30, 2023 15:01

tdonohue self-requested a review April 13, 2023 14:40

Merge branch 'DSpace:main' into main

5455187

Added GET endpoint to Filtered Items report

41e9ac3

Merge branch 'DSpace:main' into main

001a342

abollini requested changes May 7, 2023

View reviewed changes

Merge branch 'DSpace:main' into main

b6af0ae

jeffmorin and others added 2 commits November 20, 2023 08:41

Merge branch 'DSpace:main' into main

26648ee

Updated to latest version from main branch

718ca8a

Merge branch 'DSpace:main' into main

a552cb8

paulo-graca reviewed Feb 11, 2024

View reviewed changes

Merge branch 'DSpace:main' into main

6386f46

jeffmorin and others added 5 commits February 20, 2024 15:59

Merge branch 'DSpace:main' into main

768e541

Merge branch 'DSpace:main' into main

487f59b

Added beta feature warning in both Content Report pages

014af1b

Removed POST endpoints from documentation

9b2bdca

Merge branch 'DSpace:main' into main

1d4a1ff

tdonohue approved these changes Feb 28, 2024

View reviewed changes

tdonohue added this to the 8.0 milestone Feb 28, 2024

tdonohue merged commit 1ed2768 into DSpace:main Feb 28, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DS-4301] Added Content Reports section and Filtered Collections report therein #202

[DS-4301] Added Content Reports section and Filtered Collections report therein #202

jeffmorin commented Sep 13, 2022

jeffmorin commented Dec 2, 2022

tdonohue left a comment

tdonohue Dec 16, 2022

jeffmorin Jan 9, 2023

tdonohue Dec 16, 2022

jeffmorin Jan 9, 2023 •

edited

Loading

jeffmorin Jan 9, 2023 •

edited

Loading

paulo-graca Apr 12, 2023

mwoodiupui Apr 12, 2023

jeffmorin commented Jan 11, 2023

paulo-graca commented Apr 12, 2023

mwoodiupui commented Apr 12, 2023

paulo-graca commented Apr 14, 2023

jeffmorin commented Apr 19, 2023

abollini left a comment

abollini May 7, 2023 •

edited by tdonohue

Loading

abollini May 7, 2023

jeffmorin commented May 25, 2023

jeffmorin commented Nov 21, 2023

paulo-graca left a comment

mwoodiupui commented Feb 12, 2024

tdonohue commented Feb 12, 2024 •

edited

Loading

paulo-graca commented Feb 12, 2024

tdonohue commented Feb 12, 2024 •

edited

Loading

tdonohue left a comment

[DS-4301] Added Content Reports section and Filtered Collections report therein #202

[DS-4301] Added Content Reports section and Filtered Collections report therein #202

Conversation

jeffmorin commented Sep 13, 2022

jeffmorin commented Dec 2, 2022

tdonohue left a comment

Choose a reason for hiding this comment

tdonohue Dec 16, 2022

Choose a reason for hiding this comment

jeffmorin Jan 9, 2023

Choose a reason for hiding this comment

tdonohue Dec 16, 2022

Choose a reason for hiding this comment

jeffmorin Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

jeffmorin Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

paulo-graca Apr 12, 2023

Choose a reason for hiding this comment

mwoodiupui Apr 12, 2023

Choose a reason for hiding this comment

jeffmorin commented Jan 11, 2023

paulo-graca commented Apr 12, 2023

mwoodiupui commented Apr 12, 2023

paulo-graca commented Apr 14, 2023

jeffmorin commented Apr 19, 2023

abollini left a comment

Choose a reason for hiding this comment

abollini May 7, 2023 • edited by tdonohue Loading

Choose a reason for hiding this comment

abollini May 7, 2023

Choose a reason for hiding this comment

jeffmorin commented May 25, 2023

jeffmorin commented Nov 21, 2023

paulo-graca left a comment

Choose a reason for hiding this comment

mwoodiupui commented Feb 12, 2024

tdonohue commented Feb 12, 2024 • edited Loading

paulo-graca commented Feb 12, 2024

tdonohue commented Feb 12, 2024 • edited Loading

tdonohue left a comment

Choose a reason for hiding this comment

jeffmorin Jan 9, 2023 •

edited

Loading

jeffmorin Jan 9, 2023 •

edited

Loading

abollini May 7, 2023 •

edited by tdonohue

Loading

tdonohue commented Feb 12, 2024 •

edited

Loading

tdonohue commented Feb 12, 2024 •

edited

Loading