Skip to content

Commit

Permalink
Merge pull request #202 from jeffmorin/main
Browse files Browse the repository at this point in the history
[DS-4301] Added Content Reports section and Filtered Collections report therein
  • Loading branch information
tdonohue authored Feb 28, 2024
2 parents 73df0d4 + 1d4a1ff commit 1ed2768
Show file tree
Hide file tree
Showing 3 changed files with 291 additions and 0 deletions.
149 changes: 149 additions & 0 deletions contentreport-filteredcollections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Filtered Collections report
[Back to the list of all defined endpoints](endpoints.md)

This endpoint provides aggregated statistics about the number of items per collection according to selected filters.

NOTE: This is currently a beta feature.


**GET /api/contentreport/filteredcollections**

The endpoint takes a `filters` query parameter whose value is a comma-separated list of filters
like the following:
```
?filters=is_discoverable,has_multiple_originals,has_pdf_original
```

Alternatively, the comma-separated list can be replaced by a repetition of the `filters` parameter
for each requested filter:
```
?filters=is_discoverable&filter=has_multiple_originals&filter=has_pdf_original
```


Please see [below](#available-filters) for the list of available filters.

## Report contents

For each collection, the basic report consists of:
* name (label) and handle of the collection
* name (label) and handle of the parent community
* total number of items
* number of items matching all selected filters

In addition, a `summary` element provides the total number of items and the total number of items matching all filters
for the whole repository.

An example JSON response document to `/api/contentreport/filteredcollections`:
```json
{
"id": "filteredcollections",
"collections": [
{
"label": "Collection 1",
"handle": "100/1",
"values": {
"is_discoverable": 23,
"has_multiple_originals": 3,
"has_pdf_original": 14
},
"community_label": "Community 1",
"community_handle": "20.500.11794/1",
"nb_total_items": 23,
"all_filters_value": 3
},
{
"label": "Collection 2",
"handle": "100/2",
"values": {
"is_discoverable": 1,
"has_multiple_originals": 0,
"has_pdf_original": 0
},
"community_label": "Community 1",
"community_handle": "20.500.11794/1",
"nb_total_items": 1,
"all_filters_value": 0
},
{
"label": "Collection 3",
"handle": "100/3",
"values": {
"is_discoverable": 1,
"has_multiple_originals": 0,
"has_pdf_original": 1
},
"community_label": "Community 1",
"community_handle": "20.500.11794/1",
"nb_total_items": 1,
"all_filters_value": 0
}
],
"summary": {
"label": null,
"handle": null,
"values": {
"is_discoverable": 25,
"has_multiple_originals": 3,
"has_pdf_original": 15
},
"community_label": null,
"community_handle": null,
"nb_total_items": 25,
"all_filters_value": 3
},
"type": "filtered-collections",
"_links": {
"self": {
"href": "http://localhost:8080/dspace-server/api/contentreport/filtered-collections"
}
}
}
```

## Available filters

The available filters are as follows:

* Item Property Filters
* `is_item`: Is Item - always true
* `is_withdrawn`: Withdrawn Items
* `is_not_withdrawn`: Available Items - Not Withdrawn
* `is_discoverable`: Discoverable Items - Not Private
* `is_not_discoverable`: Not Discoverable - Private Item
* Basic Bitstream Filters
* `has_multiple_originals`: Item has Multiple Original Bitstreams
* `has_no_originals`: Item has No Original Bitstreams
* `has_one_original`: Item has One Original Bitstream
* Bitstream Filters by MIME Type
* `has_doc_original`: Item has a Doc Original Bitstream (PDF, Office, Text, HTML, XML, etc)
* `has_image_original`: Item has an Image Original Bitstream
* `has_unsupp_type`: Has Other Bitstream Types (not Doc or Image)
* `has_mixed_original`: Item has multiple types of Original Bitstreams (Doc, Image, Other)
* `has_pdf_original`: Item has a PDF Original Bitstream
* `has_jpg_original`: Item has JPG Original Bitstream
* `has_small_pdf`: Has unusually small PDF
* `has_large_pdf`: Has unusually large PDF
* `has_doc_without_text`: Has document bitstream without TEXT item
* Supported MIME Type Filters
* `has_only_supp_image_type`: Item Image Bitstreams are Supported
* `has_unsupp_image_type`: Item has Image Bitstream that is Unsupported
* `has_only_supp_doc_type`: Item Document Bitstreams are Supported
* `has_unsupp_doc_type`: Item has Document Bitstream that is Unsupported
* Bitstream Bundle Filters
* `has_unsupported_bundle`: Has bitstream in an unsupported bundle
* `has_small_thumbnail`: Has unusually small thumbnail
* `has_original_without_thumbnail`: Has original bitstream without thumbnail
* `has_invalid_thumbnail_name`: Has invalid thumbnail name (assumes one thumbnail for each original)
* `has_non_generated_thumb`: Has non-generated thumbnail
* `no_license`: Doesn't have a license
* `has_license_documentation`: Has documentation in the license bundle
* Permission Filters
* `has_restricted_original`: Item has Restricted Original Bitstream
* `has_restricted_thumbnail`: Item has Restricted Thumbnail
* `has_restricted_metadata`: Item has Restricted Metadata

Possible response status:

* 200 OK - The specific report data was found, and the data has been properly returned.
* 403 Forbidden - In case of unauthorized user session.
140 changes: 140 additions & 0 deletions contentreport-filtereditems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Metadata query (aka Filtered Items) report
[Back to the list of all defined endpoints](endpoints.md)

This endpoint provides a custom query API to select items from existing collections,
according to given Boolean and metadata filters.

NOTE: This is currently a beta feature.


**GET /api/contentreport/filtereditems**

The report parameters are described [below](#report-parameterization).

Additionally, a `pageNumber` parameter is available to retrieve results starting at a given page
(according to `pageLimit`, the maximum number of items per page). Page numbering starts at 0.

All parameters except `pageNumber` and `pageLimit` are repeatable. Multiple values can be expressed either
by repeating the corresponding parameter, e.g.:
```
?filters=is_discoverable&filters=has_multiple_originals&filters=has_pdf_original
```

of by using a comma-separated value, e.g.:

```
?filters=is_discoverable,has_multiple_originals,has_pdf_original
```

except the `queryPredicates` parameter, which supports only parameter repetition for multiple values
to avoid any ambiguities in case a predicate values contains commas.

Please see [below](#report-parameterization) for parameterization details.

## Report contents

An example JSON response document to `/api/contentreport/filtereditems` (metadata removed for brevity):
```json
{
"id": "filtereditems",
"items": [
{
"id": "07e388ff-f22b-4d4f-8275-acab5c3edacc",
"uuid": "07e388ff-f22b-4d4f-8275-acab5c3edacc",
"name": "Enhancing the lubricity of an environmentally friendly Swedish diesel fuel MK1",
"handle": "20.500.11794/42",
"metadata": {
"dc.contributor.author": [
{
"value": "Smith, John",
"language": null,
"authority": "6eee383a-f126-4705-9ffb-b4aa4832070e",
"confidence": 600,
"place": 0
}
],
"dc.publisher": [
{
"value": "Elsevier",
"language": "fr_CA",
"authority": null,
"confidence": -1,
"place": 0
}
],
},
"inArchive": true,
"discoverable": true,
"withdrawn": false,
"lastModified": "2015-11-23T17:30:21.463+00:00",
"entityType": "Publication",
"owningCollection": {
"id": "d98a828c-45c2-43d9-9861-6b9800bf14f5",
"uuid": "d98a828c-45c2-43d9-9861-6b9800bf14f5",
"name": "Articles publiés dans des revues avec comité de lecture",
"handle": "100/1",
"metadata": {
"dc.identifier.uri": [
{
"value": "http://localhost:4000/handle/100/1",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"dspace.entity.type": [
{
"value": "Publication",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
]
},
"type": "collection"
},
"type": "item"
},
{
...
}
],
"itemCount": 40,
"type": "filtereditemsreport",
"_links": {
"self": {
"href": "http://localhost:8080/dspace-server/api/contentreport/filtereditems"
}
}
}
```

## Report parameterization

The parameters are specified as follows:

* `collections`: The collection UUIDs where to search items. If none are provided, the whole repository is searched.
* `presetQuery`: This parameter is not used on the REST API side. It defines a predefined set of query predicates
defined in the Angular layer.
* `queryPredicates`: Predicates used to filter matching items. They can be predefined (see `presetQuery` above)
or defined specifically by the user. As mentioned above, they are the only parameter that cannot be repeated
using comma-separated values.
* `pageLimit`: Maximum number of items per page.
* `filters`: Supplementary filters, these are the same as those available in the Filtered Collections report.
Please see [/api/contentreport/filteredcollections](contentreport-filteredcollections.md#available-filters) for details.
* `additionalFields`: Fields to add to the basic report for each item included in the report.

The _basic report_ mentioned above includes, for each item:

* Sequential number (order of appearance in the report)
* UUID
* Parent collection
* Handle
* Title

Possible response status:

* 200 OK - The specific report data was found, and the data has been properly returned.
* 403 Forbidden - In case of unauthorized user session.
2 changes: 2 additions & 0 deletions endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@
* [/api/authz/features](features.md)
* [/api/statistics](statistics.md)
* [/api/tools/itemrequests](item-requests.md)
* [/api/contentreport/filteredcollections](contentreport-filteredcollections.md)
* [/api/contentreport/filtereditems](contentreport-filtereditems.md)

## Actuator endpoints
The following endpoints are implemented using [Spring Boot Actuator](https://docs.spring.io/spring-boot/docs/current/reference/html/actuator.html#actuator.enabling) and are enabled by default:
Expand Down

0 comments on commit 1ed2768

Please sign in to comment.