Merge pull request #7403 from IQSS/develop

v5.2
IQSS · Nov 9, 2020 · 4951505 · 4951505
2 parents 559a449 + 5deeec4
commit 4951505
Show file tree

Hide file tree

Showing 91 changed files with 3,183 additions and 2,325 deletions.
diff --git a/doc/release-notes/5.1.1-release-notes.md b/doc/release-notes/5.1.1-release-notes.md
@@ -0,0 +1,63 @@
+# Dataverse 5.1.1
+
+This minor release adds important scaling improvements for installations running on AWS S3. It is recommended that 5.1.1 be used in production instead of 5.1.
+
+## Release Highlights
+
+### Connection Pool Size Configuration Option, Connection Optimizations
+
+Dataverse 5.1 improved the efficiency of making S3 connections through use of an http connection pool. This release adds optimizations around closing streams and channels that may hold S3 http connections open and exhaust the connection pool. In parallel, this release increases the default pool size from 50 to 256 and adds the ability to increase the size of the connection pool, so a larger pool can be configured if needed.
+
+## Major Use Cases
+
+Newly-supported use cases in this release include:
+
+- Administrators of installations using S3 will be able to define the connection pool size, allowing better resource scaling for larger installations (Issue #7309, PR #7313)
+
+## Notes for Dataverse Installation Administrators
+
+### 5.1.1 vs. 5.1 for Production Use
+
+As mentioned above, we encourage 5.1.1 instead of 5.1 for production use.
+
+### New JVM Option for Connection Pool Size
+
+Larger installations may want to increase the number of open S3 connections allowed (default is 256). For example, to set the value to 4096:
+
+`./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"`
+(where `<id>` is the identifier of your S3 file store (likely `"s3"`). The JVM Options section of the [Configuration Guide](http://guides.dataverse.org/en/5.1.1/installation/config/) has more information.
+
+### New S3 Bucket CORS setting for Direct Upload/Download
+
+When using S3 storage with direct upload and/or download enabled, one must now expose the ETag header as documented in the [updated cors.json example](https://guides.dataverse.org/en/5.1.1/developers/big-data-support.html?highlight=etag#s3-direct-upload-and-download).
+
+## Complete List of Changes
+
+For the complete list of code changes in this release, see the [5.1.1 Milestone](https://github.com/IQSS/dataverse/milestone/91?closed=1) in Github.
+
+For help with upgrading, installing, or general questions please post to the [Dataverse Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email [email protected].
+
+## Installation
+
+If this is a new installation, please see our [Installation Guide](http://guides.dataverse.org/en/5.1.1/installation/)
+
+## Upgrade Instructions
+
+0. These instructions assume that you've already successfully upgraded to Dataverse 5.1 following the instructions in the [Dataverse 5.1 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.1).
+
+1. Undeploy the previous version.
+
+`<payara install path>/bin/asadmin list-applications`
+`<payara install path>/bin/asadmin undeploy dataverse<-version>`
+
+2. Stop payara and remove the generated directory, start.
+
+- `service payara stop`
+- remove the generated directory:
+`rm -rf <payara install path>/glassfish/domains/domain1/generated`
+- `service payara start`
+
+3. Deploy this version.
+`<payara install path>/bin/asadmin deploy dataverse-5.1.1.war`
+
+4. Restart payara
diff --git a/doc/release-notes/5.2-release-notes.md b/doc/release-notes/5.2-release-notes.md
@@ -0,0 +1,105 @@
+# Dataverse 5.2
+
+This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
+
+## Release Highlights
+
+### File Preview When Guestbooks or Terms Exist
+
+Previously, file preview was only available when files were publicly downloadable. Now if a guestbook or terms (or both) are configured for the dataset, they will be shown in the Preview tab and once they are agreed to, the file preview will appear (#6919).
+
+### Preview Only External Tools
+
+A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919).
+
+### Dataset Page Edit Options Consolidation
+
+As part of the continued effort to redesign the Dataset and File pages, some of the edit options for a file on the dataset page are being moved to a "kebab" to allow for better consistency and future scalability.
+
+### Google Cloud Archiver
+
+Dataverse Bags can now be sent to a bucket in Google Cloud, including those in the "Coldline" storage class, which provides less expensive but slower access.
+
+## Major Use Cases
+
+Newly-supported use cases in this release include:
+
+- Users can now preview files that have a guestbook or terms. (Issue #6919, PR #7369)
+- External tool developers can indicate that their tool is "preview only". (Issue #6919, PR #7369)
+- Dataverse Administrators can set up a regular export to Google Cloud so that the installation's data is preserved (Issue #7140, PR #7292)
+- Dataverse Administrators can use a regex when defining a group (Issue #7344, PR #7351)
+- External Tool Developers can use a new API endpoint to retrieve a user's information (Issue #7307, PR #7345)
+
+## Notes for Dataverse Installation Administrators
+
+### Converting Explore External Tools to Preview Only
+
+When the war file is deployed, a SQL migration script will convert [dataverse-previewers][] to have both "explore" and "preview" types so that they will continue to be displayed in the Preview tab.
+
+If you would prefer that these tools be preview only, you can delete the tools, adjust the JSON manifests (changing "explore" to "preview"), and re-add them.
+
+[dataverse-previewers]: https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers
+
+### New Database Settings and JVM Options
+
+Installations integrating with Google Cloud Archiver will need to use two new database settings:
+
+- :GoogleCloudProject - the name of the project managing the bucket
+- :GoogleCloudBucket - the name of the bucket to use
+
+For more information, see the Google Cloud Configuration section of the [Installation Guide](https://guides.dataverse.org/en/5.2/installation/)
+
+### Automation of Make Data Count Scripts
+
+Scripts have been added in order to automate Make Data Count processing. For more information, see the Make Data Count section of the [Admin Guide](https://guides.dataverse.org/en/5.2/admin/).
+
+## Notes for Tool Developers and Integrators
+
+### Preview Only External Tools, "hasPreviewMode"
+
+A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919). This "preview" type replaces "hasPreviewMode", which has been removed.
+
+### Multiple Types for External Tools
+
+External tools now support multiple types. In practice, the types "explore" and "preview" are the only combination that makes a difference in the UI as opposed to only having only one or the other type (see "preview only" above). Multiple types are specified in the JSON manifest with an array in "types". The older, single "type" is still supported but should be considered deprecated.
+
+### User Information Endpoint
+
+New API endpoint to retrieve user info so that tools can email users if needed.
+
+## Complete List of Changes
+
+For the complete list of code changes in this release, see the [5.2 Milestone](https://github.com/IQSS/dataverse/milestone/92?closed=1) in Github.
+
+For help with upgrading, installing, or general questions please post to the [Dataverse Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email [email protected].
+
+## Installation
+
+If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.2/installation/).
+
+## Upgrade Instructions
+
+0\. These instructions assume that you've already successfully upgraded from Dataverse 4.x to  Dataverse 5 following the instructions in the [Dataverse 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0).
+
+1\. Undeploy the previous version.
+
+- `<payara install path>/bin/asadmin list-applications`
+- `<payara install path>/bin/asadmin undeploy dataverse<-version>`
+
+(where `<payara install path>` is where Payara 5 is installed, for example: `/usr/local/payara5`)
+
+2\. Stop payara and remove the generated directory, start.
+
+- `service payara stop`
+- remove the generated directory: 
+`rm -rf <payara install path>/payara/domains/domain1/generated`
+- `service payara start`
+
+3\. Deploy this version.
+
+- `<payara install path>/bin/asadmin deploy dataverse-5.2.war`
+
+4\. Restart payara
+
+- `service payara stop`
+- `service payara start`
diff --git a/doc/release-notes/7308-connection-pool-size.md b/doc/release-notes/7308-connection-pool-size.md
diff --git a/doc/sphinx-guides/source/_static/admin/counter-processor-config.yaml b/doc/sphinx-guides/source/_static/admin/counter-processor-config.yaml
@@ -2,7 +2,7 @@
 # 4-digit year and 2-digit month and day
 # /usr/local/payara5/glassfish/domains/domain1/logs/counter_2019-01-11.log
 #log_name_pattern: sample_logs/counter_(yyyy-mm-dd).log
-log_name_pattern: /usr/local/payara5/glassfish/domains/domain1/logs/counter_(yyyy-mm-dd).log
+log_name_pattern: /usr/local/payara5/glassfish/domains/domain1/logs/mdc/counter_(yyyy-mm-dd).log
 
 # path_types regular expressions allow matching to classify page urls as either an investigation or request
 # based on specific URL structure for your system.

diff --git a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv
@@ -1,3 +1,4 @@
+Tool	Type	Scope	Description
 TwoRavens	explore	file	A system of interlocking statistical tools for data exploration, analysis, and meta-analysis: http://2ra.vn. See the :doc:`/user/data-exploration/tworavens` section of the User Guide for more information on TwoRavens from the user perspective and the :doc:`/installation/r-rapache-tworavens` section of the Installation Guide. 
 Data Explorer	explore	file	A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Explorer for the instructions on adding Data Explorer to your Dataverse; and the :doc:`/installation/prerequisites` section of the Installation Guide for the instructions on how to set up **basic R configuration required** (specifically, Dataverse uses R to generate .prep metadata files that are needed to run Data Explorer). 
 Whole Tale	explore	dataset	A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the `Whole Tale User Guide <https://wholetale.readthedocs.io/en/stable/users_guide/integration.html>`_.

diff --git a/...hinx-guides/source/_static/installation/files/root/external-tools/dynamicDatasetTool.json b/...hinx-guides/source/_static/installation/files/root/external-tools/dynamicDatasetTool.json
@@ -2,7 +2,9 @@
   "displayName": "Dynamic Dataset Tool",
   "description": "Dazzles! Dizzying!",
   "scope": "dataset",
-  "type": "explore",
+  "types": [
+    "explore"
+  ],
   "toolUrl": "https://dynamicdatasettool.com/v2",
   "toolParameters": {
     "queryParameters": [

diff --git a/...sphinx-guides/source/_static/installation/files/root/external-tools/fabulousFileTool.json b/...sphinx-guides/source/_static/installation/files/root/external-tools/fabulousFileTool.json
@@ -3,7 +3,10 @@
   "description": "Fabulous Fun for Files!",
   "toolName": "fabulous",
   "scope": "file",
-  "type": "explore",
+  "types": [
+    "explore",
+    "preview"
+  ],
   "toolUrl": "https://fabulousfiletool.com",
   "contentType": "text/tab-separated-values",
   "toolParameters": {

diff --git a/doc/sphinx-guides/source/_static/util/counter_daily.sh b/doc/sphinx-guides/source/_static/util/counter_daily.sh
@@ -0,0 +1,36 @@
+#! /bin/bash
+
+COUNTER_PROCESSOR_DIRECTORY="/usr/local/counter-processor-0.0.1"
+MDC_LOG_DIRECTORY="/usr/local/payara5/glassfish/domains/domain1/logs/mdc"
+
+# counter_daily.sh
+
+cd $COUNTER_PROCESSOR_DIRECTORY
+
+echo >>/tmp/counter_daily.log
+date >>/tmp/counter_daily.log
+echo >>/tmp/counter_daily.log
+
+# "You should run Counter Processor once a day to create reports in SUSHI (JSON) format that are saved to disk for Dataverse to process and that are sent to the DataCite hub."
+
+LAST=$(date -d "yesterday 13:00" '+%Y-%m-%d')
+# echo $LAST
+YEAR_MONTH=$(date -d "yesterday 13:00" '+%Y-%m')
+# echo $YEAR_MONTH
+d=$(date -I -d "$YEAR_MONTH-01")
+#echo $d
+while [ "$(date -d "$d" +%Y%m%d)" -le "$(date -d "$LAST" +%Y%m%d)" ];
+do
+  if [ -f "$MDC_LOG_DIRECTORY/counter_$d.log" ]; then
+#       echo "Found counter_$d.log"
+  else
+        touch "$MDC_LOG_DIRECTORY/counter_$d.log"
+  fi
+  d=$(date -I -d "$d + 1 day")
+done
+
+#run counter-processor as counter user
+
+sudo -u counter YEAR_MONTH=$YEAR_MONTH python3 main.py >>/tmp/counter_daily.log
+
+curl -X POST "http://localhost:8080/api/admin/makeDataCount/addUsageMetricsFromSushiReport?reportOnDisk=/tmp/make-data-count-report.json"
diff --git a/doc/sphinx-guides/source/_static/util/counter_weekly.sh b/doc/sphinx-guides/source/_static/util/counter_weekly.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+#counter_weekly.sh
+
+# This script iterates through all published Datasets in all Dataverses and calls the Make Data Count API to update their citations from DataCite
+# Note: Requires curl and jq for parsing JSON responses form curl
+
+# A recursive method to process each Dataverse
+processDV () {
+echo "Processing Dataverse ID#: $1"
+
+#Call the Dataverse API to get the contents of the Dataverse (without credentials, this will only list published datasets and dataverses
+DVCONTENTS=$(curl -s http://localhost:8080/api/dataverses/$1/contents)
+
+# Iterate over all datasets, pulling the value of their DOIs (as part of the persistentUrl) from the json returned
+for subds in $(echo "${DVCONTENTS}" | jq -r '.data[] | select(.type == "dataset") | .persistentUrl'); do
+
+#The authority/identifier are preceded by a protocol/host, i.e. https://doi.org/
+DOI=`expr "$subds" : '.*:\/\/\doi\.org\/\(.*\)'`
+
+# Call the Dataverse API for this dataset and get the response
+RESULT=$(curl -s -X POST "http://localhost:8080/api/admin/makeDataCount/:persistentId/updateCitationsForDataset?persistentId=doi:$DOI" )
+# Parse the status and number of citations found from the response
+STATUS=$(echo "$RESULT" | jq -j '.status' )
+CITATIONS=$(echo "$RESULT" | jq -j '.data.citationCount')
+
+# The status for a call that worked
+OK='OK'
+
+# Check the status and report
+if [ "$STATUS" = "$OK" ]; then
+        echo "Updated: $CITATIONS citations for doi:$DOI"
+else
+        echo "Failed to update citations for doi:$DOI"
+        echo "Run curl -s -X POST 'http://localhost:8080/api/admin/makeDataCount/:persistentId/updateCitationsForDataset?persistentId=doi:$DOI ' to retry/see the error message"
+fi
+#processDV $subds
+done
+
+# Now iterate over any child Dataverses and recursively process them
+for subdv in $(echo "${DVCONTENTS}" | jq -r '.data[] | select(.type == "dataverse") | .id'); do
+echo $subdv
+processDV $subdv
+done
+
+}
+
+# Call the function on the root dataverse to start processing 
+processDV 1
diff --git a/doc/sphinx-guides/source/admin/external-tools.rst b/doc/sphinx-guides/source/admin/external-tools.rst
@@ -1,7 +1,7 @@
 External Tools
 ==============
 
-External tools can provide additional features that are not part of Dataverse itself, such as data exploration.
+External tools can provide additional features that are not part of Dataverse itself, such as data file previews, visualization, and curation.
 
 .. contents:: |toctitle|
   :local:
@@ -12,7 +12,7 @@ Inventory of External Tools
 ---------------------------
 
 .. csv-table:: 
-   :header: "Tool", "Type", "Scope", "Description"
+   :header-rows: 1
    :widths: 20, 10, 5, 65
    :delim: tab
    :file: ../_static/admin/dataverse-external-tools.tsv
@@ -31,14 +31,12 @@ To add an external tool to your installation of Dataverse you must first downloa
 
 Go to :ref:`inventory-of-external-tools` and download a JSON manifest for one of the tools by following links in the description to installation instructions.
 
-In the curl command below, replace the placeholder "fabulousFileTool.json" placeholder for the actual name of the JSON file you downloaded.
+Configure the tool with the curl command below, making sure to replace the ``fabulousFileTool.json`` placeholder for name of the JSON manifest file you downloaded.
 
 .. code-block:: bash
 
   curl -X POST -H 'Content-type: application/json' http://localhost:8080/api/admin/externalTools --upload-file fabulousFileTool.json 
 
-Note that some tools will provide a preview mode, which provides an embedded, simplified view of the tool on the file pages of your installation. This is controlled by the `hasPreviewMode` parameter. 
-
 Listing All External Tools in Dataverse
 +++++++++++++++++++++++++++++++++++++++
 
@@ -75,17 +73,26 @@ Testing External Tools
 
 Once you have added an external tool to your installation of Dataverse, you will probably want to test it to make sure it is functioning properly.
 
+File Level vs. Dataset Level
+++++++++++++++++++++++++++++
+
+File level tools are specific to the file type (content type or MIME type). For example, a tool may work with PDFs, which have a content type of "application/pdf". 
+
+In contrast, dataset level tools are always available no matter what file types are within the dataset.
+
 File Level Explore Tools
 ++++++++++++++++++++++++
 
-File level explore tools are specific to the file type (content type or MIME type) of the file. For example, Data Explorer is tool for exploring tabular data files.
+File level explore tools provide a variety of features from data visualization to statistical analysis.
 
-An "Explore" button will appear (on both the dataset page and the file landing page) for files that match the type that the tool has been built for. When there are multiple explore tools for a filetype, the button becomes a dropdown.
+For each supported file type, file level explore tools appear in the file listing of the dataset page as well as under the "Access" button on each file page.
 
 File Level Preview Tools
 ++++++++++++++++++++++++
 
-File level explore tools can be set up to display in preview mode, which is a simplified view of an explore tool designed specifically for embedding in the file page. 
+File level preview tools allow the user to see a preview of the file contents without having to download it.
+
+When a file has a preview available, a preview icon will appear next to that file in the file listing on the dataset page. On the file page itself, the preview will appear in a Preview tab either immediately or once a guestbook has been filled in or terms, if any, have been agreed to.
 
 File Level Configure Tools
 ++++++++++++++++++++++++++
@@ -95,12 +102,12 @@ File level configure tools are only available when you log in and have write acc
 Dataset Level Explore Tools
 +++++++++++++++++++++++++++
 
-When a dataset level explore tool is added, an "Explore" button on the dataset page will appear. This button becomes a drop down when there are multiple tools.
+Dataset level explore tools allow the user to explore all the files in a dataset.
 
 Dataset Level Configure Tools
 +++++++++++++++++++++++++++++
 
-Configure tools at the dataset level are not currently supported. No button appears in the GUI if you add this type of tool.
+Configure tools at the dataset level are not currently supported.
 
 Writing Your Own External Tool
 ------------------------------