Skip to content

Commit

Permalink
Merge pull request #7403 from IQSS/develop
Browse files Browse the repository at this point in the history
v5.2
  • Loading branch information
kcondon authored Nov 9, 2020
2 parents 559a449 + 5deeec4 commit 4951505
Show file tree
Hide file tree
Showing 91 changed files with 3,183 additions and 2,325 deletions.
63 changes: 63 additions & 0 deletions doc/release-notes/5.1.1-release-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Dataverse 5.1.1

This minor release adds important scaling improvements for installations running on AWS S3. It is recommended that 5.1.1 be used in production instead of 5.1.

## Release Highlights

### Connection Pool Size Configuration Option, Connection Optimizations

Dataverse 5.1 improved the efficiency of making S3 connections through use of an http connection pool. This release adds optimizations around closing streams and channels that may hold S3 http connections open and exhaust the connection pool. In parallel, this release increases the default pool size from 50 to 256 and adds the ability to increase the size of the connection pool, so a larger pool can be configured if needed.

## Major Use Cases

Newly-supported use cases in this release include:

- Administrators of installations using S3 will be able to define the connection pool size, allowing better resource scaling for larger installations (Issue #7309, PR #7313)

## Notes for Dataverse Installation Administrators

### 5.1.1 vs. 5.1 for Production Use

As mentioned above, we encourage 5.1.1 instead of 5.1 for production use.

### New JVM Option for Connection Pool Size

Larger installations may want to increase the number of open S3 connections allowed (default is 256). For example, to set the value to 4096:

`./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"`
(where `<id>` is the identifier of your S3 file store (likely `"s3"`). The JVM Options section of the [Configuration Guide](http://guides.dataverse.org/en/5.1.1/installation/config/) has more information.

### New S3 Bucket CORS setting for Direct Upload/Download

When using S3 storage with direct upload and/or download enabled, one must now expose the ETag header as documented in the [updated cors.json example](https://guides.dataverse.org/en/5.1.1/developers/big-data-support.html?highlight=etag#s3-direct-upload-and-download).

## Complete List of Changes

For the complete list of code changes in this release, see the [5.1.1 Milestone](https://github.com/IQSS/dataverse/milestone/91?closed=1) in Github.

For help with upgrading, installing, or general questions please post to the [Dataverse Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email [email protected].

## Installation

If this is a new installation, please see our [Installation Guide](http://guides.dataverse.org/en/5.1.1/installation/)

## Upgrade Instructions

0. These instructions assume that you've already successfully upgraded to Dataverse 5.1 following the instructions in the [Dataverse 5.1 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.1).

1. Undeploy the previous version.

`<payara install path>/bin/asadmin list-applications`
`<payara install path>/bin/asadmin undeploy dataverse<-version>`

2. Stop payara and remove the generated directory, start.

- `service payara stop`
- remove the generated directory:
`rm -rf <payara install path>/glassfish/domains/domain1/generated`
- `service payara start`

3. Deploy this version.
`<payara install path>/bin/asadmin deploy dataverse-5.1.1.war`

4. Restart payara
105 changes: 105 additions & 0 deletions doc/release-notes/5.2-release-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Dataverse 5.2

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

## Release Highlights

### File Preview When Guestbooks or Terms Exist

Previously, file preview was only available when files were publicly downloadable. Now if a guestbook or terms (or both) are configured for the dataset, they will be shown in the Preview tab and once they are agreed to, the file preview will appear (#6919).

### Preview Only External Tools

A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919).

### Dataset Page Edit Options Consolidation

As part of the continued effort to redesign the Dataset and File pages, some of the edit options for a file on the dataset page are being moved to a "kebab" to allow for better consistency and future scalability.

### Google Cloud Archiver

Dataverse Bags can now be sent to a bucket in Google Cloud, including those in the "Coldline" storage class, which provides less expensive but slower access.

## Major Use Cases

Newly-supported use cases in this release include:

- Users can now preview files that have a guestbook or terms. (Issue #6919, PR #7369)
- External tool developers can indicate that their tool is "preview only". (Issue #6919, PR #7369)
- Dataverse Administrators can set up a regular export to Google Cloud so that the installation's data is preserved (Issue #7140, PR #7292)
- Dataverse Administrators can use a regex when defining a group (Issue #7344, PR #7351)
- External Tool Developers can use a new API endpoint to retrieve a user's information (Issue #7307, PR #7345)

## Notes for Dataverse Installation Administrators

### Converting Explore External Tools to Preview Only

When the war file is deployed, a SQL migration script will convert [dataverse-previewers][] to have both "explore" and "preview" types so that they will continue to be displayed in the Preview tab.

If you would prefer that these tools be preview only, you can delete the tools, adjust the JSON manifests (changing "explore" to "preview"), and re-add them.

[dataverse-previewers]: https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers

### New Database Settings and JVM Options

Installations integrating with Google Cloud Archiver will need to use two new database settings:

- :GoogleCloudProject - the name of the project managing the bucket
- :GoogleCloudBucket - the name of the bucket to use

For more information, see the Google Cloud Configuration section of the [Installation Guide](https://guides.dataverse.org/en/5.2/installation/)

### Automation of Make Data Count Scripts

Scripts have been added in order to automate Make Data Count processing. For more information, see the Make Data Count section of the [Admin Guide](https://guides.dataverse.org/en/5.2/admin/).

## Notes for Tool Developers and Integrators

### Preview Only External Tools, "hasPreviewMode"

A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919). This "preview" type replaces "hasPreviewMode", which has been removed.

### Multiple Types for External Tools

External tools now support multiple types. In practice, the types "explore" and "preview" are the only combination that makes a difference in the UI as opposed to only having only one or the other type (see "preview only" above). Multiple types are specified in the JSON manifest with an array in "types". The older, single "type" is still supported but should be considered deprecated.

### User Information Endpoint

New API endpoint to retrieve user info so that tools can email users if needed.

## Complete List of Changes

For the complete list of code changes in this release, see the [5.2 Milestone](https://github.com/IQSS/dataverse/milestone/92?closed=1) in Github.

For help with upgrading, installing, or general questions please post to the [Dataverse Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email [email protected].

## Installation

If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.2/installation/).

## Upgrade Instructions

0\. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the [Dataverse 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0).

1\. Undeploy the previous version.

- `<payara install path>/bin/asadmin list-applications`
- `<payara install path>/bin/asadmin undeploy dataverse<-version>`

(where `<payara install path>` is where Payara 5 is installed, for example: `/usr/local/payara5`)

2\. Stop payara and remove the generated directory, start.

- `service payara stop`
- remove the generated directory:
`rm -rf <payara install path>/payara/domains/domain1/generated`
- `service payara start`

3\. Deploy this version.

- `<payara install path>/bin/asadmin deploy dataverse-5.2.war`

4\. Restart payara

- `service payara stop`
- `service payara start`
9 changes: 0 additions & 9 deletions doc/release-notes/7308-connection-pool-size.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# 4-digit year and 2-digit month and day
# /usr/local/payara5/glassfish/domains/domain1/logs/counter_2019-01-11.log
#log_name_pattern: sample_logs/counter_(yyyy-mm-dd).log
log_name_pattern: /usr/local/payara5/glassfish/domains/domain1/logs/counter_(yyyy-mm-dd).log
log_name_pattern: /usr/local/payara5/glassfish/domains/domain1/logs/mdc/counter_(yyyy-mm-dd).log

# path_types regular expressions allow matching to classify page urls as either an investigation or request
# based on specific URL structure for your system.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Tool Type Scope Description
TwoRavens explore file A system of interlocking statistical tools for data exploration, analysis, and meta-analysis: http://2ra.vn. See the :doc:`/user/data-exploration/tworavens` section of the User Guide for more information on TwoRavens from the user perspective and the :doc:`/installation/r-rapache-tworavens` section of the Installation Guide.
Data Explorer explore file A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Explorer for the instructions on adding Data Explorer to your Dataverse; and the :doc:`/installation/prerequisites` section of the Installation Guide for the instructions on how to set up **basic R configuration required** (specifically, Dataverse uses R to generate .prep metadata files that are needed to run Data Explorer).
Whole Tale explore dataset A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the `Whole Tale User Guide <https://wholetale.readthedocs.io/en/stable/users_guide/integration.html>`_.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
"displayName": "Dynamic Dataset Tool",
"description": "Dazzles! Dizzying!",
"scope": "dataset",
"type": "explore",
"types": [
"explore"
],
"toolUrl": "https://dynamicdatasettool.com/v2",
"toolParameters": {
"queryParameters": [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
"description": "Fabulous Fun for Files!",
"toolName": "fabulous",
"scope": "file",
"type": "explore",
"types": [
"explore",
"preview"
],
"toolUrl": "https://fabulousfiletool.com",
"contentType": "text/tab-separated-values",
"toolParameters": {
Expand Down
36 changes: 36 additions & 0 deletions doc/sphinx-guides/source/_static/util/counter_daily.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#! /bin/bash

COUNTER_PROCESSOR_DIRECTORY="/usr/local/counter-processor-0.0.1"
MDC_LOG_DIRECTORY="/usr/local/payara5/glassfish/domains/domain1/logs/mdc"

# counter_daily.sh

cd $COUNTER_PROCESSOR_DIRECTORY

echo >>/tmp/counter_daily.log
date >>/tmp/counter_daily.log
echo >>/tmp/counter_daily.log

# "You should run Counter Processor once a day to create reports in SUSHI (JSON) format that are saved to disk for Dataverse to process and that are sent to the DataCite hub."

LAST=$(date -d "yesterday 13:00" '+%Y-%m-%d')
# echo $LAST
YEAR_MONTH=$(date -d "yesterday 13:00" '+%Y-%m')
# echo $YEAR_MONTH
d=$(date -I -d "$YEAR_MONTH-01")
#echo $d
while [ "$(date -d "$d" +%Y%m%d)" -le "$(date -d "$LAST" +%Y%m%d)" ];
do
if [ -f "$MDC_LOG_DIRECTORY/counter_$d.log" ]; then
# echo "Found counter_$d.log"
else
touch "$MDC_LOG_DIRECTORY/counter_$d.log"
fi
d=$(date -I -d "$d + 1 day")
done

#run counter-processor as counter user

sudo -u counter YEAR_MONTH=$YEAR_MONTH python3 main.py >>/tmp/counter_daily.log

curl -X POST "http://localhost:8080/api/admin/makeDataCount/addUsageMetricsFromSushiReport?reportOnDisk=/tmp/make-data-count-report.json"
48 changes: 48 additions & 0 deletions doc/sphinx-guides/source/_static/util/counter_weekly.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/bin/sh
#counter_weekly.sh

# This script iterates through all published Datasets in all Dataverses and calls the Make Data Count API to update their citations from DataCite
# Note: Requires curl and jq for parsing JSON responses form curl

# A recursive method to process each Dataverse
processDV () {
echo "Processing Dataverse ID#: $1"

#Call the Dataverse API to get the contents of the Dataverse (without credentials, this will only list published datasets and dataverses
DVCONTENTS=$(curl -s http://localhost:8080/api/dataverses/$1/contents)

# Iterate over all datasets, pulling the value of their DOIs (as part of the persistentUrl) from the json returned
for subds in $(echo "${DVCONTENTS}" | jq -r '.data[] | select(.type == "dataset") | .persistentUrl'); do

#The authority/identifier are preceded by a protocol/host, i.e. https://doi.org/
DOI=`expr "$subds" : '.*:\/\/\doi\.org\/\(.*\)'`

# Call the Dataverse API for this dataset and get the response
RESULT=$(curl -s -X POST "http://localhost:8080/api/admin/makeDataCount/:persistentId/updateCitationsForDataset?persistentId=doi:$DOI" )
# Parse the status and number of citations found from the response
STATUS=$(echo "$RESULT" | jq -j '.status' )
CITATIONS=$(echo "$RESULT" | jq -j '.data.citationCount')

# The status for a call that worked
OK='OK'

# Check the status and report
if [ "$STATUS" = "$OK" ]; then
echo "Updated: $CITATIONS citations for doi:$DOI"
else
echo "Failed to update citations for doi:$DOI"
echo "Run curl -s -X POST 'http://localhost:8080/api/admin/makeDataCount/:persistentId/updateCitationsForDataset?persistentId=doi:$DOI ' to retry/see the error message"
fi
#processDV $subds
done

# Now iterate over any child Dataverses and recursively process them
for subdv in $(echo "${DVCONTENTS}" | jq -r '.data[] | select(.type == "dataverse") | .id'); do
echo $subdv
processDV $subdv
done

}

# Call the function on the root dataverse to start processing
processDV 1
27 changes: 17 additions & 10 deletions doc/sphinx-guides/source/admin/external-tools.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
External Tools
==============

External tools can provide additional features that are not part of Dataverse itself, such as data exploration.
External tools can provide additional features that are not part of Dataverse itself, such as data file previews, visualization, and curation.

.. contents:: |toctitle|
:local:
Expand All @@ -12,7 +12,7 @@ Inventory of External Tools
---------------------------

.. csv-table::
:header: "Tool", "Type", "Scope", "Description"
:header-rows: 1
:widths: 20, 10, 5, 65
:delim: tab
:file: ../_static/admin/dataverse-external-tools.tsv
Expand All @@ -31,14 +31,12 @@ To add an external tool to your installation of Dataverse you must first downloa

Go to :ref:`inventory-of-external-tools` and download a JSON manifest for one of the tools by following links in the description to installation instructions.

In the curl command below, replace the placeholder "fabulousFileTool.json" placeholder for the actual name of the JSON file you downloaded.
Configure the tool with the curl command below, making sure to replace the ``fabulousFileTool.json`` placeholder for name of the JSON manifest file you downloaded.

.. code-block:: bash
curl -X POST -H 'Content-type: application/json' http://localhost:8080/api/admin/externalTools --upload-file fabulousFileTool.json
Note that some tools will provide a preview mode, which provides an embedded, simplified view of the tool on the file pages of your installation. This is controlled by the `hasPreviewMode` parameter.

Listing All External Tools in Dataverse
+++++++++++++++++++++++++++++++++++++++

Expand Down Expand Up @@ -75,17 +73,26 @@ Testing External Tools

Once you have added an external tool to your installation of Dataverse, you will probably want to test it to make sure it is functioning properly.

File Level vs. Dataset Level
++++++++++++++++++++++++++++

File level tools are specific to the file type (content type or MIME type). For example, a tool may work with PDFs, which have a content type of "application/pdf".

In contrast, dataset level tools are always available no matter what file types are within the dataset.

File Level Explore Tools
++++++++++++++++++++++++

File level explore tools are specific to the file type (content type or MIME type) of the file. For example, Data Explorer is tool for exploring tabular data files.
File level explore tools provide a variety of features from data visualization to statistical analysis.

An "Explore" button will appear (on both the dataset page and the file landing page) for files that match the type that the tool has been built for. When there are multiple explore tools for a filetype, the button becomes a dropdown.
For each supported file type, file level explore tools appear in the file listing of the dataset page as well as under the "Access" button on each file page.

File Level Preview Tools
++++++++++++++++++++++++

File level explore tools can be set up to display in preview mode, which is a simplified view of an explore tool designed specifically for embedding in the file page.
File level preview tools allow the user to see a preview of the file contents without having to download it.

When a file has a preview available, a preview icon will appear next to that file in the file listing on the dataset page. On the file page itself, the preview will appear in a Preview tab either immediately or once a guestbook has been filled in or terms, if any, have been agreed to.

File Level Configure Tools
++++++++++++++++++++++++++
Expand All @@ -95,12 +102,12 @@ File level configure tools are only available when you log in and have write acc
Dataset Level Explore Tools
+++++++++++++++++++++++++++

When a dataset level explore tool is added, an "Explore" button on the dataset page will appear. This button becomes a drop down when there are multiple tools.
Dataset level explore tools allow the user to explore all the files in a dataset.

Dataset Level Configure Tools
+++++++++++++++++++++++++++++

Configure tools at the dataset level are not currently supported. No button appears in the GUI if you add this type of tool.
Configure tools at the dataset level are not currently supported.

Writing Your Own External Tool
------------------------------
Expand Down
Loading

0 comments on commit 4951505

Please sign in to comment.