Skip to content

Releases: IQSS/dataverse

v5.2

09 Nov 21:31
4951505
Compare
Choose a tag to compare

Dataverse 5.2

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

File Preview When Guestbooks or Terms Exist

Previously, file preview was only available when files were publicly downloadable. Now if a guestbook or terms (or both) are configured for the dataset, they will be shown in the Preview tab and once they are agreed to, the file preview will appear (#6919).

Preview Only External Tools

A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919).

Dataset Page Edit Options Consolidation

As part of the continued effort to redesign the Dataset and File pages, some of the edit options for a file on the dataset page are being moved to a "kebab" to allow for better consistency and future scalability.

Google Cloud Archiver

Dataverse Bags can now be sent to a bucket in Google Cloud, including those in the "Coldline" storage class, which provides less expensive but slower access.

Major Use Cases

Newly-supported use cases in this release include:

  • Users can now preview files that have a guestbook or terms. (Issue #6919, PR #7369)
  • External tool developers can indicate that their tool is "preview only". (Issue #6919, PR #7369)
  • Dataverse Administrators can set up a regular export to Google Cloud so that the installation's data is preserved (Issue #7140, PR #7292)
  • Dataverse Administrators can use a regex when defining a group (Issue #7344, PR #7351)
  • External Tool Developers can use a new API endpoint to retrieve a user's information (Issue #7307, PR #7345)

Notes for Dataverse Installation Administrators

Converting Explore External Tools to Preview Only

When the war file is deployed, a SQL migration script will convert dataverse-previewers to have both "explore" and "preview" types so that they will continue to be displayed in the Preview tab.

If you would prefer that these tools be preview only, you can delete the tools, adjust the JSON manifests (changing "explore" to "preview"), and re-add them.

New Database Settings and JVM Options

Installations integrating with Google Cloud Archiver will need to use two new database settings:

  • :GoogleCloudProject - the name of the project managing the bucket
  • :GoogleCloudBucket - the name of the bucket to use

For more information, see the Google Cloud Configuration section of the Installation Guide

Automation of Make Data Count Scripts

Scripts have been added in order to automate Make Data Count processing. For more information, see the Make Data Count section of the Admin Guide.

Notes for Tool Developers and Integrators

Preview Only External Tools, "hasPreviewMode"

A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919). This "preview" type replaces "hasPreviewMode", which has been removed.

Multiple Types for External Tools

External tools now support multiple types. In practice, the types "explore" and "preview" are the only combination that makes a difference in the UI as opposed to only having only one or the other type (see "preview only" above). Multiple types are specified in the JSON manifest with an array in "types". The older, single "type" is still supported but should be considered deprecated.

User Information Endpoint

New API endpoint to retrieve user info so that tools can email users if needed.

Complete List of Changes

For the complete list of code changes in this release, see the 5.2 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.

1. Undeploy the previous version.

  • <payara install path>/bin/asadmin list-applications
  • <payara install path>/bin/asadmin undeploy dataverse<-version>

(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)

2. Stop payara and remove the generated directory, start.

  • service payara stop
  • remove the generated directory:
    rm -rf <payara install path>/payara/domains/domain1/generated
  • service payara start

3. Deploy this version.

  • <payara install path>/bin/asadmin deploy dataverse-5.2.war

4. Restart payara

  • service payara stop
  • service payara start

v5.1.1

08 Oct 18:02
559a449
Compare
Choose a tag to compare

Dataverse 5.1.1

This minor release adds important scaling improvements for installations running on AWS S3. It is recommended that 5.1.1 be used in production instead of 5.1.

Release Highlights

Connection Pool Size Configuration Option, Connection Optimizations

Dataverse 5.1 improved the efficiency of making S3 connections through use of an http connection pool. This release adds optimizations around closing streams and channels that may hold S3 http connections open and exhaust the connection pool. In parallel, this release increases the default pool size from 50 to 256 and adds the ability to increase the size of the connection pool, so a larger pool can be configured if needed.

Major Use Cases

Newly-supported use cases in this release include:

  • Administrators of installations using S3 will be able to define the connection pool size, allowing better resource scaling for larger installations (Issue #7309, PR #7313)

Notes for Dataverse Installation Administrators

5.1.1 vs. 5.1 for Production Use

As mentioned above, we encourage 5.1.1 instead of 5.1 for production use.

New JVM Option for Connection Pool Size

Larger installations may want to increase the number of open S3 connections allowed (default is 256). For example, to set the value to 4096:

./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"
(where <id> is the identifier of your S3 file store (likely "s3"). The JVM Options section of the Configuration Guide has more information.

Complete List of Changes

For the complete list of code changes in this release, see the 5.1.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide

Upgrade Instructions

  1. These instructions assume that you've already successfully upgraded to Dataverse 5.1 following the instructions in the Dataverse 5.1 Release Notes.

  2. Undeploy the previous version.

<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>

  1. Stop payara and remove the generated directory, start.
  • service payara stop
  • remove the generated directory:
    rm -rf <payara install path>/glassfish/domains/domain1/generated
  • service payara start
  1. Deploy this version.
    <payara install path>/bin/asadmin deploy dataverse-5.1.1.war

  2. Restart payara

Dataverse 5.1

06 Oct 15:55
7a0eef0
Compare
Choose a tag to compare

Dataverse 5.1

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Large File Upload for Installations Using AWS S3

The added support for multipart upload through the API and UI (Issue #6763) will allow files larger than 5 GB to be uploaded to Dataverse when an installation is running on AWS S3. Previously, only non-AWS S3 storage configurations would allow uploads larger than 5 GB.

Dataset-Specific Stores

In previous releases, configuration options were added that allow each dataverse to have a specific store enabled. This release adds even more granularity, with the ability to set a dataset-level store.

Major Use Cases

Newly-supported use cases in this release include:

  • Users can now upload files larger than 5 GB on installations running AWS S3 (Issue #6763, PR #6995)
  • Administrators will now be able to specify a store at the dataset level in addition to the Dataverse level (Issue #6872, PR #7272)
  • Users will have their dataset's directory structure retained when uploading a dataset with shapefiles (Issue #6873, PR #7279)
  • Users will now be able to download zip files through the experimental Zipper service when the set of downloaded files have duplicate names (Issue #80, PR #7276)
  • Users will now be able to download zip files with the proper file structure through the experiment Zipper service (Issue #7255, PR #7258)
  • Administrators will be able to use new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause stale search results to not load. (Issue #4225, PR #7211)

Notes for Dataverse Installation Administrators

New API for setting a Dataset-level Store

  • This release adds a new API for setting a dataset-specific store. Learn more in the Managing Dataverse and Datasets section of the Admin Guide.

Multipart Upload Storage Monitoring, Recommended Use for Multipart Upload

Charges may be incurred for storage reserved for multipart uploads that are not completed or cancelled. Administrators may want to do periodic manual or automated checks for open multipart uploads. Learn more in the Big Data Support section of the Developers Guide.

While multipart uploads can support much larger files, and can have advantages in terms of robust transfer and speed, they are more complex than single part direct uploads. Administrators should consider taking advantage of the options to limit use of multipart uploads to specific users by using multiple stores and configuring access to stores with high file size limits to specific Dataverses (added in 4.20) or Datasets (added in this release).

New APIs for keeping Solr records in sync

This release adds new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause search results to not load. Learn more in the Solr section of the Admin Guide.

Documentation for Purging the Ingest Queue

At times, it may be necessary to cancel long-running Ingest jobs in the interest of system stability. The Troubleshooting section of the Admin Guide now has specific steps.

Biomedical Metadata Block Updated

The Life Science Metadata block (biomedical.tsv) was updated. "Other Design Type", "Other Factor Type", "Other Technology Type", "Other Technology Platform" boxes were added. See the "Additional Upgrade Steps" below if you use this in your installation.

Notes for Tool Developers and Integrators

Spaces in File Names

Dataverse Installations using S3 storage will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.1+.

Complete List of Changes

For the complete list of code changes in this release, see the 5.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide

Upgrade Instructions

  1. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.

  2. Undeploy the previous version.

<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>

(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)

  1. Stop payara and remove the generated directory, start.
  • service payara stop
  • remove the generated directory:
    rm -rf <payara install path>/payara/domains/domain1/generated
  • service payara start
  1. Deploy this version.
    <payara install path>/bin/asadmin deploy dataverse-5.1.war

  2. Restart payara

Additional Upgrade Steps

  1. Update Biomedical Metadata Block (if used), Reload Solr, ReExportAll

    wget https://github.com/IQSS/dataverse/releases/download/v5.1/biomedical.tsv
    curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @biomedical.tsv -H "Content-type: text/tab-separated-values"

  2. Check if your Solr installation is running with the latest schema.xml config file (https://github.com/IQSS/dataverse/releases/download/v5.1/schema.xml), update if needed.

  3. Run the script updateSchemaMDB.sh to generate updated solr schema files and preserve any other custom fields in your Solr configuration.
    For example: (modify the path names as needed)
    cd /usr/local/solr-7.7.2/server/solr/collection1/conf
    wget https://github.com/IQSS/dataverse/releases/download/v5.1/updateSchemaMDB.sh
    chmod +x updateSchemaMDB.sh
    ./updateSchemaMDB.sh -t .
    See http://guides.dataverse.org/en/5.1/admin/metadatacustomization.html?highlight=updateschemamdb for more information.

  4. Run ReExportall to update JSON Exports
    http://guides.dataverse.org/en/5.1/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

Dataverse 5.0

18 Aug 21:34
993d0a3
Compare
Choose a tag to compare

Dataverse 5.0

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Please note that this is a major release and these are long release notes. We offer no apologies. :)

Release Highlights

Continued Dataset and File Redesign: Dataset and File Button Redesign, Responsive Layout

The buttons available on the Dataset and File pages have been redesigned. This change is to provide more scalability for future expanded options for data access and exploration, and to provide a consistent experience between the two pages. The dataset and file pages have also been redesigned to be more responsive and function better across multiple devices.

This is an important step in the incremental process of the Dataset and File Redesign project, following the release of on-page previews, filtering and sorting options, tree view, and other enhancements. Additional features in support of these redesign efforts will follow in later 5.x releases.

Payara 5

A major upgrade of the application server provides security updates, access to new features like MicroProfile Config API, and will enable upgrades to other core technologies.

Note that moving from Glassfish to Payara will be required as part of the move to Dataverse 5.

Download Dataset

Users can now more easily download all files in Dataset through both the UI and API. If this causes server instability, it's suggested that Dataverse Installation Administrators take advantage of the new Standalone Zipper Service described below.

Download All Option on the Dataset Page

In previous versions of Dataverse, downloading all files from a dataset meant several clicks to select files and initiate the download. The Dataset Page now includes a Download All option for both the original and archival formats of the files in a dataset under the "Access Dataset" button.

Download All Files in a Dataset by API

In previous versions of Dataverse, downloading all files from a dataset via API was a two step process:

  • Find all the database ids of the files.
  • Download all the files, using those ids (comma-separated).

Now you can download all files from a dataset (assuming you have access to them) via API by passing the dataset persistent ID (PID such as DOI or Handle) or the dataset's database id. Versions are also supported, and you can pass :draft, :latest, :latest-published, or numbers (1.1, 2.0) similar to the "download metadata" API.

A Multi-File, Zipped Download Optimization

In this release we are offering an experimental optimization for the multi-file, download-as-zip functionality. If this option is enabled, instead of enforcing size limits, we attempt to serve all the files that the user requested (that they are authorized to download), but the request is redirected to a standalone zipper service running as a cgi executable. Thus moving these potentially long-running jobs completely outside the Application Server (Payara); and preventing service threads from becoming locked serving them. Since zipping is also a CPU-intensive task, it is possible to have this service running on a different host system, thus freeing the cycles on the main Application Server. The system running the service needs to have access to the database as well as to the storage filesystem, and/or S3 bucket.

Please consult the scripts/zipdownload/README.md in the Dataverse 5 source tree.

The components of the standalone "zipper tool" can also be downloaded
here:

https://github.com/IQSS/dataverse/releases/download/v5.0/zipper.zip

Updated File Handling

Files without extensions can now be uploaded through the UI. This release also changes the way Dataverse handles duplicate (filename or checksum) files in a dataset. Specifically:

  • Files with the same checksum can be included in a dataset, even if the files are in the same directory.
  • Files with the same filename can be included in a dataset as long as the files are in different directories.
  • If a user uploads a file to a directory where a file already exists with that directory/filename combination, Dataverse will adjust the file path and names by adding "-1" or "-2" as applicable. This change will be visible in the list of files being uploaded.
  • If the directory or name of an existing or newly uploaded file is edited in such a way that would create a directory/filename combination that already exists, Dataverse will display an error.
  • If a user attempts to replace a file with another file that has the same checksum, an error message will be displayed and the file will not be able to be replaced.
  • If a user attempts to replace a file with a file that has the same checksum as a different file in the dataset, a warning will be displayed.
  • Files without extensions can now be uploaded through the UI.

Pre-Publish DOI Reservation with DataCite

Dataverse installations using DataCite will be able to reserve the persistent identifiers for datasets with DataCite ahead of publishing time. This allows the DOI to be reserved earlier in the data sharing process and makes the step of publishing datasets simpler and less error-prone.

Primefaces 8

Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements.

Major Use Cases

Newly-supported use cases in this release include:

  • Users will be presented with a new workflow around dataset and file access and exploration. (Issue #6684, PR #6909)
  • Users will experience a UI appropriate across a variety of device sizes. (Issue #6684, PR #6909)
  • Users will be able to download an entire dataset without needing to select all the files in that dataset. (Issue #6564, PR #6262)
  • Users will be able to download all files in a dataset with a single API call. (Issue #4529, PR #7086)
  • Users will have DOIs reserved for their datasets upon dataset create instead of at publish time. (Issue #5093, PR #6901)
  • Users will be able to upload files without extensions. (Issue #6634, PR #6804)
  • Users will be able to upload files with the same name in a dataset, as long as a those files are in different file paths. (Issue #4813, PR #6924)
  • Users will be able to upload files with the same checksum in a dataset. (Issue #4813, PR #6924)
  • Users will be less likely to encounter locks during the publishing process due to PID providers being unavailable. (Issue #6918, PR #7118)
  • Users will now have their files validated during publish, and in the unlikely event that anything has happened to the files between deposit and publish, they will be able to take corrective action. (Issue #6558, PR #6790)
  • Administrators will likely see more success with Harvesting, as many minor harvesting issues have been resolved. (Issues #7127, #7128, #4597, #7056, #7052, #7023, #7009, and #7003)
  • Administrators can now enable an external zip service that frees up application server resources and allows the zip download limit to be increased. (Issue #6505, PR #6986)
  • Administrators can now create groups based on users' email domains. (Issue #6936, PR #6974)
  • Administrators can now set date facets to be organized chronologically. (Issue #4977, PR #6958)
  • Administrators can now link harvested datasets using an API. (Issue #5886, PR #6935)
  • Administrators can now destroy datasets with mapped shapefiles. (Issue #4093, PR #6860)

Notes for Dataverse Installation Administrators

Glassfish to Payara

This upgrade requires a few extra steps. See the detailed upgrade instructions below.

Dataverse Installations Using DataCite: Upgrade Action Required

If you are using DataCite as your DOI provider you must add a new JVM option called "doi.dataciterestapiurlstring" with a value of "https://api.datacite.org" for production environments and "https://api.test.datacite.org" for test environments. More information about this JVM option can be found in the Installation Guide.

"doi.mdcbaseurlstring" should be deleted if it was previously set.

Dataverse Installations Using DataCite: Upgrade Action Recommended

For installations that are using DataCite, Dataverse v5.0 introduces a change in the process of registering the Persistent Identifier (DOI) for a dataset. Instead of registering it when the dataset is published for the first time, Dataverse will try to "reserve" the DOI when it's created (by registering it as a "draft", using DataCite terminology). When the user publishes the dataset, the DOI will be publicized as well (by switching the registration status to "findable"). This approach makes the process of publishing datasets simpler and less error-prone.

New APIs have been provided for finding any unreserved DataCite-issued DOIs in your Dataverse, and for reserving them (see below). While not required - the user can still attempt to publish a dataset with an unreserved DOI - having all the identifiers reserved ahead of time is recommended. If you are upgrading an installation that uses DataCite, we specifically recommend that you reserve the DOIs for all your pre-existing unpublished drafts as soon as Dataverse v5.0 is deployed, since none of them were registered at create time. This can be done using the following API calls:

  • /api/pids/unreserved will report the ids of the datasets
  • /api/pids/:persistentId/reserve reserves the assigned DOI with DataCite (will need to be run on every id reported by the the first API).

See the Native API Guide for more information.

Scripted, the whole process would look as follows (adj...

Read more

4.20

01 Apr 20:00
4e07b62
Compare
Choose a tag to compare

Dataverse 4.20

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Multiple Store Support

Dataverse can now be configured to store files in more than one place at the same time (multiple file, s3, and/or swift stores).

General information about this capability can be found below and in the Configuration Guide - File Storage section.

S3 Direct Upload support

S3 stores can now optionally be configured to support direct upload of files, as one option for supporting upload of larger files. In the current implementation, each file is uploaded in a single HTTP call. For AWS, this limits file size to 5 GB. With Minio the theoretical limit should be 5 TB and 50+ GB file uploads have been tested successfully. (In practice other factors such as network timeouts may prevent a successful upload a multi-TB file and minio instances may be configured with a < 5 TB single HTTP call limit.) No other S3 service providers have been tested yet. Their limits should be the lower of the maximum object size allowed and any single HTTP call upload limit.

General information about this capability can be found in the Big Data Support Guide with specific information about how to enable it in the Configuration Guide - File Storage section.

Integration Test Coverage Reporting

The percentage of code covered by the API-based integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.

New APIs

New APIs for Role Management and Dataset Size have been added. Previously, managing roles at the dataset and file level was only possible through the UI. API users can now also retrieve the size of a dataset through an API call, with specific parameters depending on the type of information needed.

More information can be found in the API Guide.

Major Use Cases

Newly-supported use cases in this release include:

  • Users will now be able to see the number of linked datasets and dataverses accurately reflected in the facet counts on the Dataverse search page. (Issue #6564, PR #6262)
  • Users will be able to upload large files directly to S3. (Issue #6489, PR #6490)
  • Users will be able to see the PIDs of datasets and files in the Guestbook export. (Issue #6534, PR #6628)
  • Administrators will be able to configure multiple stores per Dataverse installation, which allow dataverse-level setting of storage location, upload size limits, and supported data transfer methods (Issue #6485, PR #6488)
  • Administrators and integrators will be able to manage roles using a new API. (Issue #6290, PR #6622)
  • Administrators and integrators will be able to determine a dataset's size. (Issue #6524, PR #6609)
  • Integrators will now be able to retrieve the number of files in a dataset as part of a single API call instead of needing to count the number of files in the response. (Issue #6601, PR #6623)

Notes for Dataverse Installation Administrators

Potential Data Integrity Issue

We recently discovered a potential data integrity issue in Dataverse databases. One manifests itself as duplicate DataFile objects created for the same uploaded file (#6522); the other as duplicate DataTable (tabular metadata) objects linked to the same DataFile (#6510). This issue impacted approximately .03% of datasets in Harvard's Dataverse.

To see if any datasets in your installation have been impacted by this data integrity issue, we've provided a diagnostic script here:

https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6510/check_datafiles_6522_6510.sh

The script relies on the PostgreSQL utility psql to access the database. You will need to edit the credentials at the top of the script to match your database configuration.

If neither of the two issues is present in your database, you will see a message "... no duplicate DataFile objects in your database" and "no tabular files affected by this issue in your database".

If either, or both kinds of duplicates are detected, the script will provide further instructions. We will need you to send us the produced output. We will then assist you in resolving the issues in your database.

Multiple Store Support Changes

Existing installations will need to make configuration changes to adopt this version, regardless of whether additional stores are to be added or not.

Multistore support requires that each store be assigned a label, id, and type - see the Configuration Guide for a more complete explanation. For an existing store, the recommended upgrade path is to assign the store id based on it's type, i.e. a 'file' store would get id 'file', an 's3' store would have the id 's3'.

With this choice, no manual changes to datafile 'storageidentifier' entries are needed in the database. If you do not name your existing store using this convention, you will need to edit the database to maintain access to existing files.

The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade:
For a file store:

./asadmin create-jvm-options "\-Ddataverse.files.file.type=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.label=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.directory=<your directory>"

For a s3 store:

./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"
./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"
./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=<your_bucket_name>"
./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<your_bucket_name>"

Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured.

Once these options are set, restarting the Glassfish service is all that is needed to complete the change.

Note that the "-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above.

Also note that the :MaxFileUploadSizeInBytes property has a new option to provide independent limits for each store instead of a single value for the whole installation. The default is to apply any existing limit defined by this property to all stores.

Direct S3 Upload Changes

Direct upload to S3 is enabled per store by one new jvm option:

./asadmin create-jvm-options "\-Ddataverse.files.<id>.upload-redirect=true"

The existing :MaxFileUploadSizeInBytes property and dataverse.files.<id>.url-expiration-minutes jvm option for the same store also apply to direct upload.

Direct upload via the Dataverse web interface is transparent to the user and handled automatically by the browser. Some minor differences in file upload exist: directly uploaded files are not unzipped and Dataverse does not scan their content to help in assigning a MIME type. Ingest of tabular files and metadata extraction from FITS files will occur, but can be turned off for files above a specified size limit through the new dataverse.files.<id>.ingestsizelimit jvm option.

API calls to support direct upload also exist, and, if direct upload is enabled for a store in Dataverse, the latest DVUploader (v1.0.8) provides a'-directupload' flag that enables its use.

Solr Update

With this release we upgrade to the latest available stable release in the Solr 7.x branch. We recommend a fresh installation of Solr 7.7.2 (the index will be empty)
followed by an "index all".

Before you start the "index all", Dataverse will appear to be empty because
the search results come from Solr. As indexing progresses, results will appear
until indexing is complete.

Dataverse Linking Fix

The fix implemented for #6262 will display the datasets contained in linked dataverses in the linking dataverse. The full reindex described above will correct these counts. Going forward, this will happen automatically whenever a dataverse is linked.

Google Analytics Download Tracking Bug

The button tracking capability discussed in the installation guide (http://guides.dataverse.org/en/4.20/installation/config.html#id88) relies on an analytics-code.html file that must be configured using the :WebAnalyticsCode setting. The example file provided in the installation guide is no longer compatible with recent Dataverse releases (>v4.16). Installations using this feature should update their analytics-code.html file by following the installation instructions using the updated example file. Alternately, sites can modify their existing files to include the one-line change made in the example file at line 120.

Run ReExportall

We made changes to the JSON Export in this release (Issue #6650, PR #6669). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below.

New JVM Options and Database Settings

New JVM Options for file storage drivers

  • The JVM option dataverse.files.file.directo...
Read more

4.19

22 Jan 16:34
affbf4f
Compare
Choose a tag to compare

Dataverse 4.19

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Open ID Connect Support

Dataverse now provides basic support for any OpenID Connect (OIDC) compliant authentication provider.

Prior to supporting this standard, new authentication methods needed to be added by pull request. OIDC support provides a standardized way for authentication, sharing user information, and more. You are able to use any compliant provider just by loading a configuration file, without touching the codebase. While the usual prominent providers like Google and others feature OIDC support there are plenty of other options to easily attach your installation to a custom authentication provider, using enterprise grade software.

See the OpenID Connect Login Options documentation in the Installation Guide for more details.

This is to be extended with support for attribute mapping, group syncing and more in future versions of the code.

Python Installer

We are introducing a new installer script, written in Python. It is intended to eventually replace the old installer (written in Perl). For now it is being offered as an (experimental) alternative.

See README_python.txt in scripts/installer and/or in the installer bundle for more information.

Major Use Cases

Newly-supported use cases in this release include:

  • Dataverse installation administrators will be able to experiment with a Python Installer (Issue #3937, PR #6484)
  • Dataverse installation administrators will be able to set up an OIDC-compliant login options by editing a configuration file and with no need for a code change (Issue #6432, PR #6433)
  • Following setup by a Dataverse administration, users will be able to log in using OIDC-compliant methods (Issue #6432, PR #6433)
  • Users of the Search API will see additional fields in the JSON output (Issues #6300, #6396, PR #6441)
  • Users loading the support form will now be presented with the math challenge as expected and will be able to successfully send an email to support (Issue #6307, PR #6462)
  • Users of https://mybinder.org can now spin up Jupyter Notebooks and other computational environments from Dataverse DOIs (Issue #4714, PR #6453)

Notes for Dataverse Installation Administrators

Security vulnerability in Solr

A serious security issue has recently been identified in multiple versions of Solr search engine, including v.7.3 that Dataverse is currently using. Follow the instructions below to verify that your installation is safe from a potential attack. You can also consult the following link for a detailed description of the issue:

RCE in Solr via Velocity Template.

The vulnerability allows an intruder to execute arbitrary code on the system running Solr. Fortunately, it can only be exploited if Solr API access point is open to direct access from public networks (aka, "the outside world"), which is NOT needed in a Dataverse installation.

We have always recommended having Solr (port 8983) firewalled off from public access in our installation guides. But we recommend that you double-check your firewall settings and verify that the port is not accessible from outside networks. The simplest quick test is to try the following URL in your browser:

  `http://<your Solr server address>:8983`

and confirm that you get "access denied" or that it times out, etc.

In most cases, when Solr runs on the same server as the Dataverse web application, you will only want the port accessible from localhost. We also recommend that you add the following arguments to the Solr startup command: -j jetty.host=127.0.0.1. This will make Solr accept connections from localhost only; adding redundancy, in case of the firewall failure.

In a case where Solr needs to run on a different host, make sure that the firewall limits access to the port only to the Dataverse web host(s), by specific ip address(es).

We would also like to reiterate that it is simply never a good idea to run Solr as root! Running the process as a non-privileged user would substantially minimize any potential damage even in the event that the instance is compromised.

Citation and Geospatial Metadata Block Updates

We updated two metadata blocks in this release. Updating these metadata blocks is mentioned in the step-by-step upgrade instructions below.

Run ReExportall

We made changes to the JSON Export in this release (#6426). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below.

BinderHub

https://mybinder.org now supports spinning up Jupyter Notebooks and other computational environments from Dataverse DOIs.

Widgets update for OpenScholar

We updated the code for widgets so that they will keep working in OpenScholar sites after the upcoming upgrade OpenScholar upgrade to Drupal 8. If users of your dataverse have embedded widgets on an Openscholar site that upgrades to Drupal 8, you will need to run this Dataverse version (or later) for the widgets to keep working.

Payara tech preview

Dataverse 4 has always run on Glassfish 4.1 but changes in this release (PR #6523) should open the door to upgrading to Payara 5 eventually. Production installations of Dataverse should remain on Glassfish 4.1 but feedback from any experiments running Dataverse on Payara 5 is welcome via the usual channels.

Notes for Tool Developers and Integrators

Search API

The boolean parameter query_entities has been removed from the Search API. The former "true" behavior of "whether entities are queried via direct database calls (for developer use)" is now always true.

Additional fields are now available via the Search API, mostly related to information about specific dataset versions.

Complete List of Changes

For the complete list of code changes in this release, see the 4.19 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

  1. Undeploy the previous version.
  • <glassfish install path>/glassfish4/bin/asadmin list-applications
  • <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
  1. Stop glassfish and remove the generated directory, start.
  • service glassfish stop
  • remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
  • service glassfish start
  1. Deploy this version.
  • <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.19.war
  1. Restart glassfish.

  2. Update Geospatial Metadata Block

  • wget https://github.com/IQSS/dataverse/releases/download/v4.19/geospatial.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @geospatial.tsv -H "Content-type: text/tab-separated-values"
  1. (Optional) Run ReExportall to update JSON Exports

    http://guides.dataverse.org/en/4.19/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

4.18.1

20 Nov 23:24
a91d370
Compare
Choose a tag to compare

Dataverse 4.18.1

This release provides a fix for a regression introduced in 4.18 and implements a few other small changes.

Release Highlights

Proper Validation Messages

When creating or editing dataset metadata, users were not receiving field-level indications about what entries failed validation and were only receiving a message at the top of the page. This fix restores field-level indications.

Major Use Cases

Use cases in this release include:

  • Users will receive the proper messaging when dataset metadata entries are not valid.
  • Users can now view the expiration date of an API token and revoke a token on the API Token tab of the account page.

Complete List of Changes

For the complete list of code changes in this release, see the 4.18.1 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

  1. Undeploy the previous version.
  • <glassfish install path>/glassfish4/bin/asadmin list-applications
  • <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
  1. Stop glassfish and remove the generated directory, start.
  • service glassfish stop
  • remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
  • service glassfish start
  1. Deploy this version.
  • <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.1.war
  1. Restart glassfish.

4.18

14 Nov 18:13
118aa71
Compare
Choose a tag to compare

Dataverse 4.18

Note: There is an issue in 4.18 with the display of validation messages on the dataset page (#6380) and we recommend using 4.18.1 for any production environments.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

File Page Previews and Previewers

File-level External Tools can now be configured to display in a "Preview Mode" designed for embedding within the file landing page.

While not technically part of this release, previewers have been made available for several common file types. The previewers support for spreadsheet, image, text, document, audio, video, html files and more. These previewers can be found in the Qualitative Data Repository Github Repository. The spreadsheet viewer was contributed by the Dataverse SSHOC project.

Microsoft Login

Users can now create Dataverse accounts and login using self-provisioned Microsoft accounts such as live.com and outlook.com. Users can also use Microsoft accounts managed by their institutions. This new feature not only makes it easier to log in to Dataverse but will also streamline the interaction between any external tools that utilize Azure services that require login.

Add Data and Host Dataverse

More workflows to add data have been added across the UI, including a new button on the My Data tab of the Account page, as well as a link in the Dataverse navbar, which will display on every page. This will provider users much easier access to start depositing data. By default, the Host Dataverse will be the installation root dataverse for these new Add Data workflows, but there is now a dropdown component allowing creators to select a dataverse you have proper permissions to create a new dataverse or dataset in.

Primefaces 7

Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements.

Integration Test Pipeline and Test Health Reporting

As part of the Dataverse Community's ongoing efforts to provide more robust automated testing infrastructure, and in support of the project's desire to have the develop branch constantly in a "release ready" state, API-based integration tests are now run every time a branch is merged to develop. The status of the last test run is available as a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.

Make Data Count Metrics Updates

A new configuration option has been added that allows Make Data Count metrics to be collected, but not reflected in the front end. This option was designed to allow installations to collect and verify metrics for a period before turning on the display to users. It is suggested that installations turn on Make Data Count as part of the upgrade.

Search API Enhancements

The Dataverse Search API will now display unpublished content when an API token is passed (and appropriate permissions exist).

Additional Dataset Author Identifiers

The following dataset author identifiers are now supported:

Major Use Cases

Newly-supported use cases in this release include:

  • Users can view previews of several common file types, eliminating the need to download or explore a file just to get a quick look.
  • Users can log in using self-provisioned Microsoft accounts and also can log in using Microsoft accounts managed by an organization.
  • Dataverse administrators can now revoke and regenerate API tokens with an API call.
  • Users will receive notifications when their ingests complete, and will be informed if the ingest was a success or failure.
  • Dataverse developers will receive feedback about the health of the develop branch after their pull request was merged.
  • Dataverse tool developers will be able to query the Dataverse API for unpublished data as well as published data.
  • Dataverse administrators will be able to collect Make Data Count metrics without turning on the display for users.
  • Users with a DAI, ResearcherID, or ScopusID and use these author identifiers in their datasets.

Notes for Dataverse Installation Administrators

API Token Management

  • You can now delete a user's API token, recreate a user's API token, and find a token's expiration date. See the Native API guide for more information.

New JVM Options

:mdcbaseurlstring allows dataverse administrators to use a test base URL for Make Data Count.

New Database Settings

:DisplayMDCMetrics can be set to false to disable display of MDC metrics.

Notes for Tool Developers and Integrators

Preview Mode

Tool Developers can now add the hasPreviewMode parameter to their file level external tools. This setting provides an embedded, simplified view of the tool on the file pages for any installation that installs the tool. See Building External Tools for more information.

API Token Management

If your tool writes content back to Dataverse, you can now take advantage of administrative endpoints that delete and re-create API tokens. You can also use an endpoint that provides the expiration date of a specific API token. See the Native API guide for more information.

View Unpublished Data Using Search API

If you pass a token, the search API output will include unpublished content.

Complete List of Changes

For the complete list of code changes in this release, see the 4.18 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

  1. Undeploy the previous version.
  • <glassfish install path>/glassfish4/bin/asadmin list-applications
  • <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
  1. Stop glassfish and remove the generated directory, start.
  • service glassfish stop
  • remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
  • service glassfish start
  1. Deploy this version.
  • <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.war
  1. Restart glassfish.

  2. Update Citation Metadata Block

  • wget https://github.com/IQSS/dataverse/releases/download/v4.18/citation.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
  1. (Recommended) Enable Make Data Count if your installation plans to make use of it at some point in the future.

4.17

03 Oct 20:41
51739f6
Compare
Choose a tag to compare

Dataverse 4.17

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Dataset Level Explore Tools

Tools that integrate with Dataverse can now be launched from the dataset page! This makes it possible to develop and add tools that work across the entire dataset instead of single files. Tools to verify reproducibility and allow researchers to compute on an entire dataset will take advantage of this new infrastructure.

Performance Enhancements

Dataverse now allows installation administrators to configure the session timeout for logged in users using the new :LoginSessionTimeout setting. (Session length for anonymous users has been reduced from 24 hours to 10 minutes.) Setting this lower will release system resources as configured and will result in better performance (less memory use) throughout a Dataverse installation.

Dataverse and Dataset pages have also been optimized to discard more of the objects they allocate immediately after the page load. Thus keeping less memory permanently tied up for the duration of the user's login session. These savings are especially significant in the Dataverse page.

Major Use Cases

Newly-supported use cases in this release include:

  • As a user, I can launch and utilize external tools that allow me to work across the code, data, and other files in a dataset.
  • As a user, I can add a footer to my dataverse to show the logo for a funder or other entity.
  • As a developer, I can build external tools to verify reproducibility or allow computation.
  • As a developer, I can check to see the impact of my proposed changes on memory utilization.
  • As an installation administrator, I can make a quick configuration change to provide a better experience for my installation's users.

Notes for Dataverse Installation Administrators

Configurable User Session Timeout

Idle session timeout for logged-in users has been made configurable in this release.
The default is now set to 8 hours (this is a change from the previous default value of 24 hours).
If you want to change it, set the setting :LoginSessionTimeout to the new value in minutes.
For example, to reduce the timeout to 4 hours:

curl -X PUT -d 240 http://localhost:8080/api/admin/settings/:LoginSessionTimeout

Once again, this is the session timeout for logged-in users only. For the anonymous sessions the sessions are set to time out after the default session-timeout value (also in minutes) in the web.xml of the Dataverse application, which is set to 10 minutes. You will most likely not ever need to change this, but if you do, configure it by editing the web.xml file.

Flexible Solr Schema, optionally reconfigure Solr

With this release, we moved all fields in Solr search index that relate to the default metadata schemas from schema.xml to separate files. Custom metadata block configuration of the search index can be more easily automated that way. For details, see admin/metadatacustomization.html#updating-the-solr-schema.

This is optional, but all future changes will go to these files. It might be a good idea to reconfigure Solr now or be aware to look for changes to these files in the future, too. Here's how:

  1. You will need to replace or modify your schema.xml with the recent one (containing XML includes)
  2. Copy schema_dv_mdb_fields.xml and schema_dv_mdb_copies.xml to the same location as the schema.xml
  3. A re-index is not necessary as long no other changes happened, as this is only a reorganization of Solr fields from a single schema.xml file into multiple files.

In case you use custom metadata blocks, you might find the new updateSchemaMDB.sh script beneficial. Again,
see http://guides.dataverse.org/en/4.17/admin/metadatacustomization.html#updating-the-solr-schema

Memory Benchmark Test

Developers and installation administrators can take advantage of new scripts to produce graphs of memory usage and garbage collection events. This is helpful for developers to investigate the implications of changes on memory usage and it is helpful for installation administrators to compare graphs across releases or time periods. For details see the scripts/tests/ec2-memory-benchmark directory.

New Database Settings

:LoginSessionTimeout controls the session timeout (in minutes) for logged-in users.

Notes for Tool Developers and Integrators

New Features and Breaking Changes for External Tool Developers

The good news is that external tools can now be defined at the dataset level and there is new and improved documentation for external tool developers, linked below.

Additionally, the reserved words {datasetPid}, {{filePid}, and {localeCode} were added. Please consider making it possible to translate your tool into various languages! The reserved word {datasetVersion} has been made more flexible.

The bad news is that there are two breaking changes. First, tools must now define a "scope" of either "file" or "dataset" for the manifest to be successfully loaded into Dataverse. Existing tools in a Dataverse installations will be assigned a scope of "file" automatically by a SQL migration script but new installations of Dataverse will need to load an updated manifest file with this new "scope" variable.

Second, file level tools that did not previously define a "contentType" are now required to do so. In previously releases, file level tools that did not define a contentType were automatically given a contentType of "text/tab-separated-values" but now Dataverse will refuse to load the manifest file if contentType is not specified.

The Dataverse team has been reaching out to tool makers about these breaking changes and getting various tools working in the https://github.com/IQSS/dataverse-ansible repo. Thank you for your patience as the dust settles around the external tool framework.

For more information, check out new Building External Tools section of the API Guide.

Complete List of Changes

For the complete list of code changes in this release, see the 4.17 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

  1. Undeploy the previous version.
  • <glassfish install path>/glassfish4/bin/asadmin list-applications
  • <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
  1. Stop glassfish and remove the generated directory, start
  • service glassfish stop
  • remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
  • service glassfish start
  1. Deploy this version.
  • <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.17.war
  1. Restart glassfish

  2. Update Citation Metadata Block

  • wget https://github.com/IQSS/dataverse/releases/download/v4.17/citation.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

If you have any trouble adding an external tool at the dataset level and see warnings about "contenttype" in server.log, it is recommended that you run the following SQL update from pull request #6460:

 ALTER TABLE externaltool ALTER contenttype DROP NOT NULL;

4.16

28 Aug 17:50
a56d550
Compare
Choose a tag to compare

Dataverse 4.16

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Metrics Redesign

The metrics view at both the Dataset and File level has been redesigned. The main driver of this redesign has been the expanded metrics (citations and views) provided through an integration with Make Data Count, but installations that do not adopt Make Data Count will also be able to take advantage of the new metrics panel.

HTML Codebook Export

Users will now be able to download HTML Codebooks as an additional Dataset Export type. This codebook is a more human-readable version of the DDI Codebook 2.5 metadata export and provides valuable information about the contents and structure of a dataset and will increase reusability of the datasets in Dataverse.

Harvesting Improvements

The Harvesting code will now better handle problematic records during incremental harvests. Fixing this will mean not only fewer manual interventions by installation administrators to keep harvesting running, but it will also mean users can more easily find and access data that is important to their research.

Major Use Cases

Newly-supported use cases in this release include:

  • As a user, I can view the works that have cited a dataset.
  • As a user, I can view the downloads and views for a dataset, based on the Make Data Count standard.
  • As a user, I can export an HTML codebook for a dataset.
  • As a user, I can expect harvested datasets to be made available more regularly.
  • As a user, I'll encounter fewer locks as I go through the publishing process.
  • As an installation administrator, I no longer need to destroy a PID in another system after destroying a dataset in Dataverse.

Notes for Dataverse Installation Administrators

Run ReExportall

We made changes to the citation block in this release that will require installations to run ReExportall as part of the upgrade process. We've included this in the detailed instructions below.

Custom Analytics Code Changes

You should update your custom analytics code to include CDATA sections, inside the script tags, around the javascript code. We have updated the documentation and sample analytics code snippet provided in Installation Guide > Configuration > Web Analytics Code to fix a bug that broke the rendering of the 403 and 500 custom error pgs (#5967).

Destroy Updates

Destroying Datasets in Dataverse will now unregister/delete the PID with the PID provider. This eliminates the need for an extra step to "clean up" a PID registration after destroying a Dataset.

Deleting Notifications

In making the fix for #5687 we discovered that notifications created prior to 2018 may have been invalidated. With this release we advise that these older notifications are deleted from the database. The following query can be used for this purpose:

delete from usernotification where date_part('year', senddate) < 2018;

Lock Improvements

In 4.15 a new lock was added to prevent parallel edits. After seeing that the lock was not being released as expected, which required administrator intervention, we've adjusted this code to release the lock as expected.

New Database Settings

:AllowCors - Allows Cross-Origin Resource sharing(CORS). By default this setting is absent and Dataverse assumes it to be true.

Notes for Tool Developers and Integrators

OpenAIRE Export Changes

The OpenAIRE metadata export now correctly expresses information about a dataset's Production Place and GeoSpatial Bounding Box. When users add metadata to Dataverse's Production Place and GeoSpatial Bounding Box fields, those fields are now mapped to separate DataCite geoLocation properties.

Metadata about the software name and version used to create a dataset, Software Name and Software Version, are re-mapped from DataCite's more general descriptionType="Methods" property to descriptionType="TechnicalInfo", which was added in a recent version of the DataCite schema in order to improve discoverability of metadata about the software used to create datasets.

Complete List of Changes

For the complete list of code changes in this release, see the 4.16 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

  1. Undeploy the previous version.
  • <glassfish install path>/glassfish4/bin/asadmin list-applications
  • <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
  1. Stop glassfish and remove the generated directory, start
  • service glassfish stop
  • remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
  • service glassfish start
  1. Deploy this version.
  • <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.16.war
  1. Restart glassfish

  2. Update Citation Metadata Block

    curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

  3. Run ReExportall to update the citations

    http://guides.dataverse.org/en/4.16/admin/metadataexport.html?highlight=export#batch-exports-through-the-api