diff --git a/doc/release-notes/6485-multiple-stores.md b/doc/release-notes/6485-multiple-stores.md index ea2d224d612..e9c7e654d96 100644 --- a/doc/release-notes/6485-multiple-stores.md +++ b/doc/release-notes/6485-multiple-stores.md @@ -29,8 +29,4 @@ Any additional S3 options you have set will need to be replaced as well, followi Once these options are set, restarting the glassfish service is all that is needed to complete the change. -<<<<<<< HEAD Note that the "\-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above. -======= -Note that the "\-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above. ->>>>>>> branch 'IQSS/6485' of https://github.com/TexasDigitalLibrary/dataverse.git diff --git a/doc/release-notes/6489-release-notes.md b/doc/release-notes/6489-release-notes.md new file mode 100644 index 00000000000..b28174c9a74 --- /dev/null +++ b/doc/release-notes/6489-release-notes.md @@ -0,0 +1,17 @@ +# S3 Direct Upload support + +S3 stores can now optionally be configured to support direct upload of files, as one option for supporting upload of larger files. + +General information about this capability can be found in the Big Data Support Guide with specific information about how to enable it in the Configuration Guide - File Storage section. + +**Upgrade Information:** + +Direct upload to S3 is enabled per store by one new jvm option: + + ./asadmin create-jvm-options "\-Ddataverse.files..upload-redirect=true" + +The existing :MaxFileUploadSizeInBytes property and ```dataverse.files..url-expiration-minutes``` jvm option for the same store also apply to direct upload. + +Direct upload via the Dataverse web interface is transparent to the user and handled automatically by the browser. Some minor differences in file upload exist: directly uploaded files are not unzipped and Dataverse does not scan their content to help in assigning a MIME type. Ingest of tabular files and metadata extraction from FITS files will occur, but can be turned off for files above a specified size limit through the new dataverse.files..ingestsizelimit jvm option. + +API calls to support direct upload also exist, and, if direct upload is enabled for a store in Dataverse, the latest DVUploader (v1.0.8) provides a'-directupload' flag that enables its use. diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index bb16dd9133d..c1c2969a60a 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -6,7 +6,52 @@ Big data support is highly experimental. Eventually this content will move to th .. contents:: |toctitle| :local: -Various components need to be installed and configured for big data support. +Various components need to be installed and/or configured for big data support. + +S3 Direct Upload and Download +----------------------------- + +A lightweight option for supporting file sizes beyond a few gigabytes - a size that can cause performance issues when uploaded through the Dataverse server itself - is to configure an S3 store to provide direct upload and download via 'pre-signed URLs'. When these options are configured, file uploads and downloads are made directly to and from a configured S3 store using secure (https) connections that enforce Dataverse's access controls. (The upload and download URLs are signed with a unique key that only allows access for a short time period and Dataverse will only generate such a URL if the user has permission to upload/download the specific file in question.) + +This option can handle files >40GB and could be appropriate for files up to a TB. Other options can scale farther, but this option has the advantages that it is simple to configure and does not require any user training - uploads and downloads are done via the same interface as normal uploads to Dataverse. + +To configure these options, an administrator must set two JVM options for the Dataverse server using the same process as for other configuration options: + +``./asadmin create-jvm-options "-Ddataverse.files..download-redirect=true"`` +``./asadmin create-jvm-options "-Ddataverse.files..upload-redirect=true"`` + + +With multiple stores configured, it is possible to configure one S3 store with direct upload and/or download to support large files (in general or for specific dataverses) while configuring only direct download, or no direct access for another store. + +It is also possible to set file upload size limits per store. See the :MaxFileUploadSizeInBytes setting described in the :doc:`/installation/config` guide. + +At present, one potential drawback for direct-upload is that files are only partially 'ingested', tabular and FITS files are processed, but zip files are not unzipped, and the file contents are not inspected to evaluate their mimetype. This could be appropriate for large files, or it may be useful to completely turn off ingest processing for performance reasons (ingest processing requires a copy of the file to be retrieved by Dataverse from the S3 store). A store using direct upload can be configured to disable all ingest processing for files above a given size limit: + +``./asadmin create-jvm-options "-Ddataverse.files..ingestsizelimit="`` + + +**IMPORTANT:** One additional step that is required to enable direct download to work with previewers is to allow cross site (CORS) requests on your S3 store. +The example below shows how to enable the minimum needed CORS rules on a bucket using the AWS CLI command line tool. Note that you may need to add more methods and/or locations, if you also need to support certain previewers and external tools. + +``aws s3api put-bucket-cors --bucket --cors-configuration file://cors.json`` + +with the contents of the file cors.json as follows: + +.. code-block:: json + + { + "CORSRules": [ + { + "AllowedOrigins": ["https://"], + "AllowedHeaders": ["*"], + "AllowedMethods": ["PUT", "GET"] + } + ] + } + +Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above. + +Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 Tags to aid in identifying/removing such files. Upon upload, files are given a "dv-status":"temp" tag which is removed when the dataset changes are saved and the new file(s) are added in Dataverse. Note that not all S3 implementations support Tags: Minio does not. WIth such stores, direct upload works, but Tags are not used. Data Capture Module (DCM) ------------------------- diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index bfa66b97eb1..bb2f5702899 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -244,6 +244,8 @@ If you wish to change which store is used by default, you'll need to delete the ./asadmin $ASADMIN_OPTS delete-jvm-options "-Ddataverse.files.storage-driver-id=file" ./asadmin $ASADMIN_OPTS create-jvm-options "-Ddataverse.files.storage-driver-id=" + + It is also possible to set maximum file upload size limits per store. See the :ref:`:MaxFileUploadSizeInBytes` setting below. File Storage ++++++++++++ @@ -505,7 +507,9 @@ JVM Option Value Description dataverse.files.storage-driver-id Enable as the default storage driver. ``file`` dataverse.files..bucket-name The bucket name. See above. (none) dataverse.files..download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false`` -dataverse.files..url-expiration-minutes If direct downloads: time until links expire. Optional. 60 +dataverse.files..upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false`` +dataverse.files..ingestsizelimit Maximum size of directupload files that should be ingested (none) +dataverse.files..url-expiration-minutes If direct uploads/downloads: time until links expire. Optional. 60 dataverse.files..custom-endpoint-url Use custom S3 endpoint. Needs URL either with or without protocol. (none) dataverse.files..custom-endpoint-region Only used when using custom endpoint. Optional. ``dataverse`` dataverse.files..path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false`` @@ -1474,7 +1478,14 @@ Alongside the ``:StatusMessageHeader`` you need to add StatusMessageText for the :MaxFileUploadSizeInBytes +++++++++++++++++++++++++ -Set `MaxFileUploadSizeInBytes` to "2147483648", for example, to limit the size of files uploaded to 2 GB. +This setting controls the maximum size of uploaded files. +- To have one limit for all stores, set `MaxFileUploadSizeInBytes` to "2147483648", for example, to limit the size of files uploaded to 2 GB: + +``curl -X PUT -d 2147483648 http://localhost:8080/api/admin/settings/:MaxFileUploadSizeInBytes`` + +- To have limits per store with an optional default, use a serialized json object for the value of `MaxFileUploadSizeInBytes` with an entry per store, as in the following example, which maintains a 2 GB default and adds higher limits for stores with ids "fileOne" and "s3". + +``curl -X PUT -d '{"default":"2147483648","fileOne":"4000000000","s3":"8000000000"}' http://localhost:8080/api/admin/settings/:MaxFileUploadSizeInBytes`` Notes: @@ -1484,7 +1495,7 @@ Notes: - For larger file upload sizes, you may need to configure your reverse proxy timeout. If using apache2 (httpd) with Shibboleth, add a timeout to the ProxyPass defined in etc/httpd/conf.d/ssl.conf (which is described in the :doc:`/installation/shibboleth` setup). -``curl -X PUT -d 2147483648 http://localhost:8080/api/admin/settings/:MaxFileUploadSizeInBytes`` + :ZipDownloadLimit +++++++++++++++++ diff --git a/src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java index 54a88c27d91..b510d9686dd 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java @@ -1565,7 +1565,6 @@ public void finalizeFileDelete(Long dataFileId, String storageLocation) throws I throw new IOException("Attempted to permanently delete a physical file still associated with an existing DvObject " + "(id: " + dataFileId + ", location: " + storageLocation); } - logger.info("deleting: " + storageLocation); StorageIO directStorageAccess = DataAccess.getDirectStorageIO(storageLocation); directStorageAccess.delete(); } diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java b/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java index 690443d4def..0ece7e9c4c2 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java +++ b/src/main/java/edu/harvard/iq/dataverse/DatasetPage.java @@ -1774,7 +1774,6 @@ public void updateOwnerDataverse() { private String init(boolean initFull) { //System.out.println("_YE_OLDE_QUERY_COUNTER_"); // for debug purposes - this.maxFileUploadSizeInBytes = systemConfig.getMaxFileUploadSize(); setDataverseSiteUrl(systemConfig.getDataverseSiteUrl()); guestbookResponse = new GuestbookResponse(); @@ -1821,7 +1820,9 @@ private String init(boolean initFull) { // Set Working Version and Dataset by DatasaetVersion Id //retrieveDatasetVersionResponse = datasetVersionService.retrieveDatasetVersionByVersionId(versionId); - } + } + this.maxFileUploadSizeInBytes = systemConfig.getMaxFileUploadSizeForStore(dataset.getOwner().getEffectiveStorageDriverId()); + if (retrieveDatasetVersionResponse == null) { return permissionsWrapper.notFound(); @@ -2981,16 +2982,6 @@ public void setLinkingDataverseErrorMessage(String linkingDataverseErrorMessage) this.linkingDataverseErrorMessage = linkingDataverseErrorMessage; } - UIInput selectedLinkingDataverseMenu; - - public UIInput getSelectedDataverseMenu() { - return selectedLinkingDataverseMenu; - } - - public void setSelectedDataverseMenu(UIInput selectedDataverseMenu) { - this.selectedLinkingDataverseMenu = selectedDataverseMenu; - } - private Boolean saveLink(Dataverse dataverse){ boolean retVal = true; if (readOnly) { diff --git a/src/main/java/edu/harvard/iq/dataverse/Dataverse.java b/src/main/java/edu/harvard/iq/dataverse/Dataverse.java index 4b53937a87f..75dbb39e2ca 100644 --- a/src/main/java/edu/harvard/iq/dataverse/Dataverse.java +++ b/src/main/java/edu/harvard/iq/dataverse/Dataverse.java @@ -32,6 +32,8 @@ import javax.validation.constraints.NotNull; import javax.validation.constraints.Pattern; import javax.validation.constraints.Size; + +import org.apache.commons.lang.StringUtils; import org.hibernate.validator.constraints.NotBlank; import org.hibernate.validator.constraints.NotEmpty; @@ -149,7 +151,7 @@ public String getIndexableCategoryName() { private String affiliation; - private String storageDriver=""; + private String storageDriver=null; // Note: We can't have "Remove" here, as there are role assignments that refer // to this role. So, adding it would mean violating a forign key contstraint. @@ -762,7 +764,7 @@ public boolean isAncestorOf( DvObject other ) { public String getEffectiveStorageDriverId() { String id = storageDriver; - if(id == null) { + if(StringUtils.isBlank(id)) { if(this.getOwner() != null) { id = this.getOwner().getEffectiveStorageDriverId(); } else { @@ -774,10 +776,17 @@ public String getEffectiveStorageDriverId() { public String getStorageDriverId() { + if(storageDriver==null) { + return DataAccess.UNDEFINED_STORAGE_DRIVER_IDENTIFIER; + } return storageDriver; } public void setStorageDriverId(String storageDriver) { - this.storageDriver = storageDriver; + if(storageDriver!=null&&storageDriver.equals(DataAccess.UNDEFINED_STORAGE_DRIVER_IDENTIFIER)) { + this.storageDriver=null; + } else { + this.storageDriver = storageDriver; + } } } diff --git a/src/main/java/edu/harvard/iq/dataverse/DataversePage.java b/src/main/java/edu/harvard/iq/dataverse/DataversePage.java index 12f398c9c7d..165c1759b5e 100644 --- a/src/main/java/edu/harvard/iq/dataverse/DataversePage.java +++ b/src/main/java/edu/harvard/iq/dataverse/DataversePage.java @@ -1214,19 +1214,24 @@ public Set> getStorageDriverOptions() { HashMap drivers =new HashMap(); drivers.putAll(DataAccess.getStorageDriverLabels()); //Add an entry for the default (inherited from an ancestor or the system default) - drivers.put(getDefaultStorageDriverLabel(), ""); + drivers.put(getDefaultStorageDriverLabel(), DataAccess.UNDEFINED_STORAGE_DRIVER_IDENTIFIER); return drivers.entrySet(); } public String getDefaultStorageDriverLabel() { String storageDriverId = DataAccess.DEFAULT_STORAGE_DRIVER_IDENTIFIER; Dataverse parent = dataverse.getOwner(); + boolean fromAncestor=false; if(parent != null) { storageDriverId = parent.getEffectiveStorageDriverId(); - } - boolean fromAncestor=false; - if(!storageDriverId.equals(DataAccess.DEFAULT_STORAGE_DRIVER_IDENTIFIER)) { - fromAncestor = true; + //recurse dataverse chain to root and if any have a storagedriver set, fromAncestor is true + while(parent!=null) { + if(!parent.getStorageDriverId().equals(DataAccess.UNDEFINED_STORAGE_DRIVER_IDENTIFIER)) { + fromAncestor=true; + break; + } + parent=parent.getOwner(); + } } String label = DataAccess.getStorageDriverLabelFor(storageDriverId); if(fromAncestor) { diff --git a/src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java b/src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java index 8425af60335..71b74ddc7ae 100644 --- a/src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java +++ b/src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java @@ -7,13 +7,19 @@ import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser; import edu.harvard.iq.dataverse.branding.BrandingUtil; import edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper; +import edu.harvard.iq.dataverse.datasetutility.FileSizeChecker; import edu.harvard.iq.dataverse.datasetutility.FileReplaceException; import edu.harvard.iq.dataverse.datasetutility.FileReplacePageHelper; +import edu.harvard.iq.dataverse.dataaccess.DataAccess; +import edu.harvard.iq.dataverse.dataaccess.DataAccessOption; import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter; +import edu.harvard.iq.dataverse.dataaccess.S3AccessIO; +import edu.harvard.iq.dataverse.dataaccess.StorageIO; import edu.harvard.iq.dataverse.datacapturemodule.DataCaptureModuleUtil; import edu.harvard.iq.dataverse.datacapturemodule.ScriptRequestResponse; import edu.harvard.iq.dataverse.dataset.DatasetThumbnail; import edu.harvard.iq.dataverse.engine.command.Command; +import edu.harvard.iq.dataverse.engine.command.CommandContext; import edu.harvard.iq.dataverse.engine.command.exception.CommandException; import edu.harvard.iq.dataverse.engine.command.exception.IllegalCommandException; import edu.harvard.iq.dataverse.engine.command.impl.DeleteDataFileCommand; @@ -32,6 +38,8 @@ import edu.harvard.iq.dataverse.util.BundleUtil; import edu.harvard.iq.dataverse.util.EjbUtil; import static edu.harvard.iq.dataverse.util.JsfHelper.JH; +import static edu.harvard.iq.dataverse.util.StringUtil.isEmpty; + import java.io.File; import java.io.FileOutputStream; import java.io.IOException; @@ -306,6 +314,10 @@ public Long getMaxFileUploadSizeInBytes() { return this.maxFileUploadSizeInBytes; } + public String getHumanMaxFileUploadSizeInBytes() { + return FileSizeChecker.bytesToHumanReadable(this.maxFileUploadSizeInBytes); + } + public boolean isUnlimitedUploadFileSize() { return this.maxFileUploadSizeInBytes == null; @@ -350,6 +362,10 @@ public boolean doesSessionUserHaveDataSetPermission(Permission permissionToCheck return hasPermission; } + public boolean directUploadEnabled() { + return Boolean.getBoolean("dataverse.files." + this.dataset.getDataverseContext().getEffectiveStorageDriverId() + ".upload-redirect"); + } + public void reset() { // ? } @@ -439,10 +455,7 @@ public String initCreateMode(String modeToken, DatasetVersion version, List(); selectedFiles = selectedFileMetadatasList; + this.maxFileUploadSizeInBytes = systemConfig.getMaxFileUploadSizeForStore(dataset.getOwner().getEffectiveStorageDriverId()); + this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit(); + logger.fine("done"); saveEnabled = true; @@ -462,9 +478,6 @@ public String init() { newFiles = new ArrayList<>(); uploadedFiles = new ArrayList<>(); - this.maxFileUploadSizeInBytes = systemConfig.getMaxFileUploadSize(); - this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit(); - if (dataset.getId() != null){ // Set Working Version and Dataset by Datasaet Id and Version //retrieveDatasetVersionResponse = datasetVersionService.retrieveDatasetVersionById(dataset.getId(), null); @@ -479,7 +492,10 @@ public String init() { // that the dataset id is mandatory... But 404 will do for now. return permissionsWrapper.notFound(); } - + + this.maxFileUploadSizeInBytes = systemConfig.getMaxFileUploadSizeForStore(dataset.getOwner().getEffectiveStorageDriverId()); + this.multipleUploadFilesLimit = systemConfig.getMultipleUploadFilesLimit(); + workingVersion = dataset.getEditVersion(); clone = workingVersion.cloneDatasetVersion(); if (workingVersion == null || !workingVersion.isDraft()) { @@ -954,42 +970,64 @@ public void deleteFiles() { } private void deleteTempFile(DataFile dataFile) { - // Before we remove the file from the list and forget about - // it: - // The physical uploaded file is still sitting in the temporary - // directory. If it were saved, it would be moved into its - // permanent location. But since the user chose not to save it, - // we have to delete the temp file too. - // - // Eventually, we will likely add a dedicated mechanism - // for managing temp files, similar to (or part of) the storage - // access framework, that would allow us to handle specialized - // configurations - highly sensitive/private data, that - // has to be kept encrypted even in temp files, and such. - // But for now, we just delete the file directly on the - // local filesystem: - - try { - List generatedTempFiles = ingestService.listGeneratedTempFiles( - Paths.get(FileUtil.getFilesTempDirectory()), dataFile.getStorageIdentifier()); - if (generatedTempFiles != null) { - for (Path generated : generatedTempFiles) { - logger.fine("(Deleting generated thumbnail file " + generated.toString() + ")"); - try { - Files.delete(generated); - } catch (IOException ioex) { - logger.warning("Failed to delete generated file " + generated.toString()); - } - } - } - Files.delete(Paths.get(FileUtil.getFilesTempDirectory() + "/" + dataFile.getStorageIdentifier())); - } catch (IOException ioEx) { - // safe to ignore - it's just a temp file. - logger.warning("Failed to delete temporary file " + FileUtil.getFilesTempDirectory() + "/" - + dataFile.getStorageIdentifier()); - } - } - + // Before we remove the file from the list and forget about + // it: + // The physical uploaded file is still sitting in the temporary + // directory. If it were saved, it would be moved into its + // permanent location. But since the user chose not to save it, + // we have to delete the temp file too. + // + // Eventually, we will likely add a dedicated mechanism + // for managing temp files, similar to (or part of) the storage + // access framework, that would allow us to handle specialized + // configurations - highly sensitive/private data, that + // has to be kept encrypted even in temp files, and such. + // But for now, we just delete the file directly on the + // local filesystem: + + try { + List generatedTempFiles = ingestService.listGeneratedTempFiles( + Paths.get(FileUtil.getFilesTempDirectory()), dataFile.getStorageIdentifier()); + if (generatedTempFiles != null) { + for (Path generated : generatedTempFiles) { + logger.fine("(Deleting generated thumbnail file " + generated.toString() + ")"); + try { + Files.delete(generated); + } catch (IOException ioex) { + logger.warning("Failed to delete generated file " + generated.toString()); + } + } + } + String si = dataFile.getStorageIdentifier(); + if (si.contains("://")) { + //Direct upload files will already have a store id in their storageidentifier + //but they need to be associated with a dataset for the overall storagelocation to be calculated + //so we temporarily set the owner + if(dataFile.getOwner()!=null) { + logger.warning("Datafile owner was not null as expected"); + } + dataFile.setOwner(dataset); + //Use one StorageIO to get the storageLocation and then create a direct storage storageIO class to perform the delete + // (since delete is forbidden except for direct storage) + String sl = DataAccess.getStorageIO(dataFile).getStorageLocation(); + DataAccess.getDirectStorageIO(sl).delete(); + dataFile.setOwner(null); + } else { + //Temp files sent to this method have no prefix, not even "tmp://" + Files.delete(Paths.get(FileUtil.getFilesTempDirectory() + "/" + dataFile.getStorageIdentifier())); + } + } catch (IOException ioEx) { + // safe to ignore - it's just a temp file. + logger.warning(ioEx.getMessage()); + if(dataFile.getStorageIdentifier().contains("://")) { + logger.warning("Failed to delete temporary file " + dataFile.getStorageIdentifier()); + } else { + logger.warning("Failed to delete temporary file " + FileUtil.getFilesTempDirectory() + "/" + + dataFile.getStorageIdentifier()); + } + } + } + private void removeFileMetadataFromList(List fmds, FileMetadata fmToDelete) { Iterator fmit = fmds.iterator(); while (fmit.hasNext()) { @@ -1559,7 +1597,7 @@ public void handleDropBoxUpload(ActionEvent event) { // for example, multiple files can be extracted from an uncompressed // zip file. //datafiles = ingestService.createDataFiles(workingVersion, dropBoxStream, fileName, "application/octet-stream"); - datafiles = FileUtil.createDataFiles(workingVersion, dropBoxStream, fileName, "application/octet-stream", systemConfig); + datafiles = FileUtil.createDataFiles(workingVersion, dropBoxStream, fileName, "application/octet-stream", null,null, systemConfig); } catch (IOException ex) { this.logger.log(Level.SEVERE, "Error during ingest of DropBox file {0} from link {1}", new Object[]{fileName, fileLink}); @@ -1717,6 +1755,31 @@ public String getRsyncScriptFilename() { return rsyncScriptFilename; } + public void requestDirectUploadUrl() { + + //Need to assign an identifier at this point if direct upload is used. + if ( isEmpty(dataset.getIdentifier()) ) { + CommandContext ctxt = commandEngine.getContext(); + GlobalIdServiceBean idServiceBean = GlobalIdServiceBean.getBean(ctxt); + dataset.setIdentifier(ctxt.datasets().generateDatasetIdentifier(dataset, idServiceBean)); + } + + S3AccessIO s3io = FileUtil.getS3AccessForDirectUpload(dataset); + if(s3io == null) { + FacesContext.getCurrentInstance().addMessage(uploadComponentId, new FacesMessage(FacesMessage.SEVERITY_ERROR, BundleUtil.getStringFromBundle("dataset.file.uploadWarning"), "Direct upload not supported for this dataset")); + } + String url = null; + String storageIdentifier = null; + try { + url = s3io.generateTemporaryS3UploadUrl(); + storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation()); + } catch (IOException io) { + logger.warning(io.getMessage()); + FacesContext.getCurrentInstance().addMessage(uploadComponentId, new FacesMessage(FacesMessage.SEVERITY_ERROR, BundleUtil.getStringFromBundle("dataset.file.uploadWarning"), "Issue in connecting to S3 store for direct upload")); + } + + PrimeFaces.current().executeScript("uploadFileDirectly('" + url + "','" + storageIdentifier + "')"); + } public void uploadFinished() { // This method is triggered from the page, by the paramMap = FacesContext.getCurrentInstance().getExternalContext().getRequestParameterMap(); + + this.uploadComponentId = paramMap.get("uploadComponentId"); + String fullStorageIdentifier = paramMap.get("fullStorageIdentifier"); + String fileName = paramMap.get("fileName"); + String contentType = paramMap.get("contentType"); + String checksumType = paramMap.get("checksumType"); + String checksumValue = paramMap.get("checksumValue"); + + int lastColon = fullStorageIdentifier.lastIndexOf(':'); + String storageLocation= fullStorageIdentifier.substring(0,lastColon) + "/" + dataset.getAuthorityForFileStorage() + "/" + dataset.getIdentifierForFileStorage() + "/" + fullStorageIdentifier.substring(lastColon+1); + if (!uploadInProgress) { + uploadInProgress = true; + } + logger.fine("handleExternalUpload"); + + StorageIO sio; + String localWarningMessage = null; + try { + sio = DataAccess.getDirectStorageIO(storageLocation); + + //Populate metadata + sio.open(DataAccessOption.READ_ACCESS); + //get file size + long fileSize = sio.getSize(); + + /* ---------------------------- + Check file size + - Max size NOT specified in db: default is unlimited + - Max size specified in db: check too make sure file is within limits + // ---------------------------- */ + if ((!this.isUnlimitedUploadFileSize()) && (fileSize > this.getMaxFileUploadSizeInBytes())) { + String warningMessage = "Uploaded file \"" + fileName + "\" exceeded the limit of " + fileSize + " bytes and was not uploaded."; + sio.delete(); + localWarningMessage = warningMessage; + } else { + // ----------------------------------------------------------- + // Is this a FileReplaceOperation? If so, then diverge! + // ----------------------------------------------------------- + if (this.isFileReplaceOperation()){ + this.handleReplaceFileUpload(storageLocation, fileName, FileUtil.MIME_TYPE_UNDETERMINED_DEFAULT); + this.setFileMetadataSelectedForTagsPopup(fileReplacePageHelper.getNewFileMetadatasBeforeSave().get(0)); + return; + } + // ----------------------------------------------------------- + List datafiles = new ArrayList<>(); + + // ----------------------------------------------------------- + // Send it through the ingest service + // ----------------------------------------------------------- + try { + + // Note: A single uploaded file may produce multiple datafiles - + // for example, multiple files can be extracted from an uncompressed + // zip file. + //datafiles = ingestService.createDataFiles(workingVersion, dropBoxStream, fileName, "application/octet-stream"); + if(StringUtils.isEmpty(contentType)) { + contentType = "application/octet-stream"; + } + if(DataFile.ChecksumType.fromString(checksumType) != DataFile.ChecksumType.MD5 ) { + String warningMessage = "Non-MD5 checksums not yet supported in external uploads"; + localWarningMessage = warningMessage; + } + datafiles = FileUtil.createDataFiles(workingVersion, null, fileName, contentType, fullStorageIdentifier, checksumValue, systemConfig); + } catch (IOException ex) { + logger.log(Level.SEVERE, "Error during ingest of file {0}", new Object[]{fileName}); + } + + if (datafiles == null){ + logger.log(Level.SEVERE, "Failed to create DataFile for file {0}", new Object[]{fileName}); + }else{ + // ----------------------------------------------------------- + // Check if there are duplicate files or ingest warnings + // ----------------------------------------------------------- + uploadWarningMessage = processUploadedFileList(datafiles); + } + if(!uploadInProgress) { + logger.warning("Upload in progress cancelled"); + for (DataFile newFile : datafiles) { + deleteTempFile(newFile); + } + } + } + } catch (IOException e) { + logger.log(Level.SEVERE, "Failed to create DataFile for file {0}: {1}", new Object[]{fileName, e.getMessage()}); + } + if (localWarningMessage != null) { + if (uploadWarningMessage == null) { + uploadWarningMessage = localWarningMessage; + } else { + uploadWarningMessage = localWarningMessage.concat("; " + uploadWarningMessage); + } + } + } + /** * After uploading via the site or Dropbox, * check the list of DataFile objects @@ -1967,7 +2151,6 @@ public void handleFileUpload(FileUploadEvent event) throws IOException { private boolean uploadInProgress = false; private String processUploadedFileList(List dFileList) { - if (dFileList == null) { return null; } diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Admin.java b/src/main/java/edu/harvard/iq/dataverse/api/Admin.java index c5f358e8f71..1474104d379 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Admin.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Admin.java @@ -1665,7 +1665,7 @@ public Response getStorageDriver(@PathParam("alias") String alias) throws Wrappe } catch (WrappedResponse wr) { return wr.getResponse(); } - //Note that this returns what's set directly on this dataverse. If null, the user would have to recurse the chain of parents to find the effective storageDriver + //Note that this returns what's set directly on this dataverse. If null/DataAccess.UNDEFINED_STORAGE_DRIVER_IDENTIFIER, the user would have to recurse the chain of parents to find the effective storageDriver return ok(dataverse.getStorageDriverId()); } diff --git a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java index 38d3979577d..0b2e25a7f02 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java @@ -1477,6 +1477,43 @@ public Response returnToAuthor(@PathParam("id") String idSupplied, String jsonBo } } +@GET +@Path("{id}/uploadsid") +public Response getUploadUrl(@PathParam("id") String idSupplied) { + try { + Dataset dataset = findDatasetOrDie(idSupplied); + + boolean canUpdateDataset = false; + try { + canUpdateDataset = permissionSvc.requestOn(createDataverseRequest(findUserOrDie()), dataset).canIssue(UpdateDatasetVersionCommand.class); + } catch (WrappedResponse ex) { + logger.info("Exception thrown while trying to figure out permissions while getting upload URL for dataset id " + dataset.getId() + ": " + ex.getLocalizedMessage()); + } + if (!canUpdateDataset) { + return error(Response.Status.FORBIDDEN, "You are not permitted to upload files to this dataset."); + } + S3AccessIO s3io = FileUtil.getS3AccessForDirectUpload(dataset); + if(s3io == null) { + return error(Response.Status.NOT_FOUND,"Direct upload not supported for files in this dataset: " + dataset.getId()); + } + String url = null; + String storageIdentifier = null; + try { + url = s3io.generateTemporaryS3UploadUrl(); + storageIdentifier = FileUtil.getStorageIdentifierFromLocation(s3io.getStorageLocation()); + } catch (IOException io) { + logger.warning(io.getMessage()); + throw new WrappedResponse(io, error( Response.Status.INTERNAL_SERVER_ERROR, "Could not create process direct upload request")); + } + + JsonObjectBuilder response = Json.createObjectBuilder() + .add("url", url) + .add("storageIdentifier", storageIdentifier ); + return ok(response); + } catch (WrappedResponse wr) { + return wr.getResponse(); + } +} /** * Add a File to an existing Dataset * @@ -1539,17 +1576,6 @@ public Response addFileToDataset(@PathParam("id") String idSupplied, } } - - // ------------------------------------- - // (3) Get the file name and content type - // ------------------------------------- - if(null == contentDispositionHeader) { - return error(BAD_REQUEST, "You must upload a file."); - } - String newFilename = contentDispositionHeader.getFileName(); - String newFileContentType = formDataBodyPart.getMediaType().toString(); - - // (2a) Load up optional params via JSON //--------------------------------------- OptionalFileParams optionalFileParams = null; @@ -1560,6 +1586,31 @@ public Response addFileToDataset(@PathParam("id") String idSupplied, } catch (DataFileTagException ex) { return error( Response.Status.BAD_REQUEST, ex.getMessage()); } + + // ------------------------------------- + // (3) Get the file name and content type + // ------------------------------------- + String newFilename = null; + String newFileContentType = null; + String newStorageIdentifier = null; + if (null == contentDispositionHeader) { + if (optionalFileParams.hasStorageIdentifier()) { + newStorageIdentifier = optionalFileParams.getStorageIdentifier(); + // ToDo - check that storageIdentifier is valid + if (optionalFileParams.hasFileName()) { + newFilename = optionalFileParams.getFileName(); + if (optionalFileParams.hasMimetype()) { + newFileContentType = optionalFileParams.getMimeType(); + } + } + } else { + return error(BAD_REQUEST, + "You must upload a file or provide a storageidentifier, filename, and mimetype."); + } + } else { + newFilename = contentDispositionHeader.getFileName(); + newFileContentType = formDataBodyPart.getMediaType().toString(); + } //------------------- @@ -1583,6 +1634,7 @@ public Response addFileToDataset(@PathParam("id") String idSupplied, addFileHelper.runAddFileByDataset(dataset, newFilename, newFileContentType, + newStorageIdentifier, fileInputStream, optionalFileParams); diff --git a/src/main/java/edu/harvard/iq/dataverse/api/datadeposit/MediaResourceManagerImpl.java b/src/main/java/edu/harvard/iq/dataverse/api/datadeposit/MediaResourceManagerImpl.java index f5cf35276d0..6dfe605774f 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/datadeposit/MediaResourceManagerImpl.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/datadeposit/MediaResourceManagerImpl.java @@ -300,7 +300,7 @@ DepositReceipt replaceOrAddFiles(String uri, Deposit deposit, AuthCredentials au List dataFiles = new ArrayList<>(); try { try { - dataFiles = FileUtil.createDataFiles(editVersion, deposit.getInputStream(), uploadedZipFilename, guessContentTypeForMe, systemConfig); + dataFiles = FileUtil.createDataFiles(editVersion, deposit.getInputStream(), uploadedZipFilename, guessContentTypeForMe, null, null, systemConfig); } catch (EJBException ex) { Throwable cause = ex.getCause(); if (cause != null) { diff --git a/src/main/java/edu/harvard/iq/dataverse/api/datadeposit/SwordConfigurationImpl.java b/src/main/java/edu/harvard/iq/dataverse/api/datadeposit/SwordConfigurationImpl.java index 4eb6e77fe21..ce5f9415fcc 100644 --- a/src/main/java/edu/harvard/iq/dataverse/api/datadeposit/SwordConfigurationImpl.java +++ b/src/main/java/edu/harvard/iq/dataverse/api/datadeposit/SwordConfigurationImpl.java @@ -124,8 +124,11 @@ public String getTempDirectory() { public int getMaxUploadSize() { int unlimited = -1; - - Long maxUploadInBytes = systemConfig.getMaxFileUploadSize(); + /* It doesn't look like we can determine which store will be used here, so we'll go with the default + * (It looks like the collection or study involved is available where this method is called, but the SwordConfiguration.getMaxUploadSize() + * doesn't allow a parameter) + */ + Long maxUploadInBytes = systemConfig.getMaxFileUploadSizeForStore("default"); if (maxUploadInBytes == null){ // (a) No setting, return unlimited diff --git a/src/main/java/edu/harvard/iq/dataverse/dataaccess/DataAccess.java b/src/main/java/edu/harvard/iq/dataverse/dataaccess/DataAccess.java index c2b7cbffd62..db87b1751c6 100644 --- a/src/main/java/edu/harvard/iq/dataverse/dataaccess/DataAccess.java +++ b/src/main/java/edu/harvard/iq/dataverse/dataaccess/DataAccess.java @@ -21,12 +21,11 @@ package edu.harvard.iq.dataverse.dataaccess; import edu.harvard.iq.dataverse.DvObject; -import edu.harvard.iq.dataverse.util.StringUtil; - import java.io.IOException; import java.util.HashMap; import java.util.Properties; import java.util.logging.Logger; + import org.apache.commons.lang.StringUtils; /** * @@ -42,9 +41,10 @@ public DataAccess() { }; - //Default is only for tests + //Default to "file" is for tests only public static final String DEFAULT_STORAGE_DRIVER_IDENTIFIER = System.getProperty("dataverse.files.storage-driver-id", "file"); - + public static final String UNDEFINED_STORAGE_DRIVER_IDENTIFIER = "undefined"; //Used in dataverse.xhtml as a non-null selection option value (indicating a null driver/inheriting the default) + // The getStorageIO() methods initialize StorageIO objects for // datafiles that are already saved using one of the supported Dataverse // DataAccess IO drivers. @@ -165,7 +165,7 @@ public static StorageIO createNewStorageIO(T dvObject, S dvObject.setStorageIdentifier(storageTag); - if (StringUtils.isEmpty(storageDriverId)) { + if (StringUtils.isBlank(storageDriverId)) { storageDriverId = DEFAULT_STORAGE_DRIVER_IDENTIFIER; } String storageType = getDriverType(storageDriverId); @@ -196,7 +196,7 @@ public static String getStorageDriverId(String driverLabel) { if (drivers==null) { populateDrivers(); } - if(StringUtil.nonEmpty(driverLabel) && drivers.containsKey(driverLabel)) { + if(!StringUtils.isBlank(driverLabel) && drivers.containsKey(driverLabel)) { return drivers.get(driverLabel); } return DEFAULT_STORAGE_DRIVER_IDENTIFIER; @@ -219,7 +219,6 @@ private static void populateDrivers() { logger.info("Found Storage Driver: " + driverId + " for " + p.get(property).toString()); drivers.put(p.get(property).toString(), driverId); } - } } diff --git a/src/main/java/edu/harvard/iq/dataverse/dataaccess/FileAccessIO.java b/src/main/java/edu/harvard/iq/dataverse/dataaccess/FileAccessIO.java index d7a405d63c7..c338ec8ff93 100644 --- a/src/main/java/edu/harvard/iq/dataverse/dataaccess/FileAccessIO.java +++ b/src/main/java/edu/harvard/iq/dataverse/dataaccess/FileAccessIO.java @@ -633,4 +633,4 @@ private String stripDriverId(String storageIdentifier) { } return storageIdentifier; } -} \ No newline at end of file +} diff --git a/src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java b/src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java index 8194ab80c58..973fa8ee42a 100644 --- a/src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java +++ b/src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java @@ -7,10 +7,12 @@ import com.amazonaws.client.builder.AwsClientBuilder; import com.amazonaws.services.s3.AmazonS3; import com.amazonaws.services.s3.AmazonS3ClientBuilder; +import com.amazonaws.services.s3.Headers; import com.amazonaws.services.s3.model.ObjectMetadata; import com.amazonaws.services.s3.model.PutObjectRequest; import com.amazonaws.services.s3.model.CopyObjectRequest; import com.amazonaws.services.s3.model.DeleteObjectRequest; +import com.amazonaws.services.s3.model.DeleteObjectTaggingRequest; import com.amazonaws.services.s3.model.DeleteObjectsRequest; import com.amazonaws.services.s3.model.DeleteObjectsRequest.KeyVersion; import com.amazonaws.services.s3.model.GeneratePresignedUrlRequest; @@ -125,8 +127,7 @@ public S3AccessIO(T dvObject, DataAccessRequest req, @NotNull AmazonS3 s3client, private boolean s3chunkedEncoding = true; private String s3profile = "default"; private String bucketName = null; - - private String key; + private String key = null; @Override public void open(DataAccessOption... options) throws IOException { @@ -212,7 +213,36 @@ public void open(DataAccessOption... options) throws IOException { } else if (dvObject instanceof Dataverse) { throw new IOException("Data Access: Storage driver does not support dvObject type Dataverse yet"); } else { + // Direct access, e.g. for external upload - no associated DVobject yet, but we want to be able to get the size + // With small files, it looks like we may call before S3 says it exists, so try some retries before failing + if(key!=null) { + ObjectMetadata objectMetadata = null; + int retries = 20; + while(retries > 0) { + try { + objectMetadata = s3.getObjectMetadata(bucketName, key); + if(retries != 20) { + logger.warning("Success for key: " + key + " after " + ((20-retries)*3) + " seconds"); + } + retries = 0; + } catch (SdkClientException sce) { + if(retries > 1) { + retries--; + try { + Thread.sleep(3000); + } catch (InterruptedException e) { + e.printStackTrace(); + } + logger.warning("Retrying after: " + sce.getMessage()); + } else { + throw new IOException("Cannot get S3 object " + key + " ("+sce.getMessage()+")"); + } + } + } + this.setSize(objectMetadata.getContentLength()); + }else { throw new IOException("Data Access: Invalid DvObject type"); + } } } @@ -678,6 +708,9 @@ public boolean exists() { String destinationKey = null; if (dvObject instanceof DataFile) { destinationKey = key; + } else if((dvObject==null) && (key !=null)) { + //direct access + destinationKey = key; } else { logger.warning("Trying to check if a path exists is only supported for a data file."); } @@ -781,7 +814,7 @@ public String generateTemporaryS3Url() throws IOException { key = getMainFileKey(); java.util.Date expiration = new java.util.Date(); long msec = expiration.getTime(); - msec += 1000 * getUrlExpirationMinutes(); + msec += 60 * 1000 * getUrlExpirationMinutes(); expiration.setTime(msec); GeneratePresignedUrlRequest generatePresignedUrlRequest = @@ -828,6 +861,40 @@ public String generateTemporaryS3Url() throws IOException { } } + public String generateTemporaryS3UploadUrl() throws IOException { + + key = getMainFileKey(); + java.util.Date expiration = new java.util.Date(); + long msec = expiration.getTime(); + msec += 60 * 1000 * getUrlExpirationMinutes(); + expiration.setTime(msec); + + GeneratePresignedUrlRequest generatePresignedUrlRequest = + new GeneratePresignedUrlRequest(bucketName, key).withMethod(HttpMethod.PUT).withExpiration(expiration); + //Require user to add this header to indicate a temporary file + generatePresignedUrlRequest.putCustomRequestHeader(Headers.S3_TAGGING, "dv-state=temp"); + + URL presignedUrl; + try { + presignedUrl = s3.generatePresignedUrl(generatePresignedUrlRequest); + } catch (SdkClientException sce) { + logger.warning("SdkClientException generating temporary S3 url for "+key+" ("+sce.getMessage()+")"); + presignedUrl = null; + } + String urlString = null; + if (presignedUrl != null) { + String endpoint = System.getProperty("dataverse.files." + driverId + ".custom-endpoint-url"); + String proxy = System.getProperty("dataverse.files." + driverId + ".proxy-url"); + if(proxy!=null) { + urlString = presignedUrl.toString().replace(endpoint, proxy); + } else { + urlString = presignedUrl.toString(); + } + } + + return urlString; + } + int getUrlExpirationMinutes() { String optionValue = System.getProperty("dataverse.files." + this.driverId + ".url-expiration-minutes"); if (optionValue != null) { @@ -877,8 +944,34 @@ private void readSettings() { s3profile = System.getProperty("dataverse.files." + this.driverId + ".profile","default"); bucketName = System.getProperty("dataverse.files." + this.driverId + ".bucket-name"); - - + } + + + public void removeTempTag() throws IOException { + if (!(dvObject instanceof DataFile)) { + logger.warning("Attempt to remove tag from non-file DVObject id: " + dvObject.getId()); + throw new IOException("Attempt to remove temp tag from non-file S3 Object"); + } + try { + + key = getMainFileKey(); + DeleteObjectTaggingRequest deleteObjectTaggingRequest = new DeleteObjectTaggingRequest(bucketName, key); + //NOte - currently we only use one tag so delete is the fastest and cheapest way to get rid of that one tag + //Otherwise you have to get tags, remove the one you don't want and post new tags and get charged for the operations + s3.deleteObjectTagging(deleteObjectTaggingRequest); + } catch (SdkClientException sce) { + if(sce.getMessage().contains("Status Code: 501")) { + // In this case, it's likely that tags are not implemented at all (e.g. by Minio) so no tag was set either and it's just something to be aware of + logger.warning("Temp tag not deleted: Object tags not supported by storage: " + driverId); + } else { + // In this case, the assumption is that adding tags has worked, so not removing it is a problem that should be looked into. + logger.severe("Unable to remove temp tag from : " + bucketName + " : " + key); + } + } catch (IOException e) { + logger.warning("Could not create key for S3 object." ); + e.printStackTrace(); + } + } } diff --git a/src/main/java/edu/harvard/iq/dataverse/dataaccess/StorageIO.java b/src/main/java/edu/harvard/iq/dataverse/dataaccess/StorageIO.java index 9e0cf7e11b8..2f66eec5f4c 100644 --- a/src/main/java/edu/harvard/iq/dataverse/dataaccess/StorageIO.java +++ b/src/main/java/edu/harvard/iq/dataverse/dataaccess/StorageIO.java @@ -533,4 +533,13 @@ protected boolean isWriteAccessRequested(DataAccessOption... options) throws IOE // By default, we open the file in read mode: return false; } + + public boolean isBelowIngestSizeLimit() { + long limit = Long.parseLong(System.getProperty("dataverse.files." + this.driverId + ".ingestsizelimit", "-1")); + if(limit>0 && getSize()>limit) { + return false; + } else { + return true; + } + } } diff --git a/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java b/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java index f44e33404c9..6716cb15327 100644 --- a/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java +++ b/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java @@ -120,10 +120,12 @@ public class AddReplaceFileHelper{ private InputStream newFileInputStream; // step 20 private String newFileName; // step 20 private String newFileContentType; // step 20 + private String newStorageIdentifier; // step 20 + private String newCheckSum; // step 20 + // -- Optional private DataFile fileToReplace; // step 25 - // ----------------------------------- // Instance variables derived from other input // ----------------------------------- @@ -258,6 +260,7 @@ public AddReplaceFileHelper(DataverseRequest dvRequest, public boolean runAddFileByDataset(Dataset chosenDataset, String newFileName, String newFileContentType, + String newStorageIdentifier, InputStream newFileInputStream, OptionalFileParams optionalFileParams){ @@ -272,7 +275,7 @@ public boolean runAddFileByDataset(Dataset chosenDataset, } //return this.runAddFile(this.dataset, newFileName, newFileContentType, newFileInputStream, optionalFileParams); - return this.runAddReplaceFile(dataset, newFileName, newFileContentType, newFileInputStream, optionalFileParams); + return this.runAddReplaceFile(dataset, newFileName, newFileContentType, newStorageIdentifier, newFileInputStream, optionalFileParams); } @@ -342,9 +345,7 @@ public boolean runForceReplaceFile(Long oldFileId, } - - - public boolean runReplaceFile(Long oldFileId, + public boolean runReplaceFile(Long oldFileId, String newFileName, String newFileContentType, InputStream newFileInputStream, @@ -386,13 +387,19 @@ public boolean runReplaceFile(Long oldFileId, * * The UI will call Phase 1 on initial upload and * then run Phase 2 if the user chooses to save the changes. + * @param newStorageIdentifier * * * @return */ + private boolean runAddReplaceFile(Dataset owner, String newFileName, String newFileContentType, + InputStream newFileInputStream, OptionalFileParams optionalFileParams) { + return runAddReplaceFile(dataset,newFileName, newFileContentType, null, newFileInputStream, optionalFileParams); + } + private boolean runAddReplaceFile(Dataset dataset, String newFileName, String newFileContentType, - InputStream newFileInputStream, + String newStorageIdentifier, InputStream newFileInputStream, OptionalFileParams optionalFileParams){ // Run "Phase 1" - Initial ingest of file + error check @@ -401,6 +408,7 @@ private boolean runAddReplaceFile(Dataset dataset, boolean phase1Success = runAddReplacePhase1(dataset, newFileName, newFileContentType, + newStorageIdentifier, newFileInputStream, optionalFileParams ); @@ -429,6 +437,7 @@ public boolean runReplaceFromUI_Phase1(Long oldFileId, String newFileName, String newFileContentType, InputStream newFileInputStream, + String fullStorageId, OptionalFileParams optionalFileParams){ @@ -449,7 +458,8 @@ public boolean runReplaceFromUI_Phase1(Long oldFileId, return this.runAddReplacePhase1(fileToReplace.getOwner(), newFileName, - newFileContentType, + newFileContentType, + fullStorageId, newFileInputStream, optionalFileParams); @@ -462,13 +472,14 @@ public boolean runReplaceFromUI_Phase1(Long oldFileId, * * Phase 1 (here): Add/replace the file and make sure there are no errors * But don't update the Dataset (yet) + * @param newStorageIdentifier * * @return */ private boolean runAddReplacePhase1(Dataset dataset, String newFileName, String newFileContentType, - InputStream newFileInputStream, + String newStorageIdentifier, InputStream newFileInputStream, OptionalFileParams optionalFileParams){ if (this.hasError()){ @@ -487,11 +498,14 @@ private boolean runAddReplacePhase1(Dataset dataset, } msgt("step_020_loadNewFile"); - if (!this.step_020_loadNewFile(newFileName, newFileContentType, newFileInputStream)){ + if (!this.step_020_loadNewFile(newFileName, newFileContentType, newStorageIdentifier, newFileInputStream)){ return false; } + if(optionalFileParams.hasCheckSum()) { + newCheckSum = optionalFileParams.getCheckSum(); + } msgt("step_030_createNewFilesViaIngest"); if (!this.step_030_createNewFilesViaIngest()){ return false; @@ -914,7 +928,7 @@ private boolean step_015_auto_check_permissions(Dataset datasetToCheck){ } - private boolean step_020_loadNewFile(String fileName, String fileContentType, InputStream fileInputStream){ + private boolean step_020_loadNewFile(String fileName, String fileContentType, String storageIdentifier, InputStream fileInputStream){ if (this.hasError()){ return false; @@ -932,18 +946,25 @@ private boolean step_020_loadNewFile(String fileName, String fileContentType, In } - if (fileInputStream == null){ - this.addErrorSevere(getBundleErr("file_upload_failed")); - return false; - } - + if (fileInputStream == null) { + if (storageIdentifier == null) { + this.addErrorSevere(getBundleErr("file_upload_failed")); + return false; + } else { + newStorageIdentifier = storageIdentifier; + } + } + newFileName = fileName; newFileContentType = fileContentType; + + //One of these will be null + newStorageIdentifier = storageIdentifier; newFileInputStream = fileInputStream; return true; } - + /** * Optional: old file to replace @@ -1050,6 +1071,8 @@ private boolean step_030_createNewFilesViaIngest(){ this.newFileInputStream, this.newFileName, this.newFileContentType, + this.newStorageIdentifier, + this.newCheckSum, this.systemConfig); } catch (IOException ex) { diff --git a/src/main/java/edu/harvard/iq/dataverse/datasetutility/FileReplacePageHelper.java b/src/main/java/edu/harvard/iq/dataverse/datasetutility/FileReplacePageHelper.java index e6d7c1e5ebe..94f247e6419 100644 --- a/src/main/java/edu/harvard/iq/dataverse/datasetutility/FileReplacePageHelper.java +++ b/src/main/java/edu/harvard/iq/dataverse/datasetutility/FileReplacePageHelper.java @@ -96,14 +96,14 @@ public boolean resetReplaceFileHelper(){ * Handle native file replace * @param event */ - public boolean handleNativeFileUpload(InputStream inputStream, String fileName, String fileContentType) { + public boolean handleNativeFileUpload(InputStream inputStream, String fullStorageId, String fileName, String fileContentType) { phase1Success = false; // Preliminary sanity check // - if (inputStream == null){ - throw new NullPointerException("inputStream cannot be null"); + if ((inputStream == null)&&(fullStorageId==null)){ + throw new NullPointerException("inputStream and storageId cannot both be null"); } if (fileName == null){ throw new NullPointerException("fileName cannot be null"); @@ -118,6 +118,7 @@ public boolean handleNativeFileUpload(InputStream inputStream, String fileName, fileName, fileContentType, inputStream, + fullStorageId, null ); diff --git a/src/main/java/edu/harvard/iq/dataverse/datasetutility/FileSizeChecker.java b/src/main/java/edu/harvard/iq/dataverse/datasetutility/FileSizeChecker.java index 8d24270c76c..06b3f467867 100644 --- a/src/main/java/edu/harvard/iq/dataverse/datasetutility/FileSizeChecker.java +++ b/src/main/java/edu/harvard/iq/dataverse/datasetutility/FileSizeChecker.java @@ -6,9 +6,6 @@ package edu.harvard.iq.dataverse.datasetutility; import edu.harvard.iq.dataverse.util.BundleUtil; -import edu.harvard.iq.dataverse.util.SystemConfig; -import java.util.Collections; -import java.util.logging.Logger; /** * Convenience methods for checking max. file size @@ -16,94 +13,24 @@ */ public class FileSizeChecker { - private static final Logger logger = Logger.getLogger(FileSizeChecker.class.getCanonicalName()); + /* This method turns a number of bytes into a human readable version + */ + public static String bytesToHumanReadable(long v) { + return bytesToHumanReadable(v, 1); + } + + /* This method turns a number of bytes into a human readable version + * with figs decimal places + */ + public static String bytesToHumanReadable(long v, int figs) { + if (v < 1024) { + return v + " " + BundleUtil.getStringFromBundle("file.addreplace.error.byte_abrev"); + } + // 63 - because long has 63 binary digits + int trailingBin0s = (63 - Long.numberOfLeadingZeros(v))/10; + //String base = "%."+figs+"f %s"+ BundleUtil.getStringFromBundle("file.addreplace.error.byte_abrev"); + return String.format("%."+figs+"f %s"+ BundleUtil.getStringFromBundle("file.addreplace.error.byte_abrev"), (double)v / (1L << (trailingBin0s*10)), + " KMGTPE".charAt(trailingBin0s)); + } - SystemConfig systemConfig; - - /** - * constructor - */ - public FileSizeChecker(SystemConfig systemConfig){ - if (systemConfig == null){ - throw new NullPointerException("systemConfig cannot be null"); - } - this.systemConfig = systemConfig; - } - - public FileSizeResponse isAllowedFileSize(Long filesize){ - - if (filesize == null){ - throw new NullPointerException("filesize cannot be null"); - //return new FileSizeResponse(false, "The file size could not be found!"); - } - - Long maxFileSize = systemConfig.getMaxFileUploadSize(); - - // If no maxFileSize in the database, set it to unlimited! - // - if (maxFileSize == null){ - return new FileSizeResponse(true, - BundleUtil.getStringFromBundle("file.addreplace.file_size_ok") - ); - } - - // Good size! - // - if (filesize <= maxFileSize){ - return new FileSizeResponse(true, - BundleUtil.getStringFromBundle("file.addreplace.file_size_ok") - ); - } - - // Nope! Sorry! File is too big - // - String errMsg = BundleUtil.getStringFromBundle("file.addreplace.error.file_exceeds_limit", Collections.singletonList(bytesToHumanReadable(maxFileSize))); - - return new FileSizeResponse(false, errMsg); - - } - - /* This method turns a number of bytes into a human readable version - */ - public static String bytesToHumanReadable(long v) { - return bytesToHumanReadable(v, 1); - } - - /* This method turns a number of bytes into a human readable version - * with figs decimal places - */ - public static String bytesToHumanReadable(long v, int figs) { - if (v < 1024) { - return v + " " + BundleUtil.getStringFromBundle("file.addreplace.error.byte_abrev"); - } - // 63 - because long has 63 binary digits - int trailingBin0s = (63 - Long.numberOfLeadingZeros(v))/10; - //String base = "%."+figs+"f %s"+ BundleUtil.getStringFromBundle("file.addreplace.error.byte_abrev"); - return String.format("%."+figs+"f %s"+ BundleUtil.getStringFromBundle("file.addreplace.error.byte_abrev"), (double)v / (1L << (trailingBin0s*10)), - " KMGTPE".charAt(trailingBin0s)); - } - - /** - * Inner class that can also return an error message - */ - public class FileSizeResponse{ - - public boolean fileSizeOK; - public String userMsg; - - public FileSizeResponse(boolean isOk, String msg){ - - fileSizeOK = isOk; - userMsg = msg; - } - - public boolean isFileSizeOK(){ - return fileSizeOK; - } - - public String getUserMessage(){ - return userMsg; - } - - } // end inner class } diff --git a/src/main/java/edu/harvard/iq/dataverse/datasetutility/OptionalFileParams.java b/src/main/java/edu/harvard/iq/dataverse/datasetutility/OptionalFileParams.java index 6459715e518..11787d25a7e 100644 --- a/src/main/java/edu/harvard/iq/dataverse/datasetutility/OptionalFileParams.java +++ b/src/main/java/edu/harvard/iq/dataverse/datasetutility/OptionalFileParams.java @@ -61,7 +61,15 @@ public class OptionalFileParams { private boolean restrict = false; public static final String RESTRICT_ATTR_NAME = "restrict"; - + + private String storageIdentifier; + public static final String STORAGE_IDENTIFIER_ATTR_NAME = "storageIdentifier"; + private String fileName; + public static final String FILE_NAME_ATTR_NAME = "fileName"; + private String mimeType; + public static final String MIME_TYPE_ATTR_NAME = "mimeType"; + private String checkSum; + public static final String CHECKSUM_ATTR_NAME = "md5Hash"; public OptionalFileParams(String jsonData) throws DataFileTagException{ @@ -184,7 +192,39 @@ public boolean hasProvFreeform(){ } return true; } - + + public boolean hasStorageIdentifier() { + return ((storageIdentifier!=null)&&(!storageIdentifier.isEmpty())); + } + + public String getStorageIdentifier() { + return storageIdentifier; + } + + public boolean hasFileName() { + return ((fileName!=null)&&(!fileName.isEmpty())); + } + + public String getFileName() { + return fileName; + } + + public boolean hasMimetype() { + return ((mimeType!=null)&&(!mimeType.isEmpty())); + } + + public String getMimeType() { + return mimeType; + } + + public boolean hasCheckSum() { + return ((checkSum!=null)&&(!checkSum.isEmpty())); + } + + public String getCheckSum() { + return checkSum; + } + /** * Set tags * @param tags @@ -281,6 +321,38 @@ private void loadParamsFromJson(String jsonData) throws DataFileTagException{ this.restrict = Boolean.valueOf(jsonObj.get(RESTRICT_ATTR_NAME).getAsString()); } + // ------------------------------- + // get storage identifier as string + // ------------------------------- + if ((jsonObj.has(STORAGE_IDENTIFIER_ATTR_NAME)) && (!jsonObj.get(STORAGE_IDENTIFIER_ATTR_NAME).isJsonNull())){ + + this.storageIdentifier = jsonObj.get(STORAGE_IDENTIFIER_ATTR_NAME).getAsString(); + } + + // ------------------------------- + // get file name as string + // ------------------------------- + if ((jsonObj.has(FILE_NAME_ATTR_NAME)) && (!jsonObj.get(FILE_NAME_ATTR_NAME).isJsonNull())){ + + this.fileName = jsonObj.get(FILE_NAME_ATTR_NAME).getAsString(); + } + + // ------------------------------- + // get mimetype as string + // ------------------------------- + if ((jsonObj.has(MIME_TYPE_ATTR_NAME)) && (!jsonObj.get(MIME_TYPE_ATTR_NAME).isJsonNull())){ + + this.mimeType = jsonObj.get(MIME_TYPE_ATTR_NAME).getAsString(); + } + + // ------------------------------- + // get checkSum as string + // ------------------------------- + if ((jsonObj.has(CHECKSUM_ATTR_NAME)) && (!jsonObj.get(CHECKSUM_ATTR_NAME).isJsonNull())){ + + this.checkSum = jsonObj.get(CHECKSUM_ATTR_NAME).getAsString(); + } + // ------------------------------- // get tags // ------------------------------- @@ -516,5 +588,5 @@ private void replaceFileDataTagsInFile(DataFile df) throws DataFileTagException{ } } - + } diff --git a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateNewDatasetCommand.java b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateNewDatasetCommand.java index 3ce10e40abe..e97eeb47ab3 100644 --- a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateNewDatasetCommand.java +++ b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateNewDatasetCommand.java @@ -69,7 +69,7 @@ public CreateNewDatasetCommand(Dataset theDataset, DataverseRequest aRequest, bo protected void additionalParameterTests(CommandContext ctxt) throws CommandException { if ( nonEmpty(getDataset().getIdentifier()) ) { GlobalIdServiceBean idServiceBean = GlobalIdServiceBean.getBean(getDataset().getProtocol(), ctxt); - if ( ctxt.datasets().isIdentifierUnique(getDataset().getIdentifier(), getDataset(), idServiceBean) ) { + if ( !ctxt.datasets().isIdentifierUnique(getDataset().getIdentifier(), getDataset(), idServiceBean) ) { throw new IllegalCommandException(String.format("Dataset with identifier '%s', protocol '%s' and authority '%s' already exists", getDataset().getIdentifier(), getDataset().getProtocol(), getDataset().getAuthority()), this); diff --git a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java index 0bcf11d371d..fefa8707c8b 100644 --- a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java +++ b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java @@ -15,6 +15,9 @@ import java.util.concurrent.Future; import java.util.logging.Level; import java.util.logging.Logger; + +import javax.validation.ConstraintViolationException; + import org.apache.solr.client.solrj.SolrServerException; /** @@ -119,7 +122,13 @@ public Dataset execute(CommandContext ctxt) throws CommandException { if (editVersion.getId() == null || editVersion.getId() == 0L) { ctxt.em().persist(editVersion); } else { - ctxt.em().merge(editVersion); + try { + ctxt.em().merge(editVersion); + } catch (ConstraintViolationException e) { + logger.log(Level.SEVERE,"Exception: "); + e.getConstraintViolations().forEach(err->logger.log(Level.SEVERE,err.toString())); + throw e; + } } for (DataFile dataFile : theDataset.getFiles()) { diff --git a/src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java index b06877df2ae..e7ccc7bf1b4 100644 --- a/src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/ingest/IngestServiceBean.java @@ -44,6 +44,7 @@ import edu.harvard.iq.dataverse.dataaccess.DataAccessOption; import edu.harvard.iq.dataverse.dataaccess.StorageIO; import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter; +import edu.harvard.iq.dataverse.dataaccess.S3AccessIO; import edu.harvard.iq.dataverse.dataaccess.TabularSubsetGenerator; import edu.harvard.iq.dataverse.datavariable.SummaryStatistic; import edu.harvard.iq.dataverse.datavariable.DataVariable; @@ -72,6 +73,7 @@ import java.io.File; import java.io.FileInputStream; import java.io.IOException; +import java.io.InputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.nio.channels.FileChannel; @@ -175,16 +177,16 @@ public List saveAndAddFilesToDataset(DatasetVersion version, List saveAndAddFilesToDataset(DatasetVersion version, List dataAccess = DataAccess.getStorageIO(dataFile); @@ -347,18 +302,74 @@ public List saveAndAddFilesToDataset(DatasetVersion version, List)dataAccess).removeTempTag(); + } } catch (IOException ioex) { logger.warning("Failed to get file size, storage id " + dataFile.getStorageIdentifier() + " (" + ioex.getMessage() + ")"); } savedSuccess = true; - logger.info("unattached: " + unattached); - dataFile.setOwner(null); - + dataFile.setOwner(null); } logger.fine("Done! Finished saving new files in permanent storage and adding them to the dataset."); - + boolean belowLimit = false; + + try { + belowLimit = dataFile.getStorageIO().isBelowIngestSizeLimit(); + } catch (IOException e) { + logger.warning("Error getting ingest limit for file: " + dataFile.getIdentifier() + " : " + e.getMessage()); + } + + if (savedSuccess && belowLimit) { + // These are all brand new files, so they should all have + // one filemetadata total. -- L.A. + FileMetadata fileMetadata = dataFile.getFileMetadatas().get(0); + String fileName = fileMetadata.getLabel(); + + boolean metadataExtracted = false; + if (FileUtil.canIngestAsTabular(dataFile)) { + /* + * Note that we don't try to ingest the file right away - instead we mark it as + * "scheduled for ingest", then at the end of the save process it will be queued + * for async. ingest in the background. In the meantime, the file will be + * ingested as a regular, non-tabular file, and appear as such to the user, + * until the ingest job is finished with the Ingest Service. + */ + dataFile.SetIngestScheduled(); + } else if (fileMetadataExtractable(dataFile)) { + + try { + // FITS is the only type supported for metadata + // extraction, as of now. -- L.A. 4.0 + dataFile.setContentType("application/fits"); + metadataExtracted = extractMetadata(tempFileLocation, dataFile, version); + } catch (IOException mex) { + logger.severe("Caught exception trying to extract indexable metadata from file " + + fileName + ", " + mex.getMessage()); + } + if (metadataExtracted) { + logger.fine("Successfully extracted indexable metadata from file " + fileName); + } else { + logger.fine("Failed to extract indexable metadata from file " + fileName); + } + } else if (FileUtil.MIME_TYPE_INGESTED_FILE.equals(dataFile.getContentType())) { + // Make sure no *uningested* tab-delimited files are saved with the type "text/tab-separated-values"! + // "text/tsv" should be used instead: + dataFile.setContentType(FileUtil.MIME_TYPE_TSV); + } + } + // ... and let's delete the main temp file if it exists: + if(tempLocationPath!=null) { + try { + logger.fine("Will attempt to delete the temp file " + tempLocationPath.toString()); + Files.delete(tempLocationPath); + } catch (IOException ex) { + // (non-fatal - it's just a temp file.) + logger.warning("Failed to delete temp file " + tempLocationPath.toString()); + } + } if (savedSuccess) { // temp dbug line // System.out.println("ADDING FILE: " + fileName + "; for dataset: " + @@ -1146,12 +1157,17 @@ public boolean fileMetadataExtractable(DataFile dataFile) { public boolean extractMetadata(String tempFileLocation, DataFile dataFile, DatasetVersion editVersion) throws IOException { boolean ingestSuccessful = false; - FileInputStream tempFileInputStream = null; - - try { - tempFileInputStream = new FileInputStream(new File(tempFileLocation)); - } catch (FileNotFoundException notfoundEx) { - throw new IOException("Could not open temp file "+tempFileLocation); + InputStream tempFileInputStream = null; + if(tempFileLocation == null) { + StorageIO sio = dataFile.getStorageIO(); + sio.open(DataAccessOption.READ_ACCESS); + tempFileInputStream = sio.getInputStream(); + } else { + try { + tempFileInputStream = new FileInputStream(new File(tempFileLocation)); + } catch (FileNotFoundException notfoundEx) { + throw new IOException("Could not open temp file "+tempFileLocation); + } } // Locate metadata extraction plugin for the file format by looking diff --git a/src/main/java/edu/harvard/iq/dataverse/pidproviders/FakePidProviderServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/pidproviders/FakePidProviderServiceBean.java index ce9e281e986..eb313631077 100644 --- a/src/main/java/edu/harvard/iq/dataverse/pidproviders/FakePidProviderServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/pidproviders/FakePidProviderServiceBean.java @@ -15,7 +15,11 @@ public class FakePidProviderServiceBean extends AbstractGlobalIdServiceBean { @Override public boolean alreadyExists(DvObject dvo) throws Exception { - return true; + /* Direct upload creates an identifier prior to calling the CreateNewDatasetCommand - if this is true, that call fails. + * In that case, the local test (DatasetServiceBean.isIdentifierLocallyUnique()) correctly returns false since it tests the database. + * This provider could do the same check or use some other method to test alreadyExists(DvObject) =true failures. (no tests found currently) + */ + return false; } @Override diff --git a/src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java index 473bea561b4..ece5eff2843 100644 --- a/src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java +++ b/src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java @@ -1,9 +1,13 @@ package edu.harvard.iq.dataverse.settings; +import edu.harvard.iq.dataverse.Dataset; import edu.harvard.iq.dataverse.actionlogging.ActionLogRecord; import edu.harvard.iq.dataverse.actionlogging.ActionLogServiceBean; import edu.harvard.iq.dataverse.api.ApiBlockingFilter; +import edu.harvard.iq.dataverse.engine.command.DataverseRequest; import edu.harvard.iq.dataverse.util.StringUtil; + +import java.io.StringReader; import java.util.HashSet; import java.util.List; import java.util.Set; @@ -12,6 +16,8 @@ import javax.ejb.EJB; import javax.ejb.Stateless; import javax.inject.Named; +import javax.json.Json; +import javax.json.JsonObject; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; @@ -162,7 +168,7 @@ public enum Key { /** Enable full-text indexing in solr up to max file size */ SolrFullTextIndexing, //true or false (default) SolrMaxFileSizeForFullTextIndexing, //long - size in bytes (default unset/no limit) - /** Key for limiting the number of bytes uploaded via the Data Deposit API, UI (web site and . */ + /** Default Key for limiting the number of bytes uploaded via the Data Deposit API, UI (web site and . */ MaxFileUploadSizeInBytes, /** Key for if ScrubMigrationData is enabled or disabled. */ ScrubMigrationData, @@ -477,6 +483,44 @@ public Long getValueForKeyAsLong(Key key){ } + /** + * Attempt to convert a value in a compound key to a long + * - Applicable for keys such as MaxFileUploadSizeInBytes after multistore capabilities were added in ~v4.20 + * backward compatible with a single value. For multi values, the key's value must be an object with param:value pairs. + * A "default":value pair is allowed and will be returned for any param that doesn't have a defined value. + * + * On failure (key not found or string not convertible to a long), returns null + * @param key + * @return + */ + public Long getValueForCompoundKeyAsLong(Key key, String param){ + + String val = this.getValueForKey(key); + + if (val == null){ + return null; + } + + try { + return Long.parseLong(val); + } catch (NumberFormatException ex) { + try ( StringReader rdr = new StringReader(val) ) { + JsonObject settings = Json.createReader(rdr).readObject(); + if(settings.containsKey(param)) { + return Long.parseLong(settings.getString(param)); + } else if(settings.containsKey("default")) { + return Long.parseLong(settings.getString("default")); + } else { + return null; + } + + } catch (Exception e) { + logger.log(Level.WARNING, "Incorrect setting. Could not convert \"{0}\" from setting {1} to long: {2}", new Object[]{val, key.toString(), e.getMessage()}); + return null; + } + } + + } /** * Return the value stored, or the default value, in case no setting by that diff --git a/src/main/java/edu/harvard/iq/dataverse/util/FileUtil.java b/src/main/java/edu/harvard/iq/dataverse/util/FileUtil.java index ec77e53d790..a4370c7b38f 100644 --- a/src/main/java/edu/harvard/iq/dataverse/util/FileUtil.java +++ b/src/main/java/edu/harvard/iq/dataverse/util/FileUtil.java @@ -79,7 +79,6 @@ import java.util.zip.GZIPInputStream; import java.util.zip.ZipEntry; import java.util.zip.ZipInputStream; -import static edu.harvard.iq.dataverse.datasetutility.FileSizeChecker.bytesToHumanReadable; import org.apache.commons.io.FilenameUtils; import com.amazonaws.AmazonServiceException; @@ -707,7 +706,7 @@ public static String generateOriginalExtension(String fileType) { return ""; } - public static List createDataFiles(DatasetVersion version, InputStream inputStream, String fileName, String suppliedContentType, SystemConfig systemConfig) throws IOException { + public static List createDataFiles(DatasetVersion version, InputStream inputStream, String fileName, String suppliedContentType, String newStorageIdentifier, String newCheckSum, SystemConfig systemConfig) throws IOException { List datafiles = new ArrayList<>(); String warningMessage = null; @@ -715,381 +714,399 @@ public static List createDataFiles(DatasetVersion version, InputStream // save the file, in the temporary location for now: Path tempFile = null; - Long fileSizeLimit = systemConfig.getMaxFileUploadSize(); - - if (getFilesTempDirectory() != null) { - tempFile = Files.createTempFile(Paths.get(getFilesTempDirectory()), "tmp", "upload"); - // "temporary" location is the key here; this is why we are not using - // the DataStore framework for this - the assumption is that - // temp files will always be stored on the local filesystem. - // -- L.A. Jul. 2014 - logger.fine("Will attempt to save the file as: " + tempFile.toString()); - Files.copy(inputStream, tempFile, StandardCopyOption.REPLACE_EXISTING); - - // A file size check, before we do anything else: - // (note that "no size limit set" = "unlimited") - // (also note, that if this is a zip file, we'll be checking - // the size limit for each of the individual unpacked files) - Long fileSize = tempFile.toFile().length(); - if (fileSizeLimit != null && fileSize > fileSizeLimit) { - try {tempFile.toFile().delete();} catch (Exception ex) {} - throw new IOException (MessageFormat.format(BundleUtil.getStringFromBundle("file.addreplace.error.file_exceeds_limit"), bytesToHumanReadable(fileSize), bytesToHumanReadable(fileSizeLimit))); - } - - } else { - throw new IOException ("Temp directory is not configured."); - } - logger.fine("mime type supplied: "+suppliedContentType); - // Let's try our own utilities (Jhove, etc.) to determine the file type - // of the uploaded file. (We may already have a mime type supplied for this - // file - maybe the type that the browser recognized on upload; or, if - // it's a harvest, maybe the remote server has already given us the type - // for this file... with our own type utility we may or may not do better - // than the type supplied: - // -- L.A. - String recognizedType = null; + Long fileSizeLimit = systemConfig.getMaxFileUploadSizeForStore(version.getDataset().getOwner().getEffectiveStorageDriverId()); String finalType = null; - try { - recognizedType = determineFileType(tempFile.toFile(), fileName); - logger.fine("File utility recognized the file as " + recognizedType); - if (recognizedType != null && !recognizedType.equals("")) { - // is it any better than the type that was supplied to us, - // if any? - // This is not as trivial a task as one might expect... - // We may need a list of "good" mime types, that should always - // be chosen over other choices available. Maybe it should - // even be a weighed list... as in, "application/foo" should - // be chosen over "application/foo-with-bells-and-whistles". - - // For now the logic will be as follows: - // - // 1. If the contentType supplied (by the browser, most likely) - // is some form of "unknown", we always discard it in favor of - // whatever our own utilities have determined; - // 2. We should NEVER trust the browser when it comes to the - // following "ingestable" types: Stata, SPSS, R; - // 2a. We are willing to TRUST the browser when it comes to - // the CSV and XSLX ingestable types. - // 3. We should ALWAYS trust our utilities when it comes to - // ingestable types. - - if (suppliedContentType == null + if (newStorageIdentifier == null) { + if (getFilesTempDirectory() != null) { + tempFile = Files.createTempFile(Paths.get(getFilesTempDirectory()), "tmp", "upload"); + // "temporary" location is the key here; this is why we are not using + // the DataStore framework for this - the assumption is that + // temp files will always be stored on the local filesystem. + // -- L.A. Jul. 2014 + logger.fine("Will attempt to save the file as: " + tempFile.toString()); + Files.copy(inputStream, tempFile, StandardCopyOption.REPLACE_EXISTING); + + // A file size check, before we do anything else: + // (note that "no size limit set" = "unlimited") + // (also note, that if this is a zip file, we'll be checking + // the size limit for each of the individual unpacked files) + Long fileSize = tempFile.toFile().length(); + if (fileSizeLimit != null && fileSize > fileSizeLimit) { + try {tempFile.toFile().delete();} catch (Exception ex) {} + throw new IOException (MessageFormat.format(BundleUtil.getStringFromBundle("file.addreplace.error.file_exceeds_limit"), bytesToHumanReadable(fileSize), bytesToHumanReadable(fileSizeLimit))); + } + + } else { + throw new IOException("Temp directory is not configured."); + } + logger.fine("mime type supplied: " + suppliedContentType); + // Let's try our own utilities (Jhove, etc.) to determine the file type + // of the uploaded file. (We may already have a mime type supplied for this + // file - maybe the type that the browser recognized on upload; or, if + // it's a harvest, maybe the remote server has already given us the type + // for this file... with our own type utility we may or may not do better + // than the type supplied: + // -- L.A. + String recognizedType = null; + + try { + recognizedType = determineFileType(tempFile.toFile(), fileName); + logger.fine("File utility recognized the file as " + recognizedType); + if (recognizedType != null && !recognizedType.equals("")) { + // is it any better than the type that was supplied to us, + // if any? + // This is not as trivial a task as one might expect... + // We may need a list of "good" mime types, that should always + // be chosen over other choices available. Maybe it should + // even be a weighed list... as in, "application/foo" should + // be chosen over "application/foo-with-bells-and-whistles". + + // For now the logic will be as follows: + // + // 1. If the contentType supplied (by the browser, most likely) + // is some form of "unknown", we always discard it in favor of + // whatever our own utilities have determined; + // 2. We should NEVER trust the browser when it comes to the + // following "ingestable" types: Stata, SPSS, R; + // 2a. We are willing to TRUST the browser when it comes to + // the CSV and XSLX ingestable types. + // 3. We should ALWAYS trust our utilities when it comes to + // ingestable types. + + if (suppliedContentType == null || suppliedContentType.equals("") - || suppliedContentType.equalsIgnoreCase(MIME_TYPE_UNDETERMINED_DEFAULT) - || suppliedContentType.equalsIgnoreCase(MIME_TYPE_UNDETERMINED_BINARY) - || (canIngestAsTabular(suppliedContentType) - && !suppliedContentType.equalsIgnoreCase(MIME_TYPE_CSV) - && !suppliedContentType.equalsIgnoreCase(MIME_TYPE_CSV_ALT) - && !suppliedContentType.equalsIgnoreCase(MIME_TYPE_XLSX)) + || suppliedContentType.equalsIgnoreCase(MIME_TYPE_UNDETERMINED_DEFAULT) + || suppliedContentType.equalsIgnoreCase(MIME_TYPE_UNDETERMINED_BINARY) + || (canIngestAsTabular(suppliedContentType) + && !suppliedContentType.equalsIgnoreCase(MIME_TYPE_CSV) + && !suppliedContentType.equalsIgnoreCase(MIME_TYPE_CSV_ALT) + && !suppliedContentType.equalsIgnoreCase(MIME_TYPE_XLSX)) || canIngestAsTabular(recognizedType) || recognizedType.equals("application/fits-gzipped") - || recognizedType.equalsIgnoreCase(ShapefileHandler.SHAPEFILE_FILE_TYPE) - || recognizedType.equals(MIME_TYPE_ZIP)) { - finalType = recognizedType; - } - } - - } catch (Exception ex) { - logger.warning("Failed to run the file utility mime type check on file " + fileName); - } - - if (finalType == null) { - finalType = (suppliedContentType == null || suppliedContentType.equals("")) - ? MIME_TYPE_UNDETERMINED_DEFAULT - : suppliedContentType; - } - - // A few special cases: - - // if this is a gzipped FITS file, we'll uncompress it, and ingest it as - // a regular FITS file: - - if (finalType.equals("application/fits-gzipped")) { - - InputStream uncompressedIn = null; - String finalFileName = fileName; - // if the file name had the ".gz" extension, remove it, - // since we are going to uncompress it: - if (fileName != null && fileName.matches(".*\\.gz$")) { - finalFileName = fileName.replaceAll("\\.gz$", ""); - } - - DataFile datafile = null; - try { - uncompressedIn = new GZIPInputStream(new FileInputStream(tempFile.toFile())); - File unZippedTempFile = saveInputStreamInTempFile(uncompressedIn, fileSizeLimit); - datafile = createSingleDataFile(version, unZippedTempFile, finalFileName, MIME_TYPE_UNDETERMINED_DEFAULT, systemConfig.getFileFixityChecksumAlgorithm()); - } catch (IOException | FileExceedsMaxSizeException ioex) { - datafile = null; - } finally { - if (uncompressedIn != null) { - try {uncompressedIn.close();} catch (IOException e) {} - } - } - - // If we were able to produce an uncompressed file, we'll use it - // to create and return a final DataFile; if not, we're not going - // to do anything - and then a new DataFile will be created further - // down, from the original, uncompressed file. - if (datafile != null) { - // remove the compressed temp file: - try { - tempFile.toFile().delete(); - } catch (SecurityException ex) { - // (this is very non-fatal) - logger.warning("Failed to delete temporary file "+tempFile.toString()); - } - - datafiles.add(datafile); - return datafiles; - } - - // If it's a ZIP file, we are going to unpack it and create multiple - // DataFile objects from its contents: - } else if (finalType.equals("application/zip")) { - - ZipInputStream unZippedIn = null; - ZipEntry zipEntry = null; - - int fileNumberLimit = systemConfig.getZipUploadFilesLimit(); - - try { - Charset charset = null; - /* - TODO: (?) - We may want to investigate somehow letting the user specify - the charset for the filenames in the zip file... - - otherwise, ZipInputStream bails out if it encounteres a file - name that's not valid in the current charest (i.e., UTF-8, in - our case). It would be a bit trickier than what we're doing for - SPSS tabular ingests - with the lang. encoding pulldown menu - - because this encoding needs to be specified *before* we upload and - attempt to unzip the file. - -- L.A. 4.0 beta12 - logger.info("default charset is "+Charset.defaultCharset().name()); - if (Charset.isSupported("US-ASCII")) { - logger.info("charset US-ASCII is supported."); - charset = Charset.forName("US-ASCII"); - if (charset != null) { - logger.info("was able to obtain charset for US-ASCII"); - } - - } - */ - - if (charset != null) { - unZippedIn = new ZipInputStream(new FileInputStream(tempFile.toFile()), charset); - } else { - unZippedIn = new ZipInputStream(new FileInputStream(tempFile.toFile())); - } - - while (true) { - try { - zipEntry = unZippedIn.getNextEntry(); - } catch (IllegalArgumentException iaex) { - // Note: - // ZipInputStream documentation doesn't even mention that - // getNextEntry() throws an IllegalArgumentException! - // but that's what happens if the file name of the next - // entry is not valid in the current CharSet. - // -- L.A. - warningMessage = "Failed to unpack Zip file. (Unknown Character Set used in a file name?) Saving the file as is."; - logger.warning(warningMessage); - throw new IOException(); - } - - if (zipEntry == null) { - break; - } - // Note that some zip entries may be directories - we - // simply skip them: + || recognizedType.equalsIgnoreCase(ShapefileHandler.SHAPEFILE_FILE_TYPE) + || recognizedType.equals(MIME_TYPE_ZIP)) { + finalType = recognizedType; + } + } + + } catch (Exception ex) { + logger.warning("Failed to run the file utility mime type check on file " + fileName); + } + + if (finalType == null) { + finalType = (suppliedContentType == null || suppliedContentType.equals("")) + ? MIME_TYPE_UNDETERMINED_DEFAULT + : suppliedContentType; + } + + // A few special cases: + + // if this is a gzipped FITS file, we'll uncompress it, and ingest it as + // a regular FITS file: + + if (finalType.equals("application/fits-gzipped")) { + + InputStream uncompressedIn = null; + String finalFileName = fileName; + // if the file name had the ".gz" extension, remove it, + // since we are going to uncompress it: + if (fileName != null && fileName.matches(".*\\.gz$")) { + finalFileName = fileName.replaceAll("\\.gz$", ""); + } + + DataFile datafile = null; + try { + uncompressedIn = new GZIPInputStream(new FileInputStream(tempFile.toFile())); + File unZippedTempFile = saveInputStreamInTempFile(uncompressedIn, fileSizeLimit); + datafile = createSingleDataFile(version, unZippedTempFile, finalFileName, MIME_TYPE_UNDETERMINED_DEFAULT, systemConfig.getFileFixityChecksumAlgorithm()); + } catch (IOException | FileExceedsMaxSizeException ioex) { + datafile = null; + } finally { + if (uncompressedIn != null) { + try {uncompressedIn.close();} catch (IOException e) {} + } + } + + // If we were able to produce an uncompressed file, we'll use it + // to create and return a final DataFile; if not, we're not going + // to do anything - and then a new DataFile will be created further + // down, from the original, uncompressed file. + if (datafile != null) { + // remove the compressed temp file: + try { + tempFile.toFile().delete(); + } catch (SecurityException ex) { + // (this is very non-fatal) + logger.warning("Failed to delete temporary file " + tempFile.toString()); + } + + datafiles.add(datafile); + return datafiles; + } + + // If it's a ZIP file, we are going to unpack it and create multiple + // DataFile objects from its contents: + } else if (finalType.equals("application/zip")) { + + ZipInputStream unZippedIn = null; + ZipEntry zipEntry = null; + + int fileNumberLimit = systemConfig.getZipUploadFilesLimit(); + + try { + Charset charset = null; + /* + TODO: (?) + We may want to investigate somehow letting the user specify + the charset for the filenames in the zip file... + - otherwise, ZipInputStream bails out if it encounteres a file + name that's not valid in the current charest (i.e., UTF-8, in + our case). It would be a bit trickier than what we're doing for + SPSS tabular ingests - with the lang. encoding pulldown menu - + because this encoding needs to be specified *before* we upload and + attempt to unzip the file. + -- L.A. 4.0 beta12 + logger.info("default charset is "+Charset.defaultCharset().name()); + if (Charset.isSupported("US-ASCII")) { + logger.info("charset US-ASCII is supported."); + charset = Charset.forName("US-ASCII"); + if (charset != null) { + logger.info("was able to obtain charset for US-ASCII"); + } - if (!zipEntry.isDirectory()) { - if (datafiles.size() > fileNumberLimit) { - logger.warning("Zip upload - too many files."); - warningMessage = "The number of files in the zip archive is over the limit (" + fileNumberLimit + - "); please upload a zip archive with fewer files, if you want them to be ingested " + - "as individual DataFiles."; - throw new IOException(); + } + */ + + if (charset != null) { + unZippedIn = new ZipInputStream(new FileInputStream(tempFile.toFile()), charset); + } else { + unZippedIn = new ZipInputStream(new FileInputStream(tempFile.toFile())); + } + + while (true) { + try { + zipEntry = unZippedIn.getNextEntry(); + } catch (IllegalArgumentException iaex) { + // Note: + // ZipInputStream documentation doesn't even mention that + // getNextEntry() throws an IllegalArgumentException! + // but that's what happens if the file name of the next + // entry is not valid in the current CharSet. + // -- L.A. + warningMessage = "Failed to unpack Zip file. (Unknown Character Set used in a file name?) Saving the file as is."; + logger.warning(warningMessage); + throw new IOException(); + } + + if (zipEntry == null) { + break; + } + // Note that some zip entries may be directories - we + // simply skip them: + + if (!zipEntry.isDirectory()) { + if (datafiles.size() > fileNumberLimit) { + logger.warning("Zip upload - too many files."); + warningMessage = "The number of files in the zip archive is over the limit (" + fileNumberLimit + + "); please upload a zip archive with fewer files, if you want them to be ingested " + + "as individual DataFiles."; + throw new IOException(); + } + + String fileEntryName = zipEntry.getName(); + logger.fine("ZipEntry, file: " + fileEntryName); + + if (fileEntryName != null && !fileEntryName.equals("")) { + + String shortName = fileEntryName.replaceFirst("^.*[\\/]", ""); + + // Check if it's a "fake" file - a zip archive entry + // created for a MacOS X filesystem element: (these + // start with "._") + if (!shortName.startsWith("._") && !shortName.startsWith(".DS_Store") && !"".equals(shortName)) { + // OK, this seems like an OK file entry - we'll try + // to read it and create a DataFile with it: + + File unZippedTempFile = saveInputStreamInTempFile(unZippedIn, fileSizeLimit); + DataFile datafile = createSingleDataFile(version, unZippedTempFile, null, shortName, + MIME_TYPE_UNDETERMINED_DEFAULT, + systemConfig.getFileFixityChecksumAlgorithm(), null, false); + + if (!fileEntryName.equals(shortName)) { + // If the filename looks like a hierarchical folder name (i.e., contains slashes and backslashes), + // we'll extract the directory name; then subject it to some "aggressive sanitizing" - strip all + // the leading, trailing and duplicate slashes; then replace all the characters that + // don't pass our validation rules. + String directoryName = fileEntryName.replaceFirst("[\\\\/][\\\\/]*[^\\\\/]*$", ""); + directoryName = StringUtil.sanitizeFileDirectory(directoryName, true); + // if (!"".equals(directoryName)) { + if (!StringUtil.isEmpty(directoryName)) { + logger.fine("setting the directory label to " + directoryName); + datafile.getFileMetadata().setDirectoryLabel(directoryName); + } + } + + if (datafile != null) { + // We have created this datafile with the mime type "unknown"; + // Now that we have it saved in a temporary location, + // let's try and determine its real type: + + String tempFileName = getFilesTempDirectory() + "/" + datafile.getStorageIdentifier(); + + try { + recognizedType = determineFileType(new File(tempFileName), shortName); + logger.fine("File utility recognized unzipped file as " + recognizedType); + if (recognizedType != null && !recognizedType.equals("")) { + datafile.setContentType(recognizedType); + } + } catch (Exception ex) { + logger.warning("Failed to run the file utility mime type check on file " + fileName); + } + + datafiles.add(datafile); + } + } + } + } + unZippedIn.closeEntry(); + + } + + } catch (IOException ioex) { + // just clear the datafiles list and let + // ingest default to creating a single DataFile out + // of the unzipped file. + logger.warning("Unzipping failed; rolling back to saving the file as is."); + if (warningMessage == null) { + warningMessage = "Failed to unzip the file. Saving the file as is."; + } + + datafiles.clear(); + } catch (FileExceedsMaxSizeException femsx) { + logger.warning("One of the unzipped files exceeds the size limit; resorting to saving the file as is. " + femsx.getMessage()); + warningMessage = femsx.getMessage() + "; saving the zip file as is, unzipped."; + datafiles.clear(); + } finally { + if (unZippedIn != null) { + try {unZippedIn.close();} catch (Exception zEx) {} + } + } + if (datafiles.size() > 0) { + // link the data files to the dataset/version: + // (except we no longer want to do this! -- 4.6) + /*Iterator itf = datafiles.iterator(); + while (itf.hasNext()) { + DataFile datafile = itf.next(); + datafile.setOwner(version.getDataset()); + if (version.getFileMetadatas() == null) { + version.setFileMetadatas(new ArrayList()); } - - String fileEntryName = zipEntry.getName(); - logger.fine("ZipEntry, file: "+fileEntryName); - - if (fileEntryName != null && !fileEntryName.equals("")) { - - String shortName = fileEntryName.replaceFirst("^.*[\\/]", ""); - - // Check if it's a "fake" file - a zip archive entry - // created for a MacOS X filesystem element: (these - // start with "._") - if (!shortName.startsWith("._") && !shortName.startsWith(".DS_Store") && !"".equals(shortName)) { - // OK, this seems like an OK file entry - we'll try - // to read it and create a DataFile with it: - - File unZippedTempFile = saveInputStreamInTempFile(unZippedIn, fileSizeLimit); - DataFile datafile = createSingleDataFile(version, unZippedTempFile, shortName, MIME_TYPE_UNDETERMINED_DEFAULT, systemConfig.getFileFixityChecksumAlgorithm(), false); - - if (!fileEntryName.equals(shortName)) { - // If the filename looks like a hierarchical folder name (i.e., contains slashes and backslashes), - // we'll extract the directory name; then subject it to some "aggressive sanitizing" - strip all - // the leading, trailing and duplicate slashes; then replace all the characters that - // don't pass our validation rules. - String directoryName = fileEntryName.replaceFirst("[\\\\/][\\\\/]*[^\\\\/]*$", ""); - directoryName = StringUtil.sanitizeFileDirectory(directoryName, true); - //if (!"".equals(directoryName)) { - if (!StringUtil.isEmpty(directoryName)) { - logger.fine("setting the directory label to " + directoryName); - datafile.getFileMetadata().setDirectoryLabel(directoryName); - } - } - - if (datafile != null) { - // We have created this datafile with the mime type "unknown"; - // Now that we have it saved in a temporary location, - // let's try and determine its real type: - - String tempFileName = getFilesTempDirectory() + "/" + datafile.getStorageIdentifier(); - - try { - recognizedType = determineFileType(new File(tempFileName), shortName); - logger.fine("File utility recognized unzipped file as " + recognizedType); - if (recognizedType != null && !recognizedType.equals("")) { - datafile.setContentType(recognizedType); - } - } catch (Exception ex) { - logger.warning("Failed to run the file utility mime type check on file " + fileName); - } - - datafiles.add(datafile); - } - } - } - } - unZippedIn.closeEntry(); + version.getFileMetadatas().add(datafile.getFileMetadata()); + datafile.getFileMetadata().setDatasetVersion(version); - } - - } catch (IOException ioex) { - // just clear the datafiles list and let - // ingest default to creating a single DataFile out - // of the unzipped file. - logger.warning("Unzipping failed; rolling back to saving the file as is."); - if (warningMessage == null) { - warningMessage = "Failed to unzip the file. Saving the file as is."; - } - - datafiles.clear(); - } catch (FileExceedsMaxSizeException femsx) { - logger.warning("One of the unzipped files exceeds the size limit; resorting to saving the file as is. " + femsx.getMessage()); - warningMessage = femsx.getMessage() + "; saving the zip file as is, unzipped."; - datafiles.clear(); - } finally { - if (unZippedIn != null) { - try {unZippedIn.close();} catch (Exception zEx) {} - } - } - if (datafiles.size() > 0) { - // link the data files to the dataset/version: - // (except we no longer want to do this! -- 4.6) - /*Iterator itf = datafiles.iterator(); - while (itf.hasNext()) { - DataFile datafile = itf.next(); - datafile.setOwner(version.getDataset()); - if (version.getFileMetadatas() == null) { - version.setFileMetadatas(new ArrayList()); - } - version.getFileMetadatas().add(datafile.getFileMetadata()); - datafile.getFileMetadata().setDatasetVersion(version); - - version.getDataset().getFiles().add(datafile); - } */ - // remove the uploaded zip file: - try { - Files.delete(tempFile); - } catch (IOException ioex) { - // do nothing - it's just a temp file. - logger.warning("Could not remove temp file "+tempFile.getFileName().toString()); - } - // and return: - return datafiles; - } - - } else if (finalType.equalsIgnoreCase(ShapefileHandler.SHAPEFILE_FILE_TYPE)) { - // Shape files may have to be split into multiple files, - // one zip archive per each complete set of shape files: - - //File rezipFolder = new File(this.getFilesTempDirectory()); - File rezipFolder = getShapefileUnzipTempDirectory(); - - IngestServiceShapefileHelper shpIngestHelper; - shpIngestHelper = new IngestServiceShapefileHelper(tempFile.toFile(), rezipFolder); - - boolean didProcessWork = shpIngestHelper.processFile(); - if (!(didProcessWork)){ - logger.severe("Processing of zipped shapefile failed."); - return null; - } - - try { - for (File finalFile : shpIngestHelper.getFinalRezippedFiles()) { - FileInputStream finalFileInputStream = new FileInputStream(finalFile); - finalType = determineContentType(finalFile); - if (finalType == null) { - logger.warning("Content type is null; but should default to 'MIME_TYPE_UNDETERMINED_DEFAULT'"); - continue; - } - - File unZippedShapeTempFile = saveInputStreamInTempFile(finalFileInputStream, fileSizeLimit); - DataFile new_datafile = createSingleDataFile(version, unZippedShapeTempFile, finalFile.getName(), finalType, systemConfig.getFileFixityChecksumAlgorithm()); - if (new_datafile != null) { - datafiles.add(new_datafile); - } else { - logger.severe("Could not add part of rezipped shapefile. new_datafile was null: " + finalFile.getName()); - } - finalFileInputStream.close(); - - } - } catch (FileExceedsMaxSizeException femsx) { - logger.severe("One of the unzipped shape files exceeded the size limit; giving up. " + femsx.getMessage()); - datafiles.clear(); - } - - // Delete the temp directory used for unzipping - // The try-catch is due to error encountered in using NFS for stocking file, - // cf. https://github.com/IQSS/dataverse/issues/5909 - try { - FileUtils.deleteDirectory(rezipFolder); - } catch (IOException ioex) { - // do nothing - it's a tempo folder. - logger.warning("Could not remove temp folder, error message : " + ioex.getMessage()); - } - - if (datafiles.size() > 0) { - // remove the uploaded zip file: - try { - Files.delete(tempFile); - } catch (IOException ioex) { - // do nothing - it's just a temp file. - logger.warning("Could not remove temp file " + tempFile.getFileName().toString()); - } catch (SecurityException se) { - logger.warning("Unable to delete: " + tempFile.toString() + "due to Security Exception: " - + se.getMessage()); - } - return datafiles; - }else{ - logger.severe("No files added from directory of rezipped shapefiles"); - } - return null; - - } + version.getDataset().getFiles().add(datafile); + } */ + // remove the uploaded zip file: + try { + Files.delete(tempFile); + } catch (IOException ioex) { + // do nothing - it's just a temp file. + logger.warning("Could not remove temp file " + tempFile.getFileName().toString()); + } + // and return: + return datafiles; + } + + } else if (finalType.equalsIgnoreCase(ShapefileHandler.SHAPEFILE_FILE_TYPE)) { + // Shape files may have to be split into multiple files, + // one zip archive per each complete set of shape files: + + // File rezipFolder = new File(this.getFilesTempDirectory()); + File rezipFolder = getShapefileUnzipTempDirectory(); + + IngestServiceShapefileHelper shpIngestHelper; + shpIngestHelper = new IngestServiceShapefileHelper(tempFile.toFile(), rezipFolder); + + boolean didProcessWork = shpIngestHelper.processFile(); + if (!(didProcessWork)) { + logger.severe("Processing of zipped shapefile failed."); + return null; + } + + try { + for (File finalFile : shpIngestHelper.getFinalRezippedFiles()) { + FileInputStream finalFileInputStream = new FileInputStream(finalFile); + finalType = determineContentType(finalFile); + if (finalType == null) { + logger.warning("Content type is null; but should default to 'MIME_TYPE_UNDETERMINED_DEFAULT'"); + continue; + } + + File unZippedShapeTempFile = saveInputStreamInTempFile(finalFileInputStream, fileSizeLimit); + DataFile new_datafile = createSingleDataFile(version, unZippedShapeTempFile, finalFile.getName(), finalType, systemConfig.getFileFixityChecksumAlgorithm()); + if (new_datafile != null) { + datafiles.add(new_datafile); + } else { + logger.severe("Could not add part of rezipped shapefile. new_datafile was null: " + finalFile.getName()); + } + finalFileInputStream.close(); + + } + } catch (FileExceedsMaxSizeException femsx) { + logger.severe("One of the unzipped shape files exceeded the size limit; giving up. " + femsx.getMessage()); + datafiles.clear(); + } + + // Delete the temp directory used for unzipping + // The try-catch is due to error encountered in using NFS for stocking file, + // cf. https://github.com/IQSS/dataverse/issues/5909 + try { + FileUtils.deleteDirectory(rezipFolder); + } catch (IOException ioex) { + // do nothing - it's a tempo folder. + logger.warning("Could not remove temp folder, error message : " + ioex.getMessage()); + } + + if (datafiles.size() > 0) { + // remove the uploaded zip file: + try { + Files.delete(tempFile); + } catch (IOException ioex) { + // do nothing - it's just a temp file. + logger.warning("Could not remove temp file " + tempFile.getFileName().toString()); + } catch (SecurityException se) { + logger.warning("Unable to delete: " + tempFile.toString() + "due to Security Exception: " + + se.getMessage()); + } + return datafiles; + } else { + logger.severe("No files added from directory of rezipped shapefiles"); + } + return null; + + } + } else { + //Remote file, trust supplier + finalType = suppliedContentType; + } // Finally, if none of the special cases above were applicable (or // if we were unable to unpack an uploaded file, etc.), we'll just // create and return a single DataFile: + File newFile = null; + if(tempFile!=null) { + newFile = tempFile.toFile(); + } + ChecksumType checkSumType = DataFile.ChecksumType.MD5; + if(newStorageIdentifier==null) { + checkSumType=systemConfig.getFileFixityChecksumAlgorithm(); + } - DataFile datafile = createSingleDataFile(version, tempFile.toFile(), fileName, finalType, systemConfig.getFileFixityChecksumAlgorithm()); - - if (datafile != null && tempFile.toFile() != null) { + DataFile datafile = createSingleDataFile(version, newFile, newStorageIdentifier, fileName, finalType, checkSumType, newCheckSum); + File f = null; + if(tempFile!=null) { + f=tempFile.toFile(); + } + if (datafile != null && ((f != null) || (newStorageIdentifier!=null))) { if (warningMessage != null) { createIngestFailureReport(datafile, warningMessage); @@ -1133,14 +1150,18 @@ private static File saveInputStreamInTempFile(InputStream inputStream, Long file * individual files, etc., and once the file name and mime type have already * been figured out. */ - + private static DataFile createSingleDataFile(DatasetVersion version, File tempFile, String fileName, String contentType, DataFile.ChecksumType checksumType) { - return createSingleDataFile(version, tempFile, fileName, contentType, checksumType, false); + return createSingleDataFile(version, tempFile, null, fileName, contentType, checksumType, null, false); + } + + private static DataFile createSingleDataFile(DatasetVersion version, File tempFile, String storageIdentifier, String fileName, String contentType, DataFile.ChecksumType checksumType, String checksum) { + return createSingleDataFile(version, tempFile, storageIdentifier, fileName, contentType, checksumType, checksum, false); } - private static DataFile createSingleDataFile(DatasetVersion version, File tempFile, String fileName, String contentType, DataFile.ChecksumType checksumType, boolean addToDataset) { + private static DataFile createSingleDataFile(DatasetVersion version, File tempFile, String storageIdentifier, String fileName, String contentType, DataFile.ChecksumType checksumType, String checksum, boolean addToDataset) { - if (tempFile == null) { + if ((tempFile == null) && (storageIdentifier == null)) { return null; } @@ -1171,20 +1192,27 @@ private static DataFile createSingleDataFile(DatasetVersion version, File tempFi fmd.setDatasetVersion(version); version.getDataset().getFiles().add(datafile); } - + if(storageIdentifier==null) { generateStorageIdentifier(datafile); if (!tempFile.renameTo(new File(getFilesTempDirectory() + "/" + datafile.getStorageIdentifier()))) { return null; } - - try { - // We persist "SHA1" rather than "SHA-1". - datafile.setChecksumType(checksumType); - datafile.setChecksumValue(calculateChecksum(getFilesTempDirectory() + "/" + datafile.getStorageIdentifier(), datafile.getChecksumType())); - } catch (Exception cksumEx) { - logger.warning("Could not calculate " + checksumType + " signature for the new file " + fileName); + } else { + datafile.setStorageIdentifier(storageIdentifier); } + if ((checksum !=null)&&(!checksum.isEmpty())) { + datafile.setChecksumType(checksumType); + datafile.setChecksumValue(checksum); + } else { + try { + // We persist "SHA1" rather than "SHA-1". + datafile.setChecksumType(checksumType); + datafile.setChecksumValue(calculateChecksum(getFilesTempDirectory() + "/" + datafile.getStorageIdentifier(), datafile.getChecksumType())); + } catch (Exception cksumEx) { + logger.warning("Could not calculate " + checksumType + " signature for the new file " + fileName); + } + } return datafile; } @@ -1617,10 +1645,40 @@ public static DatasetThumbnail getThumbnail(DataFile file) { public static boolean isPackageFile(DataFile dataFile) { return DataFileServiceBean.MIME_TYPE_PACKAGE_FILE.equalsIgnoreCase(dataFile.getContentType()); } + + public static S3AccessIO getS3AccessForDirectUpload(Dataset dataset) { + String driverId = dataset.getDataverseContext().getEffectiveStorageDriverId(); + boolean directEnabled = Boolean.getBoolean("dataverse.files." + driverId + ".upload-redirect"); + //Should only be requested when it is allowed, but we'll log a warning otherwise + if(!directEnabled) { + logger.warning("Direct upload not supported for files in this dataset: " + dataset.getId()); + return null; + } + S3AccessIO s3io = null; + String bucket = System.getProperty("dataverse.files." + driverId + ".bucket-name") + "/"; + String sid = null; + int i=0; + while (s3io==null && i<5) { + sid = bucket+ dataset.getAuthorityForFileStorage() + "/" + dataset.getIdentifierForFileStorage() + "/" + FileUtil.generateStorageIdentifier(); + try { + s3io = new S3AccessIO(sid, driverId); + if(s3io.exists()) { + s3io=null; + i=i+1; + } + + } catch (Exception e) { + i=i+1; + } + } + return s3io; + } + public static String getStorageIdentifierFromLocation(String location) { int driverEnd = location.indexOf("://") + 3; int bucketEnd = driverEnd + location.substring(driverEnd).indexOf("/"); return location.substring(0,bucketEnd) + ":" + location.substring(location.lastIndexOf("/") + 1); } + } diff --git a/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java b/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java index 8d0cb276a93..aefb01992f4 100644 --- a/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java +++ b/src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java @@ -539,14 +539,10 @@ public boolean isFilesOnDatasetPageFromSolr() { return settingsService.isTrueForKey(SettingsServiceBean.Key.FilesOnDatasetPageFromSolr, safeDefaultIfKeyNotFound); } - public Long getMaxFileUploadSize(){ - return settingsService.getValueForKeyAsLong(SettingsServiceBean.Key.MaxFileUploadSizeInBytes); + public Long getMaxFileUploadSizeForStore(String driverId){ + return settingsService.getValueForCompoundKeyAsLong(SettingsServiceBean.Key.MaxFileUploadSizeInBytes, driverId); } - public String getHumanMaxFileUploadSize(){ - return bytesToHumanReadable(getMaxFileUploadSize()); - } - public Integer getSearchHighlightFragmentSize() { String fragSize = settingsService.getValueForKey(SettingsServiceBean.Key.SearchHighlightFragmentSize); if (fragSize != null) { diff --git a/src/main/java/propertyFiles/Bundle.properties b/src/main/java/propertyFiles/Bundle.properties index 35229c5162d..2f89a3742ae 100755 --- a/src/main/java/propertyFiles/Bundle.properties +++ b/src/main/java/propertyFiles/Bundle.properties @@ -695,7 +695,6 @@ dataverse.host.autocomplete.nomatches=No matches dataverse.identifier.title=Short name used for the URL of this dataverse. dataverse.affiliation.title=The organization with which this dataverse is affiliated. dataverse.storage.title=A storage service to be used for datasets in this dataverse. -dataverse.storage.usedefault=Use Default dataverse.category=Category dataverse.category.title=The type that most closely reflects this dataverse. dataverse.type.selectTab.top=Select one... diff --git a/src/main/webapp/dataset.xhtml b/src/main/webapp/dataset.xhtml index db7620bab71..d907aa5aceb 100644 --- a/src/main/webapp/dataset.xhtml +++ b/src/main/webapp/dataset.xhtml @@ -1363,7 +1363,6 @@
- diff --git a/src/main/webapp/editFilesFragment.xhtml b/src/main/webapp/editFilesFragment.xhtml index 811e6c4d55f..7f7d3e5c594 100644 --- a/src/main/webapp/editFilesFragment.xhtml +++ b/src/main/webapp/editFilesFragment.xhtml @@ -11,7 +11,9 @@ xmlns:o="http://omnifaces.org/ui" xmlns:iqbs="http://xmlns.jcp.org/jsf/composite/iqbs"> - + + +
    @@ -71,8 +73,8 @@ - + rendered="#{!EditDatafilesPage.isUnlimitedUploadFileSize()}"> +

    @@ -90,8 +92,10 @@ function uploadWidgetDropRemoveMsg() { $('div[id$="fileUpload"] div.ui-fileupload-content div#dragdropMsg').remove(); } + $(document).ready(function () { uploadWidgetDropMsg(); + setupDirectUpload(#{EditDatafilesPage.directUploadEnabled()}, #{EditDatafilesPage.workingVersion.dataset.id}); }); //]]> @@ -123,7 +127,7 @@ + + diff --git a/src/main/webapp/resources/css/structure.css b/src/main/webapp/resources/css/structure.css index b5c764c1e15..e6fd0a110b6 100644 --- a/src/main/webapp/resources/css/structure.css +++ b/src/main/webapp/resources/css/structure.css @@ -972,3 +972,12 @@ span.ui-autocomplete input.ui-autocomplete-input {width:100%;} #citation-banner {width:100%; height:45px; position: absolute; z-index: 999999; border-radius: 0; border-width: 0 0 1px 0;} #citation-banner a.close, #citation-banner a.close span.glyphicon {line-height:.2;} #citation-forward {position: absolute; top:45px; height: calc(100% - 65px); border:0; background:url(/resources/images/ajax-loading.gif) no-repeat 50% 50%;} + +/*Direct upload progress bar*/ +progress::-webkit-progress-bar { + background-color:white; +} + +progress::-webkit-progress-value { + background-color:green; +} diff --git a/src/main/webapp/resources/js/fileupload.js b/src/main/webapp/resources/js/fileupload.js index f26218913f3..c47ce524dc7 100644 --- a/src/main/webapp/resources/js/fileupload.js +++ b/src/main/webapp/resources/js/fileupload.js @@ -1,42 +1,201 @@ +var fileList = []; +var observer2=null; +var datasetId=null; +//How many files have started being processed but aren't yet being uploaded +var filesInProgress=0; +//The # of the current file being processed (total number of files for which upload has at least started) +var curFile=0; +//The number of upload ids that have been assigned in the files table +var getUpId = (function () { + var counter = -1; + return function () {counter += 1; return counter} +})(); +//How many files are completely done +var finishFile = (function () { + var counter = 0; + return function () {counter += 1; return counter} +})(); + + +function setupDirectUpload(enabled, theDatasetId) { + if(enabled) { + datasetId=theDatasetId; + $('.ui-fileupload-upload').hide(); + $('.ui-fileupload-cancel').hide(); + //Catch files entered via upload dialog box. Since this 'select' widget is replaced by PF, we need to add a listener again when it is replaced + var fileInput=document.getElementById('datasetForm:fileUpload_input'); + fileInput.addEventListener('change', function(event) { + fileList=[]; + for(var i=0;i').attr('class', 'ui-progressbar ui-widget ui-widget-content ui-corner-all')); + $.ajax({ + url: url, + headers: {"x-amz-tagging":"dv-state=temp"}, + type: 'PUT', + data: file, + cache: false, + processData: false, + success: function () { + reportUpload(storageId, file) + }, + error: function(jqXHR, textStatus, errorThrown) { + + console.log('Failure: ' + jqXHR.status); + console.log('Failure: ' + errorThrown); + uploadFailure(jqXHR, thisFile); + }, + xhr: function() { + var myXhr = $.ajaxSettings.xhr(); + if(myXhr.upload) { + myXhr.upload.addEventListener('progress', function(e) { + if(e.lengthComputable) { + var doublelength = 2 * e.total; + progBar.children('progress').attr({ + value:e.loaded, + max:doublelength + }); + } + }); + } + return myXhr; + } + }); +} + +function reportUpload(storageId, file){ + console.log('S3 Upload complete for ' + file.name + ' : ' + storageId); + getMD5( + file, + prog => { + + var current = 1 + prog; + $('progress').attr({ + value:current, + max:2 + }); + } + ).then( + md5 => { + //storageId is not the location - has a : separator and no path elements from dataset + //(String uploadComponentId, String fullStorageIdentifier, String fileName, String contentType, String checksumType, String checksumValue) + handleExternalUpload([{name:'uploadComponentId', value:'datasetForm:fileUpload'}, {name:'fullStorageIdentifier', value:storageId}, {name:'fileName', value:file.name}, {name:'contentType', value:file.type}, {name:'checksumType', value:'MD5'}, {name:'checksumValue', value:md5}]); + }, + err => console.error(err) + ); +} + function removeErrors() { - var errors = document.getElementsByClassName("ui-fileupload-error"); + var errors = document.getElementsByClassName("ui-fileupload-error"); for(i=errors.length-1; i >=0; i--) { errors[i].parentNode.removeChild(errors[i]); } } + var observer=null; + function uploadStarted() { - // If this is not the first upload, remove error messages since - // the upload of any files that failed will be tried again. - removeErrors(); - var curId=0; - //Find the upload table body - var files = $('.ui-fileupload-files .ui-fileupload-row'); - //Add an id attribute to each entry so we can later match errors with the right entry - for(i=0;i< files.length;i++) { - files[i].setAttribute('upid', curId); - curId = curId+1; - } - //Setup an observer to watch for additional rows being added - var config={childList: true}; - var callback = function(mutations) { - //Add an id attribute to all new entries - mutations.forEach(function(mutation) { - for(i=0; i= fileSize) { + endCallback(null); + return; + } + readNext(); + }; + + reader.onerror = function(err) { + endCallback(err || {}); + }; + + function readNext() { + var fileSlice = file.slice(offset, offset + chunkSize); + reader.readAsBinaryString(fileSlice); + } + readNext(); +} + +function getMD5(blob, cbProgress) { + return new Promise((resolve, reject) => { + var md5 = CryptoJS.algo.MD5.create(); + readChunked(blob, (chunk, offs, total) => { + md5.update(CryptoJS.enc.Latin1.parse(chunk)); + if (cbProgress) { + cbProgress(offs / total); + } + }, err => { + if (err) { + reject(err); + } else { + // TODO: Handle errors + var hash = md5.finalize(); + var hashHex = hash.toString(CryptoJS.enc.Hex); + resolve(hashHex); + } + }); + }); } diff --git a/src/test/java/edu/harvard/iq/dataverse/datasetutility/FileSizeCheckerTest.java b/src/test/java/edu/harvard/iq/dataverse/datasetutility/FileSizeCheckerTest.java index e562e3f9e0c..824dc6794fe 100644 --- a/src/test/java/edu/harvard/iq/dataverse/datasetutility/FileSizeCheckerTest.java +++ b/src/test/java/edu/harvard/iq/dataverse/datasetutility/FileSizeCheckerTest.java @@ -7,22 +7,13 @@ import static edu.harvard.iq.dataverse.datasetutility.FileSizeChecker.bytesToHumanReadable; -import edu.harvard.iq.dataverse.datasetutility.FileSizeChecker.FileSizeResponse; import edu.harvard.iq.dataverse.util.BundleUtil; -import edu.harvard.iq.dataverse.util.SystemConfig; - import java.util.ArrayList; import java.util.Arrays; import java.util.List; import org.junit.jupiter.api.Test; -import org.junit.jupiter.params.ParameterizedTest; -import org.junit.jupiter.params.provider.ValueSource; - import static org.junit.jupiter.api.Assertions.assertEquals; -import static org.junit.jupiter.api.Assertions.assertFalse; -import static org.junit.jupiter.api.Assertions.assertThrows; -import static org.junit.jupiter.api.Assertions.assertTrue; /** * @@ -44,62 +35,4 @@ public void testBytesToHumanReadable() { assertEquals(expAns, ans); assertEquals(expLongAns, longAns); } - - @Test - public void testIsAllowedFileSize_throwsOnNull() { - FileSizeChecker fileSizeChecker = new FileSizeChecker(new SystemConfig() { - @Override - public Long getMaxFileUploadSize() { - return 1000L; - } - }); - assertThrows(NullPointerException.class, () -> { - fileSizeChecker.isAllowedFileSize(null); - }); - } - - @ParameterizedTest - @ValueSource(longs = { 0L, 999L, 1000L }) - public void testIsAllowedFileSize_allowsSmallerOrEqualFileSize(Long fileSize) { - // initialize a system config and instantiate a file size checker - // override the max file upload side to allow for testing - FileSizeChecker fileSizeChecker = new FileSizeChecker(new SystemConfig() { - @Override - public Long getMaxFileUploadSize() { - return 1000L; - } - }); - FileSizeResponse response = fileSizeChecker.isAllowedFileSize(fileSize); - assertTrue(response.fileSizeOK); - } - - @ParameterizedTest - @ValueSource(longs = { 1001L, Long.MAX_VALUE }) - public void testIsAllowedFileSize_rejectsBiggerFileSize(Long fileSize) { - // initialize a system config and instantiate a file size checker - // override the max file upload side to allow for testing - FileSizeChecker fileSizeChecker = new FileSizeChecker(new SystemConfig() { - @Override - public Long getMaxFileUploadSize() { - return 1000L; - } - }); - FileSizeResponse response = fileSizeChecker.isAllowedFileSize(fileSize); - assertFalse(response.fileSizeOK); - } - - @ParameterizedTest - @ValueSource(longs = { 0L, 1000L, Long.MAX_VALUE }) - public void testIsAllowedFileSize_allowsOnUnboundedFileSize(Long fileSize) { - // initialize a system config and instantiate a file size checker - // ensure that a max filesize is not set - FileSizeChecker unboundedFileSizeChecker = new FileSizeChecker(new SystemConfig() { - @Override - public Long getMaxFileUploadSize() { - return null; - } - }); - FileSizeResponse response = unboundedFileSizeChecker.isAllowedFileSize(fileSize); - assertTrue(response.fileSizeOK); - } }