Skip to content

Commit

Permalink
rewrote/rearranged the troubleshooting guide for invalid files. (#6558)
Browse files Browse the repository at this point in the history
  • Loading branch information
landreev committed Apr 3, 2020
1 parent 2a6411a commit bc6e37f
Showing 1 changed file with 11 additions and 4 deletions.
15 changes: 11 additions & 4 deletions doc/sphinx-guides/source/admin/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,20 @@ See the :doc:`/api/intro` section of the API Guide for a high level overview of
A Dataset Is Locked And Cannot Be Edited or Published
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It's normal for the ingest process described in the :doc:`/user/tabulardataingest/ingestprocess` section of the User Guide to take some time but if hours or days have passed and the dataset is still locked, you might want to inspect the locks and consider deleting some or all of them. It is recommended to restart the application server if you are deleting an ingest lock, to make sure the ingest job is no longer running in the background.
There are several types of dataset locks. Locks can be managed using the locks API, or by accessing them directly in the database. Internally locks are maintained in the ``DatasetLock`` database table, with the ``field dataset_id`` linking them to specific datasets, and the column ``reason`` specifying the type of lock.

A dataset is locked with a lock of type "finalizePublication" (the lock type appears in the "reason" column of the DatasetLock table) while the persistent identifiers of the datafiles in the dataset are registered or updated, and/or while the physical files are being validated by recalculating the checksums and verifying them against the values stored in the database, before the publication process can be complete (Note that either of the two tasks can be disabled with database options - see :doc:`config`). If a dataset has been stuck in this state for a long period of time, check the "Info" value of the entry in the corresponding DatasetLock table. If it says "FILE VALIDATION ERROR" - it really means that one or more of the files have failed the validation and the problem must be resolved (or the datafile purged from the dataset) before you delete the lock and advice the user to try publishing again. Real issues that have resulted in corrupted datafiles during normal operation of Dataverse in the past: Botched file deletes - while a datafile is in DRAFT, attempting to delete it from the dataset also involved deleting the physical files. In the past we've observed partially successful deletes, that would fail to delete the entry from the database, after having successfully removed the physical files - resulting in a datafile linked to a missing file. We believe we have addressed what was causing this condition so it shouldn't happen again - there may be a datafile in this state in your database. Solving the issue would involve either restoring the file from backups, or, if that is not an option, purging the datafile from the databaes and asking the user to upload the file again. Another real life condition we've seen: a failed tabular data ingest that leaves the datafile un-ingested, BUT with the physical file already replaced by the generated tab-delimited version of the data. This datafile will fail to validate because the checksum in the database is of the original file and will not match that of the tab-delimited version. To fix: luckily, this is easily reversable, since the uploaded original should be saved in your storage, with the .orig extension. Simply swapping the .orig copy with the main file associated with the datafile will fix it. Similarly, we believe this condition should not happen again in Dataverse versions 4.20+, but you may have some legacy cases on your server. The goal of the validation framework is to catch these types of conditions while the dataset is still in DRAFT.
It's normal for the ingest process described in the :doc:`/user/tabulardataingest/ingestprocess` section of the User Guide to take some time but if hours or days have passed and the dataset is still locked, you might want to inspect the locks and consider deleting some or all of them. It is recommended to restart the application server if you are deleting an ingest lock, to make sure the ingest job is no longer running in the background. Ingest locks are idetified by the label ``Ingest`` in the ``reason`` column of the ``DatasetLock`` table in the database.

If the finalizePublication lock has the info label "Registering PIDs for Datafiles" or "Validating Datafiles Asynchronously", and the dataset has been in this state for hours or longer, it is somewhat safe to assume that it is stuck (for example, the process may have been interrupted by an application server restart, or a system crash), so you may want to remove the lock (make sure the application server is restarted) and advice the user to try publishing again.
A dataset is locked with a lock of type ``finalizePublication`` while the persistent identifiers for the datafiles in the dataset are registered or updated, and/or while the physical files are being validated by recalculating the checksums and verifying them against the values stored in the database, before the publication process can be completed (Note that either of the two tasks can be disabled via database options - see :doc:`config`). If a dataset has been in this state for a long period of time, for hours or longer, it is somewhat safe to assume that it is stuck (for example, the process may have been interrupted by an application server restart, or a system crash), so you may want to remove the lock (to be safe, do restart the application server, to ensure that the job is no longer running in the background) and advise the user to try publishing again. See :doc:`dataverses-datasets` for more information on publishing.

See :doc:`dataverses-datasets`.
If any files in the dataset fail the validation above the dataset will be left locked with a ``DatasetLock.Reason=FileValidationFailed``. The user will be notified that they need to contact their Dataverse support in order to address the issue before another attempt to publish can be made. The admin will have to address and fix the underlying problems (by either restoring the missing or corrupted files, or by purging the affected files from the dataset) before deleting the lock and advising the user to try to publish again. The goal of the validation framework is to catch these types of conditions while the dataset is still in DRAFT.

The following are two real life examples of problems that have resulted in corrupted datafiles during normal operation of Dataverse:

1. Botched file deletes - while a datafile is in DRAFT, attempting to delete it from the dataset involves deleting both the ``DataFile`` database table entry, and the physical file. (Deleting a datafile from a *published* version merely removes it from the future versions - but keeps the file in the dataset). The problem we've observed in the early versions of Dataverse was a *partially successful* delete, where the database tansaction would fail (for whatever reason), but only after the physical file had already been deleted from the filesystem. Thus resulting in a datafile entry remaining in the dataset, but with the corresponding physical file missing. We believe we have addressed the issue that was making this condition possible, so it shouldn't happen again - but there may be a datafile in this state in your database. Assuming the user's intent was in fact to delete the file, the easiest solution is simply to confirm it and purge the datafile entity from the database. Otherwise the file needs to be restored from backups, or obtained from the user and copied back into storage.
2. Another issue we've observed: a failed tabular data ingest that leaves the datafile un-ingested, BUT with the physical file already replaced by the generated tab-delimited version of the data. This datafile will fail the validation because the checksum in the database matches the file in the original format (Stata, SPSS, etc.) as uploaded by the user. To fix: luckily, this is easily reversable, since the uploaded original should be saved in your storage, with the .orig extension. Simply swapping the .orig copy with the main file associated with the datafile will fix it. Similarly, we believe this condition should not happen again in Dataverse versions 4.20+, but you may have some legacy cases on your server.

The validation API will stop after encountering the first file that does not pass the validation. You can consult the server log file for the error messages indicating which file has failed. But you will likely want to review and verify all the files in the dataset before you unlock it.

Someone Created Spam Datasets and I Need to Delete Them
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down

0 comments on commit bc6e37f

Please sign in to comment.