-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest: Multiple (repeated) ingests of tabular data files #6510
Comments
@landreev I moved this into the ready column and we'll take it on in a sprint soon. Just to clarify, this would be a code change and some cleanup? |
Correct. |
This doesn't seem to have affected any older files - meaning, any tabular files ingested prior to the introduction of ingest of |
The remaining tasks are mostly cleanup. |
All that said, I still have no idea how it was possible, for an ingested file to be added to the processing queue again. Our system of status flags - separate flags for "ingest scheduled" and "ingest in progress" - was supposed to prevent that. But there must be some ways in which things can go wrong that could make it possible... a page operating on a stale copy of the dataset, allowing to bypass an existing ingest lock? a system crash leaving the flags and the queue in an inconsistent state? something specific to how ingest queue is activated when files are uploaded via the API? ... Whatever it is, it is apparently possible. But if it does happen again, the added checks, and the fact that the ingested mime format is no longer considered ingestable, should prevent any repeated processing from being done. |
|
Back in 2018 we have enabled ingest on tab-delimited text files. (This was done as part of the CSV ingest improvement; by reusing the same parsing code, but with the TAB for the delimiter character).
Apparently, we now have a condition where a successfully ingested file gets picked up for ingest AGAIN - because the content type is tab-delimited, and tab-delimited files are now ingestable... this of course should never happen, because the file already has a datatable object associated with it. But apparently it does occasionally, and an ingested file gets ingested again, corrupting the tab file and the saved original in the process.
I have a list. There are relatively few of these cases, but this is still very annoying and I believe we should treat it as urgent.
The text was updated successfully, but these errors were encountered: