Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7527 redetect api error #7631

Closed
wants to merge 4 commits into from
Closed

7527 redetect api error #7631

wants to merge 4 commits into from

Conversation

landreev
Copy link
Contributor

@landreev landreev commented Feb 23, 2021

What this PR does / why we need it:

See the discussion on slack for what's going on in this PR.

Which issue(s) this PR closes:

Closes #7527 (maybe?)

Special notes for your reviewer:

See the discussion on slack.

Suggestions on how to test this:

There are 2 parts of the issue. The first one is trivial - the filename/extension were not being taken into account by the redetect API; this is fairly straightforward to test.

The second part, the strange EJB rollback, is completely counter-intuitive. I have no idea how to test it, seeing how I haven't been able to reproduce it outside of prod. For example, the API consistently fails in prod. with this EJB error for the following datafile:

curl -X POST "http://localhost:8080/api/files/4325124/redetect?key=..."

however, it's working perfectly fine for this id on vm-5, which is also running 5.3 on a copy of the prod. db.

It doesn't happen for any file in prod. either. It was in fact a matter of considerable luck/coincidence that when the remote installation reported the error, Danny tried the API on a random file in our archive, and got the same exact error they were getting. (But at least this means that the condition is not limited to just our server).

I didn't have any other way to fix/investigate this than by deploying a modified 5.3 in prod. on a weekend. Once again, I'm not sure how to QA this.

(edit: I will see if I can find a datafile that is similarly failing on vm-5, by trial and error)

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@landreev landreev marked this pull request as draft February 23, 2021 20:04
@landreev landreev marked this pull request as ready for review February 23, 2021 20:05
Copy link
Contributor

@scolapasta scolapasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific comments here are small. The general thing is about whether the API class should be an ejb or not. I wrote more extensively in slack (and we are still discussing) but the TLDR is: API classes should operate similar to backing beans and NOT be ejbs. (I think)

if (settingsService != null && settingsService.isTrueForKey(SettingsServiceBean.Key.ExcludeEmailFromExport, false) && DatasetFieldType.FieldType.EMAIL.equals(pv.getDatasetField().getDatasetFieldType().getFieldType())) {

} else if (datasetFieldType.isPrimitive()) {
for (DatasetFieldValue pv : sort(fld.getDatasetFieldValues(), DatasetFieldValue.DisplayOrder)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fair to remove this and the comment. Any reason to keep it?


public static String determineFileType(File file) throws IOException {
return FileUtil.determineFileType(file, file.getName());
// Question: why do we need this utility? - as opposed to just calling the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question - I vote we just remove, up to you if you want to handle in this PR though. (as it would require changing the places you call it; maybe moving the test? Though my guess is if we don't now, we never will.

@scolapasta scolapasta assigned landreev and unassigned scolapasta Mar 3, 2021
@djbrooke djbrooke self-assigned this Sep 1, 2021
@donsizemore
Copy link
Contributor

Just a note that Odum has hit this problem in certain of its datafiles. I can provide more information about our datafiles and filemetadata if it would be helpful, but the results are identical to those described by Mr. Ferey.

@kaitlinnewson
Copy link
Contributor

Hi all, what is the status of this PR? We are also having this issue and would love to be able to clean up some of our files marked as "Unknown" formats.

@landreev
Copy link
Contributor Author

I will close this PR in favor of #8835 that fixes the file name issue.
I am concerned about the second part of #7527 - the mysterious TransactionRolledbackLocalException that's triggered by some specific files on some installations. See the description of this PR for more info.

I just checked and it is still happening in our prod., for that specific file (id 4325124). It would be nice to figure out what is happening there. I will open a new issue for just that part, and then close this PR. My gut feeling is that it has got nothing to do with the actual functionality in the redetect command.

@landreev
Copy link
Contributor Author

landreev commented Nov 4, 2022

Opened #9142 to investigate the remaining mystery part (the selectively-happening weird EJB rollback). Closing this.

@landreev landreev closed this Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Error with Redetect File Type API
6 participants