-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update csv importexport tool. ##Godoy0722/stable-3_3_0 #1606
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the best place is currently for plugin documentation - I wonder if this would be more suited to the Docs Hub, e.g. in the Admin Guide's import/export section? That way it would be easier for users to find.
One extra comment... I've just noticed the number of modified lines is large, and looks like a lot of files are having the line break replaced from I'll create an issue to decrease the chances of it happening again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Henrique523! This was a little tricky to review as there was quite a large changeset. When working with a stable branch, I'd suggest trying to avoid introducing changes that will result in a large changeset -- code reformatting, extra refactoring, etc. That way the reviewer will have an easy time pinpointing the exact changes being proposed, and there's less chance of regressions. There's much more leeway for making bigger changes on the main
branch during a major dev cycle. I've made a few minor suggestions for change.
Hello folks, and sorry for the delay about this PR! @kaitlinnewson I read your comments and I'll make other commits to make everything in the same pattern as it should be. For @asmecher most comments and @jonasraoni too. My next commit will be a kind of undo for my auto-format so the changes will be more clear. |
@Henrique523, don't spend too much time undoing the formatting; I don't mind if it gets included here. Just a note for the future, though, so that it's easier for reviewers next time. |
Hello again! After making some updates, I have some news about this PR which I'd like to share with you. Undos:Patterns and Structure:
Bug fixes
Docs and VersionI believe I covered all requests from you in these commits. Despite that, I have one more comment, about the documentation. I'll suggest the main PKP documentation (https://docs.pkp.sfu.ca/admin-guide/3.3/en/) about this tool. This way the docs will be available in both code and the docs. If you need anything else from me, or if you find more things to solve before merging this PR, please let me know! |
I didn't review the PR, but just to get you prepared, updates will need to be forwarded to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, @Godoy0722, I've just added a couple of comments. There are a few others that still need resolution, though.
Hello folks. I reviewed all @asmecher , @kaitlinnewson, and @jonasraoni comments and I think now all your requests and suggestions are covered. Could you please take one last look at the code? Thank you so much for your patience! |
Thanks, @Godoy0722, I've taken a last skim over it and added just a couple of comments -- we could easily go back and forth over small details like this forever, so take what you like from those and ignore the rest. But @kaitlinnewson, a last test and look over it from you would be much appreciated! |
@Godoy0722 I've added a few additional comments for things that came up when I re-tested. Looking good to me otherwise! |
Hello again! I just made more updates here. I made a rebase until the stable-3_3_0 to maintain all updates in one commit, removed the permission stuff @asmecher told me about and fix the last issues you guys sent me. Because of the rebase, I had to make a force push in my fork branch. If you need anything from me because of it please let me know! And thank you again. |
Hello folks! Any news about my updates from here? @asmecher @kaitlinnewson |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Godoy0722! I've left some comments too 😁
Hello again, here are all my updates for my last commit:
New Tool Structure/BehaviorThe tool now is working the way as follows below:
Hope this explain helps the Code Review! If you have any questions for me, please let me know! Thanks folks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Done 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next round 😁
$authorDao->insertObject($author); | ||
} // Authors done. | ||
|
||
$sanitizedAbstract = PKPString::stripUnsafeHtml($data->abstract); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to insert the content as it is, the sanitization will be done at the presentation layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a request from Michael, cause there will be another use cases for the abstract beyond the presentation layer. I asked him about this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what was the outcome of your conversation? All the codebase assumes the cleanup will happen at the UI.
Ah, a general comment that I forgot to leave... If you have free time, it would be interesting to break that huge method into pieces. |
Hello @jonasraoni, @kaitlinnewson, and @asmecher, I wanted to inform you of some significant updates to the codebase: I've refactored the code, implementing several improvements including private attributes for caching and static variables. The tool is now thoroughly documented with comments, and each submission process has been organized into small sections for clarity. Additionally, I've updated some deprecated methods to their current equivalents. To enhance reliability, all verifications now occur before processing begins, ensuring field correctness and consistency. As for error handling, any rows that fail validation are now recorded in a separate CSV file. This file is generated in the same directory as the client's CSV and includes an additional column detailing the reason for each row's failure. I would greatly appreciate it if you could review these changes when you have the opportunity. 😄 |
@Godoy0722 I'll aim to give this another test run later today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Godoy0722 I've left more comments, but they are not important, just suggestions/nitpicks... Also, I didn't test anything, but looks like @kaitlinnewson will do it 🎉
I think it's looking cleaner than before, also working better than other tools (e.g. having a file just with the errored items, almost ready to be re-imported is great), so I'll approve it.
For the stable-3_4_0
some updates will be needed (I didn't mark the comment of Alec regarding the submission_progress
as resolved as a reminder).
private function getCachedDao($daoType) { | ||
return $this->_daos[$daoType] ?? $this->_daos[$daoType] = DAORegistry::getDAO($daoType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed to cache these DAO
s, they are cheap to get. But as it's already implemented, I don't think it's worth to revert. So, this is just a note :)
$this->_dirNames = Application::getFileDirectories(); | ||
$this->_format = trim($this->_dirNames['context'], '/') . '/%d/' . trim($this->_dirNames['submission'], '/') . '/%d'; | ||
|
||
$this->_fileManager = new FileManager(); | ||
$this->_publicFileManager = new PublicFileManager(); | ||
|
||
$this->_fileService = Services::get('file'); | ||
$this->_publicationService = Services::get('publication'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These things are cheap to initialize, but I don't see any problem to cache them (especially if you're using in a lot of places), so I'm just leaving a note.
private function processFailedRow($invalidRowsFile, $fields, $reason, $failedRows) { | ||
$invalidRowsFile->fputcsv(array_merge(array_pad($fields, $this->_expectedRowSize, null), [$reason])); | ||
|
||
return $failedRows + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you receive the $failedRows
by reference, then you can replace the occurrences of $failedRows = $this->processFailedRow(...)
by $this->processFailedRow(...)
private function processFailedRow($invalidRowsFile, $fields, $reason, $failedRows) { | |
$invalidRowsFile->fputcsv(array_merge(array_pad($fields, $this->_expectedRowSize, null), [$reason])); | |
return $failedRows + 1; | |
private function processFailedRow($invalidRowsFile, $fields, $reason, &$failedRows) { | |
$invalidRowsFile->fputcsv(array_merge(array_pad($fields, $this->_expectedRowSize, null), [$reason])); | |
++$failedRows; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as you've created a couple of private variables, you could also create one for the $failedRows
and the $invalidRowsFile
, then you won't need to pass them by arguments.
* @param array $args | ||
* @return string[] | ||
*/ | ||
private function parseCommandLineArguments($scriptName, $args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's important, but if you want to follow the pattern on the stable-3_3_0
, the private methods should be prefixed with _
too. On the stable-3_4_0
and main
, the pattern was changed and the _
is not used anymore).
private function parseCommandLineArguments($scriptName, $args) { | |
private function parseCommandLineArguments($scriptName, $args) { |
*/ | ||
private function createAndValidateCSVFileInvalidRows($csvForInvalidRowsName) { | ||
$invalidRowsFile = $this->createNewFile($csvForInvalidRowsName, 'w'); | ||
$invalidRowsFile->fputcsv(array_merge(array_pad($this->_expectedHeaders, $this->_expectedRowSize, null), ['error'])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$invalidRowsFile->fputcsv(array_merge(array_pad($this->_expectedHeaders, $this->_expectedRowSize, null), ['error'])); | |
$invalidRowsFile->fputcsv(array_merge($this->_expectedHeaders, ['error'])); |
// Format is: | ||
// Press Path, Author string, title, abstract, series path, year, is_edited_volume, locale, URL to Asset, doi (optional), keywords list, subjects list, book cover image path, book cover image alt text, categories list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed, we have this description on the $_expectedHeaders
} | ||
|
||
// All requirements passed. Start processing from here. | ||
$this->initializeStaticVariables(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should stay out of the loop or it ends up not caching anything.
* Retrieves a Genre ID by Press ID. If the Genre doesn't exist, the result | ||
* will be false | ||
* | ||
* @param int $pressId | ||
*/ | ||
private function getGenreIdByPressId($pressId) { | ||
/** @var GenreDAO $genreDao */ | ||
$genreDao = $this->getCachedDao('GenreDAO'); | ||
$genre = $genreDao->getByKey('MANUSCRIPT', $pressId); | ||
|
||
return !$genre ? false : $genre->getId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think the genre name should be also a field on the CSV file, and if not found, then a default one could be passed by a command-line argument (if nothing was passed, then we can use the "MANUSCRIPT" as default).
* Retrieves a Genre ID by Press ID. If the Genre doesn't exist, the result | |
* will be false | |
* | |
* @param int $pressId | |
*/ | |
private function getGenreIdByPressId($pressId) { | |
/** @var GenreDAO $genreDao */ | |
$genreDao = $this->getCachedDao('GenreDAO'); | |
$genre = $genreDao->getByKey('MANUSCRIPT', $pressId); | |
return !$genre ? false : $genre->getId(); | |
* Retrieves the "Manuscript" genre's ID of a given Press ID | |
* | |
* @param int $pressId | |
* @return ?int Null if not found | |
*/ | |
private function getManuscriptGenreId($pressId) { | |
/** @var GenreDAO $genreDao */ | |
$genreDao = $this->getCachedDao('GenreDAO'); | |
$genre = $genreDao->getByKey('MANUSCRIPT', $pressId); | |
return $genre ? $genre->getId() : null; |
* | ||
* @param int $pressId | ||
*/ | ||
private function getUserGroupIdByPressId($pressId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important, but as we don't have other ways to retrieve (e.g. ByX
, ByY
, ByZ
), I think this can be dropped from the name (there are other occurrences).
$userGroupDao = $this->getCachedDao('UserGroupDAO'); | ||
$userGroup = $userGroupDao->getDefaultByRoleId($pressId, ROLE_ID_AUTHOR); | ||
|
||
return !$userGroup ? false : $userGroup->getId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other cases returning false
, I think null
fits better.
return !$userGroup ? false : $userGroup->getId(); | |
return $userGroup ? $userGroup->getId() : null; |
$coverImage['altText'] = $bookCoverImageAltText ?? ''; | ||
|
||
$destFilePath = $this->_publicFileManager->getContextFilesPath($press->getId()) . '/' . $coverImage['uploadName']; | ||
copy($srcFilePath, $destFilePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be interesting to check the return of the copy()
and perhaps other places that might fail.
p.s.: We're not using transactions nor checking if the database operations worked fine, so such things don't need to be checked 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second comment... I think it's better to use the FileManager::copy()
to setup the expected permissions.
* @param string $srcFilePath | ||
* @param Publication $publication | ||
*/ | ||
private function processBookCoverImage($data, $srcFilePath, $publication) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to type the inputs and returns of the methods, just be careful with the types that can be null
. But maybe it's better to leave this for the stable_3_4_0
.
@Godoy0722 Is this ready for another test or are there more changes planned? I wasn't able to get to it before my vacation but can do another test when it's ready! |
Hi there. Just pinging @kaitlinnewson that an additional review is still needed for this tool. I really appreciate if you could take a look and see if everything is in place now. Best, |
@Godoy0722 I've made a few additional comments - looking good otherwise! |
Hello, @kaitlinnewson! Yes, you were right. I forgot to pass this variable as a parameter to the respective method that processes this info. I updated and tested again the tool and it seems it's working as it should be. If you could please take a look again, I appreciate that! |
P.S.: I just updated the Docs and the "tab delimited" occurrences. The "submissionPdfs" and "coverImages" parts of the path were removed since the behavior of the tool changed. Now it's working in a way that all the assets (both PDFs and cover images) must be put inside the same path as the CSV file. It makes the tool management easier and clearer for the user. A final matter, the tool is ready in my idea, there's nothing to add for now unless the tool is with more bugs. So, if you find everything correct, it's ready to be merged on my idea. Thanks for the review @kaitlinnewson ! |
Looking good to me @Godoy0722! I think @asmecher will need to do the final merge due to the previous requested changes. |
Merged -- thanks, all! |
Updates:
Issues
This PR is intended to solve the following issues: