Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Detection ported from DSpace CRIS 7 Angular #1732

Closed

Conversation

kshepherd
Copy link
Member

@kshepherd kshepherd commented Jul 20, 2022

This pull request ports the Duplicate Detection framework from DSpace CRIS 7 into DSpace.

Attribution: Ported from DSpace-CRIS and extended by The Library Code with support of Technische Universität Berlin

DSpace Backend PR: DSpace/DSpace#8415
REST Contract PR: TBD

One outstanding issue
For a new submission, the default behaviour when no matches are found should be to keep the section hidden, but instead, the default message indicates that decisions are all registered.
Fixed, see comment #1732 (comment)

To test

  • Make sure both the dspace-angular and DSpace PRs are applied
  • Enable the ‘dedup’ event consumer in dspace.cfg and add the ‘detect-duplicate’ step to a submission process in item-submission.xml
  • For best results, configure autosave on the dc.title field and any others you’re planning to test (eg. dc.identifier.doi, dc.identifier.isbn or so on). This means that whenever a change is saved to dc.title, the duplicate detection section will update with a new list of potential matches
  • Submit an item to a collection with the detect-duplicate step enabled. Enter some text in the title so that

Notes for PR reviewers

  • There is a new ‘dedup’ table in the database, this should be added by flyway
  • There is a new ‘dedup’ Solr core
  • The ‘dedup’ consumer is not in the default event consumers by default, so it’ll need to be added for practical testing
  • The fuzzy search signature is commented out of the deduplication.xml spring config by default, so uncomment that to test fuzzy matches (the default MD5 signatures for exact metadata matches still work)

Notes for 4Science devs (regarding changes from CRIS)

  • SearchDeduplication was renamed to DeduplicationPluginService as that seemed more fitting to what it actually does
  • tmpMapFilter, tmpFilter, other similar names of maps and lists were renamed to suit their purpose (eg. signatureSearchFilters)
  • A lot of unused methods, variables, classes were stripped out during the port to keep code lean and easier to review. If some were partial implementations of future work, we can bring them in with a subsequent PR

As I’ve added an authorization check (configurable, enabled by default) so that the user needs read access to each item in the potential matches list, I manually disabled it for some integration tests and leave it for others, and explicitly test it in a new integration test

I’ve added javadoc and most is pretty straightforward but please correct me if I misunderstood terms such as ‘fake', ‘reader’

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & integration tests). Exceptions may be made if previously agreed upon.
  • My PR passes Checkstyle validation based on the Code Style Guide.
  • My PR includes Javadoc for all new (or modified) public methods and classes. It also includes Javadoc for large or complex private methods.
  • My PR passes all tests and includes new/updated Unit or Integration Tests based on the Code Testing Guide.
  • If my PR modifies the REST API, I've linked to the REST Contract page (or open PR) related to this change.

@lgtm-com
Copy link

lgtm-com bot commented Jul 20, 2022

This pull request introduces 1 alert when merging 0a193b8 into ea67a15 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@lgtm-com
Copy link

lgtm-com bot commented Aug 11, 2022

This pull request introduces 1 alert when merging 9bbbbfa into f6d2014 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@kshepherd
Copy link
Member Author

I solved the display issue that I had noted in my first PR description. Describing my fix below for reviewers' attention:

There is an action + effect combo “CleanDetectDuplicateSection”. It’s only dispatched when the parseSaveResponse() submission effect is called.

I modified the DetectDuplicateService so that if the workspace item is undefined when getDuplicateMatches()1 is first called (ie. the angular component is initialised and asking for matches but the sectionData is not fully populated yet), the Clean action is dispatched.

@lgtm-com
Copy link

lgtm-com bot commented Aug 11, 2022

This pull request introduces 1 alert when merging aa4bbd1 into f6d2014 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@lgtm-com
Copy link

lgtm-com bot commented Oct 30, 2022

This pull request introduces 1 alert when merging 13d17aa into c876055 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@lgtm-com
Copy link

lgtm-com bot commented Nov 2, 2022

This pull request introduces 1 alert when merging 5ab3acd into dc4b4ff - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

Hi @kshepherd,
Conflicts have been detected against the base branch.
Please resolve these conflicts as soon as you can. Thanks!


This message is automatically generated by prince-chrismc/label-merge-conflicts-action so don't hesitate to report issues/improvements there.

@kshepherd kshepherd force-pushed the TLC-277_detect_duplicate_port_7x branch from 593f8fd to 8b8c630 Compare January 18, 2023 21:56
@artlowel
Copy link
Member

Thanks @kshepherd!

Like @benbosman on the backend, I'm not assigned as a reviewer on this PR, but I gave it a quick test. A few comments and questions:

  • It would be good to have a link (in a new window) to the full metadata page for a potential duplicate. In the list view you only have a handful of fixed metadata fields
  • The action buttons on duplicate results are stuck together. I'd add a bootstrap spacing class such as mr-1, or put them in a button group
  • There is a "Submitter decision" field when you look at the workflowitem as an editor, but it seems that if I mark something as a duplicate as the submitter, that duplicate doesn't show up for the editor, and if I mark it as not a duplicate, it shows up but still says "no decision yet".
  • What happens after I've "marked an item to merge" as an editor? Is there a way to merge those items from the UI?

@pnbecker
Copy link
Member

We are willing to follow this up if it is clear who and when it is being reviewed. We rebased and fixed this PR several times. We would do this one more time if there is a concrete plan how this is being reviewed. We will not rebase again and again until it is clear who and when this will be reviewed. Please ping us or contact us otherwise if you are interested in reviewing it.

@github-actions
Copy link

github-actions bot commented Feb 2, 2023

Hi @kshepherd,
Conflicts have been detected against the base branch.
Please resolve these conflicts as soon as you can. Thanks!


This message is automatically generated by prince-chrismc/label-merge-conflicts-action so don't hesitate to report issues/improvements there.

@pnbecker
Copy link
Member

We took a a new simpler and clean approach. Closing this in favor for #2749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants