Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new field 'spelling variants' under all entity types, and include it in searches just as 'label' #2367

Open
davidzbiral opened this issue Oct 3, 2024 · 1 comment
Assignees
Milestone

Comments

@davidzbiral
Copy link
Collaborator

davidzbiral commented Oct 3, 2024

Introduce a new field under models for all entity types, entitled 'spelling variants', a multiple field which will accept individual strings (one or more words). E.g., we want to create C "democratisation", and put "democratization" in 'spelling variants', and then find C democratisation also when searching for democratization.
In all searches, treat any of the spelling variants as the 'label' (I think you call label 'name' internally in the DB).
If the user confirms as spelling variant the same string as the 'label' or as any already existing spelling variant, it should not be added (this would create an unnecesary duplicate; message: 'spelling variant already exists').
The user needs to be able to remove any spelling variant later, design-wise probably the same way as you remove an entity class from lists of entity classes.

More detailed rationale (written esp. for historians):

C and A Entities are lemma-meaning units. Orthographic variation is not a difference of meaning, and should not matter and should not lead in creating them as new entities (linked as SYN), but is needed for querying. (Such over-burgeoning of entities makes the data quite messy to query and do research on, and in this case, quite unnecessarily.) Thus, I propose to add a new field to all entity types – a multifield, where individual items will be individual orthographic variants.
(In Persons, editors can, by perfectionism or doubt, decide that P. Fornerii and Petrus Fornerii will be created separately and linked with IDE – this is not necessarily discouraged and is not related to this proposal. This proposal argues that we need to have a way not to create A vocatus and A vochatus as two verbs, and double the work on the action description, but we still want to find find vocatus trough vochatus.
I don’t care whether some borderline cases will lead to more entities even in C and A, esp. if the variation is so massive as to constitute uncertainty of the same lemma and/or meaning. This will still be possible. My concern is to solve the normal case, where we have becharius, beccarius, becarius… and want to find trough InkVisitor Suggester field the basic variant when searching for any of them.

@davidzbiral davidzbiral added this to the 1.4.2 milestone Oct 3, 2024
@davidzbiral davidzbiral changed the title Introduce new field under all entities 'spelling variants' Introduce new field under all entities 'spelling variants', and include it in searches just as 'label' Oct 3, 2024
@davidzbiral davidzbiral changed the title Introduce new field under all entities 'spelling variants', and include it in searches just as 'label' Introduce new field 'spelling variants' under all entity types, and include it in searches just as 'label' Oct 3, 2024
@davidzbiral
Copy link
Collaborator Author

@adammertel Endorsed by H4 and related to upcoming work. I prioritize - but still the working Annotator (user-friendly basic functionalities) is top priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants