Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify ElasticSearch mapping of metadata #4266

Open
solth opened this issue Mar 12, 2021 · 4 comments
Open

Simplify ElasticSearch mapping of metadata #4266

solth opened this issue Mar 12, 2021 · 4 comments
Labels
improvement search search, filter

Comments

@solth
Copy link
Member

solth commented Mar 12, 2021

Currently, process metadata is saved in the following JSON structure in the ElasticSearch index:
Bildschirmfoto 2021-03-12 um 10 49 49
Since Kitodo.Production internally always uses METS as the container format for metadata, the nodes mdWrap, xmlData and kitodo are not really required and unnecessarily complicate dealing with metadata retrieved from the search index.

I would therefore propose to remove the nodes mentioned above and directly save all metadata for a given structure element as a JSON array associated directly with the structure elements UUID.

@matthias-ronge
Copy link
Collaborator

We should think twice about this when redesigning the search engine integration. In any case, the user should be able to search for metadata, also specifically, i.e. that a certain metadata has a certain value. As I would understand, a separate search field would have to be written in the index for each searchable metadata key (except ElasticSearch offers an additional functionality here that I don't know about).

@solth
Copy link
Member Author

solth commented Jun 24, 2021

I am sorry, perhaps there was a misunderstanding. My idea was to remove just the mdWrap, xmlData and kitodo nodes, not the metadata node or its content. So all key values pairs concerning metadata entries and groups would be retained.
For example:

meta:
    0:
        mdWrap:
            xmlData:
                kitodo:
                    metadata:
        ID:
    1:
        mdWrap:
            xmlData:
                kitodo:
                    metadata:
        ID:
...

would become:

meta:
    0:
        metadata:
        ID:
    1:
        metadata:
        ID:
...

@matthias-ronge
Copy link
Collaborator

My question is: Is the way in which metadata is indexed suitable that you can search using individual metadata fields? So, I want to find "cook", but only in the title, not in the author. Is this already possible, or does your suggested change make it possible?

I am just thinking that we need a change here, but I it should be a change to make the above possible, if if isn’t yet. This can go along with mapping simplification.

@solth
Copy link
Member Author

solth commented Jun 25, 2021

I totally agree that searching individual metadata fields in the way you described is important and should be possible - actually that and loading metadata from the index into ProcessDTO objects to make them usable there was the incentive for this issue. But in my opinion that should be discussed in another topic. If anything, the proposed changes should make it easier to work with the metadata stored in the index.

This issue is just for pointing out that currently there are static (or "invariant", if you prefer) parts in the mapping of metadata that do not have any benefit (because they are always the same, for every structure element) and instead make reading the metadata from the index a chore. That's why I propose to remove those static parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement search search, filter
Projects
None yet
Development

No branches or pull requests

2 participants