Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hibernate-search] Introduce Hibernate Search framework and implement indexing page #6218

Open
wants to merge 30 commits into
base: hibernate-search
Choose a base branch
from

Conversation

matthias-ronge
Copy link
Collaborator

@matthias-ronge matthias-ronge commented Sep 4, 2024

Issue #5760 2a) and 2b)

Follow-up pull request to #6209 (immediate diff)

Recording

The three numbers before the slash in “Indexed entries” represent the number of objects that Hibernate has already loaded from the database, the number of objects that have been prepared as indexable documents (JSONs), and finally the number of indexed documents.

Basic experience: Hibernate Search and lazy loading don't mix. It looks like we have to accept that. As a result, I have deactivated lazy loading wherever the number of members of a set is typically small (< 25). This affects most sets, e.g. projects of a template, tasks, users or properties of a template or a process, etc. If the set can typically be large (> 1000), the elements of the set are not indexed. Example: Processes of a batch. Consideration: If the number of subelements to be indexed in an object is very large, the findability of the object approaches infinity (it becomes increasingly likely that it will be found with any search query). Such indexing also makes the index enormously large. Therefore, it can be considered justifiable not to index these fields.

@matthias-ronge matthias-ronge changed the base branch from master to hibernate-search September 4, 2024 14:59
@henning-gerhardt
Copy link
Collaborator

@matthias-ronge : a hopefully short general question: is it possible to use different indices with Hibernate-Search? Currently this is possible through different values with the elasticsearch.index configuration. Is this or something similar still possible? I'm asking because I'm working with different Kitodo.Production versions which has separated meta data directories on my local file system, different databases in a MariaDB database and different search prefixes in a ElasticSearch instance. This must not working in the current state of the changes nor is this a current goal but maybe something for later?

@matthias-ronge
Copy link
Collaborator Author

is it possible to use different indices with Hibernate-Search?

The index names for the individual objects are contained in the annotations as a string. I cannot estimate whether it is even possible to use variables here, or whether these have to be hard-coded strings at compile time; but I suspect the latter. Index access is controlled via properties such as port. You could install several index services on different ports and set the port at runtime before the program starts, or change the index data directory (as a symbolic link).

Such a feature is currently not in the scope of our development.

@henning-gerhardt
Copy link
Collaborator

henning-gerhardt commented Sep 9, 2024

Thank you @matthias-ronge for the explanation. I know and I did not expect that this usage scenario is part of the current development to use different hibernate search indices.

Edit: Maybe indexlayout-strategy-custom is a way to archive this. But this is nothing for now.

Comment on lines +1 to +3
hibernate.search.enabled=true
hibernate.search.backend.hosts=localhost:9200
hibernate.search.backend.protocol=http
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question must this content not be added to the existing hibernate.cfg.xml file or did we need two configuration files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, this is the configuration file for the Hibernate Search framework. I would be surprised if we could mix the two configurations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean, that everyone must run ElasticSearch / Opensearch on localhost and port 9200? If so I'm unable to do this in my development system nor on a productive environment.

@@ -31,9 +32,11 @@
@Table(name = "property")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an @Indexed(index = "kitodo-property") annotation not missing like in the other bean files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the annotation only applies to objects to be indexed. Standalone properties are not indexed as separate objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an @Indexed(index = "kitodo-folder") annotation not missing like in the other bean files?

@@ -26,19 +26,23 @@
import javax.persistence.OneToMany;
import javax.persistence.Table;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.GenericField;
import org.kitodo.data.database.persistence.UserDAO;

@Entity
@Table(name = "user")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an @Indexed(index = "kitodo-user") annotation not missing like in the other bean files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users are not indexed because otherwise you cannot log in before the index is created. Then you cannot create the index because you cannot log in. This was not planned at the very beginning but is now the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. One of the chicken and egg problems at least and even an issue of privacy restrictions.

Comment on lines +1 to +3
hibernate.search.enabled=true
hibernate.search.backend.hosts=localhost:9205
hibernate.search.backend.protocol=http
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above at the first hibernate.properties file.

*/
class ServerConnectionChecker implements Runnable {
private static final Logger logger = LogManager.getLogger(ServerConnectionChecker.class);
private static final Pattern PATTERN_SERVER = Pattern.compile("cluster_name\\W+([^\"]*).*?number\\W+([^\"]*)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a pattern which is working with OpenSearch and ElasticSearch servers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know yet, right now I'm focused on getting the existing code to work. If not, it could be extended. I assume both are the same, but I haven't tried it yet. First, it's focused on finishing the minimal development.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine for me and hopefully it is working without a change.

@@ -37,6 +37,9 @@
<property name="hibernate.connection.verifyServerCertificate">false</property>
<property name="hibernate.connection.useSSL">false</property>

<!-- Hibernate search -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are here the other Hibernate Search parameters are missing like used URI, port, ... which are added in the hibernate.properties file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that I also notice the similarity. Needs testing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the Hibernate-Search properties are stored in one place / file. If this is not possible it would be bad at least for me.

@@ -158,7 +158,7 @@ public void runNotExistingScriptAsync() throws InterruptedException {
String commandString = scriptPath + "not_existing_script" + scriptExtension;
CommandService service = new CommandService();
service.runCommandAsync(commandString);
Thread.sleep(1000); // wait for async thread to finish;
Thread.sleep(2000); // wait for async thread to finish;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this and the lines below are changed from 1 to 2 seconds? Is this really needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my development machine is sometimes pretty slow and then the build aborts because of an error here. Maybe this should be handled completely differently than just waiting an arbitrary amount of time, but that would be something for a separate branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. But this will increase the build and test time for everyone. But fine for now.

Comment on lines +1 to +3
hibernate.search.enabled=true
hibernate.search.backend.hosts=localhost:9205
hibernate.search.backend.protocol=http
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the first hibernate.properties file.

Comment on lines +1 to +3
hibernate.search.enabled=true
hibernate.search.backend.hosts=localhost:9205
hibernate.search.backend.protocol=http
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the first hibernate.properties file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants