-
Notifications
You must be signed in to change notification settings - Fork 36
SIREn is an extension for Apache Lucene and Solr. SIREn adds new features to Lucene and Solr for processing and searching highly heterogeneous semi-structured data (e.g., RDF). In essence, SIREn adds a new "Field Type" with a set of specific tools such as Analyzers, Query Operators and Query Parser. If you were looking for a way to:
- have a real schema-less solution, i.e., you don't have to define all of your fields ahead of time, without penalties on the system performance;
- search efficiently over millions of fields;
- create a Lucene Document containing sub-element (i.e., nested child elements)
then SIREn might be a solution.
SIREn extends Lucene and Solr, meaning that you can still use all the features of Solr in conjunction with the features provided by SIREn.
As SIREn introduces a new "Field Type" with a different data model than what can be found in Lucene, SIREn needs its own implementation of each query type supported by Lucene. Currently, most of the core query types that can be found in Lucene have been implemented for SIREn. The table below summarises the current status.
Query Types | SIREn | Lucene |
---|---|---|
Boolean Query | Yes | Yes |
Phrase Query | Yes | Yes |
Proximity (Span) Query | No | Yes |
Wildcard Query | Yes | Yes |
Prefix Query | Yes | Yes |
FuzzyQuery | Yes | Yes |
Range Query | Yes | Yes |
Numeric Range Query | Yes | Yes |
In addition, SIREn provides new query types such as Tuple Query and Cell Query. These new query types allows more complex "structured query" than what Lucene proposes. For example, by using these query types, it is possible now to perform efficient search over an unlimited number of fields, or to perform queries over nested child elements.
The SIREn query types are compatible with the Lucene Boolean query type, i.e., you can combine SIREn query types using the Lucene BooleanQuery.
In the future, SIREn will propose new query types that are similar to XPath, such as the Parent/Child or Ancestor/Descendant query types.
Yes, SIREn returns a list of results that are automatically ranked based on their relevance to your query.
Yes, you can create arbitrary facets using SIREn's query with the Solr Query Faceting feature.
SIREn enables also powerful faceted search by applying a pivot operation on a particular entity type. For example, let say you have three types of entities: movie, review and actor. You could start to search for a particular movie with the help of facets associated to the movie entity type. You can then pivot to another entity type, e.g., actor, and explore all the actors associated to the restricted set of movies. At the same time, a new set of facets is calculated which allows you to restrict the list of actors by certain of their characteristics, like their birthplaces or their nominations. You can then switch back to the list of movies, which will be restricted by a particular genre and by a particular set of actors. Such a functionality is not available as a out of the box solution, but it can be achieved by mixing Solr facet features with SIREn querying features. If you are interested by such a solution, please contact us, we can help you.
Yes, SIREn supports highlighting.
At the moment, SIREn does not support sorting on a particular value of a SIREn field. This might be supported in a future release. However, you can still use sorting on a Lucene field.
Yes. Similarly to Lucene, language agnostic search can be achieved by a careful design of your indexing and querying analysis pipeline, e.g., by using appropriate word stemming filters.
In addition, SIREn provides more flexibility than Lucene/Solr for such a task. In Lucene, a field is restricted to have one single analyzer. In SIREn, you can associate one analyzer per field value, i.e., SIREn allows to associate more than one analyzer for one single field. For example, you can have one field with multiple values, each one in a different language. In this scenario, you can configure SIREn to use a different analyzer based on the language of the value.