-
Notifications
You must be signed in to change notification settings - Fork 339
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added support for exists query, as defined in Elasticsearch
Field exists does not consider types, only field names. Field capability will have to be handled differently unfortunately. This works by introducing an internal (but normal) "u64" field that stores postings list for field existence. For performance/RAM reasons, the fields full path is not stored as a string but instead we compute a u64-fnv hash using the path from root to leaf. If the hash perfects ideally, even with the anniversary attach, collisions are very unlikely. When dealing with complex JSON with the raw tokenizer this feature can double the number of tokens we deal with, and has an impact on performance. For this reason, it is not added as an option in the DocMapper. Like Elasticsearch, we only store field existence of indexed fields. Also in order to handle refinement like expand_dots, we work over the built tantivy Document and reuse the existing resolution logic. On 1.4GB of gharchive (which is close to a worst case scenaio), see the following performance/index size change: With field_exists enabled - Indexing Throughput: 41 MB/s - Index size: 701M With field_exists disabled - Indexing Throughput: 46 MB/s - Index size: 698M
- Loading branch information
1 parent
8c2caf5
commit 04cc9d3
Showing
28 changed files
with
480 additions
and
35 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
// Copyright (C) 2023 Quickwit, Inc. | ||
// | ||
// Quickwit is offered under the AGPL v3.0 and as commercial software. | ||
// For commercial licensing, contact us at [email protected]. | ||
// | ||
// AGPL: | ||
// This program is free software: you can redistribute it and/or modify | ||
// it under the terms of the GNU Affero General Public License as | ||
// published by the Free Software Foundation, either version 3 of the | ||
// License, or (at your option) any later version. | ||
// | ||
// This program is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU Affero General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU Affero General Public License | ||
// along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
use std::hash::Hasher; | ||
|
||
#[derive(Default)] | ||
pub struct PathHasher { | ||
hasher: fnv::FnvHasher, | ||
} | ||
|
||
impl Clone for PathHasher { | ||
#[inline(always)] | ||
fn clone(&self) -> PathHasher { | ||
PathHasher { | ||
hasher: fnv::FnvHasher::with_key(self.hasher.finish()), | ||
} | ||
} | ||
} | ||
|
||
impl PathHasher { | ||
pub fn hash_path(segments: &[&[u8]]) -> u64 { | ||
let mut hasher = Self::default(); | ||
for segment in segments { | ||
hasher.append(segment); | ||
} | ||
hasher.harvest() | ||
} | ||
|
||
/// Appends a new segment to our path. | ||
/// | ||
/// In order to avoid natural collisions, (e.g. &["ab", "c"] and &["a", "bc"]), | ||
/// we add a null byte between each segment as a separator. | ||
#[inline] | ||
pub fn append(&mut self, payload: &[u8]) { | ||
self.hasher.write(payload); | ||
self.hasher.write(&[0u8]); | ||
} | ||
|
||
#[inline] | ||
pub fn harvest(&self) -> u64 { | ||
self.hasher.finish() | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.