-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First attempt at mutable indexes for lunr.js 2.x #315
Conversation
I need to take a closer look through the change, but I think the general approach might work. Having a subclass of the immutable index that adds mutability. I especially like the fact that it can then be packaged as an extension that users can opt into if they need it. Let me take a closer look through your implementation to see if there are any gotchas that you need to be aware of. |
Sounds good - thanks for the quick response! |
Some properties on the builder are shared with the index, so we can save a bit of serialization space by not serializing the exact same data twice
lunr.Builder.fieldTermFrequencies and lunr.Builder.fieldLengths have the exact same keys, and are both keyed by field refs. It saves us some space when serializing a mutable builder to save the list of field refs once, along with the values in these two objects, and recombine them into objects at load time
I added a few new commits - just trimming down the serialized size for mutable indexes a bit. |
Otherwise, if you're performing multiple changes to the index in-between queries, you end up doing a lot of redundant work
This PR looks great - it would be extremely useful for server side applications, or any use case requiring index maintenance. Anything I can do to help get this merged? |
@k00p I put this here as a sort of proof-of-concept - I was actually thinking of publishing it as a separate extension for lunr.js. I've been using it on my own for a while now and it seems to work fine - I should probably just package it up on NPM =) |
I went ahead and installed it as a node module directly from this pull request:
Note that the Test errors out of the box - the tests need to be added to test/env/file_list.json. I am still not getting it to work yet, but I will keep trying to figure it out. Update: Have it working now but am still validating updates and removes. If you aren't concerned with the errors for the Last Update: The tokenizer persistence issue ended up not being a big deal. I was concerned because I was attempting to use metadata to carry document data, but that was a fundamentally flawed strategy. The fix was to maintain a dictionary in addition to the index, and then ornament the search results from the dictionary. For my purposes, the increase in memory footprint is worth the gains in response times. |
So I finally got around to bundling this PR up into a standalone NPM package: https://www.npmjs.com/package/lunr-mutable-indexes It's my first NPM package, so any feedback on how I could improve it would be great! |
Just wanted to say that I'm using this and finding it extremely valuable. Would be great to see it as part of core Lunr. |
I switched my focus as well to @hoelzro lunr-mutable-indexes project. The benefits for server side index maintenance greatly outweigh the drawbacks, and so far the project "piggybacks" on lunr, so it should continue to be a mutually beneficial relationship between the two repos. |
I'm going to go ahead and close this - I think that this functionality existing as a separate module is the best place for this, and I can always submit a new PR in the future if @olivernn wants me to fold this into lunr core. |
@hoelzro This works great. I have server side code where the index is rather large and often updated. Thanks! |
Hi there! I chimed in a separate issue about mutable indexes for 2.x, and I took a stab at adding them for my own purposes. This PR (which I don't expect you to merge if you don't want to - I just wanted to share my work, get some review, and start a conversation) is the result of that work. I'll be trying this for my FTS TiddlyWiki plugin - if it's something you're not interested in merging in, I can always publish it as a separate module, perhaps.
I made a few tradeoffs for this initial implementation:
lunr(function() { ... })
for mutable indexes - I might add that later.builder.termIndex
may build up when documents are deleted - I consider this to be acceptable since in the kinds of documents I'm working with, terms will seldom disappear permanently.JavaScript isn't a language that I "speak" fluently, so any feedback on idioms that I may have improperly used and how to correct them would be most welcome. I would appreciate any feedback on this, even "this won't work because of X, Y, and Z" because that would save me some headaches. =)