Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate detection when Bloodhound#add(datums) is used #946

Closed
bitcity opened this issue Aug 14, 2014 · 4 comments
Closed

Duplicate detection when Bloodhound#add(datums) is used #946

bitcity opened this issue Aug 14, 2014 · 4 comments
Milestone

Comments

@bitcity
Copy link
Contributor

bitcity commented Aug 14, 2014

The dupDetector is good for detecting duplicates between local & remote match. However if I'm using Bloodhound#add(datum) to occasionally append data to the search index, I see duplicates showing up.

Is that expected behavior? If yes, should it be patched in typeahead.js or should my code inspect the available list of suggestions and avoid duplicate manually?

@jharding
Copy link
Contributor

That's the expected behavior. The dupDetector is meant for ensuring suggestions from remote do not duplicate suggestions gathered from local and prefetch. So for what you're doing, you'd have to add logic to prevent the adding of duplicate entries through Bloodhound#add.

If yes, should it be patched in typeahead.js or should my code inspect the available list of suggestions and avoid duplicate manually?

I'm leaning towards the latter. Otherwise adding datums would take O(n*m) where n is the number of datums being added and m is the number of datums already present.

@bitcity
Copy link
Contributor Author

bitcity commented Aug 15, 2014

If I'm not mistaken, all suggestions are stored in the datums array in search_index.js. There's no API exposed to access that. Even if we had access to it, the data is stored as an array of objects. Duplicate detection would be inefficient as you mentioned above.

Although if there was an associative array (with suggestions as it's keys), the lookup for each entry should be O(1) (total would be O(n) where n is the datums being added). However, I'm a bit hesitant to maintain a redundant data array and keep in sync with local/prefetch/remote (simply because maintaining another copy of data is error prone, especially as typeahead changes)

Do you have a suggestion on how should I go about implementing this?

P.S. If there was a method Bloodhound#remove complementing Bloodhound#add, something that removes suggestions from the index, that would serve the purpose too.

@jharding
Copy link
Contributor

Do you have a suggestion on how should I go about implementing this?

I'd suggest forking typeahead.js, changing this line to this.index = o.index || new SearchIndex({, and then implementing a custom search index that does what you want. Or I suppose if you're going to fork the project, you could also just modify search_index.js directly.

P.S. If there was a method Bloodhound#remove complementing Bloodhound#add, something that removes suggestions from the index, that would serve the purpose too.

See #652. With how the search index is currently structured, a remove would be a pain. However, I think I'm going to simplify the search index next time I focus on bloodhound and adding remove support will be one of the goals of that effort.

@jharding jharding added this to the v1.0.0 milestone Aug 24, 2014
@jharding
Copy link
Contributor

This is now possible in v0.11 with usage of the identify option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants