-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Quick Open heuristic should prefer contiguous substrings #2068
Comments
Another example: do a quick open for |
Assigning to @peterflynn |
Actually, I think the reason this happens is because we walk right-to-left, on the theory that the rightmost segments are more specific. That makes sense at the segment level, but I don't think it makes sense at the character level. What if we walked right-to-left through segments, but within each segment we walk left-to-right? |
Reviewed. |
Bumping to medium priority--this is really breaking some common cases. For example, if you do Quick Open on "Commands", you get "CommandManager.js" as the first hit instead of "Commands.js". |
Nominating for sprint 17 since the new heuristic has slightly worse behavior in some cases. |
Yeah... the heuristic is definitely improved on the whole (the robustness to typos alone is worth it!), but there are definitely a lot of little cases where the top hit isn't what I'd expect. Here are some more examples:
I like the idea of walking left-to-right within each segment, but I don't think it would fix any of these ranking problems (nor the ones NJ mentioned above). It does seem worth fixing, since it leads to odd highlighting (e.g. "menus" highlights "Menus.js" when you'd expect "Menus.js"). But it seems lower priority. Also note that regardless of direction, because the matching is greedy there will always be cases where we fail to find the longest matching substring. E.g. left-to-right searching for "abcd" in "abcxxxabcd" would yield "abcxxxabcd" which isn't optimal either. |
Moving out to sprint 18. Somewhat risky, not an official feature. |
I've probably spent more time than was desired/anticipated on this, and I'm closer to a "solution" but not there yet. I put "solution" in quotes, because I don't think it's really possible to have the first matched item always be the one you'd anticipate, but it can be the case most of the time. We can probably make things better by tweaking the scoring, but I think relying purely on right-to-left walking is going to leave us with many cases that feel funny. My first thought was to find the longest contiguous substring and build a result around that. Unfortunately, finding the longest contiguous substring is slow (not just in running time complexity but in actual observed time on my machine). Beyond that, while I think having the longest contiguous substring would make things better, I think we'd still end up with suboptimal cases. I observed what other editors do and found a rough comparison order that I like. I say "rough" because I haven't implemented it yet, and I know there are little gremlins. I believe that this would produce fundamentally better results than what we have now.
I don't know if I'll have this done in time for the end of the sprint, especially given review. |
I think it would be ok to move this out to sprint 19. Good feedback, Kevin! |
Good thinking here. I think the general approach of right-to-left for the whole list of segments and left-to-right-with-contiguity for individual segments makes sense. (I'm assuming that by "split the string up into segments", you're talking about the string to be matched, not the query.) A couple of other thoughts:
|
@pthiess I'm making good progress now (TDD FTW in a case like this), but yeah, this looks like a "land early in sprint 19" thing given the timing. I am hoping to have a pull request up today. @njx: I don't know for sure yet, but I don't think we need to account for multiple segments in the query. "/" is considered a "special character" and automatically gets special treatment when searching and scoring in my new algorithm. I totally agree with having a decent set of representative cases and I already have a test function that takes a query and a list of strings to test and verifies that the strings are scored to appear in the order that the test provides. I will feed that a bunch of cases (you and @peterflynn have provided a bunch of great ones here) to ensure sane results. |
@dangoor I moved it to Sprint 19 and deleted it from the Trello card for Sprint 18 |
Another update: yesterday, I got the matching working and now I have the scoring working well. The results seem much better than the old algorithm. There's one bug left to fix and I need to write a lot of comments. |
Sweet! Looking forward to it. |
Fix for #2068: better QuickOpen heuristics
FBNC @njx |
Looks great. The only case that I found that seems a little weird to me is "cm.js" matches "CodeHintManager.js" before "CommandManager.js"--I would expect the latter to come first because the two uppercase characters are contiguous in the query. But we're never going to make Quick Open read my mind in every case :) |
spec/live
Result: The highlighting in the first few entries suggests they're being sorted to the top because of the "v" and "e" being scattered throughout the path, which seems weird. It seems like the heuristic should prefer the contiguous "live" at the beginning of the path entry.
The text was updated successfully, but these errors were encountered: