Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLOSED] Fix for #2068: better QuickOpen heuristics #2346

Open
core-ai-bot opened this issue Aug 29, 2021 · 24 comments
Open

[CLOSED] Fix for #2068: better QuickOpen heuristics #2346

core-ai-bot opened this issue Aug 29, 2021 · 24 comments

Comments

@core-ai-bot
Copy link
Member

Issue by dangoor
Wednesday Jan 02, 2013 at 17:10 GMT
Originally opened as adobe/brackets#2462


The new heuristics don't relate so much to longest substring as they do to trying to find contiguous matches around characters in "special" positions in the string.


dangoor included the following code: https://github.com/adobe/brackets/pull/2462/commits

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Wednesday Jan 02, 2013 at 17:11 GMT


Adding a link for convenience: Fix for #2068

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Tuesday Jan 08, 2013 at 01:55 GMT


Almost done reviewing -- will wrap up later tonight.

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Tuesday Jan 08, 2013 at 08:27 GMT


Overall on the heuristic: definitely feels much improved!

I'm still feeling like camelcase matches come out a little too low, though I won't say it necessarily needs to be addressed right now -- I think I probably use camelcase search queries with Quick Open more heavily than most users. Here are a few example cases:

  • "ECH" ranks "SpecHelper" above "EditorCommandHandlers"/-test
  • "DMan" ranks "CommandManager" above "DocumentManager"
  • "EUtil" lists "FileUtils" before "ExtensionUtils" & "EditorUtils"

I have a few half-formed ideas about how we could fix cases like that. E.g. if two consecutive specials are matched, also give credit for (some fraction of) a contiguous match for the whole range of chars in between. Or, slightly penalize contiguous matches that span a special without starting on one.

Anyway, unless you think there's something very easy we could do here, I'm happy to just spin off a bug and then deal with it later :-)

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Tuesday Jan 08, 2013 at 08:36 GMT


One final thought -- the match heuristic code is getting pretty big. Should it go into its own module? E.g. utils/StringMatch.js. (If so I'd suggest doing the move as its own commit, so that other changes resulting from the code review are easier to see in a diff).

Ok, whew! Done reviewing.

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Wednesday Jan 09, 2013 at 15:18 GMT


The stringMatch code is a bit over 500 lines (out of a 1360 line file). It doesn't seem too unwieldy to me. But, it's possible that the string matching could be useful for things that are not "QuickOpen" (I'm thinking of a command line :) and it may be cleaner to have it separate from that angle. I agree that moving it should be a separate commit.

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Wednesday Jan 09, 2013 at 16:16 GMT


I've moved StringMatch into its own file. Note that this does break the PHP QuickOpen extension at least:

https://github.com/aonic/brackets-QuickOpenPHP/blob/master/main.js

I think a good practice in a case like this would be to re-export stringMatch from QuickOpen with a deprecation warning and remove it for good in a couple of sprints. But, we may be early enough in the project now that we can just ask the QuickOpenPHP author to change the extension. What do you think?

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Thursday Jan 10, 2013 at 01:17 GMT


Done re-reviewing -- getting close now! Thanks for all the thorough responses and refactorings!

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Thursday Jan 10, 2013 at 01:23 GMT


Actually one more thing. I happened to hit an odd bug yesterday that still repros with your latest (if I patch it so Go to Definition works, that is):

  1. Open WorkingSetView
  2. Go to Definition search on "hadledo" --> no results
  3. Go to Definition search on "handledo" --> _handleDocumentSelectionChange() appears in results

It seems like any subset of a query with results should include all those same results. Or at least, I can't think of any reason why "hadledo" wouldn't match "_handleDocumentSelectionChange"...

If this turns out to be tricky we can spin off a separate bug to unblock the pull request though -- whatever the issue is here, I'm not seeing it often...

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Thursday Jan 10, 2013 at 03:10 GMT


It looks to me like your "hadledo" example above is another example of the need for backtracking. I'll take a look into that in the morning.

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Monday Jan 14, 2013 at 17:52 GMT


The latest commit fixes the "hadledo" problem and also makes "sru" prefer SpecRunnerUtils to SpecRunner.js.

nj mentioned forward tracking as an alternative to backtracking, with the possibility of trying multiple paths and locating the one with the best score. I think the new algorithm works well for a large majority of cases, and I believe that it's likely to process many search strings on a single pass through without backtracking. I have spotted one case where testing a couple of paths could make things better (searching for "quickopen" ends up matching QuickOpenCSS/main.js because it searches the last segment first), but it would require a more significant rejiggering to change the way the last segment preference is handled. I did actually consider it when adding the backtracking, but opted to leave that part as is.

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Wednesday Jan 16, 2013 at 01:45 GMT


Still trying to wrap my head around deadBranches and backtrackTo a little better. Will be back online later tonight to continue reviewing...

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Wednesday Jan 16, 2013 at 02:23 GMT


I wonder if I can come up with some ASCII art to make them clearer. The key is that the normal pattern prefers the special characters, but in so doing it skips over possible matches. So, I used backtrackTo to keep taking us back one special character at a time (while pulling off every other character that may have matched in between), and then we take it forward consecutively from there. We didn't get a match in the specials, but maybe there's one lurking between those last two specials we checked.

But, a problem remains: what happens if we match a consecutive character, then switch back to hitting specials and then hit the end of the string again without a complete match? That's where deadBranches comes in. At the time we set backtrackTo, we also have positively identified that the part of the query after queryCounter does not appear after that special character. If we find ourselves heading down that path again, we need to stop looking for specials because we're not going to find what we're looking for. Without deadBranches, it would keep trying to match up the specials. I did try a simpler approach where it just keeps track of the highest special it should scan to, but it turned out that that approach would actually not do specials scanning when it should.

This is akin to dynamic programming, if not exactly that. stringMatch is not exactly a generalized routine... we have some idea how it's used, and I tried to match the algorithm to that.

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Wednesday Jan 16, 2013 at 20:31 GMT


@peterflynn and I worked through an example on IRC of how the backtracking works and I added that example in the doc for _generateMatchList (in commit 112b206) in hopes that how the matching works will be a bit clearer.

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Wednesday Jan 16, 2013 at 23:14 GMT


Ok, I think it's all starting to make sense to me :-)

Here are a few things I think we might want to capture in the docs in some way:
(note: I haven't yet read through the big block comment added in 112b206, so apologies if some of this is already in there)

  • We only backtrack() when we're exhausted both special AND normal forward searches past that point, for the query remainder we currently have. For a different query remainder, we may well get further along - hence deadBranches[] being dependent on queryCounter; but in order to get a different query remainder, we must give up one or more current matches by backtracking.
  • Normal "any char" forward search is a superset of special matching mode -- anything that would have been matched in special mode could also be matched by normal mode
  • backtrack() always goes at least as far back as str[backtrackTo-1] before allowing forward searching to resume
  • When deadBranches[qi] = si it means if we're still trying to match queryStr[qi] and we get to str[si], there's no way we can match the remainer of queryStr with the remainder of str -- either using specials-only or full any-char matching.
  • We know this because deadBranches[] is set in backtrack(), and we don't get to backtrack() unless either:
    1. We've already exhausted both special AND normal forward searches past that point
      (i.e. backtrack() due to strCounter >= str.length, yet queryCounter < query.length)
    2. We stopped searching further forward due to a previously set deadBranches[] value
      (i.e. backtrack() due to strCounter > deadBranches[queryCounter], yet queryCounter < query.length)

Hopefully that is all correct! If not lmk...

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Wednesday Jan 16, 2013 at 23:15 GMT


Also needs a (hopefully trivial) merge with master before this is mergeable

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Thursday Jan 17, 2013 at 02:15 GMT


Yes, that is how it all works. It's not very important, but there is one minor adjustment. You say:

  • Normal "any char" forward search is a superset of special matching mode -- anything that would have been matched in special mode could also be matched by normal mode

could be, because those special characters would be traversed, but it does not happen. It doesn't happen because the specials are traversed first and only after that fails does it resort to matching the normal characters. So, as it progresses forward character-by-character it will compare a special if it hits one, but it won't be a match because it had already compared it.

Otherwise, everything you said is spot on. Maybe I'll just add your points in directly, because I think that someone else's interpretation of that code (plus the comments that I already have there) can only help someone new who's approaching it for the first time.

And, yes, the merge is trivial. I did that merge when I created the branch for the performance improvement yesterday.

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Thursday Jan 17, 2013 at 04:09 GMT


Hmm, ok I just ran across another case where it seems to fail to find a match: open StringMatch.js and search for "_computerangesa" (or any longer substring of _computeRangesAndScore) -- it won't show any results. If I take out the "s" ("_computeRangeAndScore") then it matches.

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Thursday Jan 17, 2013 at 04:12 GMT


Also one case of funny scoring: if I search for "jsutil," it ranks JSLintUtils.js above JSUtils.js. It seems like both the longer contiguous match and the preference for shorter strings should work in JSUtils' favor, so I'm not sure what's happening...

I tried turning on DEBUG_SCORES, but I'm only seeing the total number in the Quick Open results list -- not the breakdown of individual scoring attributes. Is there more to it than just changing the initializer from false to true?

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Thursday Jan 17, 2013 at 13:02 GMT


I'll look into the "_computerangesa" case and the scoring. Scoring is easy to tweak and will never be 100% perfect all the time. The current algorithm counts uppercase letters after lowercase ones as special (the camelCase pattern), so JSLintUtils.js has an extra special than JSUtils.js. I'll see if there's something that makes sense to tweak.

When DEBUG_SCORES is on, hover over the score to see the individual parts (it's a tooltip).

@core-ai-bot
Copy link
Member Author

Comment by njx
Thursday Jan 17, 2013 at 15:55 GMT


I haven't been following the algorithm in detail, but I wonder if we should treat all uppercase letters as "special", not just ones after lowercase letters, precisely because of cases like "JSUtils", where conceptually the "U" really is the start of a "word", and arguably the "S" is too (because it's part of an acronymish abbreviation standing in for a word). I think most contiguous strings of uppercase letters in identifier names tend to be abbreviations like this.

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Thursday Jan 17, 2013 at 16:11 GMT


@njx that's certainly an option (which I considered just now, but found another way to accomplish the same thing). I'm guessing that we'll turn the knobs a bit on the scoring parameters over time to improve how the matches feel (because it's pretty subjective). I found a different tweak this morning that I think will work out nicely (see commit f325030 if interested).

@peterflynn good catch on the _computerangesa search. You had asked at one point if there was a need for backtrackTo to be able to move forward. I couldn't think of a case, but this problem was actually an instance of that very problem. As I said in the commit message, backtrackTo was my starting point for the backtracking, but it turned out that it wasn't necessary and was even harmful.

So, the fix here actually made things a bit simpler. When we determine that past point X in the string, the last Y parts of the query can't find a match, we just keep track of that. We won't go hunting beyond that point for a match for that part of the query, and if we need to backtrack we'll make sure that we rewind to the previous special before point X. deadBranches is all of the bookkeeping we need.

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Friday Jan 18, 2013 at 00:37 GMT


The code changes all look good. I'm just going to run on this branch for a while longer to make sure I don't hit any other cases that seem off -- and if not, I think we're good to merge!

@core-ai-bot
Copy link
Member Author

Comment by dangoor
Friday Jan 18, 2013 at 05:16 GMT


Awesome. Thanks again for digging into this one and sticking with it.

When this merges, I think I'll redo my pull request for the performance improvement. That code is really quite straightforward and I think we should get it in soon, even if not sprint 19.

@core-ai-bot
Copy link
Member Author

Comment by peterflynn
Friday Jan 18, 2013 at 21:47 GMT


Alright, played with it a bunch more all all still seems well -- time to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant