proposal: support matching inside string literals #7640

alevinval · 2023-07-15T19:58:19Z

Matching is in much better shape than what it used to, but there are still really counter-intuitive issues such as: #2110 This patch proposes two changes:

find_matching_bracket_fuzzy should not traverse parents if already on a valid bracket, this was not being respected as traverse parents was always set to true, probably an overlook after the refactor introduced in match pairs which don't form a standalone TS node #7242
Finally I propose a change: plain-text search should run even if the node has children. Why? In Misbehaving bracket-matching highlighting in rust lang #2110, when on top of the inner brackets, when inspecting the tree-sitter scopes you will notice it's a string literal node with 2 escape sequence child nodes, therefore plain-text matching never runs, and the user is confused. I also regularly run into similar issues on my day-to-day, as I work with YAMLs that have interpolation inside strings (eg something: "anything {{ version }}") but for some reason the node has a children (so 1 > 0) and again I can never match the inner {.

In general, the first fix is desirable: if you are lucky to be on top of a matching bracket inside a tree-site node without children, you would never be able to match anyway, because it would be traversing the parents and very likely there will be a matching node somewhere in the parent nodes, so I really think the first should stay no matter what. And this is really bad, because the highlight would correctly pick the inner match, and then when running mm it would behave differently and match something on the parent hierarchy.

I'm not an expert on the code-base and the issues you've run into in general, but I'd like to believe removing the constraint should make the majority of the users happy (I see there are still a few issues open about weird mm behaviour, I hope to address all of them here) and it would be very weird to run into performance problems for most of the people doing regular editing.

When the cursor is already on top of a valid bracket, then it should never traverse the parents. The documentation also states this: ``` // If the cursor is on an opening or closing bracket, the function // behaves equivalent to [`find_matching_bracket`]. ``` This was not true, as it was always beign set to true, regardless of being on top of a bracket or not.

…onsistent for vast majority of helix users

pascalkuthe · 2023-07-15T20:53:18Z

find_matching_bracket_fuzzy should not traverse parents if already on a valid bracket, this was not being respected as traverse parents was always set to true, probably an overlook after the refactor introduced in #7242

that special case isn't necessary anymore because we also look at unnamed nodes now (only named in the past). Every bracket gets its own unnamed TS lexical token. Its a bit subtle but it should work as intended for the most part. The only case I could where it doesn't again involves plaintext (comment) brackets. I honestly never really intended to match plaintext brackets I just do it to work around the c preprocessor. It's very much on purpose since otherwise using mm on arrows (or other stuff that actually isn't a bracket but contains a bracket char) wouldn't actually make it correctly jump to the parent scope. I guess we could try not transversing parents first and if that fails the try again with transversing parents to work around that.

Finally I propose a change: plain-text search should run even if the node has children. Why? In

#2110, when on top of the inner brackets, when inspecting the tree-sitter scopes you will notice it's a string literal node with 2 escape sequence child nodes, therefore plain-text matching never runs, and the user is confused. I also regularly run into similar issues on my day-to-day, as I work with YAMLs that have interpolation inside strings (eg something: "anything {{ version }}") but for some reason the node has a children (so 1 > 0) and again I can never match the inner {.

The condition is working as intended. String literals have multiple unnamed children (the quotes and the actual content). I specifically did not want to match inside string literals. The plaintext matching was only intended for very weird top-level nodes that basically indicate parts of the file that are plaintext only (C preprocessor tokens). (#2110 is about something else not the absence of highlights but the weird presence of them but that's been fixed btw.).

I am not' sure if I want to remove the restriction here. Plaintext matching can cause a ton of weird false positives. We can prevent false positives by descending to the unnamed child (instead of the named child) in which case it could be ok I guess. It might still falsely match on some punction (like a double arrow <=>, but I guess that's niche)

alevinval · 2023-07-15T21:40:09Z

You are absolutely right. I now realize I messed up the order, the first commit I introduced to get the second one working correctly and then lost track of the whole thing. I'm sorry about that, I really don't like distracting you when it's me getting confused.

I could still take a look at doing what you suggested (descending unnamed child as a last resort) and see if something worthwhile could come from it.

alevinval · 2023-07-15T21:45:29Z

Might be better to close this though, it may be confusing if someone stumbles into that mess of a PR.

pascalkuthe · 2023-07-15T22:01:16Z

No worries, I appreciate your contributions! This stuff is definitly a bit subtle.

I think I would prefer the stricter criterion to remain unchanged for mm. Plaintext matching just works a a bit differently and mixing the two kinds of matching can produce unexpected results. The current support is just a hack to support the C preprocessor and exclude everything else. I am actually very tempted to remove it from mm again.

I think what could work instead is to have a seperate key that always does plaintext matching (like mp or something). It's not as smooth but it makes conceptually sense to me.
Matching brackets inside a string literals is just a different operation than syntax based matching.

alevinval · 2023-07-16T18:41:39Z

I suspect mp would not really solve the issue, because Helix would not be showing visual highlights for these matches, the visual hint is essential in letting you know you can mm, without it you would not know if the current position is mp-able or not.

I understand your position that matching should stick to syntax only, for multiple reasons. Whereas I come at you with the PoV of an average user (eg. myself) that gets bamboozled when matching does not do what they expect.

What about a middle ground? Introducing a boolean configuration for matching, something like strict: true|false. On strict matching, you run entirely on TS matching, when not strict you get the plain-text fallbacks for special cases (being on top of a valid bracket, under certain conditions).

The find_pair logic could look like this:

diff --git a/helix-core/src/match_brackets.rs b/helix-core/src/match_brackets.rs
index 7fda6d7e..4f4d1cee 100644
--- a/helix-core/src/match_brackets.rs
+++ b/helix-core/src/match_brackets.rs
@@ -56,9 +56,11 @@ fn find_pair(
     doc: RopeSlice,
     pos_: usize,
     traverse_parents: bool,
+    strict: bool
 ) -> Option<usize> {
     let tree = syntax.tree();
     let pos = doc.char_to_byte(pos_);
+    let cursor_on_bracket = !strict && is_valid_bracket(doc.char(pos))
 
     let mut node = tree.root_node().descendant_for_byte_range(pos, pos)?;
 
@@ -74,7 +76,7 @@ fn find_pair(
 
                 // We return the end char if the cursor is either on the start char
                 // or at some arbitrary position between start and end char.
-                if traverse_parents || start_byte == pos {
+                if (traverse_parents && !cursor_on_bracket) || start_byte == pos {
                     return Some(end_char);
                 }
             }
@@ -89,7 +91,7 @@ fn find_pair(
             }
         }
 
-        if !traverse_parents {
+        if !traverse_parents || cursor_on_bracket {
             // check if we are *on* the opening pair (special cased here as
             // an opptimization since we only care about bracket on the cursor
             // here)
@@ -115,7 +117,10 @@ fn find_pair(
         node = parent;
     }
     let node = tree.root_node().named_descendant_for_byte_range(pos, pos)?;
-    if node.child_count() != 0 {
+    if !node
+        .children(&mut node.walk())
+        .all(|child| child.child_count() == 0)
+    {
         return None;
     }
     let node_start = doc.byte_to_char(node.start_byte());

Note the revisited condition at the end, between removing the constraint entirely, and being super strict, towards allowing any node without children, or nodes whose children are all leaves. I've not found false positives for the smoke testing I did, and it basically gives me what I was looking for in my day-to-day driving of Helix.

I will prepare a draft showing what a configurable solution could look like when I'm back from vacation.

alevinval added 2 commits July 15, 2023 21:45

discussion: if we did that, we would make the behaviour of matching c…

ad8f086

…onsistent for vast majority of helix users

alevinval marked this pull request as draft July 15, 2023 19:58

alevinval changed the title ~~enhancement/fix: intuitive match bracket behavior~~ proposal: support matching inside literals Jul 15, 2023

alevinval changed the title ~~proposal: support matching inside literals~~ proposal: support matching inside string literals Jul 15, 2023

alevinval closed this Jul 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: support matching inside string literals #7640

proposal: support matching inside string literals #7640

alevinval commented Jul 15, 2023 •

edited

Loading

pascalkuthe commented Jul 15, 2023

alevinval commented Jul 15, 2023 •

edited

Loading

alevinval commented Jul 15, 2023 •

edited

Loading

pascalkuthe commented Jul 15, 2023 •

edited

Loading

alevinval commented Jul 16, 2023

proposal: support matching inside string literals #7640

proposal: support matching inside string literals #7640

Conversation

alevinval commented Jul 15, 2023 • edited Loading

pascalkuthe commented Jul 15, 2023

alevinval commented Jul 15, 2023 • edited Loading

alevinval commented Jul 15, 2023 • edited Loading

pascalkuthe commented Jul 15, 2023 • edited Loading

alevinval commented Jul 16, 2023

alevinval commented Jul 15, 2023 •

edited

Loading

alevinval commented Jul 15, 2023 •

edited

Loading

alevinval commented Jul 15, 2023 •

edited

Loading

pascalkuthe commented Jul 15, 2023 •

edited

Loading