Remove duplicates from file-based history search #741

saep · 2024-02-01T23:22:09Z

I've just started using nushell and I'm using ctrl+r a lot. I was surprised to see duplicates in the suggestions and I can't think of a good reason why there should be any. It is noisy and it might require additional key presses to select the thing I want. So I took a look at it and created a PR. ❤️

fdncred · 2024-02-01T23:49:04Z

It seems like there may be two ways to do this.

filter things like you're doing
don't allow dupes to get put into history to begin with

I'm not sure which way is best.

One thing that concerns me is what the impact will be filtering this way when there are 10s of thousands of history entries.

stormasm · 2024-02-02T04:36:56Z

don't allow dupes to get put into history to begin with

This is the proper solution IMHO...
I don't really see any reason why someone needs to have duplicates in the history...
I have always hoped someone would come along and fix this...
I believe you could implement that solution with the same idea of using a HashSet.

saep · 2024-02-02T08:54:36Z

The HistoryItem type contains a lot of metadata fields that might be useful for audit logs or statistical analyses. (Personally, I don't care about that and simply use time for commands where I do.)

don't allow dupes to get put into history to begin with

I feel a bit dumb that I haven't looked at the file "format" for the text based history. For that, it probably makes more sense to not store duplicates. I'll take a look at that this weekend as well.

One thing that concerns me is what the impact will be filtering this way when there are 10s of thousands of history entries.

Reading the history items from the file is probably the performance bottleneck here.

Memory-wise, I don't think that this is a serious issue. If you choose to have a history of 1_000_000 items, you probably have a computer that can handle this easily. However, to limit the memory usage in most cases, I have created another commit, that filters the duplicates after filtering by the input. The worst case memory usage is still the same. E.g. if all history items contain an "a" and you filter by "a", the set will have the same size. If there are a lot of duplicates, this might actually save memory.

fdncred · 2024-02-02T13:38:35Z

I'm kind of in the camp of fixing the problem vs fixing the symptom. The problem (maybe) is that we have dupes in our history. I'm not exactly sure how to fix that myself because I think you'd always want the most recent dupe in the more recent history vs a long way away.

Having said that, I'm not sure this is working right as it works with our menus. I'm trying to test with cargo run --example demo after checking out your PR. You can then hit ctrl+x for the history menu. I think this menu now shows unique items however, I'm thinking the stats like at the bottom of the history meny may be wrong. It also seems that we're allowed to page through pages that don't even exist by holding in ctrl+x or ctrl+shift+x. (the paging through thing could be a separate bug unrelated to this pr)

saep · 2024-02-03T10:55:43Z

I've read through a lot of the source code and I think that this is a reasonable short term solution to remove the duplicates from the file-based history.

If we want to avoid adding duplicates to the history file in the first place, the backend for that has to be architected differently. Since this seems to fall into the scope of #735, I`ll add some of my observations and suggestions there.

stormasm · 2024-02-03T16:43:10Z

@saep thanks ! this looks great 👍
I tested it out and it appears to work fine with very minimal code changes...
We have a release coming up in a few days --- so we will hold off on landing this fix / change until after the release...
After the release passes on Feb 6th I will circle back and let other folks take a look at this change as well if they want to --- and do any testing as well...
but to me this looks like an excellent change to our code to avoid the history items being repeated....

ClementNerma · 2024-02-03T17:36:05Z

I strongly disagree with the idea of removing duplicates from the history. From a Ctrl-R research sure, that makes sense. But not in general.

There are multiple reasons to that, the most obvious being backtracing: I want to be able to see what commands I executed in order without any missing. It's especially useful when trying to redo a set of instructions in order. That happened to me multiple times and I don't think it's that rare of a scenario either.

So I think duplicates should not be removed from history, only from the fuzzy-finding search.

stormasm · 2024-02-03T18:47:33Z

@ClementNerma point well made and I understand where you are coming from but I am still leaning towards removing the duplicates...

But this shows why we want to give the user the ability to do their own History implementation --- here is yet another example of having removing Duplicates as the Default scenario --- but users like yourself who want to keep the duplicates can easily replace the Default History Trait or something to that effect with your History Trait which keeps the duplicates.

Or your history Trait which stores to a database like Sled instead of Sqlite etc...

Or whatever other cool History features other developers will come up with that is more useful for their type of scenario.

stormasm · 2024-02-03T18:49:20Z

I want to be able to see what commands I executed in order without any missing.

It already does not do this... If you type ls three times or version three times in a row it is only shown once....

ClementNerma · 2024-02-03T19:29:56Z

Maybe it would be interesting to see what Nushell's choice would be on that matter, implement it as the default behaviour, and make it configurable for whoever wants to customize it?

fdncred · 2024-02-03T19:39:37Z

I want to be able to see what commands I executed in order without any missing ...

In the sqlite history, if we were only keeping unique commands, i'd want to update the time date stamp each time a duplicate command is executed and ensure our history pulls from the most recent.

I don't want to type ls and then hit up arrow and get some other command. obviously, ls would be duplicated because i use it all the time, but i agree that it's important to have the history order intact. although, i'm not sure i care about old commands, but you make a good point here @ClementNerma. You're kind of convincing me to your way of thinking. Thanks.

fdncred · 2024-02-07T22:17:48Z

We talked about this today in our meeting (Feb 7th, 2024). This is how we decided that we'd like it to work. Maybe it's what you've already done. I'm not sure.

We do not want to remove duplicates from our history.
We want to do a "special" type of filtering that works like the example below.

Example: Here's the items in the history

git stash drop
git add .
git status | something | else
git status
git commit -m 'something'
ls
git push
git status

a. Hitting up with git s<tab> shows git status (since it was the last command)
b. We decide we don't want the plain "git status" so we hit up again and it shows git status | something | else but does not show the git status proceeding it.

does that help you understand the direction we'd like to go?
do you have any questions about it?
would you be willing to modify your PR to work like this?

stormasm · 2024-02-08T01:19:42Z

@fdncred I am going to go ahead and close this PR ?

Now that you have documented a future PR the next developer can start from a clean slate...

In case anyone wants to move forward in the future they can review what you have written....

fdncred · 2024-02-08T01:25:44Z

@stormasm I'd leave that up to @saep to decide the best way to move forward.

saep · 2024-02-08T17:59:43Z

I've rebased on main and added a test specifying your wanted behavior. For that, I had to move the duplicate tests back to the history module.

Since I've removed the call to Historys count function, this implementation might be less optimal for the sqlite backend. However, it works and optimizing that could be done in a separate pull request. Possibly by adding an option to the search.

stormasm · 2024-02-09T03:18:52Z

@saep Thanks for making the modifications !
Excellent job 😄

@fdncred I tested this PR and it works as I believe we want it to work...
Would you mind testing it as well...
To see if this is what you explained...
Then I believe we can go ahead and land it...

fdncred · 2024-02-09T12:23:31Z

I tested it with cargo run --example demo and cargo run --example demo --features=sqlite,bashisms which, I think should test the FileBackedHistory and SqliteBackedHistory, and it all worked well. Thanks for this PR!

saep force-pushed the no-duplicates-in-history branch from 89ca9ab to fea1afc Compare February 3, 2024 10:43

saep changed the title ~~Remove duplicates from history completer~~ Remove duplicates from file-based history search Feb 3, 2024

saep mentioned this pull request Feb 3, 2024

History API rewrite (discussion after #680) #735

Closed

fdncred marked this pull request as draft February 8, 2024 13:11

saep force-pushed the no-duplicates-in-history branch from fea1afc to dc356e0 Compare February 8, 2024 17:30

Remove duplicates from file based history search

19be613

saep force-pushed the no-duplicates-in-history branch from dc356e0 to 19be613 Compare February 8, 2024 17:56

saep marked this pull request as ready for review February 8, 2024 18:00

fdncred merged commit ab1b47e into nushell:main Feb 9, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove duplicates from file-based history search #741

Remove duplicates from file-based history search #741

saep commented Feb 1, 2024

fdncred commented Feb 1, 2024

stormasm commented Feb 2, 2024

saep commented Feb 2, 2024

fdncred commented Feb 2, 2024

saep commented Feb 3, 2024

stormasm commented Feb 3, 2024

ClementNerma commented Feb 3, 2024

stormasm commented Feb 3, 2024 •

edited

Loading

stormasm commented Feb 3, 2024

ClementNerma commented Feb 3, 2024

fdncred commented Feb 3, 2024

fdncred commented Feb 7, 2024

stormasm commented Feb 8, 2024

fdncred commented Feb 8, 2024

saep commented Feb 8, 2024

stormasm commented Feb 9, 2024

fdncred commented Feb 9, 2024

Remove duplicates from file-based history search #741

Remove duplicates from file-based history search #741

Conversation

saep commented Feb 1, 2024

fdncred commented Feb 1, 2024

stormasm commented Feb 2, 2024

saep commented Feb 2, 2024

fdncred commented Feb 2, 2024

saep commented Feb 3, 2024

stormasm commented Feb 3, 2024

ClementNerma commented Feb 3, 2024

stormasm commented Feb 3, 2024 • edited Loading

stormasm commented Feb 3, 2024

ClementNerma commented Feb 3, 2024

fdncred commented Feb 3, 2024

fdncred commented Feb 7, 2024

stormasm commented Feb 8, 2024

fdncred commented Feb 8, 2024

saep commented Feb 8, 2024

stormasm commented Feb 9, 2024

fdncred commented Feb 9, 2024

stormasm commented Feb 3, 2024 •

edited

Loading