Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicates from file-based history search #741

Merged
merged 1 commit into from
Feb 9, 2024

Conversation

saep
Copy link
Contributor

@saep saep commented Feb 1, 2024

I've just started using nushell and I'm using ctrl+r a lot. I was surprised to see duplicates in the suggestions and I can't think of a good reason why there should be any. It is noisy and it might require additional key presses to select the thing I want. So I took a look at it and created a PR. ❤️

image

@fdncred
Copy link
Collaborator

fdncred commented Feb 1, 2024

It seems like there may be two ways to do this.

  1. filter things like you're doing
  2. don't allow dupes to get put into history to begin with

I'm not sure which way is best.

One thing that concerns me is what the impact will be filtering this way when there are 10s of thousands of history entries.

@stormasm
Copy link
Contributor

stormasm commented Feb 2, 2024

don't allow dupes to get put into history to begin with

This is the proper solution IMHO...
I don't really see any reason why someone needs to have duplicates in the history...
I have always hoped someone would come along and fix this...
I believe you could implement that solution with the same idea of using a HashSet.

@saep
Copy link
Contributor Author

saep commented Feb 2, 2024

The HistoryItem type contains a lot of metadata fields that might be useful for audit logs or statistical analyses. (Personally, I don't care about that and simply use time for commands where I do.)

don't allow dupes to get put into history to begin with

I feel a bit dumb that I haven't looked at the file "format" for the text based history. For that, it probably makes more sense to not store duplicates. I'll take a look at that this weekend as well.

One thing that concerns me is what the impact will be filtering this way when there are 10s of thousands of history entries.

Reading the history items from the file is probably the performance bottleneck here.

Memory-wise, I don't think that this is a serious issue. If you choose to have a history of 1_000_000 items, you probably have a computer that can handle this easily. However, to limit the memory usage in most cases, I have created another commit, that filters the duplicates after filtering by the input. The worst case memory usage is still the same. E.g. if all history items contain an "a" and you filter by "a", the set will have the same size. If there are a lot of duplicates, this might actually save memory.

@fdncred
Copy link
Collaborator

fdncred commented Feb 2, 2024

I'm kind of in the camp of fixing the problem vs fixing the symptom. The problem (maybe) is that we have dupes in our history. I'm not exactly sure how to fix that myself because I think you'd always want the most recent dupe in the more recent history vs a long way away.

Having said that, I'm not sure this is working right as it works with our menus. I'm trying to test with cargo run --example demo after checking out your PR. You can then hit ctrl+x for the history menu. I think this menu now shows unique items however, I'm thinking the stats like at the bottom of the history meny may be wrong. It also seems that we're allowed to page through pages that don't even exist by holding in ctrl+x or ctrl+shift+x. (the paging through thing could be a separate bug unrelated to this pr)

@saep saep changed the title Remove duplicates from history completer Remove duplicates from file-based history search Feb 3, 2024
@saep
Copy link
Contributor Author

saep commented Feb 3, 2024

I've read through a lot of the source code and I think that this is a reasonable short term solution to remove the duplicates from the file-based history.

If we want to avoid adding duplicates to the history file in the first place, the backend for that has to be architected differently. Since this seems to fall into the scope of #735, I`ll add some of my observations and suggestions there.

@stormasm
Copy link
Contributor

stormasm commented Feb 3, 2024

@saep thanks ! this looks great 👍
I tested it out and it appears to work fine with very minimal code changes...
We have a release coming up in a few days --- so we will hold off on landing this fix / change until after the release...
After the release passes on Feb 6th I will circle back and let other folks take a look at this change as well if they want to --- and do any testing as well...
but to me this looks like an excellent change to our code to avoid the history items being repeated....

@ClementNerma
Copy link
Contributor

I strongly disagree with the idea of removing duplicates from the history. From a Ctrl-R research sure, that makes sense. But not in general.

There are multiple reasons to that, the most obvious being backtracing: I want to be able to see what commands I executed in order without any missing. It's especially useful when trying to redo a set of instructions in order. That happened to me multiple times and I don't think it's that rare of a scenario either.

So I think duplicates should not be removed from history, only from the fuzzy-finding search.

@stormasm
Copy link
Contributor

stormasm commented Feb 3, 2024

@ClementNerma point well made and I understand where you are coming from but I am still leaning towards removing the duplicates...

But this shows why we want to give the user the ability to do their own History implementation --- here is yet another example of having removing Duplicates as the Default scenario --- but users like yourself who want to keep the duplicates can easily replace the Default History Trait or something to that effect with your History Trait which keeps the duplicates.

Or your history Trait which stores to a database like Sled instead of Sqlite etc...

Or whatever other cool History features other developers will come up with that is more useful for their type of scenario.

@stormasm
Copy link
Contributor

stormasm commented Feb 3, 2024

I want to be able to see what commands I executed in order without any missing.

It already does not do this... If you type ls three times or version three times in a row it is only shown once....

@ClementNerma
Copy link
Contributor

Maybe it would be interesting to see what Nushell's choice would be on that matter, implement it as the default behaviour, and make it configurable for whoever wants to customize it?

@fdncred
Copy link
Collaborator

fdncred commented Feb 3, 2024

I want to be able to see what commands I executed in order without any missing ...

In the sqlite history, if we were only keeping unique commands, i'd want to update the time date stamp each time a duplicate command is executed and ensure our history pulls from the most recent.

I don't want to type ls and then hit up arrow and get some other command. obviously, ls would be duplicated because i use it all the time, but i agree that it's important to have the history order intact. although, i'm not sure i care about old commands, but you make a good point here @ClementNerma. You're kind of convincing me to your way of thinking. Thanks.

@fdncred
Copy link
Collaborator

fdncred commented Feb 7, 2024

We talked about this today in our meeting (Feb 7th, 2024). This is how we decided that we'd like it to work. Maybe it's what you've already done. I'm not sure.

  1. We do not want to remove duplicates from our history.
  2. We want to do a "special" type of filtering that works like the example below.

Example: Here's the items in the history

git stash drop
git add .
git status | something | else
git status
git commit -m 'something'
ls
git push
git status

a. Hitting up with git s<tab> shows git status (since it was the last command)
b. We decide we don't want the plain "git status" so we hit up again and it shows git status | something | else but does not show the git status proceeding it.

  • does that help you understand the direction we'd like to go?
  • do you have any questions about it?
  • would you be willing to modify your PR to work like this?

@stormasm
Copy link
Contributor

stormasm commented Feb 8, 2024

@fdncred I am going to go ahead and close this PR ?

Now that you have documented a future PR the next developer can start from a clean slate...

In case anyone wants to move forward in the future they can review what you have written....

@fdncred
Copy link
Collaborator

fdncred commented Feb 8, 2024

@stormasm I'd leave that up to @saep to decide the best way to move forward.

@fdncred fdncred marked this pull request as draft February 8, 2024 13:11
@saep
Copy link
Contributor Author

saep commented Feb 8, 2024

I've rebased on main and added a test specifying your wanted behavior. For that, I had to move the duplicate tests back to the history module.

Since I've removed the call to Historys count function, this implementation might be less optimal for the sqlite backend. However, it works and optimizing that could be done in a separate pull request. Possibly by adding an option to the search.

@saep saep marked this pull request as ready for review February 8, 2024 18:00
@stormasm
Copy link
Contributor

stormasm commented Feb 9, 2024

@saep Thanks for making the modifications !
Excellent job 😄

@fdncred I tested this PR and it works as I believe we want it to work...
Would you mind testing it as well...
To see if this is what you explained...
Then I believe we can go ahead and land it...

@fdncred
Copy link
Collaborator

fdncred commented Feb 9, 2024

I tested it with cargo run --example demo and cargo run --example demo --features=sqlite,bashisms which, I think should test the FileBackedHistory and SqliteBackedHistory, and it all worked well. Thanks for this PR!

@fdncred fdncred merged commit ab1b47e into nushell:main Feb 9, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants