Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve history API #680

Closed
wants to merge 18 commits into from

Conversation

ClementNerma
Copy link
Contributor

@ClementNerma ClementNerma commented Dec 7, 2023

This PR reworks the history API, and brings several improvements:

  • In the History trait, adding and replacing entries are now two distinct methods, reducing confusion
  • HistoryItem values always have an ID, reducing confusion
  • History values are responsible for generating IDs for new entries
  • FileBackedHistory now longer has duplicate IDs in the history

FileBackedHistory as well as SqliteBackedHistory now generate IDs randomly, using rand's SmallRng. This RNG is created from a system-provided source of entropy during the history creation, and is designed to generate numbers quickly.

This PR induces two main downsides:

First, creating a FileBackedHistory or SqliteBackedHistory may now cause a slight delay for gathering entropy. I tested on two machines and got less than 1s for 1 million RNG creation, but there may be edge cases where it takes longer).

Also, the History API contains breaking changes. In the current context, I don't consider this to be a large problem as no one seems to use this API anyway - up until two days ago, it was impossible to implement your own History due to HistoryItemId not being constructable from the outside.

Finally, a new dependency is pulled: rand. It only uses the small_rng and getrandom features and so it's pretty minimal, but it's still another dependency. It is used for non-clashing history item IDs generation.

Overall, this PR aims to make the API easier to write, read and use, while also reducing confusion in how the history works. It also removes what could be a bug in the future in FileBackedHistory which previously could have duplicate IDs for entries.

@ClementNerma
Copy link
Contributor Author

Hmmm seems like some tests don't pass because the IDs are expected to be contiguous. This wasn't mentioned in the docs, I'll see what can I do about this.

@fdncred
Copy link
Collaborator

fdncred commented Dec 7, 2023

Thanks for the PR. We're always interested in making reedline better.

However, someone is going to have to convince me that creating a FileBackedHistory or SqliteBackedHistory now generate IDs randomly is going in the right direction.

@sholderbach
Copy link
Member

The goal of this is still a bit unclear to me, you describe changes but not what is problematic with the current system of ids.

Having the IDs private before #677 allowed us to guarantee that those possibly implicit guarantees are upheld.

Maybe worth writing out what the current IDs all mean, what requirements we impose to implement session isolation/cwd aware hints etc. The constraint of the existing text file format only storing the command needs to be upheld for Nushell backwards compatibility unless we have a clear story,

src/engine.rs Show resolved Hide resolved
@ClementNerma
Copy link
Contributor Author

Before I go further and rework the histories' internals, @fdncred may I have your opinion on this?

@ClementNerma
Copy link
Contributor Author

ClementNerma commented Dec 7, 2023

@fdncred and @sholderbach thanks for your answers!

The goal of this is still a bit unclear to me, you describe changes but not what is problematic with the current system of ids.

Having the IDs private before #677 allowed us to guarantee that those possibly implicit guarantees are upheld.

Maybe worth writing out what the current IDs all mean, what requirements we impose to implement session isolation/cwd aware hints etc. The constraint of the existing text file format only storing the command needs to be upheld for Nushell backwards compatibility unless we have a clear story,

Having the IDs private prevented anyone to make a custom History. Which is a problem, because having a basic file or SQLite implementation doesn't cover anyone's needs.

For me, using random IDs is the best choice: we don't make any assumption on what an ID represents - it is only that, an ID. Not an index, or anything else.

It can be generated in a variety of manners depending on the needs. For an in-memory history for instance, it could be a simple counter. In most cases, a random generator (like I used) would be the best choice to avoid ID re-use.

With this PR we have a system where IDs are black boxes, and are only used to refer to a specific history item - nothing more.

Navigation in the history, ordering etc. is the responsibility of the History itself.

This will allow anyone to build a custom History without having to worry about:

  1. How the IDs can or cannot be generated
  2. How they should be handled
  3. What are guarantees they hold
  4. What guarantees the navigation system (History::navigate) must satisfy based on the provided IDs

@fdncred
Copy link
Collaborator

fdncred commented Dec 7, 2023

By the pattern of PRs, I'm beginning to wonder if me merging those 3 other ones was the right thing to do. I'm wondering if you separated them to help us or if you're just sending PRs as you discover the need instead of a holistic approach. This PR in particular seems like quite a big departure. I'm wondering there are specific things that could be improved with our history system without breaking it this way.

I hope that doesn't sound mean or rude, but we have to consider what's best for reedline but also for nushell first because reedline was created for nushell.

As far as the opinion you're wanting from me, I'm not sure what you're asking about. If it's about sholderbach's comments, then, when it comes to reedline, I'm always happy to agree with him due to his vast knowledge of reedline and how reedline is used in nushell along with his general expertise with Rust.

Again, we're excited to have people excited about reedline and trying to make it better. So, thank you for that.

@sholderbach
Copy link
Member

Mhh to me what the capabilities and requirements for your History implementation are, is still very vague.
What would prevent you from implementing that inside reedline itself, where we can more easily control the assumptions of the overall interface, before preeemptively making everything public, making it much harder to reason about what has to hold for all the other components to properly work.

Reedline needs to improve in this area for Nushell as well, but designing against a vague outside target requires us to dedicate time to reason about much more as to not box in our Nushell-internal design space. From our perspective having a locked trait is perfectly fine.
It seems based on your recent PRs that you don't yet have a working History implementation due to the private things, maybe worth spinning this up in your reedline fork for us to take a shared look at after it satisfies your requirements.

@ClementNerma
Copy link
Contributor Author

By the pattern of PRs, I'm beginning to wonder if me merging those 3 other ones was the right thing to do. I'm wondering if you separated them to help us or if you're just sending PRs as you discover the need instead of a holistic approach.

This was indeed the case in the beginning, up until the moment the API was entirely usable from the outside (my last PR).

Now I have a pretty solid idea of what the history's API should look like (in my opinion). This PR isn't the first of many others like the previous ones, it is the final changes (more important this time) to re-design part of the history API as well as its internals.

This PR in particular seems like quite a big departure. I'm wondering there are specific things that could be improved with our history system without breaking it this way.

I don't think so - I personally think using HistoryItemId to hold indexes is wrong and error-prone. Also, the current design with save being used to either insert a new item or replace it depending on weither its id property is set is confusing. Same thing for the fact that serializing / deserializing will require to .unwrap() the id afterwards.

I hope that doesn't sound mean or rude, but we have to consider what's best for reedline but also for nushell first because reedline was created for nushell.

Absolutely, and I think it would benefit nushell as well by making the history API easier to use, which would result in nushell's code related to this part easier as well.

As far as the opinion you're wanting from me, I'm not sure what you're asking about. If it's about sholderbach's comments, then, when it comes to reedline, I'm always happy to agree with him due to his vast knowledge of reedline and how reedline is used in nushell along with his general expertise with Rust.

Oh ok, I thought you were the main maintainer as you merged my last PRs.

Again, we're excited to have people excited about reedline and trying to make it better. So, thank you for that.

You're welcome! I really want to make reedline the best possible both for nushell and for external projects ; I think it's an amazing tool but with great limitations (most related to terminal emulators, like the famous resizing problems). Making it as usable and complete as possible would allow it to become a standard as a line editor crate, and would greatly simplify many projects that require this kind of tools.

I think there are still areas where reedline can improve: auto-inserting closing symbols like quotes and parenthesis could be interesting for instance. But overall I feel like it's mostly a finished "product" which only requires some polishing and one breaking change: the one suggested in this PR.

@ClementNerma
Copy link
Contributor Author

ClementNerma commented Dec 7, 2023

Mhh to me what the capabilities and requirements for your History implementation are, is still very vague. What would prevent you from implementing that inside reedline itself, where we can more easily control the assumptions of the overall interface, before preeemptively making everything public, making it much harder to reason about what has to hold for all the other components to properly work.

Because why keep anything private when we can make it customizable from the outside? The PR I'm proposing doesn't make the history system anymore complex than it already is ; in fact I'd argue it is simpler this way. IDs are black boxes and we don't make any assumption about them - that's simpler than making assumptions everywhere about what they mean and taking the risk of introducing bugs if we handle them the wrong way somewhere.

Reedline needs to improve in this area for Nushell as well, but designing against a vague outside target requires us to dedicate time to reason about much more as to not box in our Nushell-internal design space. From our perspective having a locked trait is perfectly fine. It seems based on your recent PRs that you don't yet have a working History implementation due to the private things, maybe worth spinning this up in your reedline fork for us to take a shared look at after it satisfies your requirements.

This PR will contain the updated code for FIleBackedHistory notably (I'm currently working on it), resolving multiple issues this struct had, notably IDs clashing. I had several situations where the history didn't keep track of some informations because of that.

It's very hard to fix it without changing the underlying file format as well as the way HistoryItemId are handled.

@ClementNerma
Copy link
Contributor Author

ClementNerma commented Dec 7, 2023

Adding to this:

It seems based on your recent PRs that you don't yet have a working History implementation due to the private things, maybe worth spinning this up in your reedline fork for us to take a shared look at after it satisfies your requirements.

You will be able to see it in ReShell (which I briefly talked about in another PR). This one will be based on sled , which I feel like is a LOT better than using a basic file or using an SQLite database.

I'd argue it would even be a better idea to replace both FileBackedHistory and SqliteBackedHistory with one based on sled but that's another matter and I think it wouldn't be a good idea to remove existing history implementations anyway.

Copy link

codecov bot commented Dec 8, 2023

Codecov Report

Merging #680 (eff5630) into main (ff5fdb0) will increase coverage by 0.72%.
The diff coverage is 86.20%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #680      +/-   ##
==========================================
+ Coverage   49.17%   49.90%   +0.72%     
==========================================
  Files          46       46              
  Lines        7924     8086     +162     
==========================================
+ Hits         3897     4035     +138     
- Misses       4027     4051      +24     
Files Coverage Δ
src/history/base.rs 84.58% <100.00%> (+0.63%) ⬆️
src/history/cursor.rs 99.06% <100.00%> (+0.04%) ⬆️
src/history/item.rs 51.02% <100.00%> (+6.12%) ⬆️
src/history/file_backed.rs 83.09% <95.54%> (+1.15%) ⬆️
src/engine.rs 4.99% <0.00%> (-0.01%) ⬇️
src/history/sqlite_backed.rs 74.40% <77.38%> (-0.35%) ⬇️

@ClementNerma
Copy link
Contributor Author

CI now passes 🥳

So, there is one big problem with this PR: the underlying data (text file for FileBackedHistory and database file for SqliteBackedHistory) have changed. Text file has a slightly different format (to store IDs), and the database has one new column to store the indexes, required to order rows.

This means there needs to be a migration for older files - this can be done easily with a simple function call though. But this needs a migration is required. I'm currently searching for a solution to avoid this.

@ClementNerma
Copy link
Contributor Author

Ok so I managed to resolve the problem for the SQLite database: by default the user_version pragma is always 0 and it is always retrieved during startup.

So I perform a simple check:

  • If user_version = 0 (the default):
    • If the history table does not exist, we create it using the new schema
    • Otherwise we migrate its data to the new schema, keeping existing IDs
    • Then we set user_version to 1
  • If user_version = 1, this means the schema exists and is the latest version
  • If it's another value, we fail (just like before)

This way, existing databases will be migrated seamlessly and there won't be the slightest performance hit once the migration has been done.

@ClementNerma
Copy link
Contributor Author

The PR is now complete. Some tests still need to be added to check edge cases and how the migration behaves in different scenarios, but all the changes and features are fully implemented.

To sum up a little bit: the main change is the History trait changed. This led to changes in how FileBackedHistory and SqliteBackedHistory behave, which then led to a change in their underlying format (text file format / SQL schema).

Concretely, the new system should be invisible to users, apart from the API change (which is trivial to adapt if they want to keep the same behaviour as they currently have).

Both history structs now use an internal rand::rngs::SmallRng whose entropy is gathered from the operating system. This allows to generate random numbers quickly, more than with a cryptographically-secure RNG which is not required in the current context. Given the History trait is responsible for the generation, anyone can use its own RNG if required, but the default should cover almost all possible cases.

A slight initialization cost will be induced by the entropy gathering, but it should be quick enough to be unnoticeable (less than 10 µs (microseconds) average on two of my machines in release mode).

Performance will be slightly lower given the additional need to handle IDs, but it should be low enough to be entirely unnoticeable. Some benchmarks could be conducted on these changes but I'm confident it won't be perceptible to end users.

Safety is increased ; some long-standing bugs have been resolved in this PR, including history truncating creating IDs collision. String escaping has also been added to the SQLite backend when searching with a pattern (using a LIKE pattern).

History backends still mostly behave as they used to: FileBackedHistory still uses a line-based text file format, and SqliteBackedHistory still uses a SQLite database with a single history table and the same columns, aside from one new column used for indexing.

FileBackedHistory gets a compatibility layer ; it now uses a new line format to include the ID inside, but will still support the old file formats without any changes. Old-format and new-format lines can be mixed together inside the same file.

SqliteBackedHistory gets a migration process ; the first time the database is opened, it will migrate the previous history data to a new one using a transaction. This will only happen once, and should be almost instantaneous even with a large history. Interruption of the migration process should not leave the history in a corrupted state, as the transaction will rollback.

Tests are mostly unchanged, they have been slightly updated to reflect the new API but are otherwise identical - so the code coverage doesn't changes and the tests are still as reliable as previously.

An interesting add in the future could be to add, either as a builtin or as an example, an History backed by a sled database - which is perfectly fit for this case, more than both a text file or an SQLite database. The existing structs will be kept anyway for compatibility, especially considering the SqliteBackedHistory is currently used by nushell.

@fdncred and @sholderbach may I have your opinion on this? If you agree with it, I'll write the new tests to ensure everything is working perfectly (notably the migration process), and then we'll be able to merge this PR.

Copy link
Collaborator

@fdncred fdncred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to try and improve reedline and for the summary at the bottom, it helps.

I see some good work in here that on some level makes dealing with history more intuitive. I also see some things that I don't agree with and/or have questions about.

I'm really concerned about landing this on many levels. Here's some of what I'd need to feel better about in order for me to vote yes to land this PR. I'm not quite sure how to get to a good level of comfort. Suggestions welcome.

  • I need to feel good about how it's NOT going to affect nushell users.
  • I need to feel certain that the changes made here are going in the right direction of where we want to take reedline in the future.
  • I need to feel like this is better than what it was before.
  • I need to feel like our tests cover as much as possible in order to allay my concerns.
  • I think we need to have some consensus of a few nushell core-team members (up/down vote) since this is such a large PR and really changes everything related to history.

// History is "full", so we delete the oldest entry first,
// before adding a new one.
self.entries.pop_front();
self.len_on_disk = self.len_on_disk.saturating_sub(1);
let first_id = *(self.entries.first().unwrap().0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of unwrap. I'd like to see them all removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have an idea of how to get rid of it - a previous instruction checks if we have a last element. Unfortunately, IndexMap doesn't directly implement .pop_front() (which is a decision on their own) so we need to get the first ID somewhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not objecting to getting the first ID somewhere. I'm objecting to using unwrap. I'd guess that you can match on it and handle errors and use map_err if the error is a different flavor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be misleading. Using a .map_err() in a situation we've proven to meet the condition required for unwrapping may lead other developers to think there is a case where this could fail - while it can't, that's why I used .unwrap() (which I always use very sparingly).

Writing .first().unwrap() is more readable, less confusing, and more correct as if we don't meet this condition something's very wrong and we cannot continue processing anything because some of the code is completely invalid.

src/history/file_backed.rs Outdated Show resolved Hide resolved
}

/// this history doesn't replace entries
fn replace(&mut self, h: &HistoryItem) -> Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of replace if neither FileBacked or SqlBacked are using it? Also, seems like the name should be update vs replace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should use it, but this hasn't been implemented yet. For now I've left these parts untouched.
I feel like changing their behaviour would be a bad idea right now as it would change how Nushell behaves.

The method separation is useful for clarifying behaviour, and here we can clearly see a "feature" is missing, while it wasn't obvious at all previously.

For histories that support replacing, this will be useful. I added it because reedline requires histories to replace history items. It could have been a call to update, but it's how reedline is designed, so I preferred to not change that.

src/history/sqlite_backed.rs Outdated Show resolved Hide resolved
"
create table history (
idx integer primary key autoincrement,
id integer unique not null,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the point of having an idx primary key with autoincrement which is guaranteed to be unique and also having an id.

Also, if you're going to add new columns, this is why more_info/ExtraInfo exists. It was put there to future-proof the db schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The starting point of this PR is that IDs should be random. So we have a query that tells "get the X next entries in the history starting from item Y" and we don't have a timestamp, we need an index for that.

Random IDs don't enable new features, they simplify how we think about IDs, and also solve some longstanding issues like ID clashes in filled FileBackedHistory.

I thought about more_info but 1) I feel like every structured information should be strictly explicit and typed and 2) we can't create an index on this if we put an index and other things together in it. Also, I wasn't sure weither Nushell was using it or not.

Also, this makes it more composable for future updates: if Nushell wants to use this field for other things, it won't have to deal with an index already being contained inside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, one nice "side-effect" of this PR is that the migration process is re-usage ; meaning that if Nushell decides to store new informations in the future, it will be trivial to add new data by simply re-using the same migration codebase.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The starting point of this PR is that IDs should be random. So we have a query that tells "get the X next entries in the history starting from item Y" and we don't have a timestamp, we need an index for that.

Still don't get it. It seems odd to have a primary key that we essentially don't use because we have a different "primary key" not called a primary key that we're using.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We absolutely use the id column: that's the one used for inserting, updating, removing, and fetching indexes. Everything is based on that column, idx is only used for a specific situation: when we want to query >1 items from an existing one in either direction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, good. maybe it's the name that i'm hung up on then. id and idx are very similar. maybe query_idx or something similar. it's just not intuitive having such similar names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah renaming the column makes sense, I'll do that.

src/history/sqlite_backed.rs Outdated Show resolved Hide resolved
@ClementNerma
Copy link
Contributor Author

I need to feel good about how it's NOT going to affect nushell users.

This PR is designed to not affect everyone in a perceptible manner. All the changes are hunder the hood. The only difference is the API revamp which needs a little change (concretely, we're talking propably less than a minute of code update, that's really a detail).

I need to feel certain that the changes made here are going in the right direction of where we want to take reedline in the future.

That entirely depends on your vision of the project, but I feel like reedline would be the best at being both a reference implementation and solid ground for Nushell AND being a nice line editing library to use in custom shells.

I need to feel like this is better than what it was before.

I feel it is way better, but that's a personal take. We now have a clearer ID system, less error-prone, a cleaner API.

I need to feel like our tests cover as much as possible in order to allay my concerns.

Absolutely, which is why I said I would write the new tests if this PR is approved (what I mean by that is if you're okay with merging the PR at the condition of it being correctly tested, I'll write the tests).

I think we need to have some consensus of a few nushell core-team members (up/down vote) since this is such a large PR and really changes everything related to history.

Well, yes and no actually. It mostly works like previously, it's mostly under-the-hood changes. Especially the SQLite backend, which Nushell uses, uses an almost identical system: you just have one more column for indexes, and that's all.

But I understand your concern and given reedline's under the "reponsability" of Nushell it would be a good thing to get an approval from Nushell's team.

@fdncred
Copy link
Collaborator

fdncred commented Dec 8, 2023

Well, yes and no actually.

Clearly, this is a breaking change and changes a ton of guts. The history is all dealt with differently now that you've added a new index column. You try to maintain backwards compatibility as much as possible, which is appreciated, but any reedline user will have to deal with your changes if we accept this PR. There's no getting around it. I understand your point though.

@ClementNerma
Copy link
Contributor Author

Well, yes and no actually.

Clearly, this is a breaking change and changes a ton of guts. The history is all dealt with differently now that you've added a new index column. You try to maintain backwards compatibility as much as possible, which is appreciated, but any reedline user will have to deal with your changes if we accept this PR. There's no getting around it. I understand your point though.

This PR is designed to make changes seamless. Everything that worked previously still works now. If we list all the things that change for users, it is:

  • The new (breaking) API for History
  • The fact that the new SQLite schema and the new text file format can contain new informations
  • The fact that the SQLite database now uses unordered IDs

No one has implemented a custom History before given the previous visibility of some items, so the only real problem that may arise is the fact the text file format changes (the SQLite schema is backwards-compatible after all, aside from the ID column).

But I think it is such a small number of users with such a minimal impact that this isn't a major problem ; it would be a bigger problem if Nushell used the text file format as some tools may have relied on it ; but fortunately it uses the SQLite database instead.

@stormasm
Copy link
Contributor

stormasm commented Dec 8, 2023

@ClementNerma we are happy you are taking the time to review and understand our code better...

It is going to take our team some time to digest and understand the ramifications of your changes as well as have some internal discussions about the work you have done so that we can move forward and give you some feedback.

I will review your code as well and get a better understanding of it...

My first question for you is this ?

In our current history implementation by default we save the same entry numerous times.

For example if I type

version
ls
version
ls
version
ls
history
ls
history
ls

We will save all of those ls entries as well as all of the history entries and version entries.

I would like to have one ls and one history and one version across my entire history file...

How easy would it be to do this with your current solution just out of curiosity ?

Thanks for your work --- we need folks like yourself who are willing to take on the work and stay with it over time...

Our concern is that you come in and make a bunch of changes which we are not totally able to understand the motivation for them and then we are responsible over the longer term maintaining that code...

Hope you understand and are willing to work with us on the reedline communities behalf.

@ClementNerma
Copy link
Contributor Author

@ClementNerma we are happy you are taking the time to review and understand our code better...

It is going to take our team some time to digest and understand the ramifications of your changes as well as have some internal discussions about the work you have done so that we can move forward and give you some feedback.

Of course! It's not a trivial change after all.

We will save all of those ls entries as well as all of the history entries and version entries.
I would like to have one ls and one history and one version across my entire history file...
How easy would it be to do this with your current solution just out of curiosity ?

The SQLite backend can just perform a full-string search - check if the history table already contains a row with the provided command.

The file backend would require to perform a full search on the history. A way to avoid that could be to construct an in-memory HashSet in parallel of the IndexMap used for the entries, but it would increase memory usage up to 2x.

Side note: I looked at a good chunk of reedline's code and saw it was very intelligently optimized in many places. So it seems like performance is really an important part of its design. In which case, adding a third (optional) backend for the history could be greatly beneficial: basing a new one on sled would support both increasing performance compared to SQLite (notably no more SQL parsing to perform) as keeping a low memory usage just like the file backend as most of the data would only stay on the disk.

If we go this way, it will be the subject of another PR, but I think it's good to keep that in mind.

Thanks for your work --- we need folks like yourself who are willing to take on the work and stay with it over time...

Our concern is that you come in and make a bunch of changes which we are not totally able to understand the motivation for them and then we are responsible over the longer term maintaining that code...

Hope you understand and are willing to work with us on the reedline communities behalf.

It's a pleasure to work on this project! Rest assured, if this PR is merged, I will still be here to support it if new features need to be added for instance.

I have three main motivations with this PR:

  • Make reedline's internals and public API cleaner ;
  • Ensure perfect, drop-in replacement for Nushell in terms of internals and make the API migration super easy to do ;
  • Make the history work in a way that's future proof and the least error-prone possible

Also, reedline is the base module I use for the REPL of my own shell, which is my daily driver. As much as like Nushell, I strongly prefer imperative-style programming as well as some syntax tricks and features that Nushell can't (or won't) implement. So I also have an incentive to make it better in order to benefit both Nushell and my own project ;)

@stormasm
Copy link
Contributor

stormasm commented Dec 9, 2023

@ClementNerma I reviewed your PR and now have a better understanding of the work you have done...

First of all let me say that @sholderbach is the lead person on reedline and @fdncred also knows a lot about reedline but will defer to @sholderbach for final approval on anything major like this PR.

I am just another member of the nushell core team but with no expertise on reedline...

As I stated in my previous comments we are happy to have folks like yourself who are willing to contribute to the reedline community but you should understand that we are incredibly stretched in our ability to manage all of the work that we have to deal with both in the reedline community and nushell community as a whole.

With that said the chance of this PR landing is low.

I am not going to close it because that would be up to @sholderbach or @fdncred to make the final call but in order for you to be able to be a valuable member of the community we are going to have to take a different approach.

We want you to help us maintain and make reedline better over the long term so don't take this the wrong way...

Our team is much happier with smaller more focused PRs with an end design goal that has been well thought out and discussed prior...

So I would suggest entering into more discussion in the reedline discord channel with your ideas and possibly maintaining reedline by assisting with and commenting on PRs as they come in...

Overall I like your idea of having swappable History Implementations on top of a well defined History API but in order to get there its going to take a crawl, walk, run approach which is how we like to phrase it...

Also I personally do not like having the HistoryItem IDs locked down and fixed --- The History Items IDs should be created dynamically each time Reedline fires up...

And storing the ItemID's in the history.txt file is just not good IMHO.

It bloats the size of the history.txt file dramatically and makes it almost impossible to post process the history.txt file if someone wants to use it for other purposes.

Having the very clean, nice, simple history.txt file with just the command is the way to go...

@ClementNerma
Copy link
Contributor Author

As I stated in my previous comments we are happy to have folks like yourself who are willing to contribute to the reedline community but you should understand that we are incredibly stretched in our ability to manage all of the work that we have to deal with both in the reedline community and nushell community as a whole.
...
Our team is much happier with smaller more focused PRs with an end design goal that has been well thought out and discussed prior...

I understand that, after all its just a FOSS project, not a paid product.

So I would suggest entering into more discussion in the reedline discord channel with your ideas and possibly maintaining reedline by assisting with and commenting on PRs as they come in...

Overall I like your idea of having swappable History Implementations on top of a well defined History API but in order to get there its going to take a crawl, walk, run approach which is how we like to phrase it...

Seems like a good idea!

Also I personally do not like having the HistoryItem IDs locked down and fixed --- The History Items IDs should be created dynamically each time Reedline fires up...

Allow to me disagree on that. IDs are used to differentiate items, how can you re-generate them on-the-fly? Don't forget entries can be inserted by another concurrent process interacting with the same history file / database.

And storing the ItemID's in the history.txt file is just not good IMHO.

It bloats the size of the history.txt file dramatically and makes it almost impossible to post process the history.txt file if someone wants to use it for other purposes.

Having the very clean, nice, simple history.txt file with just the command is the way to go...

I totally agree with that, but it's hard to keep it simple. How do you identify uniquely a set of entries created during runtime, knowing that you may later have to synchronize the file to disk, discover there are new entries, and insert them at a previous position in the list? We need to have something that is entirely unique to each entry, which is not possible with a plain file.

If for instance you have one ls in memory, synchronize to disk, and find another ls in a previous position - how do you identify each one?

@stormasm stormasm closed this Jan 11, 2024
@ClementNerma
Copy link
Contributor Author

ClementNerma commented Jan 30, 2024

I've thought about this PR a bit lately, and thought that we could do this another way:

What if HistoryItemId were only identifying for a single session? This would allow to greatly simplify data storage.

For instance, let's say we keep the same file format as before. We store entries one by one, and at runtime we associate each entry with an ID (could be an incremented counter). This could also be done for the SQL database (even though I'd like to keep things the way I did them in this PR).

This way we wouldn't have to change the format of the text-based format: simple lines with command text, nothing more.

What do you think about that? Would it make a merge considerable?

EDIT: I've updated my branch accordingly, tests all pass. This PR now uses the exact same file format than before.

@stormasm
Copy link
Contributor

@ClementNerma I am going to give you some preliminary ideas here --- this is just a starting point so please don't start coding yet --- lets just talk about a design going forward...

You might want to open up an issue and we can talk more about this there...

Here are my preliminary thoughts on this...

History - Menus - Completer can be referred to as HMC components

The Big Idea concept is a Complete Refactor / Rewrite of Reedline without having History / Menus / Completer (HMC)
so tightly embedded into the Engine ---- but instead API entry points into the Engine which would enable
decoupled HMC components...

What if HistoryItemId were only identifying for a single session? This would allow to greatly simplify data storage.

Yes --- I like this idea / most importantly greatly simplify the storage...

I want the whole history API to be a drop in replacement to Reedline in the future --- that is the end futuristic goal...

So imagine developers coming along and having the ability to write their own History implementation and it would
attach to our simple History API which is versatile...

Now moving on my "grand vision" for Reedline is to have the ability to have a simple Reedline with

  • No History
  • No Menus
  • No Completers

Based on an API system that allowed you to Shut OFF History / Menus / Completers

This is for example purposes only --- most people would not want to shut it off but it shows
how you can configure Reedline with your own components or a lightweight Reedline with no components...

And or drop in your own History / Menus / Completers based on a well thought out API that enabled gluing on to a core Engine that accepted these components...

That is where I would like to end up...

The History API could be step one --- but it would entail lots of design to be able to pull it out of the Engine
and put back in History API entry points...

You could create another Reedline-History crate in your own repo to do the experimentation or something to that effect...

Please start an issue so we can continue the discussion there....

Not sure if you are up for this project --- but I think it could be a cool longer term effort that would take a while to implement...

Open to all sorts of ideas / strategies / criticisms of this concept --- I am just throwing ideas out there to see what people think...

Nu-cli should eventually be able to accept the alternative Reedline as it is being developed and then eventually
swap out the new Reedline with the current Reedline version...

Just so you can see my ideas....
@maxomatic458 @ysthakur @fdncred

@stormasm
Copy link
Contributor

stormasm commented Jan 30, 2024

Just writing stuff down here as an example...

If you look at the basic example in Reedline...

let sig = line_editor.read_line(&prompt)?;

Why does the read_line method have to take a prompt ---- I never understood that design choice ?

It should just work like this...

let sig = line_editor.read_line()?;

@ysthakur
Copy link
Member

ysthakur commented Jan 30, 2024

@stormasm That would require the prompt to be stored inside the Reedline object, so mutating it from outside would become a bit tougher. It would definitely be convenient if you don't want to mutate the prompt, though.

Edit: Actually, never mind, you could just do line_editor.with_prompt(prompt) before each invocation of read_line()

@stormasm
Copy link
Contributor

@ysthakur thank you ! for thinking about the prompt topic --- and updating the idea 😄

@cactusdualcore
Copy link
Contributor

@ysthakur GIven that Prompt is a trait and not a structure with strings, it feels odd to change the actual prompt object. The trait methods are called every time it's rendered anyway. If you look at examples/custom_prompt.rs. You could mutate internal prompt state and branch the current prompt based on that (or even include such information in the prompt e.g. current time).

Yes, I think the current prompt API is not working to make that easy (you need to use interior mutability, unfortunately), but it probably is "the right thing to do".

@ClementNerma ClementNerma mentioned this pull request Feb 6, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants