Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: Lua plugins #2203

Closed
wants to merge 6 commits into from
Closed

Proof of concept: Lua plugins #2203

wants to merge 6 commits into from

Conversation

sharkdp
Copy link
Owner

@sharkdp sharkdp commented May 29, 2022

This is a proof-of-concept implementation of something that we discussed amongst the maintainers recently: a plugin mechanism for bat. The idea of this PR is to get the discussion started. And to answer a few questions. Most importantly: do we really want/need this?

This whole idea came up because there are a lot of feature requests that keep popping up on bats issue tracker that we previously always declined, mostly because we wanted to keep bat focused on its core functionality. And because of the additional maintenance overhead:

On the other hand, most of these functionalities are actually quite useful! A lot of workarounds have been suggested. Lots of people wrote their own wrapper functions. And with bat-extras, we even have a dedicated project working on extending bats functionality. The "problem" with all of these solutions is that they are not directly included in bat, so you need to remember the name of the wrapper scripts. And you need to install them separately.

Here, we try to follow a different approach: instead of forcing users to wrap bat, we enable users to add functionality to bat by providing a plugin mechanism. We could still maintain those plugins here in this repository. But (1) we could easily argue that those plugins are not part of bats core functionality, especially if all of them would be opt-in (2) it would be easier to maintain this functionality... or to throw it out again. And (3), users could easily customize those plugins or write their own.

In this PoC, plugins can be enabled by adding lines like the following to your bat config file:

--load-plugin "uncompress.lua"
--load-plugin "curl.lua"
--load-plugin "directories.lua"

Plugins are written in Lua. For now, there is just one type of plugin: a "preprocess" plugin that can modify the input to bat. These plugins have to provide a preprocess function that will be called for every input path to bat. The plugins then have the option to return a different path instead. This can be used to hand back paths to temporary files with the output of the preprocessing. For example, directories.lua looks like this:

function preprocess(path)
    if is_dir(path) then
        tmpfile = os.tmpname()
        os.execute("ls -alh --color=always '" .. path .. "' > '" .. tmpfile .. "'")
        return tmpfile
    else
        return path
    end
end

Obviously, this is not a great plugin architecture. It's extremely restrictive. And the interface between bat and the plugins is very limited. I could easily imagine a lot of different things that we would like to expose to the plugins. Or functionality that we would like to provide (e.g. to add new command-line options to bat). But it's somewhat surprising that this simple interface is (up to a few limitations) powerful enough to handle four of the five feature requests listed above:

Showcase

gzip support:
image

CURL support:
image

bat <directory> support
image

Preview binary files

image

Executable size, Benchmarks

I chose Lua because I heard that it is cheap to embed it into existing applications. And also fast. I have never programmed in Lua until today, so I would have probably have preferred another scripting language personally. But I don't think we want to ship a Python interpreter with bat.

But obviously, this still comes with some overhead in terms of binary size:

bat-master: 4.69 MB (4,685,824 bytes)
bat-2203:   5.05 MB (5,050,368 bytes)

Concerning startup time: the overhead here seems to be pretty much negligible. With three plugins activated and executed (but in the trivial "don't touch this path" mode) and 22 input files passed to bat (i.e. 3 x 22 = 66 Lua function invocations), we get a slowdown of less than a millisecond:

Command Mean [ms] Min [ms] Max [ms] Relative
./bat-master src/*.rs 5.2 ± 0.4 4.5 7.4 1.00
./bat-2203 src/*.rs 5.6 ± 0.4 4.9 7.9 1.08 ± 0.12

@Enselic
Copy link
Collaborator

Enselic commented Jun 1, 2022

Very interesting! I think you are onto something very powerful and useful here. Some high level remarks/questions:

  • A traditional plug-in system does not require explicitly enabling plug-ins, but instead uses all plugins in say ~/.config/bat/plugins. Maybe we should also behave like that?
  • I think Lua is a perfectly reasonable choice as a scripting language, but I think we should consider a "binary" based interface, so that any language and runtime could be used to write plugins. For example: plug-ins get the path in argv and print a new path to stdout. Or maybe something even more sophisticated.
  • It can be tricky to make cargo install bat also install plug-ins, which I think is necessary to handle some use cases out-of-the-box. We could maybe make some plug-ins internal/intrinsic, but then a binary architecture is probably out of the question.

@keith-hall
Copy link
Collaborator

I had similar thoughts. I like the idea in general, and have some concerns which I'd like us to discuss before we get too deep:

  • You mention Lua is good for embedding, and it means we don't have to ship a Python interpreter. I'd argue that none of us know Lua very well, so maintaining plugins and reviewing PRs is going to be hard for us. I like Martin's direction of not being tied to a particular scripting language, or even scripting. If we were to use Python plugins by default, as an example, and the user has no Python interpreter, bat could just skip processing the plugins. But I do like how startup time is barely affected in this PoC! :)
  • How will this affect our diagnostics and ability to debug issues? Would --diagnostics also append all plugin code? Seems a bit noisy? Maybe just a diff from the released plugins somehow?
  • Is it really necessary to use temporary files instead of stdout? I realize it may be in some cases, but IMO would be nice to avoid when we can.
  • I didn't quite grok why it was necessary to move from pub(crate) to pub for some struct fields. But it could be useful to pass the syntax name to the plugin, or have a plugin resolve which syntax to highlight with.

@sharkdp
Copy link
Owner Author

sharkdp commented Jun 4, 2022

A traditional plug-in system does not require explicitly enabling plug-ins, but instead uses all plugins in say ~/.config/bat/plugins. Maybe we should also behave like that?

Hm. I was thinking that plugins would be similar to syntaxes or themes. There could be builtin plugins. And in addition, there could be user plugins in ~/.config/bat/plugins. It might make sense to enable everything in ~/.config/bat/plugins automatically (since the user controls that folder). But I thought it would be good (at least in the beginning), to require opt-in for all builtin plugins. Doing the same for user-plugins would then be more consistent maybe.

I think Lua is a perfectly reasonable choice as a scripting language, but I think we should consider a "binary" based interface, so that any language and runtime could be used to write plugins. For example: plug-ins get the path in argv and print a new path to stdout. Or maybe something even more sophisticated.

You mention Lua is good for embedding, and it means we don't have to ship a Python interpreter. I'd argue that none of us know Lua very well, so maintaining plugins and reviewing PRs is going to be hard for us. I like Martin's direction of not being tied to a particular scripting language, or even scripting. If we were to use Python plugins by default, as an example, and the user has no Python interpreter, bat could just skip processing the plugins. But I do like how startup time is barely affected in this PoC! :)

So the main reason why I went for a deep integration with a scripting language as opposed to a "binary interface" is this: I believe that we would quickly want more interaction between bat and the plugins. As described in the initial post, this PR sketches just a very basic interface (path in, path out), that could very easily be implemented with an IO-based approach where we simply communicate with an arbitrary process via Unix pipes. But I could imagine that "plugin authors" would quickly be in need for a much tighter integration. For example:

  • Ways for plugins to query bats internal list of syntaxes, syntax mappings, themes, etc.
  • Ways for plugins to access bats command-line arguments
  • Allow plugins to add their own command-line arguments
  • Allow plugins to directly generate output (instead of writing to temporary files)
  • Allow plugins to modify bats appearance (e.g. set a particular --style for certain file types)
  • Allow plugins to control things like --line-range or --highlight-line. And all other settings.

  • All of this could also be implemented with a generic IO-based interface (think JSON-RPC), but then we would basically need a full support library for each plugin language.

It can be tricky to make cargo install bat also install plug-ins, which I think is necessary to handle some use cases out-of-the-box. We could maybe make some plug-ins internal/intrinsic, but then a binary architecture is probably out of the question.

Good point. I didn't think of that. My feeling is that cargo-based installations are a tiny minority, but it would be nice to further support it, of course. I mean… we already include a lot of assets inside the bat binary. It might seem strange, but we could, in theory, also embed (compressed) Lua source code for the builtin plugins.

  • You mention Lua is good for embedding, and it means we don't have to ship a Python interpreter. I'd argue that none of us know Lua very well, so maintaining plugins and reviewing PRs is going to be hard for us.

I hope that Lua is indeed easy to learn and maybe even easier to review. But it's still a very good and valid argument.

  • I like Martin's direction of not being tied to a particular scripting language, or even scripting. If we were to use Python plugins by default, as an example, and the user has no Python interpreter, bat could just skip processing the plugins. But I do like how startup time is barely affected in this PoC! :)

Hm, yes. I might be overly sensitive to performance topics. But with all of the great improvements concerning startup time in the recent past, it would be a real shame if we would fire up a python interpreter for each plugin. That would add ~ 20 ms for every plugin.

How will this affect our diagnostics and ability to debug issues? Would --diagnostics also append all plugin code? Seems a bit noisy? Maybe just a diff from the released plugins somehow?

Another good question. I'd go with including the full source code for a start.

Is it really necessary to use temporary files instead of stdout? I realize it may be in some cases, but IMO would be nice to avoid when we can.

Absolutely. I think we could and should avoid the temporary files completely. The plugins would need some way to directly write to stdin of the bat process, if they need that (see above).

I didn't quite grok why it was necessary to move from pub(crate) to pub for some struct fields.

The pub(crate) => pub thing was just me being lazy for this prototype. I needed some way to match on different InputKinds and that was the fastest way to achieve that.

But it could be useful to pass the syntax name to the plugin, or have a plugin resolve which syntax to highlight with.

Absolutely. However, instead of passing everything that could be remotely interesting to the plugin, I think it would be cool if the plugins access this information somehow by explicitly asking for it (see the "deeper integration" topic above). The way I thought this would work is that we would expose certain functions via the Rust-Lua FFI.

@sharkdp
Copy link
Owner Author

sharkdp commented Sep 2, 2022

@Enselic @keith-hall Do you think this is worth exploring further? Honest opinion please :-)

@Enselic
Copy link
Collaborator

Enselic commented Sep 3, 2022

My 100% honest opinion is that I don't want to increase the maintenance burden of this project with Lua scripts, or anything else that is not Rust :)

I do think that a good plugin-infrastructure has the potential to decrease the long-term maintenance burden of this project, which would be great, but I don't think it should be Lua based.

Sorry for being somewhat discouraging, but I interpreted your request for honesty as a sincere request, so I wanted to be honest :)

@sharkdp
Copy link
Owner Author

sharkdp commented Sep 4, 2022

Sorry for being somewhat discouraging, but I interpreted your request for honesty as a sincere request, so I wanted to be honest :)

Yes, of course. Thank you.

My 100% honest opinion is that I don't want to increase the maintenance burden of this project with Lua scripts, or anything else that is not Rust :)
I do think that a good plugin-infrastructure has the potential to decrease the long-term maintenance burden of this project, which would be great, but I don't think it should be Lua based.

I don't think that adding plugins will ever decrease the maintenance burden. Plugins will - hopefully - allow us to improve the functionality of the project. But maintenance-wise, they will almost certainly increase the effort. If easier maintenance is the only goal of this project, we should not accept any new features (which is also a valid approach). And certainly not a plugin-architecture that allows for a manifold of new features (and bugs).

So I'm not sure if it's fair to evaluate this purely from a maintenance-related perspective. First and foremost, I would like to evaluate this from a user-perspective:

  • Is this new functionality something that users want? I certainly think so, given the amount of linked tickets in the top post
  • Is a plugin architecture something that users want? Well, I'm not so sure. Probably, some of them would like to see all of those features directly included in the main application. On the other hand, some users might like the ability to create new plugins or modify existing ones. A plugin architecture certainly also allows us so prototype new features quicker, which should be a positive thing from a user-perspective.
  • Is a scripting-language plugin architecture something that users want? Or would they prefer a Rust-based architecture? Again, not sure. But I think that most users would probably prefer a scripting language (even one they don't know) compared to a much harder-to-learn system programming language. I think that users would also prefer to have an easy way to create/modify plugins (simply edit a text file), without having to install an entire Rust toolchain. And without having to re-compile their plugins (into shared libraries?) on each change.
  • Assuming we want a scripting-language-based plugin architecture: is Lua the best choice? From a user perspective: I don't think so. There are certainly much more people familiar with Python or JavaScript than with Lua. However, the big downside is the execution time (as detailed above). A large delay in bats startup time is something that I myself - as a user - would not tolerate.

Coming back to the maintenance concern: would it help if we move all plugins to separate repositories (similar to how it works for Sublime syntaxes) and let them be maintained independently, by the respective authors? Instead of directly integrating them in our project, we could simply provide a list/registry of plugins somewhere... and users would simply install them by git cloneing the respective plugin repos into their ~/.config/bat/plugins folder.

@Enselic
Copy link
Collaborator

Enselic commented Sep 4, 2022

What you say make a lot of sense. What it boils down to for me personally is that I don't want to read or write Lua to solve problems in bat. I sloppily framed this as "increased maintenance". But I know very well that no one expects me to read or write Lua, so I would say my concern should not have too much weight here. It was just some spontaneous feedback. Merging this will not prevent me from continue to contribute to other parts of bat, so if that's what we end up doing, that's fine.

Thank you for reaching out for feedback and for elaborating on your position.

@keith-hall
Copy link
Collaborator

  • A large delay in bats startup time is something that I myself - as a user - would not tolerate.

Agreed. Is there a way to avoid starting the plugin host on each bat invocation - at least for the (common?) case where no plugins would need to do anything with the input? Some type of configuration of when the plugin would apply?

would it help if we move all plugins to separate repositories (similar to how it works for Sublime syntaxes) and let them be maintained independently, by the respective authors? Instead of directly integrating them in our project, we could simply provide a list/registry of plugins somewhere... and users would simply install them by git cloneing the respective plugin repos into their ~/.config/bat/plugins folder.

I personally worry about then having no control over the "base" set of plugins, which could make it hard to triage bug reports. Maybe having the diagnostic list all the installed plugins, but that could ad a lot of noise.

Those concerns aside, I like the idea in general, it definitely helps solve a lot of user requests. But at the same time, I feel those could also be solved with something external to bat, like rsop.

So my "final verdict" : I'm undecided, but I'm happy to go with the flow. Adding plugins will no doubt be an interesting challenge / experience if we go for it.

@sharkdp
Copy link
Owner Author

sharkdp commented Sep 4, 2022

  • A large delay in bats startup time is something that I myself - as a user - would not tolerate.

Agreed. Is there a way to avoid starting the plugin host on each bat invocation - at least for the (common?) case where no plugins would need to do anything with the input? Some type of configuration of when the plugin would apply?

I thought about that as well. But that would probably make plugins (a bit?) less powerful. Because the functionality of when to call the plugin would have to be done in the core bat application (e.g. via something like --map-extension-to-plugin '*.gz:uncompress). Or, with a plugin-specific config file, similar to:

[bat-plugin]
name = "uncompress"
description = ""
filetype_filter = "*.gz"

That could work quite well for some plugins. But the "directories" and "curl" plugins above, for example, would not fit this pattern. And immediately call for an extension of the config format.

I personally worry about then having no control over the "base" set of plugins, which could make it hard to triage bug reports. Maybe having the diagnostic list all the installed plugins, but that could ad a lot of noise.

Agreed. We will probably receive lots of falsely targeted bug reports in this repo if we go down that route, similar to bugs in the syntaxes.

Those concerns aside, I like the idea in general, it definitely helps solve a lot of user requests. But at the same time, I feel those could also be solved with something external to bat, like rsop.

True. But requires users to know about two tools. And gives them less control over the specific options passed to bat.

So my "final verdict" : I'm undecided, but I'm happy to go with the flow. Adding plugins will no doubt be an interesting challenge / experience if we go for it.

Ok, thank you both for voicing your concerns. I suggest we close this topic for now. Focus on bats core strengths. And we can always come back to this (or a better) idea, if we believe that we really need a plugin system in bat.

@xeruf
Copy link
Contributor

xeruf commented Nov 18, 2023

Just a note: If this ever gets implemented, the plugin should be able to return multiple file paths for one, such that one can view the contents of files in a compressed directory for example, this would be amazing!

I already maintain a script that does lots of this kind of preprocessing, but the pagination often ends up suboptimal because bat then receives the output as a stream rather than individual files: https://github.com/xeruf/dotfiles/blob/main/.local/bin/scripts/b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants