-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenSearch Extensibility #2447
Comments
@saratvemulapalli thanks a lot for putting this proposal in place, I do have a couple of questions which I have not seen addressed yet (please correct me if I am wrong). First, I agree with you that extensions will not become an equivalent replacement of the plugins. For example, plugins could tailor the codecs and translog policies, new index types, a very core pieces of the OpenSearch engine, externalizing those would not only be difficult but would kill the engine performance, no doubts. Secondly, you raised the question of latency but there is no answer to it: the OpenSearch could only impose SLA bounds but have no control over latency. It will also significantly impact the availability or/and accuracy or/and consistency of the data, depending on the nature of the extension (not to forget about cycles here). Thirdly, we should keep in mind how it plays with other planned features. For example, if #1968 gets in, we are suddenly going to deal with a massively distributed system where complex core / storage / extensions remote communications may render the entire cluster unstable or completely unavailable. With that being said, I do see extensions as useful mechanism for doing certain things, in this regards I think about that as webhooks: the engine could use external hooks (extensions?) in order to provide an opportunity to enrich ingested data or notify about index changes. Is it the right analogy to think about? |
Thanks @saratvemulapalli for sharing the proposal. This really helps to get the right picture of how we are thinking about this evolution. I had few questions/clarifications:
Does this also mean we will remove the current Plugin Framework support, or the new SPIs could co-exist ? I really see a high value in keeping the new framework co-existent, and deprecate organically (only if necessary) due to the below reasons:
Not sure if this can actually be imposed as a necessary tenet for all the extensions. For example, network plugins/extensions such as Jetty/Netty might still require a restart. |
I would appreciate that @saratvemulapalli can provide more details on why the existing plugin framework is so broken that we need a totally different one. Some straw-man approaches for the problems to solve in the new framework. Versioning: Today if I want to make AD 1.3 works for Opensearch core 1.1, I would revert changes in build.gradle and build zip using 1.1, even though there are many more changes in other places between 1.1 and 1.3. So plugins are not rigid in terms of compatibility with the Opensearch core and can easily be switched among Opensearch cores. Here are the changes I made to make 1.3 AD work for 1.1 OpenSearch core.
then ./gradlew :assemble -Dopensearch.version=1.1.0 For versioning support, we can at least enable configuration to allow plugins to specify what versions of Opensearch core it is compatible with. If the versions are compatible, we can match plugins with different versions of Opensearch cores. Independence/Sandboxing: The reader in the Performance analyzer plugin are running in a separate JVM and communicate with Opensearch via RestFul API (check bin/performance-analyzer-agent-cli). To do that, the Performance analyzer plugin writes the code in the plugins and writes extra configuration files to be run by Opensearch startup script (check https://opensearch.org/docs/latest/opensearch/install/tar/). Also, we can use cgroup to limit resource usage (e.g., memory) by starting the separate plugin process in a specific cgroup. We can automate this process. security: We can create an API in a security plugin like opensearch-project/security#566 to offload all of the checks to security plugins. Other plugins just have to call this API or the plugin framework can make the check implicit for all data/setting access by creating a wrapper client for plugins to call. Inter-plugin communication: This is a solved issue (check opensearch-project/notifications#223) |
Seems don't see the analysis about why not enhance current plugin framework to solve the problems stated in this issue. What's the pros/cons between "enhancing current plugin framework" and "adding a new extension framework"? If we find it's impossible to enhance current plugin framework to solve these problems, I think it's ok to add new extension framework, but agree with @getsaurabh02 for "keeping the new framework co-existent, and deprecate organically (only if necessary)". From the description, the extension will be much limited, why not keep the flexibility for community to build plugin? I see Kaituo gave solutions for several problems of plugin framework. For "Discoverability/Dependency Management", why can't we build discover mechanism for plugin? For example, we can ask developer to register/onboard their plugin to OpenSearch "AppStore", then these plugins will be verified and eligible to discover and download/install. For inter-plugin communication, another example is the new For "Independence/Sandboxing" , I think
This looks like building some transport client in multiple languages? I see we are using Apple “AppStore” as analogy and allow use to easily install any extension on the fly. That seems challenging. When people download app from Apple "AppStore", they install to just one local iPhone and all Apps shares the same set of hardware resource. But OpenSearch cluster may have multiple nodes. When user download a new extension, should we install it on all data nodes or all nodes? It's still risk that user install too many extensions on data node even we limit each extension's resource usage, for example extension1 use 50% memory at most, extension2 use 60% memory at most, that still risky to install two extensions on same data node. So we may ask user which node they want to use to install new extension. But that may change the cluster load balance by running specific extensions on specific data node only. So the "easily install extension" seems challenging especially for production cluster. Should be ok to install new extensions on non-production cluster to explore and learn, then install and configure carefully on production cluster. |
Thanks @reta for taking a look and reading through.
With extension we would like to provide an opportunity for developers/cluster operators to choose where to run them.
Thats a great question. Sure for latency, we do not have the data at this point and we are marching towards it. The numbers would tell us what direction to go for.
Thats a great point. Do you have idea's/suggestions how to take this on?
A yes and no :). How I see extensions like pieces of code which interacts with OpenSearch (via within the process, outside of the process and remote). The framework should support all 3 mechanisms while solving different use cases. And thanks again for these questions. Lets keep the conversation rolling. |
Thanks @getsaurabh02 for taking a look and reading through.
No. The plugin architecture will be still supported, infact it will be enhanced to solve these problems. The new interfaces will still provide support for traditional way of invoking via extension points while enhancing the framework to solve the problems listed above.
💯 Totally agreed, as I've said above we are enhancing the architecture and as we see the need for deprecating the existing architecture we can discuss.
Absolutely, and this is exactly why we would like to leave the option for developers/cluster operators to choose where to run them.
+1
Thats a great point. Mostly these are modules which are loaded by default to make the communication possible. And thanks again :)! |
@saratvemulapalli Thanks for putting this together! I'd like to understand more on the Sandboxing/separate process idea?
Does this means it will be the OpenSearch admin to determine where to install the extensions, like on which hosts? Or it is still the OpenSearch process to handle the extension installation? For instance, say one extension's work is heavy and needs to be run on a different host, does the OpenSearch admin need to install the plugin manually on the designated host, or OpenSearch can be aware of the hardware available and do the installation? Meanwhile, I imagine extensions may do some work like the plugins are doing today. Since most extension points requires the core OpenSearch engine to trigger some functions in the extension in a separate (and even remote) process, will the OpenSearch engine manages the plugin topology/routing? Or can we extend and take advantage of the existing plugin framework, by adding a new role of nodes like dedicate |
@saratvemulapalli it seems like we have considerable amount of unknowns / concerns / questions regarding the extensions. The plugins do have a number of issues (see #1707 for example), but since we are not getting rid of them (please correct me if I am wrong), we would end up with 2 problems instead of just one. May be we could step back a bit and reassess the deliverables?
The limitations and problems you have described are real, but the solution may not be as clear yet (at least to me and a few other folks). Thank you! |
Thanks @saratvemulapalli for the proposal! @reta I think you have it spot on, the limitations and problems you have described are real, but the solution may not be as clear yet. And @ylwu-amzn is absolutely right to ask: Seems don't see the analysis about why not enhance current plugin framework to solve the problems stated in this issue. What's the pros/cons between "enhancing current plugin framework" and "adding a new extension framework"? I thought about how to address this concern around "extensions" vs. "plugins". Personally, I am a big believer in incremental evolution rather than radical revolution, too. I only read the proposal as "let's think about what plugins could be if we weren't bound by years of legacy". I am, personally, completely comfortable to be talking about a new thing called "extensions", and leave the actual evolution path to (albeit important) implementation details. We can spec this upfront, but I'd prefer to see see some PRs that chip away at getting plugins to evolve into what @saratvemulapalli called extensions. The way that's done could possibly be done through a plugin, it's a good idea! We can decide what we merge PR by PR and mark things experimental. @kaituo I think your example of recompiling a plugin convinced me even more of the problem that plugins are tightly coupled to a specific version of OpenSearch. How can it be OK that in order to upgrade from version X to Y of OpenSearch I need to rebuild and upgrade all plugins? This means that after releasing OpenSearch Y, I also need to wait till all the plugins I used are rebuilt for version Y. In your example there's actually no change in the software, it's all swapping jackson and protostuff dependencies. Why can't version X of the plugin "just work" without any changes with version Y of OpenSearch given that no APIs have changed? Why can't the plugin use its own version of jackson? Why does it have to be the same as OpenSearch? What kind of ecosystem can we possibly expect when everything has to move locked up in sync? The answer is the only ecosystem is what we have now: a bundle of core engine and plugins that ship together as one giant monolith and I think it's a huge barrier for more plugins to be developed. On the other items, when plugins invent their own way of running out of process workers (PA) this is telling me that the framework should support it. When notifications takes garbage in and casts it into a type that is copied to both client and server, this is telling me that we need an SDK. Regarding performance: we shouldn't be optimizing performance of what we cannot measure, yet. I worked on search in the 90s where we had hard time standing up an engine that did 8 requests per second. Storing data in an external/remote database seemed to me like an insane idea. Fast forward, S3 can do 5,500 GET/HEAD requests per second with strong consistency. You could be moving 100 Gb/s on a single node, and having small object latencies of 100–200 milliseconds. We're not talking about S3 here, but maybe similar additional latency is an acceptable tradeoff for many scenarios for getting additional security or crash protection? I really like the idea that we can build options and let users (or plugin authors!) decide whether the plugin runs in the same JVM, or on a remote node with total isolation, and document tradeoffs in performance. All I am saying, don't throw the baby out with the bathwater and let's compare real numbers when we have some. There's a ton of unanswered questions! I'm glad people are taking on hard work and questioning the status quo of what we have now. I suggest prototyping some of these ideas, and am looking forward to seeing how some of these can become code we'd all agree is worthy of being merged to main. |
One thing that I'd like to understand more is how extensions can interface with OpenSearch Security so that people can build new features that can use access control features, audit logging, etc... Today we already see this need in some plugins on the project (e.g., AD, Alerting, ISM, etc...). Codifying how to integrate would really help drive a consistent security experience. |
also thanks @saratvemulapalli for putting this together! |
Hi, Would extensions have to run on a separate instance/server? Also, what would be the way for version tracking, such as extension track which OS version it supports, or OS track supported extension version through extensions.yml(?). Thanks. |
No. In fact, for most of our development and testing we're using the same server and localhost/loopback interface. It does provide a lot of flexibility in moving it elsewhere!
Yes, the only issue with running extensions on the same node would be having multiple |
We are tracking the progress and milestones for Extensibility here: #1422 |
I came here from #5768 but some plugins really need to be in the same jvm, mainly the lucene ones such as |
many of the lucene analyzers are simply not in the But they are still important, e.g. it is the case for Chinese, Japanese, Korean that we need some data files to do a decent job for majority use-cases. Unfortunately it means a few megabytes and maybe becomes a plugin because of that. But I hope we don't make things absurdly slow for such languages for no good reasons, please think about the lucene plugins when trying to design extensibility here. |
Thanks @rmuir for the feedback and I agree for core workloads like indexing and searching (where analysis plugins are very common use cases) the overhead of communicating will make it worse with performance. This is something next on the radar to understand how to make it happen. If you have suggestions I would love to hear. With most of the other plugins, they really are building features on top of OpenSearch which we'd like to make them as extensions and make it easy to develop them without worrying about constructs in OpenSearch/Lucene. |
IPC overhead is real. Once we have an analysis plugin implemented as an extension we can see what actual numbers look like. I fully expect them to be slower, but I am curious whether it's 10% or 200%. I think if it's 10% we can give users choices: if you trust the code, run it in the same JVM. Otherwise run it out of proc with a performance penalty. Then, running the processor on a separate JVM may improve the actual processor performance by the fact that you can control heap size and GC pause separately or dedicate active CPU counts. Finally, if a processor can be implemented in completely different technology (Rust? native code?) the serialization/deserialization overhead may turn out to be smaller than the performance improvement. Let's keep an open mind! I hear similar concerns all the time in AWS services when users say "I can't have network overhead", but then they measure and observe that the entire system performs and scales a lot better when you call Lambda in the middle of a critical path (think truly remote analyzer :). |
I've created a proposal for how the language clients can support extensions in OpenSearch here: opensearch-project/opensearch-clients#55 |
I was trying to understand if extensions can be built on top of another extension, like the way one plugin can extends another and leverage another extension's function to do its work. If this is supported, how can one extension detect another extension, and understand the function of another extension? btw, according to it SDK design seems in order for an OpenSearch to add a new extension, OpenSearch admin will need to update the |
re: dependencies, the first part is to be able to declare a dependency on something in an extension, starting with OpenSearch - that will be the mechanism to say "this extension runs with OpenSearch ~> 2.x". We will reuse that exact mechanism to declare that "this extension depends on OpenSearch ~> 2.x and notifications ~> 7.1.0". The extensions manager should prevent calls from/to the extension if its dependency is missing, prevent uninstalling a dependency of another extension, etc. This is opensearch-project/opensearch-sdk-java#108. re: extensions.yml, that's just a crutch, the plan is to have an installation/uninstallation API where the cluster does not need to be restarted to install/uninstall an extension - that's called hot-swap and we plan to support it, see opensearch-project/opensearch-sdk-java#356. |
Introduction
OpenSearch is committed to being a vibrant and welcoming community-developed product. Community development, at its best, lets people with diverse interests have a direct hand in guiding and building products they will use; this results in products that meet their needs better than anything else. Additionally, community development allows the project to scale, as the community is able to find and build new areas of development that they are passionate about beyond what a single person or company could support. This acts as a virtuous cycle where new users and contributors add new features, which in turn draw more users and contributors.
To drive this flywheel, we propose that extensions become the default way to implement new and extend existing features in OpenSearch. To raise the bar for extensibility, we will provide the community with a well-supported OpenSearch catalog for extensions. Our vision is to build the equivalent of “Visual Studio Code” and the “AppStore” for OpenSearch. In the same way that these tools acted as a force multiplier for the number of problems that an iPhone can solve, we want to build an extension ecosystem that enables the community to solve more with OpenSearch. No single organization has the ability to prioritize every problem, so enabling developers to easily build extensions for OpenSearch will allow the project to address a broader range of end user problems.
In the future for OpenSearch, we want to see thousands of new features being quickly and painlessly built by developers. And we want those features to be easily discoverable by the community, who will be able to easily install them with confidence that they’ll be able to use them securely with no impact to their cluster.
What's next?
To reach our goal of making OpenSearch extensible, there are three major areas that we’ll need to make changes in:
API/Versioning
Problem: Plugins are rigid in terms of compatibility and have to be built with a specific x.y.z version of OpenSearch during compile time. This tight coupling reduces the velocity of software development lifecycle for OpenSearch and plugins because it requires all plug-ins to release at the same time when a version number is raised. An additional side effect is that plugins cannot be installed/uninstalled/upgraded/configured without restarting the cluster.
The underlying problem is lack of versioning support for extension points on which plugins are extended.
These extension points are part of core modules of OpenSearch (like settings etc) which do not support versioning. Plugins rely on these extension points to get notified on the changes in the system.
Working Backwards:
Who are the actors in the community:
a. Extension developer
b. Extension user
c. Clients developer
What would the customer like to see/use:
a. Not worry about updating Extension for every patch version of OpenSearch.
b. Extension is not broken when OpenSearch minor version is upgraded.
c. Install/Update/Remove an extension without restarting OpenSearch.
How we’d like to solve it: OpenSearch#2283
Independence/Sandboxing
Problem: Because plugins currently run in the same process (and JVM), plugins have unrestricted access to various resources across the cluster. Plugins can therefore fatally affect the cluster, impacting core functionalities like indexing and searching. to the point that the cluster becomes unavailability
Working Backwards:
Who are the actors in the community:
a. Extension developer
b. Extension user
What would the customer like to see/use:
a. Not worry about cluster going down due to an extension misbehaving.
b. Run a 3rd party extension and not worry about a 3rd party accessing data and configurations on the cluster to which it should not have permissions.
c. Ability to support granular access control of cluster resources for an extension, e.g. CPU, Memory, etc.
d. Ability to write an extension in any language of choice.
How we’d like to solve it: OpenSearch#1422
Running extensions within the same process/JVM of OpenSearch limits the ability to secure the cluster.
Also it doesn’t scale when we’d like to run many extensions within the same node.
We believe adding support to run extensions outside of OpenSearch process solves these problems if we can define a common communication protocol and make extensions independent. It enables all extensions to talk via a common interface and not fatally effect the core of OpenSearch.
Build and publish extension SDKs which will translate messages between OpenSearch and an extension. These SDKs should be distributed in multiple languages while keeping the same communication protocol.
Add granular security support for cluster resources (in OpenSearch) and node resources (potentially via extension SDK).
Discoverability/Dependency Management
Problem: Plugins are not discoverable from the distribution. There is no way for a customer to know what plugins exist in the community and how to install them. Also customers have to understand the versioning compatibilities of OpenSearch and other plugins.
Working Backwards:
Who are the actors in the community:
a. Extension developer
b. Extension user
What would the customer like to see/use:
a. Discover all OpenSearch extensions in one place.
b. Not worry about extension and its dependencies, but just install and ready to go.
How we’d like to solve it:
The extension manifest would contain version, dependencies, security policies etc.
FAQ
Our goal is to get benchmark numbers to understand how much performance impact we’ll see and are tracking via issue and issue.
Our goal is to support performance intense workloads via extensions. Depending on the benchmark results we will explore different solutions to make the communication light weight (like protobuf etc).
No, existing plugin architecture will be supported and will be just another form of extensions (running within the process of OpenSearch).
We do not know yet, but we are actively working towards to get data. (FAQ 1).
Our hunch says it will be, since the communication is synchronous today.
Dashboards (and its plugins) doesn’t rely on OpenSearch plugins architecture. They communicate via REST APIs.
But dashboards has similar architectural problems which have to be solved through.
The goal of extensibility is to make as easy as possible to develop, build and use them. We will strive to make it simple for clusters of all sizes.
With AD plugin as an extension, we are working on building entire AD plugin as an extension. AD extension is just mimicking existing AD plugin.
Our goal is to near future move all existing plugins as extensions. Our vision is that all new plugins are build as extensions.
Extensions will support all modes, in-proc, separate JVM, separate process, remote. We will let customers determine and choose how they would want to run an extension based on their use-case and needs.
We have and will further dive into it. For this first phase we decided to go with REST since the existing clients is REST based. We will definitely look into gRPC.
How can I contribute?
We would love to have your contributions to make OpenSearch extensible. Within the 3 focus areas, we just started scratching the surface with sandboxing but there is a lot more work to make this happen. Feel free to pick up any of these issues and let’s make it better, together!
The text was updated successfully, but these errors were encountered: