Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch Extensibility #2447

Open
saratvemulapalli opened this issue Mar 11, 2022 · 21 comments
Open

OpenSearch Extensibility #2447

saratvemulapalli opened this issue Mar 11, 2022 · 21 comments
Labels
discuss Issues intended to help drive brainstorming and decision making RFC Issues requesting major changes

Comments

@saratvemulapalli
Copy link
Member

saratvemulapalli commented Mar 11, 2022

Introduction

OpenSearch is committed to being a vibrant and welcoming community-developed product. Community development, at its best, lets people with diverse interests have a direct hand in guiding and building products they will use; this results in products that meet their needs better than anything else. Additionally, community development allows the project to scale, as the community is able to find and build new areas of development that they are passionate about beyond what a single person or company could support. This acts as a virtuous cycle where new users and contributors add new features, which in turn draw more users and contributors.

To drive this flywheel, we propose that extensions become the default way to implement new and extend existing features in OpenSearch. To raise the bar for extensibility, we will provide the community with a well-supported OpenSearch catalog for extensions. Our vision is to build the equivalent of “Visual Studio Code” and the “AppStore” for OpenSearch. In the same way that these tools acted as a force multiplier for the number of problems that an iPhone can solve, we want to build an extension ecosystem that enables the community to solve more with OpenSearch. No single organization has the ability to prioritize every problem, so enabling developers to easily build extensions for OpenSearch will allow the project to address a broader range of end user problems.

In the future for OpenSearch, we want to see thousands of new features being quickly and painlessly built by developers. And we want those features to be easily discoverable by the community, who will be able to easily install them with confidence that they’ll be able to use them securely with no impact to their cluster.

What's next?

To reach our goal of making OpenSearch extensible, there are three major areas that we’ll need to make changes in:

  • API/Versioning
  • Independence/Sandboxing
  • Discoverability/Dependency Management

API/Versioning

Problem: Plugins are rigid in terms of compatibility and have to be built with a specific x.y.z version of OpenSearch during compile time. This tight coupling reduces the velocity of software development lifecycle for OpenSearch and plugins because it requires all plug-ins to release at the same time when a version number is raised. An additional side effect is that plugins cannot be installed/uninstalled/upgraded/configured without restarting the cluster.

The underlying problem is lack of versioning support for extension points on which plugins are extended.
These extension points are part of core modules of OpenSearch (like settings etc) which do not support versioning. Plugins rely on these extension points to get notified on the changes in the system.

Working Backwards:
Who are the actors in the community:
a. Extension developer
b. Extension user
c. Clients developer

What would the customer like to see/use:
a. Not worry about updating Extension for every patch version of OpenSearch.
b. Extension is not broken when OpenSearch minor version is upgraded.
c. Install/Update/Remove an extension without restarting OpenSearch.

How we’d like to solve it: OpenSearch#2283

  1. Add versioning support for all modules which support extension points (e.g Settings etc) .
  2. Revamp all extension points to add support for versioning and backwards compatibility.
  3. Build and publish SPI’s a.k.a new extension framework which supports versioned APIs for developers.
  4. Add support for extensions to introduce API specifications for clients. (Potentially via a new extension point)
  5. Add dynamic support for all modules which support extension points. i.e add support to register/remove extension handlers dynamically.
  6. Add supporting tool to define and consume extension metadata which includes versioning.

Independence/Sandboxing

Problem: Because plugins currently run in the same process (and JVM), plugins have unrestricted access to various resources across the cluster. Plugins can therefore fatally affect the cluster, impacting core functionalities like indexing and searching. to the point that the cluster becomes unavailability

Working Backwards:
Who are the actors in the community:
a. Extension developer
b. Extension user

What would the customer like to see/use:
a. Not worry about cluster going down due to an extension misbehaving.
b. Run a 3rd party extension and not worry about a 3rd party accessing data and configurations on the cluster to which it should not have permissions.
c. Ability to support granular access control of cluster resources for an extension, e.g. CPU, Memory, etc.
d. Ability to write an extension in any language of choice.

152451097-b8814ebb-8bbf-49a5-bfc5-d011274f10d9
How we’d like to solve it: OpenSearch#1422

  1. Running extensions within the same process/JVM of OpenSearch limits the ability to secure the cluster.
    Also it doesn’t scale when we’d like to run many extensions within the same node.

    We believe adding support to run extensions outside of OpenSearch process solves these problems if we can define a common communication protocol and make extensions independent. It enables all extensions to talk via a common interface and not fatally effect the core of OpenSearch.

  2. Build and publish extension SDKs which will translate messages between OpenSearch and an extension. These SDKs should be distributed in multiple languages while keeping the same communication protocol.

  3. Add granular security support for cluster resources (in OpenSearch) and node resources (potentially via extension SDK).

Discoverability/Dependency Management

Problem: Plugins are not discoverable from the distribution. There is no way for a customer to know what plugins exist in the community and how to install them. Also customers have to understand the versioning compatibilities of OpenSearch and other plugins.

Working Backwards:
Who are the actors in the community:
a. Extension developer
b. Extension user

What would the customer like to see/use:
a. Discover all OpenSearch extensions in one place.
b. Not worry about extension and its dependencies, but just install and ready to go.

How we’d like to solve it:

  1. Solving the problem of versioning, it results into another problem which is dependency management. We should build a package manager which understands the extension manifest and manage all its dependencies (including dependencies on other plugins).
    The extension manifest would contain version, dependencies, security policies etc.
  2. Build and publish extension manifest which would be the first step towards the catalog.

FAQ

  1. How is the latency going to be for extensions?
    Our goal is to get benchmark numbers to understand how much performance impact we’ll see and are tracking via issue and issue.
  2. Can we use extensions for plugins like Index Management and K-NN?
    Our goal is to support performance intense workloads via extensions. Depending on the benchmark results we will explore different solutions to make the communication light weight (like protobuf etc).
  3. Will existing plugins break when we launch extensions?
    No, existing plugin architecture will be supported and will be just another form of extensions (running within the process of OpenSearch).
  4. Is performance of OpenSearch be impacted?
    We do not know yet, but we are actively working towards to get data. (FAQ 1).
    Our hunch says it will be, since the communication is synchronous today.
  5. How does extensions impact Dashboards?
    Dashboards (and its plugins) doesn’t rely on OpenSearch plugins architecture. They communicate via REST APIs.
    But dashboards has similar architectural problems which have to be solved through.
  6. How will I be impacted if I use a small or single node cluster?
    The goal of extensibility is to make as easy as possible to develop, build and use them. We will strive to make it simple for clusters of all sizes.
  7. With Anomaly Detection plugin as an extension, are we extending any functionality of AD?
    With AD plugin as an extension, we are working on building entire AD plugin as an extension. AD extension is just mimicking existing AD plugin.
  8. Is OpenSearch moving to convert all plugins to extensions?
    Our goal is to near future move all existing plugins as extensions. Our vision is that all new plugins are build as extensions.
  9. Today AD extension which is being built runs on a separate node. How would scaling factor in?
    Extensions will support all modes, in-proc, separate JVM, separate process, remote. We will let customers determine and choose how they would want to run an extension based on their use-case and needs.
  10. Have you considered using gRPC instead of REST for inter service communications?
    We have and will further dive into it. For this first phase we decided to go with REST since the existing clients is REST based. We will definitely look into gRPC.

How can I contribute?

We would love to have your contributions to make OpenSearch extensible. Within the 3 focus areas, we just started scratching the surface with sandboxing but there is a lot more work to make this happen. Feel free to pick up any of these issues and let’s make it better, together!

@saratvemulapalli saratvemulapalli added discuss Issues intended to help drive brainstorming and decision making RFC Issues requesting major changes labels Mar 11, 2022
@CEHENKLE CEHENKLE pinned this issue Mar 11, 2022
@reta
Copy link
Collaborator

reta commented Mar 17, 2022

@saratvemulapalli thanks a lot for putting this proposal in place, I do have a couple of questions which I have not seen addressed yet (please correct me if I am wrong).

First, I agree with you that extensions will not become an equivalent replacement of the plugins. For example, plugins could tailor the codecs and translog policies, new index types, a very core pieces of the OpenSearch engine, externalizing those would not only be difficult but would kill the engine performance, no doubts.

Secondly, you raised the question of latency but there is no answer to it: the OpenSearch could only impose SLA bounds but have no control over latency. It will also significantly impact the availability or/and accuracy or/and consistency of the data, depending on the nature of the extension (not to forget about cycles here).

Thirdly, we should keep in mind how it plays with other planned features. For example, if #1968 gets in, we are suddenly going to deal with a massively distributed system where complex core / storage / extensions remote communications may render the entire cluster unstable or completely unavailable.

With that being said, I do see extensions as useful mechanism for doing certain things, in this regards I think about that as webhooks: the engine could use external hooks (extensions?) in order to provide an opportunity to enrich ingested data or notify about index changes. Is it the right analogy to think about?

@getsaurabh02
Copy link
Member

Thanks @saratvemulapalli for sharing the proposal. This really helps to get the right picture of how we are thinking about this evolution. I had few questions/clarifications:

Build and publish SPI’s a.k.a new extension framework which supports versioned APIs for developers.

Does this also mean we will remove the current Plugin Framework support, or the new SPIs could co-exist ?

I really see a high value in keeping the new framework co-existent, and deprecate organically (only if necessary) due to the below reasons:

  1. The current plugin architecture allows more deeper integration with core and provides ability to change with the core behavior completely. On the other hand I see extensions to be built upon SPIs would be more intended for solving application uses cases while keeping the core engine logic same.
  2. Process isolation does offer more security, but comes with its own overhead of performance due to interprocess communication. For a trusted plugin, user might want to chose performance over isolation. Hence this could advocate for keeping the Plugins Framework coexistent with the Extension framework.
  3. This will also provide a gradual and smoother transitions for developers and partners to see a real value in the new framework and migrate organically evaluating the benefits.

Add dynamic support for all modules which support extension points. i.e add support to register/remove extension handlers dynamically.

Not sure if this can actually be imposed as a necessary tenet for all the extensions. For example, network plugins/extensions such as Jetty/Netty might still require a restart.

@kaituo
Copy link

kaituo commented Mar 22, 2022

I would appreciate that @saratvemulapalli can provide more details on why the existing plugin framework is so broken that we need a totally different one. Some straw-man approaches for the problems to solve in the new framework.

Versioning:

Today if I want to make AD 1.3 works for Opensearch core 1.1, I would revert changes in build.gradle and build zip using 1.1, even though there are many more changes in other places between 1.1 and 1.3. So plugins are not rigid in terms of compatibility with the Opensearch core and can easily be switched among Opensearch cores. Here are the changes I made to make 1.3 AD work for 1.1 OpenSearch core.

(base) kaituo@88665a53bc93 anomaly-detection % git diff 6779aef0120990bc1195a675e03887a54956437c
diff —git a/build.gradle b/build.gradle
index 039c708d..cf93b111 100644
--- a/build.gradle
+++ b/build.gradle
@@ -575,9 +575,9 @@ dependencies {

// force Jackson version to avoid version conflict issue
implementation 'software.amazon.randomcutforest:randomcutforest-serialization:2.0.1'
- implementation "com.fasterxml.jackson.core:jackson-core:2.12.6"
- implementation "com.fasterxml.jackson.core:jackson-databind:2.12.6"
- implementation "com.fasterxml.jackson.core:jackson-annotations:2.12.6"
+ implementation "com.fasterxml.jackson.core:jackson-core:${versions.jackson}"
+ implementation "com.fasterxml.jackson.core:jackson-databind:${versions.jackson}"
+ implementation "com.fasterxml.jackson.core:jackson-annotations:${versions.jackson}"
compile files('lib/randomcutforest-parkservices-2.0.1.jar')
compile files('lib/randomcutforest-core-2.0.1.jar')

@@ -586,7 +586,7 @@ dependencies {
compile files('lib/protostuff-core-1.8.0-SNAPSHOT.jar')
compile files('lib/protostuff-collectionschema-1.8.0-SNAPSHOT.jar')
compile files('lib/protostuff-runtime-1.8.0-SNAPSHOT.jar')
- compile group: 'org.apache.commons', name: 'commons-lang3', version: '3.12.0'
+ //compile group: 'org.apache.commons', name: 'commons-lang3', version: '3.12.0'

compile "org.jacoco:org.jacoco.agent:0.8.5"
compile ("org.jacoco:org.jacoco.ant:0.8.5") {

then

./gradlew :assemble -Dopensearch.version=1.1.0

For versioning support, we can at least enable configuration to allow plugins to specify what versions of Opensearch core it is compatible with. If the versions are compatible, we can match plugins with different versions of Opensearch cores.

Independence/Sandboxing:

The reader in the Performance analyzer plugin are running in a separate JVM and communicate with Opensearch via RestFul API (check bin/performance-analyzer-agent-cli). To do that, the Performance analyzer plugin writes the code in the plugins and writes extra configuration files to be run by Opensearch startup script (check https://opensearch.org/docs/latest/opensearch/install/tar/). Also, we can use cgroup to limit resource usage (e.g., memory) by starting the separate plugin process in a specific cgroup. We can automate this process.

security:

We can create an API in a security plugin like opensearch-project/security#566 to offload all of the checks to security plugins. Other plugins just have to call this API or the plugin framework can make the check implicit for all data/setting access by creating a wrapper client for plugins to call.

Inter-plugin communication:

This is a solved issue (check opensearch-project/notifications#223)

@ylwu-amzn
Copy link
Contributor

ylwu-amzn commented Mar 22, 2022

Seems don't see the analysis about why not enhance current plugin framework to solve the problems stated in this issue. What's the pros/cons between "enhancing current plugin framework" and "adding a new extension framework"? If we find it's impossible to enhance current plugin framework to solve these problems, I think it's ok to add new extension framework, but agree with @getsaurabh02 for "keeping the new framework co-existent, and deprecate organically (only if necessary)". From the description, the extension will be much limited, why not keep the flexibility for community to build plugin?

I see Kaituo gave solutions for several problems of plugin framework. For "Discoverability/Dependency Management", why can't we build discover mechanism for plugin? For example, we can ask developer to register/onboard their plugin to OpenSearch "AppStore", then these plugins will be verified and eligible to discover and download/install.

For inter-plugin communication, another example is the new ml-commons plugin released in 1.3 which communicate with SQL plugin.

For "Independence/Sandboxing" , I think ml-commons and other ML plugins may benefit from this by limiting the resource used by ML models, and even better running it in a separate JVM/process and on separate node. But it's hard to tell what the communicate overhead look like. And for distributed ML model, we need to run on multiple nodes, that means we need to know the realtime cluster state like which node running ML plugin and their resource usage, so we can dispatch ML task to proper nodes.

Build and publish extension SDKs which will translate messages between OpenSearch and an extension. These SDKs should be distributed in multiple languages while keeping the same communication protocol.

This looks like building some transport client in multiple languages?

I see we are using Apple “AppStore” as analogy and allow use to easily install any extension on the fly. That seems challenging. When people download app from Apple "AppStore", they install to just one local iPhone and all Apps shares the same set of hardware resource. But OpenSearch cluster may have multiple nodes. When user download a new extension, should we install it on all data nodes or all nodes? It's still risk that user install too many extensions on data node even we limit each extension's resource usage, for example extension1 use 50% memory at most, extension2 use 60% memory at most, that still risky to install two extensions on same data node. So we may ask user which node they want to use to install new extension. But that may change the cluster load balance by running specific extensions on specific data node only. So the "easily install extension" seems challenging especially for production cluster. Should be ok to install new extensions on non-production cluster to explore and learn, then install and configure carefully on production cluster.

@saratvemulapalli
Copy link
Member Author

Thanks @reta for taking a look and reading through.
Answering all questions inline:

First, I agree with you that extensions will not become an equivalent replacement of the plugins. For example, plugins could tailor the codecs and translog policies, new index types, a very core pieces of the OpenSearch engine, externalizing those would not only be difficult but would kill the engine performance, no doubts.

With extension we would like to provide an opportunity for developers/cluster operators to choose where to run them.
We should understand how bad the latency is going to be when plugins interacting during the index/query lifecycle. This issue should give us those numbers: #2231.
Our goal is to make sure OpenSearch is as stable as possible and not have plugins taking down the cluster.

Secondly, you raised the question of latency but there is no answer to it: the OpenSearch could only impose SLA bounds but have no control over latency. It will also significantly impact the availability or/and accuracy or/and consistency of the data, depending on the nature of the extension (not to forget about cycles here).

Thats a great question. Sure for latency, we do not have the data at this point and we are marching towards it. The numbers would tell us what direction to go for.
Absolutely, I would love to learn more about the cases where availability, accuracy/consistency of data being impacted.
How I see it is, the existing plugin architecture is driven by OpenSearch. All the calls are made by OpenSearch to the plugin and back. We could rely on it for extensions as well (i.e keep it tight initially) and as we see use cases we can build the systems around. Lets work through them.

Thirdly, we should keep in mind how it plays with other planned features. For example, if #1968 gets in, we are suddenly going to deal with a massively distributed system where complex core / storage / extensions remote communications may render the entire cluster unstable or completely unavailable.

Thats a great point. Do you have idea's/suggestions how to take this on?
Our thought process is, lets take it as we go, see problems list them down, start solving them.

With that being said, I do see extensions as useful mechanism for doing certain things, in this regards I think about that as webhooks: the engine could use external hooks (extensions?) in order to provide an opportunity to enrich ingested data or notify about index changes. Is it the right analogy to think about?

A yes and no :). How I see extensions like pieces of code which interacts with OpenSearch (via within the process, outside of the process and remote). The framework should support all 3 mechanisms while solving different use cases.

And thanks again for these questions. Lets keep the conversation rolling.

@saratvemulapalli
Copy link
Member Author

Thanks @getsaurabh02 for taking a look and reading through.
Answering all questions inline:

Build and publish SPI’s a.k.a new extension framework which supports versioned APIs for developers.

Does this also mean we will remove the current Plugin Framework support, or the new SPIs could co-exist ?

No. The plugin architecture will be still supported, infact it will be enhanced to solve these problems. The new interfaces will still provide support for traditional way of invoking via extension points while enhancing the framework to solve the problems listed above.

I really see a high value in keeping the new framework co-existent, and deprecate organically (only if necessary) due to the below reasons:

  1. The current plugin architecture allows more deeper integration with core and provides ability to change with the core behavior completely. On the other hand I see extensions to be built upon SPIs would be more intended for solving application uses cases while keeping the core engine logic same.

💯 Totally agreed, as I've said above we are enhancing the architecture and as we see the need for deprecating the existing architecture we can discuss.

  1. Process isolation does offer more security, but comes with its own overhead of performance due to interprocess communication. For a trusted plugin, user might want to chose performance over isolation. Hence this could advocate for keeping the Plugins Framework coexistent with the Extension framework.

Absolutely, and this is exactly why we would like to leave the option for developers/cluster operators to choose where to run them.

  1. This will also provide a gradual and smoother transitions for developers and partners to see a real value in the new framework and migrate organically evaluating the benefits.

+1

Add dynamic support for all modules which support extension points. i.e add support to register/remove extension handlers dynamically.

Not sure if this can actually be imposed as a necessary tenet for all the extensions. For example, network plugins/extensions such as Jetty/Netty might still require a restart.

Thats a great point. Mostly these are modules which are loaded by default to make the communication possible.
What this tells me is that not all types of plugins (i.e extension points) can be dynamic and I completely agree.
I'll add an item to the FAQ to capture this.

And thanks again :)!

@zengyan-amazon
Copy link
Member

@saratvemulapalli Thanks for putting this together! I'd like to understand more on the Sandboxing/separate process idea?

We believe adding support to run extensions outside of OpenSearch process solves these problems if we can define a common communication protocol and make extensions independent. It enables all extensions to talk via a common interface and not fatally effect the core of OpenSearch.

Does this means it will be the OpenSearch admin to determine where to install the extensions, like on which hosts? Or it is still the OpenSearch process to handle the extension installation? For instance, say one extension's work is heavy and needs to be run on a different host, does the OpenSearch admin need to install the plugin manually on the designated host, or OpenSearch can be aware of the hardware available and do the installation?

Meanwhile, I imagine extensions may do some work like the plugins are doing today. Since most extension points requires the core OpenSearch engine to trigger some functions in the extension in a separate (and even remote) process, will the OpenSearch engine manages the plugin topology/routing? Or can we extend and take advantage of the existing plugin framework, by adding a new role of nodes like dedicate plugin nodes to achieve such process separation?

@reta
Copy link
Collaborator

reta commented Mar 24, 2022

@saratvemulapalli it seems like we have considerable amount of unknowns / concerns / questions regarding the extensions. The plugins do have a number of issues (see #1707 for example), but since we are not getting rid of them (please correct me if I am wrong), we would end up with 2 problems instead of just one. May be we could step back a bit and reassess the deliverables?

  • could we bundle extensions through the plugin, fe https://github.com/opensearch-project/notifications?
  • could we clearly separate extensions and plugins, when, which and why? (it seems like not at this point)
  • could we share at least 1-2 convincing examples of something we clearly need extensions (not plugins) for?

The limitations and problems you have described are real, but the solution may not be as clear yet (at least to me and a few other folks). Thank you!

@dblock
Copy link
Member

dblock commented Mar 25, 2022

Thanks @saratvemulapalli for the proposal!

@reta I think you have it spot on, the limitations and problems you have described are real, but the solution may not be as clear yet. And @ylwu-amzn is absolutely right to ask: Seems don't see the analysis about why not enhance current plugin framework to solve the problems stated in this issue. What's the pros/cons between "enhancing current plugin framework" and "adding a new extension framework"?

I thought about how to address this concern around "extensions" vs. "plugins". Personally, I am a big believer in incremental evolution rather than radical revolution, too. I only read the proposal as "let's think about what plugins could be if we weren't bound by years of legacy". I am, personally, completely comfortable to be talking about a new thing called "extensions", and leave the actual evolution path to (albeit important) implementation details. We can spec this upfront, but I'd prefer to see see some PRs that chip away at getting plugins to evolve into what @saratvemulapalli called extensions. The way that's done could possibly be done through a plugin, it's a good idea! We can decide what we merge PR by PR and mark things experimental.

@kaituo I think your example of recompiling a plugin convinced me even more of the problem that plugins are tightly coupled to a specific version of OpenSearch. How can it be OK that in order to upgrade from version X to Y of OpenSearch I need to rebuild and upgrade all plugins? This means that after releasing OpenSearch Y, I also need to wait till all the plugins I used are rebuilt for version Y. In your example there's actually no change in the software, it's all swapping jackson and protostuff dependencies. Why can't version X of the plugin "just work" without any changes with version Y of OpenSearch given that no APIs have changed? Why can't the plugin use its own version of jackson? Why does it have to be the same as OpenSearch? What kind of ecosystem can we possibly expect when everything has to move locked up in sync? The answer is the only ecosystem is what we have now: a bundle of core engine and plugins that ship together as one giant monolith and I think it's a huge barrier for more plugins to be developed.

On the other items, when plugins invent their own way of running out of process workers (PA) this is telling me that the framework should support it. When notifications takes garbage in and casts it into a type that is copied to both client and server, this is telling me that we need an SDK.

Regarding performance: we shouldn't be optimizing performance of what we cannot measure, yet. I worked on search in the 90s where we had hard time standing up an engine that did 8 requests per second. Storing data in an external/remote database seemed to me like an insane idea. Fast forward, S3 can do 5,500 GET/HEAD requests per second with strong consistency. You could be moving 100 Gb/s on a single node, and having small object latencies of 100–200 milliseconds. We're not talking about S3 here, but maybe similar additional latency is an acceptable tradeoff for many scenarios for getting additional security or crash protection? I really like the idea that we can build options and let users (or plugin authors!) decide whether the plugin runs in the same JVM, or on a remote node with total isolation, and document tradeoffs in performance. All I am saying, don't throw the baby out with the bathwater and let's compare real numbers when we have some.

There's a ton of unanswered questions! I'm glad people are taking on hard work and questioning the status quo of what we have now. I suggest prototyping some of these ideas, and am looking forward to seeing how some of these can become code we'd all agree is worthy of being merged to main.

@elfisher
Copy link

One thing that I'd like to understand more is how extensions can interface with OpenSearch Security so that people can build new features that can use access control features, audit logging, etc... Today we already see this need in some plugins on the project (e.g., AD, Alerting, ISM, etc...). Codifying how to integrate would really help drive a consistent security experience.

@elfisher
Copy link

also thanks @saratvemulapalli for putting this together!

@peterzhuamazon
Copy link
Member

Hi,

Would extensions have to run on a separate instance/server?
If so, how could single node cluster work with extensions?
I would assume they still able to run locally beside the cluster in a single node right?

Also, what would be the way for version tracking, such as extension track which OS version it supports, or OS track supported extension version through extensions.yml(?).

Thanks.

@dbwiddis
Copy link
Member

Would extensions have to run on a separate instance/server?

No. In fact, for most of our development and testing we're using the same server and localhost/loopback interface. It does provide a lot of flexibility in moving it elsewhere!

If so, how could single node cluster work with extensions?
I would assume they still able to run locally beside the cluster in a single node right?

Yes, the only issue with running extensions on the same node would be having multiple java processes running and the associated resource usage. It would be entirely possible to start up all the extensions in a single JVM (other than the OpenSearch JVM) on the same machine/instance.

@owaiskazi19
Copy link
Member

We are tracking the progress and milestones for Extensibility here: #1422

@rmuir
Copy link
Contributor

rmuir commented Jan 12, 2023

I came here from #5768 but some plugins really need to be in the same jvm, mainly the lucene ones such as analysis or any codecs. these are integrated with lucene in such a way, that they really need to not be suffering the overhead of IPC communications of any sort. E.g. analyzer plugins are engineered to not produce a lot of garbage (ideally none) processing gazillions of documents and data is directly consumed from IndexWriter and query parsers.

@rmuir
Copy link
Contributor

rmuir commented Jan 12, 2023

many of the lucene analyzers are simply not in the lucene-core jar, or even lucene-analysis-common jar, but provided as separate modules optionally because they consume some additional resources (e.g. 6MB ram+disk) that doesn't make sense for an embedded deployment (e.g. your IDE).

But they are still important, e.g. it is the case for Chinese, Japanese, Korean that we need some data files to do a decent job for majority use-cases. Unfortunately it means a few megabytes and maybe becomes a plugin because of that. But I hope we don't make things absurdly slow for such languages for no good reasons, please think about the lucene plugins when trying to design extensibility here.

@saratvemulapalli
Copy link
Member Author

Thanks @rmuir for the feedback and I agree for core workloads like indexing and searching (where analysis plugins are very common use cases) the overhead of communicating will make it worse with performance. This is something next on the radar to understand how to make it happen. If you have suggestions I would love to hear.

With most of the other plugins, they really are building features on top of OpenSearch which we'd like to make them as extensions and make it easy to develop them without worrying about constructs in OpenSearch/Lucene.

@dblock
Copy link
Member

dblock commented Jan 17, 2023

IPC overhead is real. Once we have an analysis plugin implemented as an extension we can see what actual numbers look like. I fully expect them to be slower, but I am curious whether it's 10% or 200%. I think if it's 10% we can give users choices: if you trust the code, run it in the same JVM. Otherwise run it out of proc with a performance penalty.

Then, running the processor on a separate JVM may improve the actual processor performance by the fact that you can control heap size and GC pause separately or dedicate active CPU counts. Finally, if a processor can be implemented in completely different technology (Rust? native code?) the serialization/deserialization overhead may turn out to be smaller than the performance improvement.

Let's keep an open mind! I hear similar concerns all the time in AWS services when users say "I can't have network overhead", but then they measure and observe that the entire system performs and scales a lot better when you call Lambda in the middle of a critical path (think truly remote analyzer :).

@Xtansia
Copy link
Contributor

Xtansia commented Mar 23, 2023

I've created a proposal for how the language clients can support extensions in OpenSearch here: opensearch-project/opensearch-clients#55

@zengyan-amazon
Copy link
Member

I was trying to understand if extensions can be built on top of another extension, like the way one plugin can extends another and leverage another extension's function to do its work. If this is supported, how can one extension detect another extension, and understand the function of another extension?
e.g. say we have alerting extension and notification extension, and alerting extension needs to use notification extension to send alert notifications, how does this extension dependency work? Does Notification extension publish its client and Alerting extension needs to build with that client as a dependency? And What if when Notification extension is not present or temporarily down, can Alerting extension be notified?

btw, according to it SDK design seems in order for an OpenSearch to add a new extension, OpenSearch admin will need to update the extensions.yml config file, is there a OpenSearch restart needed? Does each and every OpenSearch node will do the extension registration workflow with all extensions configured in the config file, or it is only the cluster manage do the registration and synchronize the extension info to all nodes in the cluster?

@dblock
Copy link
Member

dblock commented Apr 13, 2023

re: dependencies, the first part is to be able to declare a dependency on something in an extension, starting with OpenSearch - that will be the mechanism to say "this extension runs with OpenSearch ~> 2.x". We will reuse that exact mechanism to declare that "this extension depends on OpenSearch ~> 2.x and notifications ~> 7.1.0". The extensions manager should prevent calls from/to the extension if its dependency is missing, prevent uninstalling a dependency of another extension, etc. This is opensearch-project/opensearch-sdk-java#108.

re: extensions.yml, that's just a crutch, the plan is to have an installation/uninstallation API where the cluster does not need to be restarted to install/uninstall an extension - that's called hot-swap and we plan to support it, see opensearch-project/opensearch-sdk-java#356.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making RFC Issues requesting major changes
Projects
None yet
Development

No branches or pull requests