-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Move SDK to OpenSearch repo #616
Comments
@dblock @opensearch-project/opensearch-sdk-java would love your thoughts, and also I might not have thought of all upsides/downside feel free to chip in. |
Downside: dependency management. Java SDK will have api dependencies on OpenSearchClient (java client) and OpenSearchAsyncClient. Moving to OpenSearch will require OpenSearch to take that same dependency.... but opensearch-java depends on OpenSearch. There are ways to work around it with SPI, etc. |
Neutral: I would love to see a diagram envisioning how an SDK module (ala JPMS) would interact with other OpenSearch modules in the modular future. |
I'm curious why the upstream SDK is planned to take a dependency on the downstream client? I think we should strive for no downstream inheritance in the upstream. |
Good point, @nknize; at this point it's mostly for convenience. I suppose we could remove them and make everyone instantiate and configure their own. We also have |
Downside: makes the
Only for the Java SDK. Isn't one of the primary reasons for moving to clients to have generated code and integration tests that work cross-language? We need to design IT from the start to recognize the connection. |
I think we're talking about different mechanisms? The client SDK/API for communicating with the OpenSearch cluster vs the extensions API for building new features. IMHO we should restrict extensions (new features) to be java only as other languages (e.g., kotlin) have a tendency to "compile" into unexpected jvm behaviors. So in that regard, java is special. For the client SDK, I don't think we want any language to be special. However, the core is written in Java. So at some point maybe it's here you need a multi language SDK generator. Probably thats in java? Maybe not? (Protobuf may give us everything we need here?). What we don't want is what we have today, the urge/need to monitor any downstream repos when breaking changes are made to core (public or internal API). Core should be isolated. Maybe it's best to start by identifying the needs here (so we can establish a common foundation of requirements) and then discuss the implementation details? I'll start by throwing some thoughts here:
|
Agreed but possibly with multiple meanings of "isolated". But it should not be the responsibility of OpenSearch to monitor downstream repos. OpenSearch should follow semver. And we do. We all know The same work needs to be done to ensure compatibility, the only question is which team is doing the work. What I think we want (and are getting) is a heads-up of major changes on main, but it should be our responsibility to monitor OpenSearch, not the other way around. With regard to your points: 1 - Big yes. 2 - Yes (extensions, not plugins!). 4 - Yes. 5 - Yes (part of SemVer is defining what is API.) 6 - yes but needs a bit more specificity. 7 - yes.
Adding one to your list:
More thoughts: There are a few things we do that somewhat relate to this but not "together":
Final observations: We've developing SDK for months and have an integrated "hello world" extension that has given us "early warning" when we broke things. However, this weekend I tried to do an extension from scratch from a completely different repo and ran into a few user-impacting issues (didn't declare api transitive dependencies, didn't include source and javadoc jars in our releases) that we simply wouldn't have (and haven't) noticed. When you're too tightly integrated you miss things that you accidentally connect when they aren't supposed to be connected. |
Action and Transport are being refactored out of
Meh. Not really. Code freeze is there for a reason. As long as changes are made by code freeze the time after freeze is suppose to be "bake and fix" time. Imposing artificial rules like "get your BIG changes in early" creates a slippery slope we don't want to go down.
💯. Exactly why we don't want tight integration with the core. min distribution should be standalone. Imagine if Lucene had to monitor every downstream (Solr, Elasticsearch, Opensearch, etc) when we make an API change. We're not trailblazing here. |
Kind of, but not really. Following semver would mean downstream repos could independently version and specify their opensearch dependencies using true semver semantics in
That's where we should be moving towards. Similar to npm, k-NN should be free to release 2.9 tomorrow even if core is only releasing 2.8 in the coming weeks. But we're still working on this whole "bundle" concept that, quite frankly, just doesn't scale and perpetuates this downstream/upstream issue. We should be working toward a world where OpenSearch consists of a core API and SDK (like lucene) and the concrete "distributed system" (like Solr) is just one implementation of those APIs. Case in point, Solr only just released 9.2 today, and Lucene is close to releasing 9.6. |
Took a day to think about this and summarize my thoughts:
So moving to core will help us not break in the near term (2.x) but I don't think it's the best plan for the longer term. Better for us to establish rigorous CI that runs daily (or more often) to test for breakage... and to have a central communication channel (perhaps just a tag on PRs we can subscribe to) for potential breaking changes. |
I'm curious about this in the context of the aggregations scenario in #5910. e.g., Defining a new values source, new aggregator, aggregation builder, etc. If you're suggesting we generate all of that from an IDL of some sort (hello CORBA) and have the concrete Let's simplify. One repo to build the core and validate BWC, not a bunch of separate repos. Orchestrating CI scaffolding across multiple repos is a fun project for some folks, but ultimately it's an Open Source nightmare where easy projects win over complicated ones. In that vain the whole idea of a "bundle" needs to go away. With downstreams independently versioning we enable those repos to release on their own terms. Those downstreams can then exist in their foundation of choice without tightly coupling to the OpenSearch project at all. Extension installation should leverage the OSS community mechanisms that exist today; maven repos, arch package repos, docker hub, charmhub. This is the point of open source, the project itself shouldn't own building it all, it should focus on enabling the community to build what they want without requiring heavy scaffolding. It should be as easy as, |
That's a small problem. It's fine in the repo, we just need a consistent way of copying over the text files into <insert language of choice code generation> workflow. How are we doing it for the spec-based clients? I'm envisioning something similar, whatever that looks like.
Thats fair. But a python SDK doesn't need the whole (Java) JAR, it just needs a copy of the spec files. If those are easily extracted from a jar (which is really just a zip) that's totally fine.
Sure, I'm not opposed to keeping stuff simple for the Java ecosystem. But the design needs to make it easy for non-Java too. |
Thanks @saratvemulapalli for writing this up. Upside: Can detect the breaking changes of OpenSearch, fix it in SDK and then extensions can consume the same. Downside (which has been called out by @dbwiddis): If we are planning to support multi language SDKs, where does other SDKs will exist? SDK should be treated as a library which any extension is able import and utilize the extension points APIs present to hook on to OpenSearch. Keeping it in a separate repository will also bring an overhead of adding If we are planning to have SDK just for Java then this makes sense for sure but still brings up the question where does the other languages SDKs will exist? |
Sorry for the late reply. I feel pretty strongly that the SDK should not depend on OpenSearch core at all. One of the desired features is compatibility with multiple versions of OpenSearch, which is difficult to achieve within a specific version of the upstream for minor versions, and impossible to achieve across major ones. I suggest closing the issue. |
I was chatting with @dbwiddis. To get benefits on both sides, we could build a thin layer (DSL) which lies in OpenSearch outside of server which will be the contract for all interactions with OpenSearch. |
Let's address the original issue as written, which suggests merging the SDK to OpenSearch. That sends us back to the tightly coupled, but better organized core, now we would need to rely on discipline such as the SDK doesn't accidentally reuse something from core and bring the kitchen sink in. So I still think we need to close this issue. Now, what problem do we need to solve? How the SDK code can call the server's REST endpoints? |
Thanks @dblock and maintainers who chimed in. |
What/Why
What are you proposing?
In a few sentences, describe the feature and its core capabilities.
Coming from: opensearch-project/OpenSearch#6470 (comment)
Move OpenSearch SDK to OpenSearch core repo to be the defacto interface for all dependencies.
There are obviously upsides and downsides:
Upside:
Downsides:
Maintainers I would like to hear thoughts and please vote 👍🏻 👎🏻
cc: @nknize
The text was updated successfully, but these errors were encountered: