Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add OpenSearch-Java transport as an option #124

Open
harshavamsi opened this issue Feb 24, 2023 · 6 comments
Open

[FEATURE] Add OpenSearch-Java transport as an option #124

harshavamsi opened this issue Feb 24, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@harshavamsi
Copy link
Collaborator

harshavamsi commented Feb 24, 2023

Is your feature request related to a problem?

Currently the hadoop client uses it’s own custom RestClient to make requests to an OpenSearch cluster. While this works today, we’d like to allow users to choose between that and an optional ApacheHttpClient5Transport that’s present in the OpenSearch-Java client. To do this, we would need to add a new transport option and import the java client into hadoop.

Adding the OpenSearch-Java transport as an option allows us to build on top of the upstream client and allows us to and new features like backpressure awareness and others described in opensearch-project/opensearch-clients#27

What solution would you like?

This diagram illustrates how the java client and hadoop client make requests to OpenSearch today:

Untitled Diagram drawio(2)

The hadoop client just translates incoming queries into a SimpleHTTP request and uses the URL, path, queryparameter, method, and headers to construct the request. OpenSearch-Java on the other hand makes use of a Request/Response builder pattern and exposes those methods to the client.

Approach 1

One approach would be to add a new abstraction layer in OpenSearch-Java that would abstract away the GET/POST/PUT/DELETE methods. The hadoop client can just use the appropriate class, construct the request and let the java client handle everything else. This is also an ask here — opensearch-project/opensearch-java#377

Untitled Diagram drawio(4)

Pros:

  • Would make it easier in the hadoop client to leverage OpenSearch-Java without having to explicitly parse the Request and Response classes
  • Would let people use the client as is for exisiting APIs and allow them to hit other OpenSearch endpoints that are not supported in the client today, e.g. calling plugin APIs

Cons:

  • Providing an easy REST layer can shadow the other client endpoints and cause developers to stop using them the right way adding maintenance overhead and potentially going against good design patterns?

Approach 2

Parse the incoming request at the hadoop layer and use the appropriate OpenSearch-Java request and response class to send the request.

Implementation questions:

  • what’s a good design pattern for this?

Example design pattern:

public class JavaClientTransport {
    
    public void executeRequest(Request.Method method, CharSequence uri, CharSequence path, CharSequence params, ByteSequence body, String operationType) throws Exception {
        RestClientBuilder builder = RestClient.builder(new HttpHost("endpoint", 9200, "protocol"));
        RestClient restClient = builder.build();

        // Create Client
        OpenSearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
        OpenSearchClient client = new OpenSearchClient(transport);
        switch(operationType) {
            case "bulk":
                // build the request right here, looks like we have all the information needed, it might need to be converted to specific format that fits with the BulkRequest
                BulkRequest bulkRequest = new BulkRequest.Builder().index("index").build();
                BulkResponse bulkResponse = client.bulk(bulkRequest);
            case "search":
                SearchRequest searchRequest = new SearchRequest.Builder().index(Arrays.asList("index")).build();
                SearchResponse searchResponse = client.search(searchRequest, null);
            default:
                throw new Exception("No matching path found");
        }
    }
}
  • how would we convert the Bytesequence that hadoop uses for the body into either JSON or a body class?

Potential example to convert ByteSequence to String

 ByteArrayOutputStream result = new ByteArrayOutputStream();
 byte[] buffer = new byte[1024];
 for (int length; (length = inputStream.read(buffer)) != -1; ) {
     result.write(buffer, 0, length);
 }
 return result.toString("UTF-8");

Potential example to convert Json string to required JsonData that .document() from the Java Client accepts.

JsonpMapper mapper = client._transport().jsonpMapper();
JsonParser parser = mapper.jsonProvider().createParser(new StringReader(jsonString));
JsonData data = JsonData.from(parser, mapper);

A bigger question is, what have I missed in the implementation of the Java client and the hadoop client that would require a third approach?

Do you have any additional context?

This is also a feature request in opensearch-project/spring-data-opensearch#19 and can help consolidate the approaches.

@harshavamsi harshavamsi added enhancement New feature or request untriaged labels Feb 24, 2023
@harshavamsi
Copy link
Collaborator Author

@wbeckler @VachaShah @nknize @dblock would love any feedback.

@nknize
Copy link
Collaborator

nknize commented Feb 24, 2023

I'll dig deeper but my initial reaction would be to refactor the java client transport as a core library so we take the dependency on opensearch-core and a new opensearch-transport library instead of a cross plugin dependency.

@dblock
Copy link
Member

dblock commented Feb 28, 2023

I think opensearch-java needs the ability and expose doing pure HTTP requests to avoid being a bottlebeck, and all the implementations of actual strongly typed methods should use those. For this client, taking a dependency on opensearch-java seems like the right call.

@reta
Copy link
Contributor

reta commented Mar 7, 2023

@harshavamsi what is preventing us to use the opensearch-java as intended, using typed request and response models? (I think that what you meant in approach #2).

@wbeckler wbeckler removed the untriaged label Mar 9, 2023
@harshavamsi
Copy link
Collaborator Author

@harshavamsi what is preventing us to use the opensearch-java as intended, using typed request and response models? (I think that what you meant in approach #2).

Yes, I wasn't very sure if we should be using typed request/responses given that the hadoop client today does not have any way of determining the types of API calls that are made. Based on the comments from opensearch-project/opensearch-java#377, I think it's fair on both clients to have this feature. This makes it much easier to implement the client here. What were you think about doing in opensearch-project/spring-data-opensearch#19? Were you going to pull in the request/response types from opensearch-java?

@reta
Copy link
Contributor

reta commented Mar 10, 2023

What were you think about doing in opensearch-project/spring-data-opensearch#19? Were you going to pull in the request/response types from opensearch-java?

Yes, the plan going forward is to recommend opensearch-java as the only official client to communicate with OpenSearch, I think we formalized it here [1]

[1] opensearch-project/OpenSearch#5424

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants