[FEATURE] Add OpenSearch-Java transport as an option #124

harshavamsi · 2023-02-24T17:53:52Z

Is your feature request related to a problem?

Currently the hadoop client uses it’s own custom RestClient to make requests to an OpenSearch cluster. While this works today, we’d like to allow users to choose between that and an optional ApacheHttpClient5Transport that’s present in the OpenSearch-Java client. To do this, we would need to add a new transport option and import the java client into hadoop.

Adding the OpenSearch-Java transport as an option allows us to build on top of the upstream client and allows us to and new features like backpressure awareness and others described in opensearch-project/opensearch-clients#27

What solution would you like?

This diagram illustrates how the java client and hadoop client make requests to OpenSearch today:

The hadoop client just translates incoming queries into a SimpleHTTP request and uses the URL, path, queryparameter, method, and headers to construct the request. OpenSearch-Java on the other hand makes use of a Request/Response builder pattern and exposes those methods to the client.

Approach 1

One approach would be to add a new abstraction layer in OpenSearch-Java that would abstract away the GET/POST/PUT/DELETE methods. The hadoop client can just use the appropriate class, construct the request and let the java client handle everything else. This is also an ask here — opensearch-project/opensearch-java#377

Pros:

Would make it easier in the hadoop client to leverage OpenSearch-Java without having to explicitly parse the Request and Response classes
Would let people use the client as is for exisiting APIs and allow them to hit other OpenSearch endpoints that are not supported in the client today, e.g. calling plugin APIs

Cons:

Providing an easy REST layer can shadow the other client endpoints and cause developers to stop using them the right way adding maintenance overhead and potentially going against good design patterns?

Approach 2

Parse the incoming request at the hadoop layer and use the appropriate OpenSearch-Java request and response class to send the request.

Implementation questions:

what’s a good design pattern for this?

Example design pattern:

public class JavaClientTransport {
    
    public void executeRequest(Request.Method method, CharSequence uri, CharSequence path, CharSequence params, ByteSequence body, String operationType) throws Exception {
        RestClientBuilder builder = RestClient.builder(new HttpHost("endpoint", 9200, "protocol"));
        RestClient restClient = builder.build();

        // Create Client
        OpenSearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
        OpenSearchClient client = new OpenSearchClient(transport);
        switch(operationType) {
            case "bulk":
                // build the request right here, looks like we have all the information needed, it might need to be converted to specific format that fits with the BulkRequest
                BulkRequest bulkRequest = new BulkRequest.Builder().index("index").build();
                BulkResponse bulkResponse = client.bulk(bulkRequest);
            case "search":
                SearchRequest searchRequest = new SearchRequest.Builder().index(Arrays.asList("index")).build();
                SearchResponse searchResponse = client.search(searchRequest, null);
            default:
                throw new Exception("No matching path found");
        }
    }
}

how would we convert the Bytesequence that hadoop uses for the body into either JSON or a body class?

Potential example to convert ByteSequence to String

 ByteArrayOutputStream result = new ByteArrayOutputStream();
 byte[] buffer = new byte[1024];
 for (int length; (length = inputStream.read(buffer)) != -1; ) {
     result.write(buffer, 0, length);
 }
 return result.toString("UTF-8");

Potential example to convert Json string to required JsonData that .document() from the Java Client accepts.

JsonpMapper mapper = client._transport().jsonpMapper();
JsonParser parser = mapper.jsonProvider().createParser(new StringReader(jsonString));
JsonData data = JsonData.from(parser, mapper);

A bigger question is, what have I missed in the implementation of the Java client and the hadoop client that would require a third approach?

Do you have any additional context?

This is also a feature request in opensearch-project/spring-data-opensearch#19 and can help consolidate the approaches.

The text was updated successfully, but these errors were encountered:

harshavamsi · 2023-02-24T17:55:08Z

@wbeckler @VachaShah @nknize @dblock would love any feedback.

nknize · 2023-02-24T22:52:08Z

I'll dig deeper but my initial reaction would be to refactor the java client transport as a core library so we take the dependency on opensearch-core and a new opensearch-transport library instead of a cross plugin dependency.

dblock · 2023-02-28T20:08:05Z

I think opensearch-java needs the ability and expose doing pure HTTP requests to avoid being a bottlebeck, and all the implementations of actual strongly typed methods should use those. For this client, taking a dependency on opensearch-java seems like the right call.

reta · 2023-03-07T18:08:42Z

@harshavamsi what is preventing us to use the opensearch-java as intended, using typed request and response models? (I think that what you meant in approach #2).

harshavamsi · 2023-03-10T18:04:55Z

@harshavamsi what is preventing us to use the opensearch-java as intended, using typed request and response models? (I think that what you meant in approach #2).

Yes, I wasn't very sure if we should be using typed request/responses given that the hadoop client today does not have any way of determining the types of API calls that are made. Based on the comments from opensearch-project/opensearch-java#377, I think it's fair on both clients to have this feature. This makes it much easier to implement the client here. What were you think about doing in opensearch-project/spring-data-opensearch#19? Were you going to pull in the request/response types from opensearch-java?

reta · 2023-03-10T18:23:11Z

What were you think about doing in opensearch-project/spring-data-opensearch#19? Were you going to pull in the request/response types from opensearch-java?

Yes, the plan going forward is to recommend opensearch-java as the only official client to communicate with OpenSearch, I think we formalized it here [1]

[1] opensearch-project/OpenSearch#5424

harshavamsi added enhancement New feature or request untriaged labels Feb 24, 2023

harshavamsi mentioned this issue Feb 24, 2023

[FEATURE] Enable Generic HTTP Actions in Java Client opensearch-project/opensearch-java#377

Closed

wbeckler removed the untriaged label Mar 9, 2023

harshavamsi self-assigned this Apr 10, 2023

dblock mentioned this issue Apr 25, 2023

[FEATURE] Combine support for ES, OS 1.x, 2.x, 3.x into a single branch #183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add OpenSearch-Java transport as an option #124

[FEATURE] Add OpenSearch-Java transport as an option #124

harshavamsi commented Feb 24, 2023 •

edited

Loading

harshavamsi commented Feb 24, 2023

nknize commented Feb 24, 2023

dblock commented Feb 28, 2023

reta commented Mar 7, 2023 •

edited

Loading

harshavamsi commented Mar 10, 2023

reta commented Mar 10, 2023

[FEATURE] Add OpenSearch-Java transport as an option #124

[FEATURE] Add OpenSearch-Java transport as an option #124

Comments

harshavamsi commented Feb 24, 2023 • edited Loading

Is your feature request related to a problem?

What solution would you like?

Approach 1

Approach 2

Do you have any additional context?

harshavamsi commented Feb 24, 2023

nknize commented Feb 24, 2023

dblock commented Feb 28, 2023

reta commented Mar 7, 2023 • edited Loading

harshavamsi commented Mar 10, 2023

reta commented Mar 10, 2023

harshavamsi commented Feb 24, 2023 •

edited

Loading

reta commented Mar 7, 2023 •

edited

Loading