Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze options to use SdkHttpClient implementations #101

Open
1 task
ptirador opened this issue Oct 31, 2020 · 5 comments
Open
1 task

Analyze options to use SdkHttpClient implementations #101

ptirador opened this issue Oct 31, 2020 · 5 comments

Comments

@ptirador
Copy link
Contributor

ptirador commented Oct 31, 2020

Task Description

The S3Factory class manages the build of a new Amazon S3 instance, which right now it's using an Apache HTTP Client.

As specified in this Pull Request discussion, this is locking in customers to the ApacheHttpClient, which adds a dependency they may not want. It's needed to provide an option for other SdkHttpClient implementations.

The UrlConnectionHttpClient is fairly popular choice in Java-based Lambda functions as it has faster startup time, so less impact to cold starts.

Tasks

The following tasks will need to be carried out:

  • Analyze options to use SdkHttpClient implementations

Task Relationships

This task:

Useful Links

Help

@carlspring carlspring added this to the 1.0.0 milestone Jan 23, 2021
@ptirador
Copy link
Contributor Author

Pros

Use the built-in HttpUrlConnection client to reduce instantiation time

The AWS Java SDK 2.x includes a pluggable HTTP layer that allows customers to switch to different HTTP implementations. Three HTTP clients are supported out-of-the-box:

  • Apache HTTP client
  • Netty HTTP client
  • Java HTTP URL Connection client.

With the default configuration, Apache HTTP client and Netty HTTP client are used for synchronous clients and asynchronous clients respectively. They are powerful HTTP clients with more features. However, they come at the cost of higher instantiation time.

On the other hand, the JDK built-in HTTPUrlConnection library:

  • Is more lightweight and has lower instantiation time.
  • As is part of the JDK, using it will not bring in external dependencies. It will allow you to keep the deployment package size small and thus, reduce the amount of time it takes for the deployment package to be unpacked and downloaded.

Hence, it's recommended using HttpUrlConnectionClient when configuring the SDK client. Note that it only supports synchronous API calls. If we'd like to see support for asynchronous SDK clients with JDK 11 built-in HTTP client, please upvote this GitHub issue.

Exclude unused SDK HTTP dependencies

The SDK by default includes Apache HTTP client and Netty HTTP client dependencies. If startup time is important to your application and you do not need both implementations, it's recommended excluding unused SDK HTTP dependencies to minimize the deployment package size. Below is the sample Maven POM file for an application that only uses url-connection-client and excludes netty-nio-client and apache-client.

    <dependencies>
        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>s3</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>software.amazon.awssdk</groupId>
                    <artifactId>netty-nio-client</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>software.amazon.awssdk</groupId>
                    <artifactId>apache-client</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>url-connection-client</artifactId>
        </dependency>
    </dependencies>

Cons

Incoveniences of using the built-in HttpUrlConnection client

As the JDK built-in HTTPUrlConnection client is more lightweight, its configuration is simpler. If compared to Apache HTTP Client, for example, you cannot configure:

  • the maximum number of connections, which would be useful in environments where you may want to share a single connection pool among multiple AWS services.
  • an HTTP/HTTPS proxy connection.

FYI @carlspring @steve-todorov

@carlspring
Copy link
Owner

Hi @ptirador ,

Thanks for your investigation!

What do you mean by "deployment package"?

In my opinion, we need to have support for both synchronous and asynchronous requests. If the we need the Apache + Netty dependencies for this, then so be it. There are many other things that you can't do with the HTTPUrlConnection like setting up connection pools and so on, (if I recall correctly).

How much of a difference is there in terms of instantiation time?

And the other question -- are we using async requests for anything right now? What use cases would we have for this?

My only concern is that, at the moment, we claim to support JDK11 (which is, of course indeed the case), and, whatever we decide will have to make sure this does not break out JDK 11 support.

Which one is your advice and personal preference?

@steve-todorov
Copy link
Collaborator

Thanks @ptirador for raising this issue and making the initial research!

How did you come to the conclusion using the built-in HttpUrlConnection client is faster?
Did you do a JMS benchmark which backs this statement with data?

Honestly, if I had to pick one of the three options above - I'd go with netty-nio-client and async connections as the default option. In my experience, using netty and proper async implementation would result in much better throughput and overall performance than using blocking / sync approach. Also, if you're already using Cassandra or something similar the chances you are already using netty are very big.

If you are up for the task - we can create a JMS benchmark which tests the different implementations so we can make a decision based on the data.

@ptirador
Copy link
Contributor Author

Hi @carlspring @steve-todorov,

The conclusions that I wrote are based on this article, which talks about these instantiation times but without providing any benchmarch example. We can create this JMS benchmark to test them.

In my opinion, I will also go with Netty and async connections, specially because of the overall performance boost that it provides. Also, a few months ago we switched the NIO implementation to use AsynchronousFileChannel instead of FileChannel, so I think it could be the best way to go.

@carlspring
Copy link
Owner

Hi @ptirador ,

I believe you and @steve-todorov are right -- we should use Netty, since indeed we did switch to AsynchronousFileChannel, as you've just reminded me.

How much of an effort will this task be?

@carlspring carlspring removed this from the 1.0.0 milestone May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants