Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large file size (>250 MB) file is uploaded successfully to azure storage but it fails to return success response to any http client connection e.g. apache http server #18700

Closed
harisingh-highq opened this issue Jan 20, 2021 · 11 comments
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)

Comments

@harisingh-highq
Copy link

harisingh-highq commented Jan 20, 2021

We are facing a new issue in large file size file upload with apache http server connection.

Apache http server terminate the connection due to timeout expired issue.

When we call our file upload api with https (apache http server) connection url to upload large file size (approax >250 MB) on azure storage, it throws below error

image

And when I checked apache server error logs and I found the below cause:-

[Wed Jan 20 20:24:28.731934 2021] [proxy_ajp:error] [pid 24840:tid 16476] (OS 10060)A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. : AH01030: ajp_ilink_receive() can't receive header [Wed Jan 20 20:24:28.731934 2021] [proxy_ajp:error] [pid 24840:tid 16476] [client 172.16.247.167:56821] AH00992: ajp_read_header: ajp_ilink_receive failed [Wed Jan 20 20:24:28.731934 2021] [proxy_ajp:error] [pid 24840:tid 16476] (70007)The timeout specified has expired: [client 172.16.247.167:56821] AH00893: dialog to 172.16.247.167:10777 (tr-6gy1y33.hqdev.highq.com) failed

image

When we call our file upload api with http (without apache http server) connection url to upload large file size (approax >250 MB) on azure storage, it works fine with expected response:-

image

Overall issue description is here:-
In both cases, file is uploaded successfully on azure storage but it creates issue with any http client (e.g. apache http server, httpURLConnection client in java etc) and it throws timeout issue as per above attached apache logs.

As we understand the issue, it may be the cause as like below:-

  • Any http client which make calls our file upload api for http request (like apache http server works), they have connection timeout which is 3 min as per standard.
  • So when they creates connection to our file upload api via our microservice instance, connection goes to idle state due to your azure service blob upload caching mechanism.
  • I believe azure service do cache data within buffered upload implementation to determine whether to upload the data as a single put or as multiple stage blocks. This value is based on ParallelTransferOptions.maxSingleUploadSize whose default value is 256MB.
  • Due to this caching mechanism, it throws timeout error to apache server or any caller application. Because it keeps connection in idle state without any response sent while determining whether to upload the data as a single put or as multiple stage blocks.

I hope you understand the issue.
Can you please help us to resolve this issue?

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 20, 2021
@harisingh-highq
Copy link
Author

Hi @gapra-msft
code snippet is same which is already described in #18002
` BlobClient blobSyncClient =
AzureHelper.getBlobContainerClient(AzureHelper.getBlobServiceClient(serviceEndpoint, account, key),
container).getBlobClient(destFile);

		MessageDigest md5Digest = null;
		try(
				BlobOutputStream blobOutStream = blobSyncClient.getBlockBlobClient().getBlobOutputStream(true);
				PipedOutputStream pout = new PipedOutputStream();
				PipedInputStream pin = new PipedInputStream(pout, 1024*4*6);
				BufferedInputStream bin = prepareEncryptStream(encryptionFlag, is, parametersMap);
				ReadableByteChannel channel = Channels.newChannel(bin);)
		{
			md5Digest =MessageDigest.getInstance("MD5");
			int readBufferSize = CacheManagement.getInstance().getApplicationPropertyByParameter(
							Constants.HYBRID_INTEGRATION_UPLOAD_FILE_BUFFERSIZE) != null
									? (Integer.parseInt(CacheManagement.getInstance().getApplicationPropertyByParameter(
															Constants.HYBRID_INTEGRATION_UPLOAD_FILE_BUFFERSIZE)
													.trim())
											* 1024)
									: 1048576; //Read 4096 bytes in one channel.read 
			java.nio.ByteBuffer byteBuffer = java.nio.ByteBuffer.allocate(readBufferSize);//4KB default buffer
			int len;
			int pinLen;
			while((len=channel.read(byteBuffer))>=0)
			{
				md5Digest.update(byteBuffer.array(), 0, len);
				pout.write(byteBuffer.array(), 0, len);
				if(pin.available()>(1024*4*3)){
					byteBuffer.clear();
					pinLen=pin.read(byteBuffer.array());
					blobOutStream.write(byteBuffer.array(), 0, pinLen);
				}
				byteBuffer.clear();
			}
			pout.close();
			byteBuffer.clear();
			while((pinLen=pin.read(byteBuffer.array()))>0)
			{
				blobOutStream.write(byteBuffer.array(), 0, pinLen);
				byteBuffer.clear();
			}
		}
		
		StringBuilder md5Hash = new StringBuilder();
		if(md5Digest !=null){
			byte[] plainBytes = md5Digest.digest();
			for (int i = 0; i < plainBytes.length; i++) {
				md5Hash.append(Integer.toString((plainBytes[i] & 0xff) + 0x100, 16).substring(1));
			}
		}
		
		blobSyncClient.setHttpHeadersWithResponse(new BlobHttpHeaders().setContentMd5(decodeHex(md5Hash.toString()))
				.setContentType(FILE_CONTENT_TYPE), null, null, NONE);
		BlobProperties properties = blobSyncClient.getProperties();
		responseMap.put(HybridConstants.PARAM_FILE_SIZE, String.valueOf(properties.getBlobSize()));

		String md5Value = RepositoryHelper.getInstance().convertMD5ByteToHexEncodedString(properties.getContentMd5());`

Thank you

@harisingh-highq harisingh-highq changed the title Large file size (>250 MB) file upload on azure storage fails with any http client connection e.g. apache http server Large file size (>250 MB) file is uploaded successfully to azure storage but it fails to return success response to any http client connection e.g. apache http server Jan 20, 2021
@gapra-msft
Copy link
Member

Hi @harisingh-highq Thank you for posting this issue.

Could you please clarify what the issue you are seeing with the SDK is? From what I read it looks like the upload jobs succeed.

@Dhaval8951
Copy link

Hello @gapra-msft
Me and @harisingh-highq are working on same project.

The usecase with which we are facing issue is large file upload like : 500 MB , 1GB , 2GB in time bounded manner of 120 seconds.
With large file upload we also needs to calculate MD5 which we are doing at our end as mentioned code in previous comment, Hence we are using BlobOutputStream.

Now as suggested in #18425 we are using the ParallelTransferOptions.

In this case if I am using attributes of ParallelTransferOptions in below manner:
MaxSingleUploadSizeLong : 32MB
BlockSize : 4MB
MaxConcurrency = FileSize(MB)/8 ( If 512MB then MaxConcurrency = 64 , if 1024MB then it is 128)

Now with above values of attributes my 500MB file seems to work if there is smooth operation (i.e. not connection errors), The response from Azure is received within 120 seconds .

But when trying with 1GB this doesn't seems to work. (Response is not received in timely manner)
Also with 1GB File : concurrency is 128 , Also I can see lots of connection errors and then block is getting retried which consumes more time.

  1. Could you please suggest the best use of above attributes(ParallelTransferOptions) so that large file use case can work smoothly in timely manner ?
  2. Also if you can explain the time gap like : I have transferred all 512 MB (as that is visible with ProgressReceiver of ParallelTransferOptions) and response from Azure, that is in minutes/seconds. Is Azure doing some buffering on Server end or anything else ?
  3. Also if we can resolve below connection errors during block upload , that would run in smooth operations.

[WARN ] reactor.netty.http.client.HttpClientConnect[id: 0xc0c9e63c, L:/192.168.1.234:60956 - R:hybridblobtest.blob.core.windows.net/100.64.1.2:443] The connection observed an error java.util.concurrent.TimeoutException: Channel response timed out after 60000 milliseconds. at com.azure.core.http.netty.implementation.ResponseTimeoutHandler.responseTimedOut(ResponseTimeoutHandler.java:54) at com.azure.core.http.netty.implementation.ResponseTimeoutHandler.lambda$handlerAdded$0(ResponseTimeoutHandler.java:40)
Also below one as well
reactor.netty.http.client.HttpClientConnect[id: 0x0b4d0fd0, L:0.0.0.0/0.0.0.0:61589] The connection observed an error reactor.netty.http.client.PrematureCloseException: Connection has been closed BEFORE response, while sending request body

Thank you
Dhavalkumar Chauhan

@gapra-msft
Copy link
Member

Hi @Dhaval8951

Thank you for clarifying the issue. I've answered your questions below.

Could you please suggest the best use of above attributes(ParallelTransferOptions) so that large file use case can work smoothly in timely manner ?

For this one, I'll refer you to @rickle-msft's response on this thread.
"Performance tuning is a difficult question that is highly dependent on your environment. The resources on your machine, the traffic on your network, and a variety of other factors all play a role. We cannot therefore give universally applicable guidance and can only offer general suggestions and tips, else we would have simply picked the most performant values for you. The bulk of the work will be for you to assess your own setup and implement some performance/load tests to experiment with different configurations as sometimes the difference will only be seen at scale and load. We can, however, answer specific questions about how certain options will affect the behavior and request pattern of the sdk that you can use to inform your testing. If you have any such specific questions, please do let us know."

Also if you can explain the time gap like : I have transferred all 512 MB (as that is visible with ProgressReceiver of ParallelTransferOptions) and response from Azure, that is in minutes/seconds. Is Azure doing some buffering on Server end or anything else ?

Yes, so we send information to the progress receiver every time a stage block request for a chunk of your data is sent (not when the response is received from the service). So there can be some delay between when the ProgressReceiver gets updated versus when the upload actual completes. Furthermore, after all the stageBlock requests are sent a final commitBlockList request must be made to commit the data. Once all of these requests are made and are completed, the upload method returns back it's final response.

Also if we can resolve below connection errors during block upload , that would run in smooth operations.

Could you let me know what version of azure-storage-blob and azure-core you are using?

@Dhaval8951
Copy link

@gapra-msft
Thank you for your quick response.

Could you let me know what version of azure-storage-blob and azure-core you are using?

We are using azure-storage-blob : 12.9.0 , and its compile time dependency azure-core : 1.10.0

@harisingh-highq
Copy link
Author

harisingh-highq commented Jan 22, 2021

@gapra-msft
Thank you for your continuous support on this thread.

But regardless of our network traffic and other factors, I can say large file size upload is not working after some file size.

FYI, We also support other file storage protocol in our product. We are able to upload large file size(500 MB, 1GB, 3GB etc.) in below storage protocol which we're using in same product with same environment( infra, n/w, speed etc.) at same time.

  • SMBJ library (SMB2 protocol)

  • AWS S3 Cloudian storage

  • Also we've tried with upgrading azure-storage-blob library to 12.10.0 still its not working.

  • Also we've tried to remove blobSyncClient.getBlockBlobClient().getBlobOutputStream code and just replace it
    with simple blob sync client upload method call as per below screenshot but it is also not working with >300MB file

image
As I checked in azure logs, in above case it also chunk file data to blocks and call block request automatically if file size is >256 MB.

### Please take a note that with same infra & environment, same n/w at same time, we are able to upload large file in other storage protocol (SMB2, AWS S3 cloudian etc.) via same product but its not working in azure blob storage.
In investigation, we found that in other storage library, file stream reading and writing to storage happens parallelly
but in azure blob storage library, it first takes time to determine block size based on stream received then call service for actual write/upload. It looks like that there is definitely some discrepancy in caching mechanism of stream OR block size calculation based on stream data which can cause this delay issue.

Also with below configuration suggested by @Dhaval8951, it works in his environment till 1GB but its not working in my environment for 500 MB too.

MaxSingleUploadSizeLong : 32MB
BlockSize : 4MB
MaxConcurrency = FileSize(MB)/8 ( If 512MB then MaxConcurrency = 64 , if 1024MB then it is 128)

### So my question is how we can fix it with above configuration which may work for one user but it may not work for other user? Also this configuration can work fine till certain file size. after then it may also stuck with limitation/boundary of file size

<Ashish: Edited out>

I request you to schedule Microsoft team/zoom call with us if you don't mind
So over the call, we will show you the scenario and we might come to conclusion.

It would be great help to us. We look forward to hearing from you.
Thank you

@amishra-dev
Copy link

@harisingh-highq
Please be respectful, sending messages is all caps is not acceptable.

@amishra-dev
Copy link

@Dhaval8951 since they are many moving parts here, and its not immediately clear that its a Java SDK issue, can you reach out to Microsoft Support? I am wondering if there is some Apache Server setting or something in your environment that they can help you out with.

@harisingh-highq
Copy link
Author

@gapra-msft @amishra-dev
Sorry for the inconvenience caused to you by my comment.

We are looking for any hint/direction from you expert guys to resolve this issue.
Thank you

@joshfree joshfree added Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) labels Jan 26, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jan 26, 2021
@rickle-msft
Copy link
Contributor

@harisingh-highq have you guys been able to reach out to Microsoft Support for some more direction? If so, I will close this issue if there is a case open over there. I believe the reason we suggested reaching out to support is because it looks like the scope of your issue is beyond what we are going to be able to assist with.

@gapra-msft
Copy link
Member

Closing due to inactivity. @harisingh-highq please feel free to reopen the issue if you are still hitting the problem.

azure-sdk pushed a commit to azure-sdk/azure-sdk-for-java that referenced this issue May 5, 2022
[PostgreSQL] Adding a new privatepreview version for fast-provisioning feature (Azure#18700)

* add new files

* Add getCachedServerName API

* location capability property update

* revert the wrong change

* Add new line and change formats

* fix ci issue

* Remove example file

* Re-add example file

* Added readme files

* Add location name

* Add definitions for required parameters

* Run prettier and handle model validation isue

* Remove additional property

* prettier run again
azure-sdk pushed a commit to azure-sdk/azure-sdk-for-java that referenced this issue May 5, 2022
Revert "[PostgreSQL] Adding a new privatepreview version for fast-provisioning feature (Azure#18700)" (Azure#18915)

This reverts commit 87b55611561add58ab0086f8c442e0bd25a51651.
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

6 participants