Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending a long topic message under linux produces timed out exceptions #374

Closed
pizerg opened this issue Jun 28, 2013 · 4 comments
Closed

Comments

@pizerg
Copy link

pizerg commented Jun 28, 2013

Dev : 5
Test : 3
PM :

When sending a BrokeredMessage to a topic under linux OS with long body (for example a Map<String, String> with many entries (100 or less are usually enough) produces read timed out exceptions due to no response from REST API services.

@xuezhai
Copy link
Contributor

xuezhai commented Jun 28, 2013

@gcheng
Copy link

gcheng commented Jun 28, 2013

I am wondering whether this is Linux specific or platform independent situation!

@gcheng
Copy link

gcheng commented Jul 17, 2013

I cannot repro it on Windows, it is all working fine..

@xuezhai
Copy link
Contributor

xuezhai commented Aug 2, 2013

unit test code reviewed

@xuezhai xuezhai closed this as completed Aug 2, 2013
conniey added a commit that referenced this issue Mar 10, 2019
* Made some classes/fields private

* Block blob tests

* Changing method of comparing flowable data to buffer list

* Only fromFile and fromBuffer tests remaining

* Finished up transfer manager tests

* Resolved some CR comments

* CR feedback

* Updated some project infrastructure. Fixed some failing tests. Three tests still failing, but they are not critical

* Added some missing licenses to tests

* Started writing retry policy

* Implemented first test scenario. Not tested.

* Fixed up some container tests that weren't actually validating properly. Got the first retry test working.

* Added test for max retries and made fixes that now correctly capture errors

* Got a bit more of the delay test working

* All retry tests passing

* Some cleanup

* Minor fixes

* CR feedback

* Moved some files around. Made getters on RetryOptions package private

* Removed an extra file left over from a merge

* Update README.md

Adding direct link to the API Reference docs for the Preview release and quickstart

* Tried making some swagger changes

* Another iteration of swagger generation

* Everything broken

* Generated swagger with proper (de)serialization

* Added support for setBlobTier. No tests or samples

* Tests for blob tier

* CR feedback

* Make AzureStorageCheckpointLeaseManager sync internally to get around potential deadlock when checkpointing (#349)

* Doc for append blob create

* Updated file reference

* Added apiNote

* Fixed sample ref path again

* Addressing the repo root as ..

* Started on ref samples. Maybe a third of methods done.

* Moved the sample tags to the bottom

* added a few more samples

* More samples

* More samples

* More samples

* PageBlob Samples

* Added lease samples

* api ref samples complete

* soft delete support. no tests or samples yet

* Changelog

* Working on tests

* Soft delete tests

* Changelog and BreakingChanges

* CR feedback

* Added api ref samples for tiers and soft delete

* Fixed swagger README

* Update EPH dependences. Update EPH version for release. (#353)

* Fixed an api note to generate docs

* Appveyor is running EPH tests that should not run (#356)

* Appveyor is running EPH tests that should not run.
The tests only run if certain environment variables are set.
I don't know how or why they are set. Force Appveyor to unset them.

* Maven 3.5.2 doesn't exist on apache servers anymore, update to 3.5.4.

* PutBlockFromURL

* Improved ContainerURL tests

* Updated api ref samples

* Cleaned up some types in tests that were causing failures

* Appveyor detection fix (#358)

EPH test cases which need actual Azure entities look in environment variables for the connection strings and skip if not found. Those environment variables shouldn't be set in Appveyor but the test cases were trying to run anyway, and failing. Syntax for appveyor.yml allows for setting environment variables but it isn't obvious how to delete them, so change code to also check APPVEYOR environment variable.

* Fixing API ref links

* 10.0.2-Preview release prep

* Updated dependency and javadoc configuration

* Generated with new autorest version to comply with new runtime

* Updated dependency issue. Releasing 10.0.3-Preview to fix

* EPH 2.0.1 release -- fix deadlocks (#357)

Due to a threading change in client 1.0.2, EPH 2.0.0 has the potential to deadlock due to thread starvation under a particular set of conditions:

* The Event Hub has more partitions than the threadpool has threads
* AND the application is using the default Azure Storage-based checkpoint manager
* AND the IEventProcessor.onEvents implementation checkpoints
* AND a large number of partitions on the same host attempt to checkpoint at almost the same time.

The problem: The threading change in client 1.0.2 means that IEventProcessor.onEvents is executing in the same threadpool where the checkpoint manager runs. To ensure ordering of checkpoints, it is important that the onEvents implementation waits for the checkpoint call to return, which means a thread in the threadpool is blocked. If there are too many checkpoint calls in rapid succession, it is possible for all threads in the threadpool to be blocked, leaving no threads available to perform the checkpointing.

The fix: This condition occurred because the Azure Storage-based checkpoint manager was attempting to be asynchronous internally, even though the API exposed by Azure Storage is synchronous. Thus, writing one checkpoint required two threads: one running onEvents and blocking, and another to run the task that calls Azure Storage. Since the Azure Storage API is synchronous, this is wasteful at best. Refactored the default checkpoint manager to be mostly synchronous and return pre-completed CompletableFutures. This means that onEvents and the Storage calls run on the same thread. That thread still ends up blocked waiting for Storage, but even if every thread in our threadpool ends up blocked that way, the Storage client has its own separate threadpool internally and can always make progress. The Storage calls will always return (successfully or not) and unblock our threads.

* Distinguish between nonexistent entity and other errors which return amqp:not-found (#354)

* Update how to add properties (#361)

* Websockets support (#362)

1. Supports max_frame_size of 4k (current limitation of <a href="https://github.com/Azure/qpid-proton-j-extensions">qpid-proton-j-extensions</a> library)

2. doesn't support PROXY on websockets - this will follow.

Amqp over WebSockets is particularly used when enterprise policies (ex: firewall outbound port rules) doesn't allow traffic on the default Amqp secure port (`5671`).
To send or receive over websockets - which uses port `443`, set the `TransportType` on `ConnectionStringBuilder`, like this:

```
connectionStringBuilder.setTransportType(TransportType.AmqpWebSockets)
```

related: #264

* Update Overview.md (#363)

* Fix params

* Added worm support

* getAccountInfo apis added

* Improved append blob tests

* Static websites

* Updated blob tests

* Most page blob tests

* Retry fixes

* Fix Overview files & update proton library version (#365)

* update API in the overview docs to work with latest version.
* update proton version.

* Service SAS Signature Values correctly handles nulllable fields

* Page blob tests updates

* Fixing links for API reference

* Got some boilerplate going

* Got some preliminary tests written and passing

* Added some tests

* downloadToBuffer complete

* downloadBlobToFile written

* Nothing really

* Basic test passing

* downloadToFileTests complete

* Moved sample tag

* added a cast to prevent a potential overflow

* Added etag locking

* Made the options properties final as possible

* CR feedback

* Nothing special

* Fixed issue in etag locking that messed up access conditions

* Removed memory mapping from TransferManager

* CR feedback

* ServiceSAS tests

* SAS tests complete

* Updated links for api refs

* Logging tests

* Updated changelog

* Added a test to ensure uploadFromFile is replayable

* Last minute cleanup before release

* Fixed some transiet failures in incremental copy tests

* RequestResponseChannel should not throw OperationCancelled. (#372)

* Implement retry mechanism for Receiver and Sender creation (#373)

* msgreceiver creation - retry on transient errors
* messagesender creation retry
* always throw TimeoutException on transient failures
* throw schedulererrors to usercode
* verify retry invocation in unittests

* update version for releasing version 1.1.0 (#374)

* Release commit for package com.microsoft.azure.eventhubs version 1.1.0  (#375)

* websocket support (#362)
* fix an issue with ExceptionContract - when request-response channel closes with transient error (#372)
* include PartitionReceiver and PartitionSender creation to participate in RetryPolicy (#373)

* Interfaces for helper types updated

* Added reliable download functionality to DownloadResponse

* Most tests converted to downloadResponse

* Removed RetryReader

* Cleanup

* Reverted to retrying on IOException

* Small doc update

* Added context class

* Switched to autorest generated headers and access conditions

* Fixed tests

* Extracted TransferManager inner classes to top level

* Added hints to user that Flowables for upload must be replayable

* Incorporated new autorest exception type

* Updated some tests

* Changelog breakingChanges

* Generated with context feature

* Added context parameter. need to add tests

* Started context tests

* Infra for context tests

* Halfway through context tests

* Added tests for context

* Changelog breakingChanges

* Deleted some old files

* Fixed a broken test

* Release cleanup

* Added min overloads

* WebSockets Proxy support (#378)

* followup: websockets proxy support (#381)

* Cleaned up and reformatted some docs immediately postGA. Trying establish consistent style. Other misc cleanup

* Added some test infra and logging to help debug errors in CI

* Added package info for blob package for javadoc description

* Fixed up api ref samples link and CI

* Added deep sync copy support

* Added support for progress reporting

* Fix failing unit tests (#381)

Fix some unit tests to no longer depend on specific test environment.

* Fixed some overflow bugs in TransferManager and reliable download setup in BlobURL

* CR feedback and updated Contributing doc

* Added issue template

* Added min overloads

* Cleaned up and reformatted some docs immediately postGA. Trying establish consistent style. Other misc cleanup

* Added some test infra and logging to help debug errors in CI

* Added package info for blob package for javadoc description

* Fixed up api ref samples link and CI

* Added deep sync copy support

* Fixed some overflow bugs in TransferManager and reliable download setup in BlobURL

* Added support for progress reporting

* CR feedback and updated Contributing doc

* Release prep

* * Fixed a de-dup - of Delivery event issue on proton receivelink (#391)

* Updated LoggingFactory to also log request HTTP method, URL, and headers (#394)

* Updated LoggingFactory to also log request HTTP method, URL, and headers

* Updated README to include table

* Update README.md

* Update README.md

* Update version numbers for EPH 2.0.2 release (#394)

* Fixed a flaky block blob test

* Added policy to set request on response object

* Added custom deserializer to fix bug that would sometimes return incomplete listing results

* Added support for slf4j. Added default logging

* 10.3.0 release prep

* Update ChangeLog.txt

* Update ChangeLog.txt

* EPH PartitionPump fixes (#401)

* Handle tmpdir with no trailing separator.

* Fix IO pipe stuck issue due to aggressive timer scheduling (#402)

There is an issue in the logic where operation timeout timer is scheduled as part of a receive call and due to the bug operation timer can be scheduled multiple times although it is supposed to be set only once when there is no pending receive call. Over time as the operation timer keeps getting scheduled during receive API call and scheduled again by the reactor thread while the event is fired and being handled, the number of pending IO operations on the pipe gets incremented significantly and it can cause IO pipe stuck and blocks a write operation on the pipe, which in turn, blocks a receive API. There are two changes to address the issue: a) ensure that operation timer is scheduled at most two times to avoid excessive IO operations on the pipe and b) read all bytes from the channel when signaled so that there are no remaining bytes in the channel.

* Prep for releasing client 1.3.0 and EPH 2.1.0 (#403)

* Release client 2.0.0 and EPH 2.2.0 (#416)

* Update Apache Proton-J dependency (0.29.0 --> 0.31.0) (#407)

* PartitionReceiver - add a method that provides an EventPosition which corresponds to an EventData returned last by the receiver (#408)

* Support IsPartitionEmpty property for PartitionRuntimeInformation (#399)

* Move setPrefetchCount API to the ReceiverOptions class from the PartitionReceiver and update the settings of Default & Max Prefetch count (#410)

This pull request includes two major changes related to Prefetch API.

1) Move setPrefetchCount API to the ReceiverOptions class so that prefetch value specified by a user can be used instead of using default value when communicating to the service during link open and initializing a receiver. This change also addresses the receiver stuck issue caused by setPrefetchAPI in a race condition.

2) Change the default value and set the upper bound of the prefetch count. Note that prefetch count should be greater than or equal to maxEventCount which can be set when either a) calling receive() API or b) implementing the getMaxEventCount API of the SessionReceiverHandler interface.

* Fixes several issues in the reactor related components (#411)

This pull request contains the following changes.

1) Finish pending tasks when recreating the reactor and make sure pending calls scheduled on the old reactor get complete.
2) Fix the session open timeout issue which can result in NPE in proton-J engine.
3) Make session open timeout configurable and use the value of OperationTimeout.
4) Update the message of exceptions and include an entity name in the exception message.
5) API change - use ScheduledExecutorService.
6) Improve tracing.

* Implement comparable on EventData (#395)

* Update receive/send link creation logic and improve tracing (#414)

* Prep for releasing client 2.0.0 and EPH 2.2.0 (#415)

* Removed some lingering dev notes

* Changed test behavior to allow for absence of some creds

* Release prep

* Upload from non replayable flowable

* 10.5.0 release prep

* Standardize indention and publishing (#2974)

* SpotBug P3 Fixes (#2973)

* Adding suppression for having named static inner classes from anonymous.

* Add suppression for redundant null check.

* Move SIC_INNER_SHOULD_BE_STATIC suppression

* Make anonymous class in CachingKeyResolver a named inner static.

* Include missing impl classes for auto-rest when it generates a variable with known null value

* Suppressing unconfirmed casts because we know the cast is correct but the APIs weren't designed for it

* Bumping up resource-manager parent and eventgrid.v2019_01_01 to version 1.0.0 (#2975)

* Bumping up resource-manager parent pom to 1.0.0

* Bumping up eventgrid.v2019_01_01 to version 1.0.0

* code restructure

* [AutoPR privatedns/resource-manager] Add private dns module name for java codegen (#3009)

* Generated from e92e525831933bedf2242fe607984757387be1b2

Add private dns module name for java codegen

* Update pom for private dns

* change private dns version back to beta

* SpotBug P3 fixes (#2976)

* Suppress using Nullable parameter as non-null for KeyVaultKey because it was a scenario in original SDK not thought of.

* Suppress return empty array rather than null

* Add null check for service to ensure it has been initialized

* Checking for null service value before invoking method

* Catching specific exceptions rather than Exception e

* Use 8L so that the integer multiplication is not implicitly cast to long.

* Add exclude for catch Exception in CachingKeyResolver

* Catching specific exceptions rather than Exception e

* Update YAML formatting (#3033)

- Update whitespace to match Prettier formatting
- Add .prettierrc.yml
- Part of Azure/azure-sdk#225

* adding impression pixel (#3024)

license/cla override. Check isn't coming back for some reason.

* Add .editorconfig and .gitattributes (#3036)

* Adding .editorconfig

* Add .gitattributes

* Reformat pom.client.xml

* Reformat keyvault/data-plane/pom.xml

* Reformat batch/data-plane/pom.xml

* Enable Checkstyle fail on error / violation for Key Vault (#2969)

* Now that KeyVault data-plane has no more checkstyle errors or violations, we are now enforcing build failure for this project and all child modules.

* Install build tools locally so that checkstyle can run.

* Fix whitespace

* It is not necessary to run checkstyle:check as the build will fail satisfactorily under the analyze job.

* Record playback integration (#2870)

Record/Playback Integration in Azure Batch Java SDK tests. 
Added Test recordings to be used in playback mode.

* Normalising lines (#3050)

* Fix for issue #2874. Add a cache for the tokens in case we are getting (#2875)

tokens from IMDS.

* Run dependency checker during Analyze (#3075)

* Run dependency checker during Analyze

* Add docs links
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants