-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coroutines — Large NOTIFY gets unbearably slow #651
Comments
Hi, |
Yes, I've done some extensive investigation. Initially I thought it was my stupid way of handling the received data (I was recreating the ByteArray by calling GC is also going crazy during these events, even without storing the received ByteArray, at the end running around every 200-300ms and freeing up megabytes of data. I have a feeling there's some background caching going on that causes this issue. |
There is nothing like this with notifications that I know of. Have you checked the connection parameters? What are they set to? How often and how big notifications are? |
Yes I did. PDU is phone-dependent, MTU is always 250. Connection params on connection:
First update (immediately after connection):
After a transfer:
After each transfer (when the unsubscribe from the NOTIFY characteristic happens), the above connection update happens. Based on this, the message interval should be 30ms, which roughly correlates to my previous observation of roughly 10-20ms between messages (which technically was rather around 18-28 after doing some proper math, instead of rough estimate). However a longer transfer still increases the interval from this 30ms to around 150-200ms by the time 4MB is transferred, and even higher if a transfer is ongoing. Connection updates are not present during this. |
Have you monitored the app? What is getting garbage collected — what not? Is this download the only thing that is getting the notifications? |
Yes, I'm monitoring the app - and the RAM usage goes up the moment I start the transfer, and keeps going up. GC calls are mostly explicit, but it does not state the location. There are also some background calls, but they're like 1 out of 100-120 explicit calls. And yes, this is the only flow that receives the notifications - a single instance of the RxBleConnection handler, and a single instance of this class. Did a quick test, just measuring the intervals between received messages, and the intervals clearly increase - we begin at around 10-15ms, and by the time I receive around 10k notifications, it grows to 60ms between messages. Around 20k messages, average interval grows to around 120ms, and it skyrockets higher and higher. Tested the device with other solutions (nRF Connect), and it produced even lower intervals between messages (around 3-6ms!) continuously throughout the transfer, so it's definitely not the device, or the underlying core BLE bits. |
This may be true — easy API usually have performance limitations — basing on RxJava definitely adds some more allocations + additional library abstraction. Your use-case is quite peculiar. You can create an |
That's probably what I'll do, unless I just straight out replace RxAndroidBle with nRF's solution (it fits in better/easier with my current Kotlin Coroutines structure anyway). I also have to write a custom operation to be able to receive the GATT response codes (the device I'm working on has some custom GATT opcodes that I need to process, but not as exceptions - and most of them aren't even thrown as |
I would love to know what will be your results when getting notifications via custom operation. As for the inability to get GATT codes and I had just looked into the source code but all
|
@dariuszseweryn interesting thing with the I'm fairly certain I'm doing the Is it possible that even though I pass the native callback, the (Note: I do plan adding some further error handling to this bit, as there are still a handful of failure points) |
That is the exact funny thing about Android's BLE API. It is not easy to get it right. |
Ah, indeed you're right. Multiple sources suggested that writing the descriptor with |
Report time! The custom implementation, as above, indeed performs as expected - no memory spikes in Profiler, no GC going crazy, and transfer speeds are quite optimal - the previously mentioned 8MB and 16MB transfers complete in about 2 minutes 30 seconds and roughly 5 minutes. Which is much better than the direct RxAndroidBle Something is definitely going wrong in the Rx stack here. I understand that this won't affect most users as |
Glad to hear that using a custom operation helped your case.
That is the usual problem with performant vs. easy-to-use APIs. If you want to get performance you should strive for bare-metal implementations, which in case of Android BLE APIs are not exactly rookie friendly and can lead to a nonfunctional Android BLE stack and/or phone models incompatibility. This library aims for the ease-of-use and is highly reliant on RxJava which on one hand makes it easy to implement thread safe code and on the other adds (quite a lot of) additional allocations.
|
Yes, but shouldn't a transition from a callback to a simple |
It should be linear. I am pretty sure there is no buffering. There is at least one explicit object allocation and a thread switch (which may be the cause of increasing times if it is overwhelmed with events). Other allocations are related to RxJava internals. Maybe it would be possible to optimise current code. I do not know what device/os you have tested on. |
That's what I'd expect too, based on my prior experience with RxJava. I mean, if Rx can generally provide a streamlined access to massive, fast-streaming objects (a previous project of mine, where news articles were broken up into blocks and stored in an SQL database, and Rx was used to stream these objects into the actual article display objects), the same should apply to fast-streaming, but small (250b) objects as well. The thread switching part does sound suspicious though. Testing was done on various devices - Pixel 1 and 2, a bunch of Samsung phones (S6, S7, S8+, Note8, Note9), OnePlus devices (really poor Bluetooth experience in general, but that's a device fault), and some Sony phones. All of them presented the same issue, the transfer time increase between 10% blocks was almost exponential (our best result was a 10-25-60-120-250-280s increase, and that was when I stopped the test at 80% - though I believe the increase started "evening out" at the end, given the small jump from 250s to 280s), and got unbearably slow around the end. |
You may try commenting out the thread switching code ( |
I will see if I have any time - unfortunately I've got a bunch of deadlines before the holidays, and even this three-day party with custom operations have delayed some important work. But it's definitely worth taking a look at, I think - though I understand why you can't test it without any devices that emulate this behaviour. |
I have patched up an nRF52 (PCA10056) based peripheral that spams notifications with MTU 512 Test codeTest code based on
For reference: Your target average throughput is 8MB / 2.5 min = ~56000 Bps. Test results:Samsung Galaxy S8 (SM-G950F) / OS: API 26 (Android 8.0)Profiling did not show energy usage above 10%
OnePlus 3T (A3003) / API 26 (Android 8.0.0)
Samsung Galaxy S6 (SM-G920F) / API 24 (Android 7.0)It is very slow compared to S8 or 3T but it also seem to provide stable performance.
Bottom lineI cannot recreate issues you are facing. I have two ideas on what may potentially be wrong:
QuestionDo you see what may be different between our setups? |
Well, for starters, I'm still using 1.10.4, though I doubt it means much of a difference. Then, the actual difference - I'm using an abstraction layer that takes an RxBleDevice, and wraps its functions with Kotlin Coroutines. A quick excerpt of what I'm doing: https://gist.github.com/fonix232/fef86e6124431d012b2c42734589041b As you can see, I've managed to cobble together a quite performant little Coroutine wrapper (the rest of the app also uses Coroutines, but there's simply no usable solution for BLE on Android that works nice with them, and changing the rest of the app to work Rx-style would be a challenge at this point). However I don't see how Coroutines could cause this issue - there's no thread switching, no heavy computation, and the coroutines-rx conversion is also pretty lightweight on the |
Is this abstraction layer https://github.com/Kotlin/kotlinx.coroutines/tree/master/reactive/kotlinx-coroutines-rx2 ? |
I might have find the culprit of high memory consumption in your code. |
Hmm, that's definitely interesting, and would explain why the slowdown happens only on larger transfers (e.g. an 8MB transfer shows now slowing down up until ~90% at worst, usually none at all). @dariuszseweryn what do you recon, would a batching operation on the Rx end of things possibly mitigate the problem? Given the packet size and frequency, I'm thinking about either using your |
If the issue (and so it seems) is poor performance of data consumer then batching the data should decrease amount of "thread" switches — this may be a good path to take. Batching by time should be better approach I suppose. I have also checked the performance of version This topic is somewhat related to #41 |
What I'm testing now is moving the custom operations' |
I may not be the best person to give advices about kotlin coroutines as I have little experience with them — I just try to rule out that the library is the culprit of bad performance. The thread you use for processing BLE data may be shared with different parts of the code which may indirectly degrade performance of the subject code. Or the processing itself (disc IO?) is slow? Hard to say for me as I am not a part of your project and can only make a stub on my end — sample code seems to work with acceptable performance. |
Except I even tested with no processing - not even storage of the data, just increment a counter - and the slowdown still happens. I will try adapting your flow into a sample app to see if this issue still happens using pure Rx code and nothing else around it. |
We have different results in our tests. You could prepare a repository which I could just copy and install that shows the problem so I can play with it. Without it cannot do much more. |
Two weeks have passed. I am closing this issue. Still — feel free to share a minimal project that displays the issue — I could then start digging /experimenting from there and reopen the topic. Although it seems not related directly to the library, having experience with coroutines could help in the future or give hints whether a coroutines API is feasible for BLE. |
Original topic:
Summary
In my project, I'm connecting to a device that transfers a relatively large amount of data (up to 16MB) via a characteristic's
NOTIFY
. The equivalent code on iOS works fine, however on Android, using RxAndroidBle, it slows down to an unbearably slow speed the more data it receives.MTU is set properly, and ends up being 247 (250 - 3b header), which is the maximum supported on the devices. iOS uses a lower MTU, 162 (165 - 3b header).
I've tried requesting lower MTU to see if it helps, to no avail.
The code is pretty simple, after the connection is established (using
ReplayingShare
), it creates the subscription viasetupNotification
.The transfer starts out with pretty decent speed, 1MB takes around 1 minute to sync, however around 3MB received, the speed drops, and the next MB takes 2 minutes, the next MB then 3-4 minutes, and the transfer time keeps creeping up.
Initially I thought it was my fault, as I was storing the data the worst possible way:
However even after removing the storage of the data, the slow-down still happens.
Library version
1.10.4
Preconditions
Steps to reproduce actual result
Not sure if it's reproducible as I have not seen any devices that use a similar technique to transfer data stored on them.
Actual result
The speed of the transfer drops exponentially (or rather, the amount of time required to transfer the same amount of data increases exponentially). The time between the logged
Transferred $x out of $y
events also increases (initially it's around 10-20ms between messages, which increases to around 100-120ms by the time I transferred ~3MB).Expected result
Speed should be somewhat constant, with minimum variation between the times required to transfer the same sized blocks.
Minimum code snippet reproducing the issue
The text was updated successfully, but these errors were encountered: