-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rumble-Associated Disconnection #189
Comments
Is it possible to add an option to filter out rumble events and to not forward them to the controller? I'd rather use the controller without rumble than have the device disconnect. It could serve as a temporary workaround. |
This problem is not specific to the Elite controller, it depends on the BT dongle used. I'll leave this open to collect some more feedback but I'm tempted to close it in favor of #171 (which is closed due to being identified as a dongle issue). However, I'm not sure if that's a hardware issue with the dongle or something that's going wrong in the BT driver of Linux. Comparing against Windows could help. In any case: Yes, you can already filter out rumble events. Just pass |
@hibes One idea... Maybe you could run While monitoring, I'd suggest to disconnect all other BT devices, and please turn off the controller before starting the monitor so we can capture the connection init, too. |
This was on a fresh boot, still a pretty enormous file, but there shouldn't be much excess noise and the rumble occurred fairly quickly. |
Okay, it looks like it disconnected after around 26 seconds, then immediately reconnected but probably the rumble was now playing continuously or stopped playing while the controller still reported input events. This sent cmd
Immediately after it, the controller semi-disconnected as btmon exits sniff mode (but I can only guess that's what happening, there's no explicit disconnect):
But eventually btmon entered sniff mode again and we continue seeing input events
Another rumble sent to the controller and the cycle repeats:
We are still receiving input:
The controller seems to mostly work until around after 67 seconds when another rumble command is sent:
It now receives multiple rumble commands in succession and then starts to send bogus event data:
The bytes If you'd watch the logs a little longer you would probably see some axis stuck at particular values, and a few moments later no more inputs are being reported and the controller disconnects, resets after a timeout, then reconnects. @hibes Can you confirm this observation? Since our driver actually does nothing else than sending and receiving HID reports, we have absolutely no control about how the Bluetooth protocol works with the controller. I think this is something that has to be fixed in the Bluetooth drivers or the controller firmware. Something about how Linux uses the Bluetooth protocol seems to negatively impact the stability of the controller firmware. The Linux Bluetooth stack is probably very standards compliant, so it's a quirk of how the firmware works. Maybe it can be fixed in the Linux driver stack, maybe not. It doesn't even look like we are overwhelming the controller with commands, the strange btmon behavior seems to happen out of the blue just on sending Is anyone more familiar with the Bluetooth protocol who may explain why this is happening? Or maybe anyone who could recreate a similar Bluetooth protocol log in Windows? If the issue cannot be reproduced in Windows, maybe we find a difference in how the Bluetooth protocol is handled... |
I've broken down some of the individual assertions and questions that I saw. Let me know if I missed something.
The latter. I think I didn't have functioning inputs until that point.
Yep, this stuck axis behavior occurs almost every time, I think it has to do with the state of the joystick when the rumble occurs.
In past experiences with the controller, what you're describing is exactly what happened. Did you want a log that shows that? It's not hard to reproduce if it'd be helpful. |
If you can confirm with any logs that the rumble problem results from specific joystick positions, then I'll look at such logs. But I was able to reproduce the problem with the directional rumble test program in this repository even with the joysticks idle. Looking at the stale branches of this repository, there seemed to exist an idea that the problem may be related to trigger rumble. Maybe try to disable the trigger rumble. It would be interesting to see a Bluetooth monitor log of Windows sending rumble to the controller... Maybe the controller doesn't like how we handle some combination of rumble values. I don't think it is related in any way to the joystick position but if you can prove otherwise, I'd be fine with that.
I've seen cases during tests where the rumble would stuck (either no more rumble at all out of the blue or continuous rumble that doesn't stop) while I was still able to control the game camera and character. But that situation only lasted a few seconds, maybe a minute or two at best, until the controller completely stopped working with the Xbox logo blinking. After around 30 seconds, the controller would reconnect but rumble no longer works. But the sticks and buttons are working again. |
BTW:
From the HID report descriptor:
This tells us that the maximum axis value is 65534, or This may actually be a bug in the decoder I'm using for the Whatever the case is, the value 65535 is valid for the kernel, so we are having no problem resulting from this. This may indicate that this value is actually no indicator of the firmware stability, still I see values of some axis sometimes stuck just before the controller disconnects. If you want to do some tests, look for data starting with
The Linux mode is a bit strange with 15 bits mapping to 10 buttons (with some bits obviously dead), that's why I remapped the button bits in the driver so it stays compatible with old-school JS interface and matches Windows mode. Ignore
If you are seeing
I'd expect one should see What's the value of |
Hi, I'm using Intel 7260hmw bluetooth (laptop wifi/bt card) connected via adapter to pci express (I'm on desktop PC). |
@Mikaka27 I hope that kernel 5.7 fixes a few problems for us, it seems to mostly stem from how the controller interacts with the Bluetooth stack. And that's in front of our driver. We cannot do anything about it except maybe try reducing the number of BT packets sent (which I've already done) or flipping the order of some protocol interactions (I was thinking about finding a way to send packets only after receiving some so we won't built up buffering in the controller). I'm pretty sure this is a bug in the controller firmware, and maybe it's something like overwriting internal buffers without locking (it's probably single-threaded but interrupt driven, so locks are still needed). I've once worked on a custom USB HID device with a custom-made firmware maybe 10 year ago, and that had similar issues that went away when the firmware was fixed to do proper locking on the IO buffers for the HID protocol by disabling interrupts. Otherwise the stability of the firmware vastly depended on the amount and order of data you received and sent to the device. That was USB directly, tho. We have the Bluetooth and HID drivers in front of us and cannot really steer the packet order during concurrent receive/sent. Seeing some axes stuck while other still work is usually a strong indicator of a bug in an interrupt-driven firmware. But I could not yet actually see a clear and conclusive pattern to support this observation. |
Please refer to #198 if this is a new issue that showed up only after a kernel update. |
I also got random disconnects which occur when some rumbling effect occurs. Info:
I captured a log with I started the log, then I turned the two controllers on. Then I played a game with both controllers. I ended the log taking shortly after one of the controllers rumbled continuously and did not accept any input. I shutdown the controller by pressing the main button for several seconds. Then I turned it on again. (I am not sure if I ended the log taking before or after re restarting of the controller). If there is anything else you would need, please let me know. |
Does the second controller still work when the first fails? |
There's definitely something wrong with the controller's rumble packet handling in the firmware. I ran into a similar issue with my driver for the wireless dongle where the controller vibrates for an unusually long time while ignoring any further rumble commands. I never ran into a case where the input stopped working though. I know this might not be very helpful, just wanted to share my thoughts with you. |
I think we're facing two problems here: The one is a buggy Bluetooth implementation in the controller (maybe as part of the firmware) and there're probably also bugs in the kernel drivers especially with some of the chipset revisions used in the vast amount of cheap dongles. And usually affects packet flow "on the air". The other is a firmware bug with the rumble protocol (which may interact with the Bluetooth/firmware part of the first), probably something with interrupt handling and buffer locking within the firmware. That you're only seeing the second may indicate that the original dongle uses a completely different code path in the firmware. I figured out lately (with the direct hidraw test program in I wonder how the Windows driver programs the rumble motors. Maybe this would be a clue at finally fixing this? Maybe it's wrong to program all motors in parallel with only one command? Did you by coincidence do some packet traces of this and analyzed them? |
Oh wow, that makes it even more likely to hit some kind of race condition while processing rumble packets. Which would make sense because I only ran into that continuous-rumble problem in games that send a lot of force feedback events, thereby increasing the chance of hitting such a case.
If I remember correctly, the Windows driver always programmed all motors with the maximum duration at the same time. Much like XInput's XInputSetState function which doesn't even allow you to specify the duration at all. I can do some additional packet captures if you want, either USB or wireless, they use the same packet format. |
Yes, I'm suspecting this. BTW: This works by setting the motor enable bit in byte 2 of the rumble packet only for the motors you want to program. Leaving the other bits zero will not stop them, they just continue to play their last command. Only setting the rumble strength to 0 makes the motors stop if you'd also set the enable bit of the motor. This protocol is really strange: We have those four enable bits, but we also have a strength per motor. It seems like two different engineering teams worked on this and simply smashed their results together. So if there's a race condition: What could be done to alleviate the problem? I can think of some solutions: Sending only reprograms for motors that actually had their strength changed since the last command. Reducing the running time of a command to the absolute minimum needed so the internal looping in the firmware would stop early. In the kernel, we can expect a command every 50ms as this is the resolution of the ff-memless emulation driver sitting in front of us. But I already tried setting sustain to 5, release to 0 and loop to 1 - and that was kinda choppy, the kernel timing isn't quite exact for this. |
Yeah, especially considering that Microsoft's driver apparently doesn't even make use of the motor-bits. They might be used in this UWP API though. The article recommends setting all the vibration values at once.
That seems like the best solution, if it fixes the underlying problem. I only rely on the |
I've set sustain to 60ms now, release to 0, loops to 1. Choppiness seems to be gone (as opposed to 50ms). But it'll need some time to show the stability of this as I only rarely had this issue. Also, this hack conflicts with my "optimization" of sending rumble commands only when motors changed: If I do not send another command the next 50ms cycle, the rumble will just stop. But if it proves more stable, sending reprograms only for changed motors may solve this.
If your driver registers as |
Oh, I see you just posted that it's not an option. ;-) |
Maybe-fixes: atar-axis#189 Signed-off-by: Kai Krakow <[email protected]>
@medusalix I've taken the opportunity to actually implement this optimization instead of hacking it in - it was super simple after my previous efforts of code redesign. I now see the rumble packets with varying motor enable flags, and as such only sending updates for motors that changed:
See #202 Also, I see you adapted the directional rumble into your driver. I've reworked this as direction with |
Maybe-fixes: atar-axis#189 Signed-off-by: Kai Krakow <[email protected]>
It's always the left/main motor for me. The right one never crashes. import time
import evdev
from evdev import ecodes, InputDevice, ff
for name in evdev.list_devices():
dev = InputDevice(name)
if ecodes.EV_FF in dev.capabilities():
break
effect_id = -1
rumble = ff.Rumble(strong_magnitude=65000, weak_magnitude=0)
while rumble.strong_magnitude > 0:
rumble.strong_magnitude -= 1000
effect = ff.Effect(
ecodes.FF_RUMBLE, effect_id, 0,
ff.Trigger(0, 0),
ff.Replay(0, 0),
ff.EffectType(ff_rumble_effect=rumble)
)
effect_id = dev.upload_effect(effect)
dev.write(ecodes.EV_FF, effect_id, 1)
time.sleep(0.008)
for i in range(1000):
dev.write(ecodes.EV_FF, effect_id, 1)
time.sleep(0.005) The first loop iterates over the full magnitude range of the left motor (from high to low). I've had to play around with the delays a little bit, but it does the trick for me. |
I didn't really notice that but now that you write it: Yes, it was always the left motor. But maybe only because of parsing direction of the payload within the firmware? IOT, if we "fixed" this problem, maybe the right motor shows the same problem. But kudos for creating a reproducer script. Will try later. |
I've gone ahead and captured some rumble packets on Windows. Nothing unusual, expect that the magnitudes indeed only ever reach 100 and not 255 (as you've rightfully noticed in your comment). I'll have to fix that for xow. |
In theory, the kernel ff-memless driver only gives us one rumble command per 20ms... I'll try to log it. Thanks for analyzing. |
@medusalix Okay, testing shows that |
Thanks to @medusalix great findings that the controller crashes are most probably indeed a timing issue, this commit introduces a modified rumble protocol which throttles rumble commands sent to the controller. Also, it seems we cannot rely on `ff-memless` to only send us commands once per 50ms although that's the proposed time resolution of the protocol. It may depend on kernel HZ setting. This commit migrates the rumble worker into a delayed worker. When programming a rumble motor, we now remember the timing and allow the next reprogramming only 10ms+ in the future. Further requests will be buffered and the final command will be replayed to the controller when the delayed worker finally runs. Care was taken that we do not accidentally introduce a fixed lag of 10ms for rumble events. Maybe-fixes: atar-axis#180 Maybe-fixes: atar-axis#189 Signed-off-by: Kai Krakow <[email protected]>
Thanks to @medusalix great findings that the controller crashes are most probably indeed a timing issue, this commit introduces a modified rumble protocol which throttles rumble commands sent to the controller. Also, it seems we cannot rely on `ff-memless` to only send us commands once per 50ms although that's the proposed time resolution of the protocol. It may depend on kernel HZ setting. This commit migrates the rumble worker into a delayed worker. When programming a rumble motor, we now remember the timing and allow the next reprogramming only 10ms+ in the future. Further requests will be buffered and the final command will be replayed to the controller when the delayed worker finally runs. Care was taken that we do not accidentally introduce a fixed lag of 10ms for rumble events. Maybe-fixes: atar-axis#180 Maybe-fixes: atar-axis#189 Signed-off-by: Kai Krakow <[email protected]>
Testing looks good so far with my PR #204, I didn't have a single controller disconnect/rumble crash yet. But I didn't have many issues with that anyways. So anybody affected might want to test that PR? |
I wonder whether dropping all rumble commands in that 10 ms window would cause any issues. Implementing a proper queue that buffers any excess rumbles in user space is a tricky thing to do. |
I do not drop them in that way... I'll buffer them and send only the last command buffered (so it's actually a one-packet buffer). The delay until the buffer is sent is calculated dynamically based on the last time we've sent data. For this, the actual send command will write a timestamp+10ms for the next acceptable earliest time. So the first command may send the rumble immediately, successive commands may be buffered until after 10ms, when we send the latest buffered command. So we get a lag of 10ms max. After multiple successive commands, the rumble eventually stops: The way the implementation works, is that we always send the last command seen, so we actually send a stop command no matter what. I think that matches "a clever way to buffer ... only sending the last one" you mentioned above? |
In your driver you may implement this by splitting the command receiver and command sender into two threads: The receiver will update the buffer (put a spin lock around) and signal the sender. The sender will block on a signal from the receiver and read the buffer (put a spin lock around), send the buffer to hardware, then wait 10ms before it loops back to waiting on the signal. I think for the signal, a semaphore could work. Actually, I wanted to implement it that way but the kernel does not allow me to use |
Yeah, sounds like that might be exactly what Microsoft is doing.
Makes perfect sense, thank you. So I don't even need a queue, I just overwrite the last rumble command until the 10 milliseconds are over and I'm allowed to send another packet. |
Yes, just ensure that you actually do this final send even when there's no more incoming rumble requests for some time. A semaphore and a loop in a different thread (that just waits 10ms right after sending the packet) will do the trick. It's a classical producer/consumer algorithm, you may want to Google that if you're unsure how to start. |
The tricky part is that you're working with two different locks (semaphore and spin locks) which introduces a chance of a deadlock and you need to prevent that... Since you want to update the buffer independent of the semaphore, you need to carefully design the interactions between the two threads: You don't want the receiver update the buffer after the sender reads it but before the sender sent it, so the spin lock must last until after sending the buffer to hardware, but not while the sender is sleeping. Your design must ensure that your buffer state is always known-sent or known-pending when the receiver looks at it. You should create a concurrency diagram of your code to see that there's no point in time where you could lose a packet. I guarded my code with a final PS: The wording here is swapped compared to classical consumer/producer: The receiver is the producer here (it produces the packet received by the API), the sender is the consumer (it consumes the buffer by sending it to hardware). You may thus use the wording consumer/producer in your code and put a little comment in context of API and hardware. |
I've had a similar problem in earlier versions of my driver where I needed to synchronize USB reads from two different endpoints. I had decided to put all USB packets into a queue and protected them with a mutex. Another thread was simultaneously reading from that queue while being notified of new packets using a I think I'll stick to the simpler approach of calculating the delta time and discarding rumble commands. While it might (potentially) miss some packets, it does at least save me from opening up that can of worms again. |
An easier approach would be to just send your commands 10ms late... I think, this is barely noticeable. And it's quite easy to get things started with two threads. Adding the "immediate on first packet" is a simple matter then. I've went this route in my code. It took me around 4 hours to get my code into shape. The main problem is: When I crashed or hung the kernel, I had to reboot, and a clean reboot took 30 minutes sometimes, or I even had to hard-reboot. So in the end, it maybe had taken me just 1.5 to 2 hours. Since you don't need to handle a queue, it's pretty simple. If you'd like, I can mentor you a little bit on the synchronization part but I don't know your code good enough to know where to start with thread init and tear-down. |
You may want to look into designing multi-threaded consumer/producer algorithms. There're some good read on the internet with simple examples. Also, I believe that C++ already ships with a FIFO queue implementation suitable for multi-threaded usage in consumer/producer scenario by subclassing it: https://stackoverflow.com/questions/23661759/creating-a-blocking-queue So your main issue was probably abstracting the code properly. |
The main issue was the fact that I couldn't find a way to stop the consumer (when the driver was shut down) without a horrendous amount of locking. I think my solution back then involved at least three different mutexes (imagine the horror). There is also not a safe way to cancel asynchronous USB transfers and I had to handle EDIT: Oh, I forgot to mention that there was a USB hotplug handler too. Also required locking. |
The link I've posted actually covers all of that - it can signal the end of production, so you can properly catch signals and inform the queue to stop consuming and exit cleanly:
What you describe sounds a lot like an abstraction/design issue. I've done a lot of those wrong decisions in my 20+ years of programming career so I think I can smell it when it reeks. ;-) But the interaction with USB probably bloated everything up, the consumer and producer for a single packet buffer should be much easier and without most of the pitfalls, you just need to guard access to the buffer with a spin-lock. Good that the sync API simplified everything, that's always a good choice. You may also get away with a double buffer by using https://en.cppreference.com/w/cpp/atomic/atomic/exchange and swapping the consumer with the receiver buffer. The http://www.cplusplus.com/reference/condition_variable/condition_variable/ then just waits for a signal in the consumer Another idea: Maybe even a triple buffer... That's how graphic applications prevent using vsync. The producer just swaps two buffers it constantly updates, a front and a back buffer (the latter will be updated in the background). The consumer then swap the front buffer with its own buffer. But this introduces a possible lag of one buffer, you need to ensure that the latest buffer is always sent to the device at some point in the future. This should be easy enough by ensuring a proper order of signal and swapping. The advantage of this: It's completely lock-less. Compared to GPU triple buffering: Your receiver (producer) is the uinput client (the application). It will update the back buffer, then swap it with the front buffer. Your sender (consumer) is the hardware (the display), it will wait on the condition (aka sync signal for an adaptive sync display), then swaps the front buffer with its send buffer (the frame buffer for the display), then returns to waiting on the condition. So for naming I suggest using |
Absolutely. I'm not a software engineer by any means. Things can get messy real quick sometimes.
I like to keep things simple and straightforward, you know. Least amount of threads, simple Makefile, the list goes on... Saves you a lot of trouble ;) |
Then triple buffer is probably the way to go as you do not need thread synchronization for buffer access, just signalling. |
Thanks to @medusalix great findings that the controller crashes are most probably indeed a timing issue, this commit introduces a modified rumble protocol which throttles rumble commands sent to the controller. Also, it seems we cannot rely on `ff-memless` to only send us commands once per 50ms although that's the proposed time resolution of the protocol. It may depend on kernel HZ setting. This commit migrates the rumble worker into a delayed worker. When programming a rumble motor, we now remember the timing and allow the next reprogramming only 10ms+ in the future. Further requests will be buffered and the final command will be replayed to the controller when the delayed worker finally runs. Care was taken that we do not accidentally introduce a fixed lag of 10ms for rumble events. Maybe-fixes: atar-axis#180 Maybe-fixes: atar-axis#189 Signed-off-by: Kai Krakow <[email protected]>
Thanks to @medusalix great findings that the controller crashes are most probably indeed a timing issue, this commit introduces a modified rumble protocol which throttles rumble commands sent to the controller. Also, it seems we cannot rely on `ff-memless` to only send us commands once per 50ms although that's the proposed time resolution of the protocol. It may depend on kernel HZ setting. This commit migrates the rumble worker into a delayed worker. When programming a rumble motor, we now remember the timing and allow the next reprogramming only 10ms+ in the future. Further requests will be buffered and the final command will be replayed to the controller when the delayed worker finally runs. Care was taken that we do not accidentally introduce a fixed lag of 10ms for rumble events. Maybe-fixes: atar-axis#180 Maybe-fixes: atar-axis#189 Signed-off-by: Kai Krakow <[email protected]>
Thanks to @medusalix great findings that the controller crashes are most probably indeed a timing issue, this commit introduces a modified rumble protocol which throttles rumble commands sent to the controller. Also, it seems we cannot rely on `ff-memless` to only send us commands once per 50ms although that's the proposed time resolution of the protocol. It may depend on kernel HZ setting. This commit migrates the rumble worker into a delayed worker. When programming a rumble motor, we now remember the timing and allow the next reprogramming only 10ms+ in the future. Further requests will be buffered and the final command will be replayed to the controller when the delayed worker finally runs. Care was taken that we do not accidentally introduce a fixed lag of 10ms for rumble events. Maybe-fixes: atar-axis#180 Maybe-fixes: atar-axis#189 Signed-off-by: Kai Krakow <[email protected]>
Thanks to @medusalix great findings that the controller crashes are most probably indeed a timing issue, this commit introduces a modified rumble protocol which throttles rumble commands sent to the controller. Also, it seems we cannot rely on `ff-memless` to only send us commands once per 50ms although that's the proposed time resolution of the protocol. It may depend on kernel HZ setting. This commit migrates the rumble worker into a delayed worker. When programming a rumble motor, we now remember the timing and allow the next reprogramming only 10ms+ in the future. Further requests will be buffered and the final command will be replayed to the controller when the delayed worker finally runs. Care was taken that we do not accidentally introduce a fixed lag of 10ms for rumble events. Maybe-fixes: atar-axis#180 Fixes: atar-axis#189 Signed-off-by: Kai Krakow <[email protected]>
Describe the bug
After initial connection and the rumble associated, any subsequent rumble sent to the controller seems to cause the controller to hang and subsequent inputs to the controls no longer produce the desired effect.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A continuation of input capability after the rumble occurs.
System information
Additional context
I attempted to search for issues which were explicitly focused on this issue, but the only issues which bring it up merely mention it. I hope this ticket is warranted as a focus explicitly on the rumble issue.
A workaround is to disconnect and reconnect the controller, but this takes a considerable amount of time.
The text was updated successfully, but these errors were encountered: