-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate framebuffer copyarea function with DMA and add ioctl to access it from userspace #313
Conversation
Based on the patch authored by Ali Gholami Rudi at https://lkml.org/lkml/2009/7/13/153 Provide an ioctl for userspace applications, but only if this operation is hardware accelerated (otherwide it does not make any sense). Signed-off-by: Siarhei Siamashka <[email protected]>
Based on http://www.raspberrypi.org/phpBB3/viewtopic.php?p=62425#p62425 Also used Simon's dmaer_master module as a reference for tweaking DMA settings for better performance. For now busylooping only. IRQ support might be added later. With non-overclocked Raspberry Pi, the performance is ~360 MB/s for simple copy or ~260 MB/s for two-pass copy (used when dragging windows to the right). In the case of using DMA channel 0, the performance could improve to ~440 MB/s. But channel 0 is used by SDHCI at the moment. For comparison, VFP optimized CPU copy can only do ~114 MB/s in the same conditions (hindered by reading uncached source buffer). Signed-off-by: Siarhei Siamashka <[email protected]>
Looks interesting. I'll give it a test. |
There's some suggested "general" instructions in |
Yes, the instructions provided by @hglm at ssvb/xf86-video-fbturbo#10 work fine for raspbian. The compile time is not a problem, because there are really few sources (configure step is the most time consuming). You just need to get the right branch for testing (with the copyarea support added): git clone -b rpi-fb-copyarea-20130617 git://github.com/ssvb/xf86-video-sunxifb.git And IMHO it is cleaner to copy the xorg.conf file to /etc/X11/xorg.conf as the final step. Once the system is rebooted, have a look into "/var/log/Xorg.0.log", it should have the following line: 26.125 SUNXIFB(0): enabled fbdev copyarea acceleration Regarding the default raspbian software. LXTerminal is using 32bpp color depth and is relatively slow (run "xwininfo", select a window, click on it, and get all the interesting information about the used visual and color depth). Debian Reference in Dillo gets the most benefits (both dragging its window and also scrolling the text inside of the browser window is accelerated). Midori gets fast window dragging, but unfortunately no scrolling acceleration. Also geany (not installed in raspbian by default) is an interesting application, it gets some speedup from the hardware scrolling, but still needs some optimizations to make it really smooth. |
I think it should also be mentioned that this feature has an important side effect, namely that for console (non-X) framebuffer scrolling, the kernel also starts using the DMA accelerated CopyArea function, as mentioned in the forum thread, because it detects a hardware accelerated CopyArea function is available. It seems that at the moment DMA based console scrolling is as fast or a little faster than the normal CPU "imageblit" method, which uses the CPU to write the text bitmaps to the framebuffer. However, this is also due to the (somewhat unexpectedly) relatively unoptimized nature of the kernel fb imageblit functions defined in drivers/video/cfbimgblt.c. I have implemented a fairly straightforward optimization to this function, which makes console text writing/scrolling 50-70% faster on the Allwinner platform (I am in the process of testing this on the RPi, where the gain might be even greater). If confirmed, this may result in DMA based console scrolling being slower than the optimized CPU blit method, So in short, unless the waiting for DMA function completion is implemented using IRQ instead of busy-waiting, the relative merit of using DMA for framebuffer console scrolling may not yet be fully established for all cases, and it may be wise to make this side-effect optional instead of automatic or at least make it possible to disable it. This only concerns this side-effect, the utilization in the X driver is of course a big win in every way. Edit: On the RPi with the standard 8x16 font, I am seeing a speed-up of 40% at 16bpp and 50% at 32bpp. The patch is available as https://github.com/hglm/patches/blob/master/kernel-cfbimageblt.patch. Note that this patch is not RPi specific and can help a wide range of devices. Also note that the patch is independent from the DMA CopyArea implementation discussed here and can be complimentary to it; when not used for scrolling it will still be used for any other text writing on the console framebuffer. I may do a pull request later after some more testing. |
Have you tried making DMA wait on an IRQ? It should be straightforward. |
Not yet. I spent the last few days trying to decide which ioctl API to use (the existing dmaer module, the module which provides sunxi g2d emulation or entirely new module) before leaning to the standard framebuffer copyarea implementation. Then it took a bit of time to figure out the "one more row than asked" issue for the 2D DMA transfers. Then experiment with the performance settings (channel 0, burst size, etc.). Then for the copyarea function I tried to investigate why it happens to be not used for the framebuffer console right after the start of the system (apparently logo related). The list of "interesting things to look into" kept accumulating and in the end I just decided to leave out IRQ and some other performance tweaks for a bit later. Otherwise it would take more time before I had something ready for testing. One more reason not to have IRQ wait added right from the start is stability testing. There are some rumors in the forum (or maybe confirmed problems) which correlate SD card corruption with the use of DMA. We probably need to run some heavy stress tests involving simultaneous use of DMA copyarea and some disk activity to see if anything breaks. Also now I wonder if there is enough locking around copyarea as it is obviously non-reenterant. |
@hglm I have implemented a fairly straightforward optimization to this function, which makes console text writing/scrolling 50-70% faster on the Allwinner platform (I am in the process of testing this on the RPi, where the gain might be even greater) I like when people don't miss possible optimization opportunities and do fair evaluation of alternative solutions. Looking forward to see your patches (maybe as a separate git branch / pull request). |
I've built and have:
and have:
I've rebooted and ran startx. But:
|
@popcornmix This is strange. Yesterday I tried these step-by-step instructions with the freshly installed 2013-05-25-wheezy-raspbian.img image and it worked. Maybe you have a higher priority config file somewhere in the system, which enforces the use of fbdev driver? It may make sense to check /usr/share/X11/xorg.conf.d/ directory. |
Looks like the config files in /usr/share/X11/xorg.conf.d take precedence over /etc/X11/xorg.conf. I just tested it. Deleting any /usr/share/X11/xorg.conf.d/99-*.conf file works, or copying /etc/X11/xorg.conf to /usr/share/X11/99-sunxifb.conf. |
I've removed I get the "SUNXIFB(0): enabled fbdev copyarea acceleration" line. Dragging windows seems pretty smooth. |
I've added a flag to dma driver to request bulk channel (i.e. dma channel 0) Can you update PR to use that channel, and adjust the burst length down: |
@hglm |
OK, I'll make a pull request for it. |
Fwiw, there's a start at dmaengine support in |
@ssvb Currently this PR causes lockups. |
With lockdep enabled we get: ============================================= [ INFO: possible recursive locking detected ] 3.4.4-Cavium-Octeon+ raspberrypi#313 Not tainted --------------------------------------------- kworker/u:1/36 is trying to acquire lock: (&bus->mdio_lock){+.+...}, at: [<ffffffff813da7e8>] mdio_mux_read+0x38/0xa0 but task is already holding lock: (&bus->mdio_lock){+.+...}, at: [<ffffffff813d79e4>] mdiobus_read+0x44/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&bus->mdio_lock); lock(&bus->mdio_lock); *** DEADLOCK *** May be due to missing lock nesting notation . . . This is a false positive, since we are indeed using 'nested' locking, we need to use mutex_lock_nested(). Now in theory we can stack multiple MDIO multiplexers, but that would require passing the nesting level (which is difficult to know) to mutex_lock_nested(). Instead we assume the simple case of a single level of nesting. Since these are only warning messages, it isn't so important to solve the general case. Signed-off-by: David Daney <[email protected]> Signed-off-by: David S. Miller <[email protected]>
This has been cherry-picked into 3.12 kernel. |
commit 0a5a988 upstream. __smb2_handle_cancelled_cmd() is called under a spin lock held in cifs_mid_q_entry_release(), so make its memory allocation GFP_ATOMIC. This issue was observed when running xfstests generic/028: [ 1722.589204] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72064 cmd: 5 [ 1722.590687] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72065 cmd: 17 [ 1722.593529] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72066 cmd: 6 [ 1723.039014] BUG: sleeping function called from invalid context at mm/slab.h:565 [ 1723.040710] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 30877, name: cifsd [ 1723.045098] CPU: 3 PID: 30877 Comm: cifsd Not tainted 5.5.0-rc4+ #313 [ 1723.046256] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 1723.048221] Call Trace: [ 1723.048689] dump_stack+0x97/0xe0 [ 1723.049268] ___might_sleep.cold+0xd1/0xe1 [ 1723.050069] kmem_cache_alloc_trace+0x204/0x2b0 [ 1723.051051] __smb2_handle_cancelled_cmd+0x40/0x140 [cifs] [ 1723.052137] smb2_handle_cancelled_mid+0xf6/0x120 [cifs] [ 1723.053247] cifs_mid_q_entry_release+0x44d/0x630 [cifs] [ 1723.054351] ? cifs_reconnect+0x26a/0x1620 [cifs] [ 1723.055325] cifs_demultiplex_thread+0xad4/0x14a0 [cifs] [ 1723.056458] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.057365] ? kvm_sched_clock_read+0x14/0x30 [ 1723.058197] ? sched_clock+0x5/0x10 [ 1723.058838] ? sched_clock_cpu+0x18/0x110 [ 1723.059629] ? lockdep_hardirqs_on+0x17d/0x250 [ 1723.060456] kthread+0x1ab/0x200 [ 1723.061149] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.062078] ? kthread_create_on_node+0xd0/0xd0 [ 1723.062897] ret_from_fork+0x3a/0x50 Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Fixes: 9150c3a ("CIFS: Close open handle after interrupted close") Cc: Stable <[email protected]> Signed-off-by: Steve French <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 0a5a988 upstream. __smb2_handle_cancelled_cmd() is called under a spin lock held in cifs_mid_q_entry_release(), so make its memory allocation GFP_ATOMIC. This issue was observed when running xfstests generic/028: [ 1722.589204] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72064 cmd: 5 [ 1722.590687] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72065 cmd: 17 [ 1722.593529] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72066 cmd: 6 [ 1723.039014] BUG: sleeping function called from invalid context at mm/slab.h:565 [ 1723.040710] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 30877, name: cifsd [ 1723.045098] CPU: 3 PID: 30877 Comm: cifsd Not tainted 5.5.0-rc4+ #313 [ 1723.046256] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 1723.048221] Call Trace: [ 1723.048689] dump_stack+0x97/0xe0 [ 1723.049268] ___might_sleep.cold+0xd1/0xe1 [ 1723.050069] kmem_cache_alloc_trace+0x204/0x2b0 [ 1723.051051] __smb2_handle_cancelled_cmd+0x40/0x140 [cifs] [ 1723.052137] smb2_handle_cancelled_mid+0xf6/0x120 [cifs] [ 1723.053247] cifs_mid_q_entry_release+0x44d/0x630 [cifs] [ 1723.054351] ? cifs_reconnect+0x26a/0x1620 [cifs] [ 1723.055325] cifs_demultiplex_thread+0xad4/0x14a0 [cifs] [ 1723.056458] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.057365] ? kvm_sched_clock_read+0x14/0x30 [ 1723.058197] ? sched_clock+0x5/0x10 [ 1723.058838] ? sched_clock_cpu+0x18/0x110 [ 1723.059629] ? lockdep_hardirqs_on+0x17d/0x250 [ 1723.060456] kthread+0x1ab/0x200 [ 1723.061149] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.062078] ? kthread_create_on_node+0xd0/0xd0 [ 1723.062897] ret_from_fork+0x3a/0x50 Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Fixes: 9150c3a ("CIFS: Close open handle after interrupted close") Cc: Stable <[email protected]> Signed-off-by: Steve French <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 0a5a988 upstream. __smb2_handle_cancelled_cmd() is called under a spin lock held in cifs_mid_q_entry_release(), so make its memory allocation GFP_ATOMIC. This issue was observed when running xfstests generic/028: [ 1722.589204] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72064 cmd: 5 [ 1722.590687] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72065 cmd: 17 [ 1722.593529] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72066 cmd: 6 [ 1723.039014] BUG: sleeping function called from invalid context at mm/slab.h:565 [ 1723.040710] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 30877, name: cifsd [ 1723.045098] CPU: 3 PID: 30877 Comm: cifsd Not tainted 5.5.0-rc4+ #313 [ 1723.046256] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 1723.048221] Call Trace: [ 1723.048689] dump_stack+0x97/0xe0 [ 1723.049268] ___might_sleep.cold+0xd1/0xe1 [ 1723.050069] kmem_cache_alloc_trace+0x204/0x2b0 [ 1723.051051] __smb2_handle_cancelled_cmd+0x40/0x140 [cifs] [ 1723.052137] smb2_handle_cancelled_mid+0xf6/0x120 [cifs] [ 1723.053247] cifs_mid_q_entry_release+0x44d/0x630 [cifs] [ 1723.054351] ? cifs_reconnect+0x26a/0x1620 [cifs] [ 1723.055325] cifs_demultiplex_thread+0xad4/0x14a0 [cifs] [ 1723.056458] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.057365] ? kvm_sched_clock_read+0x14/0x30 [ 1723.058197] ? sched_clock+0x5/0x10 [ 1723.058838] ? sched_clock_cpu+0x18/0x110 [ 1723.059629] ? lockdep_hardirqs_on+0x17d/0x250 [ 1723.060456] kthread+0x1ab/0x200 [ 1723.061149] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.062078] ? kthread_create_on_node+0xd0/0xd0 [ 1723.062897] ret_from_fork+0x3a/0x50 Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Fixes: 9150c3a ("CIFS: Close open handle after interrupted close") Cc: Stable <[email protected]> Signed-off-by: Steve French <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…format [ Upstream commit 06d213d ] For incoming SCO connection with transparent coding format, alt setting of CVSD is getting applied instead of Transparent. Before fix: < HCI Command: Accept Synchron.. (0x01|0x0029) plen 21 #2196 [hci0] 321.342548 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x003f HV1 may be used HV2 may be used HV3 may be used EV3 may be used EV4 may be used EV5 may be used > HCI Event: Command Status (0x0f) plen 4 #2197 [hci0] 321.343585 Accept Synchronous Connection Request (0x01|0x0029) ncmd 1 Status: Success (0x00) > HCI Event: Synchronous Connect Comp.. (0x2c) plen 17 #2198 [hci0] 321.351666 Status: Success (0x00) Handle: 257 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x04 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) ........ > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2336 [hci0] 321.383655 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2337 [hci0] 321.389558 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2338 [hci0] 321.393615 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2339 [hci0] 321.393618 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2340 [hci0] 321.393618 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2341 [hci0] 321.397070 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2342 [hci0] 321.403622 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2343 [hci0] 321.403625 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2344 [hci0] 321.403625 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2345 [hci0] 321.403625 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2346 [hci0] 321.404569 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2347 [hci0] 321.412091 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2348 [hci0] 321.413626 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2349 [hci0] 321.413630 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2350 [hci0] 321.413630 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2351 [hci0] 321.419674 After fix: < HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21 #309 [hci0] 49.439693 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x003f HV1 may be used HV2 may be used HV3 may be used EV3 may be used EV4 may be used EV5 may be used > HCI Event: Command Status (0x0f) plen 4 #310 [hci0] 49.440308 Accept Synchronous Connection Request (0x01|0x0029) ncmd 1 Status: Success (0x00) > HCI Event: Synchronous Connect Complete (0x2c) plen 17 #311 [hci0] 49.449308 Status: Success (0x00) Handle: 257 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x04 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) < SCO Data TX: Handle 257 flags 0x00 dlen 60 #312 [hci0] 49.450421 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #313 [hci0] 49.457927 > HCI Event: Max Slots Change (0x1b) plen 3 #314 [hci0] 49.460345 Handle: 256 Max slots: 5 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #315 [hci0] 49.465453 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #316 [hci0] 49.470502 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #317 [hci0] 49.470519 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #318 [hci0] 49.472996 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #319 [hci0] 49.480412 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #320 [hci0] 49.480492 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #321 [hci0] 49.487989 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #322 [hci0] 49.490303 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #323 [hci0] 49.495496 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #324 [hci0] 49.500304 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #325 [hci0] 49.500311 Signed-off-by: Kiran K <[email protected]> Signed-off-by: Lokendra Singh <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…format [ Upstream commit 06d213d ] For incoming SCO connection with transparent coding format, alt setting of CVSD is getting applied instead of Transparent. Before fix: < HCI Command: Accept Synchron.. (0x01|0x0029) plen 21 #2196 [hci0] 321.342548 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x003f HV1 may be used HV2 may be used HV3 may be used EV3 may be used EV4 may be used EV5 may be used > HCI Event: Command Status (0x0f) plen 4 #2197 [hci0] 321.343585 Accept Synchronous Connection Request (0x01|0x0029) ncmd 1 Status: Success (0x00) > HCI Event: Synchronous Connect Comp.. (0x2c) plen 17 #2198 [hci0] 321.351666 Status: Success (0x00) Handle: 257 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x04 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) ........ > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2336 [hci0] 321.383655 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2337 [hci0] 321.389558 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2338 [hci0] 321.393615 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2339 [hci0] 321.393618 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2340 [hci0] 321.393618 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2341 [hci0] 321.397070 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2342 [hci0] 321.403622 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2343 [hci0] 321.403625 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2344 [hci0] 321.403625 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2345 [hci0] 321.403625 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2346 [hci0] 321.404569 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2347 [hci0] 321.412091 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2348 [hci0] 321.413626 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2349 [hci0] 321.413630 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2350 [hci0] 321.413630 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2351 [hci0] 321.419674 After fix: < HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21 #309 [hci0] 49.439693 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x003f HV1 may be used HV2 may be used HV3 may be used EV3 may be used EV4 may be used EV5 may be used > HCI Event: Command Status (0x0f) plen 4 #310 [hci0] 49.440308 Accept Synchronous Connection Request (0x01|0x0029) ncmd 1 Status: Success (0x00) > HCI Event: Synchronous Connect Complete (0x2c) plen 17 #311 [hci0] 49.449308 Status: Success (0x00) Handle: 257 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x04 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) < SCO Data TX: Handle 257 flags 0x00 dlen 60 #312 [hci0] 49.450421 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #313 [hci0] 49.457927 > HCI Event: Max Slots Change (0x1b) plen 3 #314 [hci0] 49.460345 Handle: 256 Max slots: 5 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #315 [hci0] 49.465453 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #316 [hci0] 49.470502 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #317 [hci0] 49.470519 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #318 [hci0] 49.472996 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #319 [hci0] 49.480412 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #320 [hci0] 49.480492 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #321 [hci0] 49.487989 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #322 [hci0] 49.490303 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #323 [hci0] 49.495496 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #324 [hci0] 49.500304 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #325 [hci0] 49.500311 Signed-off-by: Kiran K <[email protected]> Signed-off-by: Lokendra Singh <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…format [ Upstream commit 06d213d ] For incoming SCO connection with transparent coding format, alt setting of CVSD is getting applied instead of Transparent. Before fix: < HCI Command: Accept Synchron.. (0x01|0x0029) plen 21 #2196 [hci0] 321.342548 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x003f HV1 may be used HV2 may be used HV3 may be used EV3 may be used EV4 may be used EV5 may be used > HCI Event: Command Status (0x0f) plen 4 #2197 [hci0] 321.343585 Accept Synchronous Connection Request (0x01|0x0029) ncmd 1 Status: Success (0x00) > HCI Event: Synchronous Connect Comp.. (0x2c) plen 17 #2198 [hci0] 321.351666 Status: Success (0x00) Handle: 257 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x04 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) ........ > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2336 [hci0] 321.383655 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2337 [hci0] 321.389558 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2338 [hci0] 321.393615 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2339 [hci0] 321.393618 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2340 [hci0] 321.393618 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2341 [hci0] 321.397070 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2342 [hci0] 321.403622 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2343 [hci0] 321.403625 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2344 [hci0] 321.403625 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2345 [hci0] 321.403625 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2346 [hci0] 321.404569 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2347 [hci0] 321.412091 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2348 [hci0] 321.413626 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2349 [hci0] 321.413630 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2350 [hci0] 321.413630 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2351 [hci0] 321.419674 After fix: < HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21 #309 [hci0] 49.439693 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x003f HV1 may be used HV2 may be used HV3 may be used EV3 may be used EV4 may be used EV5 may be used > HCI Event: Command Status (0x0f) plen 4 #310 [hci0] 49.440308 Accept Synchronous Connection Request (0x01|0x0029) ncmd 1 Status: Success (0x00) > HCI Event: Synchronous Connect Complete (0x2c) plen 17 #311 [hci0] 49.449308 Status: Success (0x00) Handle: 257 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x04 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) < SCO Data TX: Handle 257 flags 0x00 dlen 60 #312 [hci0] 49.450421 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #313 [hci0] 49.457927 > HCI Event: Max Slots Change (0x1b) plen 3 #314 [hci0] 49.460345 Handle: 256 Max slots: 5 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #315 [hci0] 49.465453 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #316 [hci0] 49.470502 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #317 [hci0] 49.470519 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #318 [hci0] 49.472996 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #319 [hci0] 49.480412 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #320 [hci0] 49.480492 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #321 [hci0] 49.487989 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #322 [hci0] 49.490303 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #323 [hci0] 49.495496 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #324 [hci0] 49.500304 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #325 [hci0] 49.500311 Signed-off-by: Kiran K <[email protected]> Signed-off-by: Lokendra Singh <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
We can see that "bswap32: Takes an unsigned 32-bit number in either big- or little-endian format and returns the equivalent number with the same bit width but opposite endianness" in BPF Instruction Set Specification, so it should clear the upper 32 bits in "case 32:" for both BPF_ALU and BPF_ALU64. [root@linux fedora]# echo 1 > /proc/sys/net/core/bpf_jit_enable [root@linux fedora]# modprobe test_bpf Before: test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) After: test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 4 PASS test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 4 PASS Fixes: 4ebf921 ("LoongArch: BPF: Support unconditional bswap instructions") Acked-by: Hengqi Chen <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the generated insn reverses byte order for both high and low 32-bit words, resuling in an incorrect swap as indicated by the jit test: [ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS [ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) [ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS [ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS [ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS [ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) [ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS [ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS Fix this by forcing 32bit variant of rev32. Fixes: 1104247 ("bpf, arm64: Support unconditional bswap") Signed-off-by: Artem Savkov <[email protected]> Tested-by: Puranjay Mohan <[email protected]> Acked-by: Puranjay Mohan <[email protected]> Acked-by: Xu Kuohai <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
[ Upstream commit a51cd6b ] In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the generated insn reverses byte order for both high and low 32-bit words, resuling in an incorrect swap as indicated by the jit test: [ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS [ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) [ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS [ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS [ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS [ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) [ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS [ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS Fix this by forcing 32bit variant of rev32. Fixes: 1104247 ("bpf, arm64: Support unconditional bswap") Signed-off-by: Artem Savkov <[email protected]> Tested-by: Puranjay Mohan <[email protected]> Acked-by: Puranjay Mohan <[email protected]> Acked-by: Xu Kuohai <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit a51cd6b ] In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the generated insn reverses byte order for both high and low 32-bit words, resuling in an incorrect swap as indicated by the jit test: [ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS [ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) [ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS [ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS [ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS [ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) [ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS [ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS Fix this by forcing 32bit variant of rev32. Fixes: 1104247 ("bpf, arm64: Support unconditional bswap") Signed-off-by: Artem Savkov <[email protected]> Tested-by: Puranjay Mohan <[email protected]> Acked-by: Puranjay Mohan <[email protected]> Acked-by: Xu Kuohai <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 8ecf3c1 ] Recent additions in BPF like cpu v4 instructions, test_bpf module exhibits the following failures: test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd eBPF filter opcode 00d7 (@2) unsupported jited:0 301 PASS test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 555 PASS test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301 eBPF filter opcode 00d7 (@2) unsupported jited:0 268 PASS test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 269 PASS test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032 eBPF filter opcode 00d7 (@2) unsupported jited:0 460 PASS test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 320 PASS test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe eBPF filter opcode 00d7 (@2) unsupported jited:0 222 PASS test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 273 PASS test_bpf: #344 BPF_LDX_MEMSX | BPF_B eBPF filter opcode 0091 (@5) unsupported jited:0 432 PASS test_bpf: #345 BPF_LDX_MEMSX | BPF_H eBPF filter opcode 0089 (@5) unsupported jited:0 381 PASS test_bpf: #346 BPF_LDX_MEMSX | BPF_W eBPF filter opcode 0081 (@5) unsupported jited:0 505 PASS test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1 eBPF filter opcode 0006 (@1) unsupported jited:0 261 PASS test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed] Fix them by adding missing processing. Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <[email protected]>
It is a faster windows dragging and scrolling enabler for the X server. More details and some discussion can be found in the forum: http://www.raspberrypi.org/phpBB3/viewtopic.php?p=371531#p371531
I believe that these patches are reasonably clean and ready for use. But would be glad to fix/update them as seen necessary, hence posting for early review.