Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for issue 21 on firmware repo #329

Closed
wants to merge 97 commits into from

Conversation

cleverca22
Copy link

this is a simply fix i found for raspberrypi/firmware#21

popcornmix and others added 30 commits July 3, 2013 01:00
Signed-off-by: popcornmix <[email protected]>
Signed-off-by: popcornmix <[email protected]>
Signed-off-by: popcornmix <[email protected]>
Signed-off-by: popcornmix <[email protected]>
Signed-off-by: popcornmix <[email protected]>
    Signed-off-by: popcornmix <[email protected]>
Experiments show that it doesn't really take that long to sync, so we
can reduce the poll interval slightly. Might improve performance a bit.
The custom clock handling code is redundant and buggy. The MMC/SDHCI
subsystem does a better job than it, so remove it for good.
Some additional quirks are needed for correct operation.
There's no SDHCI capabilities register documented, and it always reads
zero, so add SDHCI_QUIRK_MISSING_CAPS. Apparently
SDHCI_QUIRK_NO_HISPD_BIT is needed for many cards to work correctly in
high-speed mode, so add it as well.
Add a parameter to disable high-speed mode for the few cards that
still might have problems. High-speed mode is enabled by default.
80 MHz clock isnt't suited well to be dividable to get SD clocks of 25
MHz (default mode) or 50 MHz (high speed mode). 50 MHz are perfect to
drive the SD interface at ideal frequencies.
Commit d64b84c by accident reduced the maximum overall DMA sync
timeout. The maximum overall timeout was reduced from 100ms to 30ms,
which isn't enough for many cards. Increase it to 150ms, just to be
extra safe. According to commit 872a8ff in the MMC subsystem, some
cards require crazy long timeouts (3s), but as we're busy-waiting,
and shouldn't delay for such a long time, let's hope 150ms will be
enough for most cards.
…we can reliably recover from, so down_interruptable is not a safe call.
The additional FIFO might speed up transfers in some cases.
There are issues with both single block reads (missed completion)
and writes (data loss in some cases!). Just don't do single block
transfers anymore, and treat them like multiblock transfers. This
adds a quirk for this and uses it.
…. Should give about 10% more ARM performance.

Thanks to Gordon and Costas
popcornmix and others added 26 commits July 3, 2013 01:26
This corrects a bug where if a single active non-periodic endpoint
had at least one transaction in its qh, on frnum == MAX_FRNUM the qh
would get skipped and never get queued again. This would result in
a silent device until error detection (automatic or otherwise) would
either reset the device or flush and requeue the URBs.

Additionally the NAK holdoff was enabled for all transactions - this
would potentially stall a HS endpoint for 1ms if a previous error state
enabled this interrupt and the next response was a NAK. Fix so that
only split transactions get held off.
add mmap support and some cleanups to bcm2835 ALSA driver
This is designed for quick compiling when developing.
No modules are needed and it includes all Pi specific drivers
Especially on platforms with a slower CPU but a relatively high
framebuffer fill bandwidth, like current ARM devices, the existing
console monochrome imageblit function used to draw console text is
suboptimal for common pixel depths such as 16bpp and 32bpp. The existing
code is quite general and can deal with several pixel depths. By creating
special case functions for 16bpp and 32bpp, by far the most common pixel
formats used on modern systems, a significant speed-up is attained
which can be readily felt on ARM-based devices like the Raspberry Pi
and the Allwinner platform, but should help any platform using the
fb layer.

The special case functions allow constant folding, eliminating a number
of instructions including divide operations, and allow the use of an
unrolled loop, eliminating instructions with a variable shift size,
reducing source memory access instructions, and eliminating excessive
branching. These unrolled loops also allow much better code optimization
by the C compiler. The code that selects which optimized variant is used
is also simplified, eliminating integer divide instructions.

The speed-up, measured by timing 'cat file.txt' in the console, varies
between 40% and 70%, when testing on the Raspberry Pi and Allwinner
ARM-based platforms, depending on font size and the pixel depth, with
the greater benefit for 32bpp.

Signed-off-by: Harm Hanemaaijer <[email protected]>
Based on the patch authored by Ali Gholami Rudi at
    https://lkml.org/lkml/2009/7/13/153

Provide an ioctl for userspace applications, but only if this operation
is hardware accelerated (otherwide it does not make any sense).

Signed-off-by: Siarhei Siamashka <[email protected]>
Based on http://www.raspberrypi.org/phpBB3/viewtopic.php?p=62425#p62425
Also used Simon's dmaer_master module as a reference for tweaking DMA
settings for better performance.

For now busylooping only. IRQ support might be added later.
With non-overclocked Raspberry Pi, the performance is ~360 MB/s
for simple copy or ~260 MB/s for two-pass copy (used when dragging
windows to the right).

In the case of using DMA channel 0, the performance improves
to ~440 MB/s.

For comparison, VFP optimized CPU copy can only do ~114 MB/s in
the same conditions (hindered by reading uncached source buffer).

Signed-off-by: Siarhei Siamashka <[email protected]>
… handler

usb_hcd_unlink_urb_from_ep must be called with the HCD lock held.  Calling it
asynchronously in the tasklet was not safe (regression in
c4564d4).

This change unlinks it from the endpoint prior to queueing it for handling in
the tasklet, and also adds a check to ensure the urb is OK to be unlinked
before doing so.

NULL pointer dereference kernel oopses had been observed in usb_hcd_giveback_urb
when a USB device was unplugged/replugged during data transfer.  This effect
was reproduced using automated USB port power control, hundreds of replug
events were performed during active transfers to confirm that the problem was
eliminated.
This commit adds a FIQ implementaion that schedules
the split transactions using a FIQ so we don't get
held off by the interrupt latency of Linux
Fix for deprecated/undefined create_proc_entry in RTL8192cu driver
The current timeout is being hit with some cards that complete successfully with a longer timeout.
The timeout is not handled well, and is believed to be a code path that causes corruption.
872a8ff suggests that crappy cards can take up to 3 seconds to respond
@popcornmix
Copy link
Collaborator

@P33M @ghollingworth
Any comments?

@ghollingworth
Copy link

Happy to just comment it out, it's a useless interface anyway...

Gordon

@popcornmix
Copy link
Collaborator

@cleverca22
The PR needed rebasing, so I just made the edit manually.
It should be in all the source trees now, and the "next" firmware tree has been rebuild.
("master" will be soon).
Thanks.

@popcornmix popcornmix closed this Jul 16, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.