-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyze low batch size timing #538
Comments
The above capture was completed using toggling of the USART3 RX/TX lines while using a batch size of 1.
As can be seen, the whole DSP process takes approximately 1.9uS, which comes to a maximum sampling rate of approximately 526KHz. Of that, servicing the DBM DMA transfers for data requires about 420ns. If there was no DBM DMA transfer servicing required, the existing livestream / DSP routines require 1.48uS, which corresponds to a maximum sampling rate of ~676KHz. However, even without DBM DMA, there would still be some small amount of time required to read/write the SPI peripheral data registers, so in reality, the overhead would be slightly more. Rough breakdown of time requirements within DSP processing for a batch size of 1: pie title Process time breakout (Batch size = 1)
"DSP Routines": 900
"Get DMA Buffers": 440
"Prepare livestream": 400
"Update Telemetry": 120
"Exit": 20
"Entry": 120
|
Interesting. I seem to remember much less time for DSP. ~1000 insns is a lot. Might be worthwhile to check back against Lines 247 to 284 in 0fd442e
Ah. I think the big difference in DSP load is the signal generator. Also do generally use nightly and the cortex-m/inline-asm feature.I've found DWT CYCCNT to be a nicer tool for these measurement that GPIO toggling. I think it could well be less overhead. |
My calculations are showing the DSP section taking approximately ~360 insns - the rest of the overhead here is from the various other things we've put into the DSP routing, such as telemetry, signal generation, DMA servicing, etc. |
The 1.9 µs you measure are about 760 insns for "DSP Routines". That doesn't include DMA servicing and telemetry, right? |
Ah. No. The 1.9 µs you call "DSP process" is not "DSP routines". |
Isn't signal generation part of "DSP Routines" in your measurement? |
"DSP Routines" is inclusive of signal generation - it's the amount of time the closure on the ADCs/DACs run: // Start timer
(adc0, adc1, dac0, adc1).lock(() {
// Stop & Reset timer, this is "Get DMA Buffers"
})
// Stop & Reset timer, this is called "DSP Routines"
telemetry.latest_adcs = [adcs[0][0], adcs[1][0]];
telemetry.latest_dacs = [dacs[0][0], dacs[1][0]];
// Stop timer, this is "Update telemetry" I'll try to get a full diff just to show things. I want to rework it so these calculations just get reported via telemetry instead of manually probing debug pins as well. |
Analyze the timing requirements when using the DMA sample acquisition architecture for ADC/DAC operations for low batch sizes (e.g. 1 or 2).
If possible, we may want to eliminate the Peripheral data -> RAM DMA operation, as this would eliminate processing overhead in the loop. Instead, for these low batch counts, the data can just be manually transacted with the peripherals directly.
The text was updated successfully, but these errors were encountered: