-
Notifications
You must be signed in to change notification settings - Fork 2
Benchmark for FFT on Kalray's MPPA
License
kalray/Benchmark_FFT
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# Kalray Inc.: http://www.kalrayinc.com/ # # Bostan Fast-Fourier Transform Implementation (64K complex-float) # Developped by J. Hascoet # Description: # This distributed FFT implementation uses the 6-step method to split the work # over the compute clusters of the MPPA processor. # The FFT 6-step method well described in [1] page 7. # However our current implementation supports input array multiple of 4 in order # to have square matrix to transpose (transpose is step 1, 3 and 6 of the 6-step) # In this benchmark the IO generates input buffer in the DDR. # The input array is a complex array (1D array) where the imaginary part # is zeros and the real part uses random numbers. # First, the compute clusters get a tile of the 1D array interpreted as a # 2D array (tiling). # Second, the CC all execute the 6-step FFTs. All twiddle factors are pre-computed. # Finally the result is writen back to the DDR in the IO which executes # a sequential FFT and performs correctness check. # References: # [1] 'https://www.nas.nasa.gov/assets/pdf/techreports/1989/rnr-89-004.pdf' # Performance measures. # We measure performance of both DDR access time (I/O) and the computation # on the MPPA matrix. # There is no batching (batch-1) thus the throughput is the same as the # latency. It is a low-latency implementation. # The time for initializing the LUT of the twiddle factor is not computed # (system initialization). # Requirements: # This benchmark requires Kalray's AccessCore Toolchain and Kalray's MPPA # Validated with Kalray's AccessCore >= 2.9.0 # Multi-cluster - Matrix topology condition # Only nb_cluster=1, 2, 4, 8 or 16 are supported (selected at build time) # Intra-cluster # The number of core can be from 1 to 16. (nb_core variable at build time) # How to execute on MPPA hardware # By default 16 clusters and 16 cores in each cluster are used. # Using only jtag (no pcie, standalone mode) make nb_core=<NUM_CORE> nb_cluster=<NUM_CLUSTER> [stand_alone_board=<ab01|ab04>] run_jtag # Using pcie make nb_core=<NUM_CORE> nb_cluster=<NUM_CLUSTER> run_pcie
About
Benchmark for FFT on Kalray's MPPA
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published