Releases: animetosho/ParPar
Releases · animetosho/ParPar
v0.4.3
v0.4.2
- Various fixes
- Add Apple M1 optimised implementation of the CLMul GF16 kernel
- Add RISC-V Vector implementation of the Shuffle128 GF16 kernel
- Add advanced option to specify exact recovery exponents to use (via
--recovery-exponents
)
Note: linux-glibc builds are linked to glibc 2.35 (Ubuntu 22.04). linux-static builds don't support OpenCL.
v0.4.1
v0.4.0 bug fixes
Note: linux-glibc builds are linked to glibc 2.35 (Ubuntu 22.04). linux-static builds don't support OpenCL.
v0.4.0
Note: OpenCL support is not possible on static Linux builds.
Release Highlights
- Overhaul GF16 processing backend
- Remove GF-Complete components, rewrite framework and improve general region handling
- Fully separate ISA compilation units to support dynamic dispatch (also enables static Linux builds)
- Add dot-product, region interleaving, chunk packing and prefetching optimisations
- Add new calculation kernels: CLMul for NEON, Affine AVX variant for x86 (for Alder Lake and later CPUs) and experimental Shuffle2x/Affine2x variants
- Add ARM SVE and SVE2 support
- More optimisations during initialisation for various kernels, coefficient computation, and tweaked loop-tiling parameters
- Improve transposition performance for Xor-Jit kernel, plus add single-use JIT optimisations
- Rework multi-threading and remove OpenMP dependency; threading now manually managed via libuv
- Add experimental OpenCL backend for GPGPU acceleration
- Disabled by default - must be manually enabled
- Have noticed it generate incorrect output, particularly on non-Windows hosts - use with caution!
- Add internal checksumming support to help detect memory errors during GF16 computation
- Improve concurrency when transferring to/from GF backend and hashing
- Redo MD5/CRC32 implementation for better optimisation
- Input hashing now uses a stitched 2xMD5+CRC32 implementation
- Add ASM MD5 implementation for x64/ARMv6/AArch64 (unsupported in MSVC)
- Add ARM NEON and SVE2 MD5 implementations
- Full SIMD width multi-buffer implementations
- Remove node-yencode dependency
- Add support for concurrently processing multiple files to work around bottlenecks with single threaded input hashing
- Support concurrent I/O requests with chunked reading
- Support for compiling under MSVC/Clang-CL for Windows ARM/64 targets
- Separate GUI frontend available
- Improve progress display accuracy
- Various bug fixes
v0.3.2
This release mostly fixes build issues on ARM platforms.
If you're looking for Windows binaries, use v0.3.1 below as it's the same as v0.3.2.
v0.3.1
Update test arguments, add tweaks + support caching par2cmdline results
v0.3.0
Mark v0.3.0
v0.2.1
Fixes & tweaks from previous version.
v0.2.0
Most features available, can be considered an alpha release.
v0.1.0
Early proof-of-concept