Skip to content

Commit

Permalink
Update documentation & test suite (#11)
Browse files Browse the repository at this point in the history
* update readme

* update rustdoc homepage

* moved cpp files

* detailed KernelArgs doc

* updated functor docs

* replaced SerialForKernelType in code

* fix doc test

* replaced ForKernel in code

* disptach module doc

* routines doc

* finished doc of the routines module

* view module doc

* view parameters doc

* misc formatting

* update test CI to include features

* fix ci?
  • Loading branch information
imrn99 authored Dec 15, 2023
1 parent 87ce049 commit 607a99d
Show file tree
Hide file tree
Showing 12 changed files with 508 additions and 186 deletions.
8 changes: 8 additions & 0 deletions .github/workflows/simple-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,14 @@ jobs:
- uses: actions-rs/cargo@v1
with:
command: test
- uses: actions-rs/cargo@v1
with:
command: test
args: --features rayon
- uses: actions-rs/cargo@v1
with:
command: test
args: --features threads

fmt:
name: Rustfmt
Expand Down
27 changes: 8 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,13 @@ proof and verification of that statement.

## Scope of the Project

~~The main focus of this Proof-of-Concept is the architecture and approach used by
Kokkos for data management. While multiple targets support (Serial, [rayon][2], OpenMP)
could be interesting, it is not the priority.~~

Rudimentary data structure implementation being done, the goal is now to write a simple
program using a `parallel_for` statement with satisfying portability as defined by Kokkos.

Additionally, some features of Kokkos are not reproducible in Rust (GPU targetting,
templating); These create limits for the implementation that may or may not be bypassed.
This makes limit-testing an fundamental part of the project.
The goal of this project is not to produce an entire Kokkos implementation nor to
replicate the existing C++ library. While the current C++ source code is interesting
to use as inspiration, the main reference is the model description.

Additionally, because of language specific features (Rust strict compilation rules,
C++ templates), you can expect the underlying implementation of concepts to be
vastly different.

## Quickstart

Expand Down Expand Up @@ -97,16 +93,9 @@ do.

## References

### View Implementation

- `ndarray` Rust implementation: [link][NDARRAY]
- Const generics documentation from The Rust Reference: [link][CONSTG]
- `move` keyword semantic & implementation: [link][MOVE]
- The Kokkos Wiki: [link][1]
- `rayon` crate documentation: [link][2]


[1]: https://kokkos.github.io/kokkos-core-wiki/index.html
[2]: https://docs.rs/rayon/latest/rayon/

[NDARRAY]: https://docs.rs/ndarray/latest/ndarray/
[CONSTG]: https://doc.rust-lang.org/reference/items/generics.html
[MOVE]: https://stackoverflow.com/questions/30288782/what-are-move-semantics-in-rust
9 changes: 6 additions & 3 deletions build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ fn main() {

cxx_build::bridge("src/lib.rs")
.compiler(compiler)
.file("src/hello.cpp")
.file("src/cpp/hello.cpp")
.flag_if_supported("-std=c++20")
.flag(ompflags) // clang
.compile("poc-cc");
Expand All @@ -36,7 +36,10 @@ fn main() {
}
_ => unimplemented!(),
}
// main
println!("cargo:rerun-if-changed=src/main.rs");
println!("cargo:rerun-if-changed=src/hello.cpp");
println!("cargo:rerun-if-changed=include/hello.hpp");
// cpp files
println!("cargo:rerun-if-changed=src/cpp/hello.cpp");
// header files
println!("cargo:rerun-if-changed=src/include/hello.hpp");
}
2 changes: 1 addition & 1 deletion src/hello.cpp → src/cpp/hello.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#include "poc-kokkos-rs/include/hello.hpp"
#include "poc-kokkos-rs/src/include/hello.hpp"
#include "omp.h"

#include <cstdio>
Expand Down
106 changes: 95 additions & 11 deletions src/functor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,120 @@
//! This module contains all functor and kernel related code. Its content
//! is highly dependant on the features enabled since the traits that a
//! kernel must satisfy changes totally depending on the backend used.
//!
//! Kernel signatures are handled using `cargo` features. Using conditionnal
//! compilation, the exact trait kernels must implement are adjusted according
//! to the backend used to dispatch statements.
//!
//! In order to have actual closures match the required trait implementation,
//! the same mechanism is used to define operations on [`Views`][crate::view].

/// Kernel argument types
#[cfg(doc)]
use crate::routines::parameters::RangePolicy;

/// Kernel argument enum
///
/// In the Kokkos library, there is a finite number of kernel signatures.
/// Each is associated to/determined by a given execution policy.
/// In order to have kernel genericity in Rust, without introducing overhead
/// due to downcasting, the solution was to define kernel arguments as a
/// struct-like enum.
///
/// Until some work is done to have a better solution[^sol1][^sol2], this will
/// be an enum and kernels will be written in an idiomatic way.
/// ### Example
///
/// [^sol1]: Current tracking issue for upcasting implementation: <https://github.com/rust-lang/rust/issues/65991>
/// One-dimensional kernel:
/// ```
/// // Range is defined in the execution policy
/// use poc_kokkos_rs::functor::KernelArgs;
///
/// [^sol2]: Current tracking issue to allow impl trait usage in types aliases: <https://github.com/rust-lang/rust/issues/63063>
/// let kern = |arg: KernelArgs<1>| match arg {
/// KernelArgs::Index1D(i) => {
/// // body of the kernel
/// println!("Hello from iteration {i}")
/// },
/// KernelArgs::IndexND(_) => unimplemented!(),
/// KernelArgs::Handle => unimplemented!(),
/// };
/// ```
///
/// 3D kernel:
/// ```
/// use poc_kokkos_rs::functor::KernelArgs;
///
/// // Use the array
/// let kern = |arg: KernelArgs<3>| match arg {
/// KernelArgs::Index1D(_) => unimplemented!(),
/// KernelArgs::IndexND(idx) => { // idx: [usize; 3]
/// // body of the kernel
/// println!("Hello from iteration {idx:?}")
/// },
/// KernelArgs::Handle => unimplemented!(),
/// };
///
/// // Decompose the array
/// let kern = |arg: KernelArgs<3>| match arg {
/// KernelArgs::Index1D(_) => unimplemented!(),
/// KernelArgs::IndexND([i, j, k]) => { // i,j,k: usize
/// // body of the kernel
/// println!("Hello from iteration {i},{j},{k}");
/// },
/// KernelArgs::Handle => unimplemented!(),
/// };
/// ```
pub enum KernelArgs<const N: usize> {
/// Arguments of a one-dimensionnal kernel (e.g. a RangePolicy).
/// Arguments of a one-dimensionnal kernel (e.g. a [RangePolicy][RangePolicy::RangePolicy]).
Index1D(usize),
/// Arguments of a `N`-dimensionnal kernel (e.g. a MDRangePolicy).
/// Arguments of a `N`-dimensionnal kernel (e.g. a [MDRangePolicy][RangePolicy::MDRangePolicy]).
IndexND([usize; N]),
/// Arguments of a team-based kernel.
Handle,
}

cfg_if::cfg_if! {
if #[cfg(feature = "rayon")] {
/// `rayon`-specific kernel type.
/// `parallel_for` kernel type. Depends on enabled feature(s).
///
/// This type alias is configured according to enabled feature in order to adjust
/// the signatures of kernels to match the requirements of the underlying dispatch routines.
///
/// ### Possible Values
/// - `rayon` feature enabled: `Box<dyn Fn(KernelArgs<N>) + Send + Sync + 'a>`
/// - `threads` feature enabled: `Box<dyn Fn(KernelArgs<N>) + Send + 'a>`
/// - no feature enabled: fall back to [`SerialForKernelType`][SerialForKernelType]
///
/// **Current version**: `rayon`
pub type ForKernelType<'a, const N: usize> = Box<dyn Fn(KernelArgs<N>) + Send + Sync + 'a>;
} else if #[cfg(feature = "threads")] {
/// Standard threads specific kernel type.
/// `parallel_for` kernel type. Depends on enabled feature(s).
///
/// This type alias is configured according to enabled feature in order to adjust
/// the signatures of kernels to match the requirements of the underlying dispatch routines.
///
/// ### Possible Values
/// - `rayon` feature enabled: `Box<dyn Fn(KernelArgs<N>) + Send + Sync + 'a>`
/// - `threads` feature enabled: `Box<dyn Fn(KernelArgs<N>) + Send + 'a>`
/// - no feature enabled: fall back to [`SerialForKernelType`][SerialForKernelType]
///
/// **Current version**: `threads`
pub type ForKernelType<'a, const N: usize> = Box<dyn Fn(KernelArgs<N>) + Send + 'a>;
} else {
/// Fall back kernel type.
/// `parallel_for` kernel type. Depends on enabled feature(s).
///
/// This type alias is configured according to enabled feature in order to adjust
/// the signatures of kernels to match the requirements of the underlying dispatch routines.
///
/// ### Possible Values
/// - `rayon` feature enabled: `Box<dyn Fn(KernelArgs<N>) + Send + Sync + 'a>`
/// - `threads` feature enabled: `Box<dyn Fn(KernelArgs<N>) + Send + 'a>`
/// - no feature enabled: fall back to [`SerialForKernelType`][SerialForKernelType]
///
/// **Current version**: no feature
pub type ForKernelType<'a, const N: usize> = SerialForKernelType<'a, N>;
}
}

/// Serial kernel type.
/// Serial kernel type. Does not depend on enabled feature(s).
///
/// This is the minimal required trait implementation for closures passed to a
/// `for_each` statement.
pub type SerialForKernelType<'a, const N: usize> = Box<dyn FnMut(KernelArgs<N>) + 'a>;
File renamed without changes.
68 changes: 18 additions & 50 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,71 +2,39 @@
//!
//! ## Scope of the Project
//!
//! ~~The main focus of this Proof-of-Concept is the architecture and approach used by
//! [Kokkos][1] for data management. While multiple targets support (Serial, [rayon][2],
//! OpenMP) could be interesting, it is not the priority.~~
//! The goal of this project is not to produce an entire Kokkos implementation nor to
//! replicate the existing C++ library. While the current C++ source code is interesting
//! to use as inspiration, the main reference is the model description.
//!
//! Rudimentary data structure implementation being done, the goal is now to write a simple
//! program using a `parallel_for` statement with satisfying portability as defined by Kokkos.
//!
//! Additionally, some features of Kokkos are not reproducible in Rust (GPU targetting,
//! templating); These create limits for the implementation, hence the existence of this PoC.
//! This makes limit-testing an fundamental part of the project.
//! Additionally, because of language specific features (Rust strict compilation rules,
//! C++ templates), you can expect the underlying implementation of concepts to be
//! vastly different.
//!
//!
//! ## Quickstart
//!
//! The PoC itself is a library, but you can run benchmarks and examples out of the box.
//!
//! ### Benchmarks
//!
//! Benchmarks can be run using the following command:
//! The PoC itself is a library, but you can run benchmarks and examples out of the box:
//!
//! ```bash
//! # all benchmarks
//! cargo bench
//! # a specific benchmark
//! cargo bench --bench bench_name
//! cargo bench --bench <BENCHMARK>
//! # a specific example
//! cargo run --example <EXAMPLE>
//! ```
//!
//! All results are compiled to the `target/criterion/` folder. The following
//! benchmarks are available:
//!
//! **Layout:**
//! - `layout-comparison`: Bench a Matrix-Matrix product three times, using the worst possible layout,
//! the usual layout, and then the optimal layout for the operation. This shows the importance of layout
//! selection for performances.
//! - `layout-size`: Bench a Matrix-Matrix product using the usual layout and the optimal layout,
//! over a range of sizes for the square matrices. This shows the influence of cache size over
//! layout importance.
//! **Computation:**
//! - `axpy` / `gemv` / `gemm`: Measure speedup on basic BLAS implementations by running the same kernel
//! in serial mode first, then using parallelization on CPU. _Meant to be executed using features_.
//! - `hardcoded_gemm`: Compute the same operations as the `gemm` benchmark, but using a hardcoded implementation
//! instead of methods from the PoC. Used to assess the additional cost induced by the library.
//! **Library overhead:**
//! - `view_init`: Compare initialization performances of regular vectors to [Views][view]; This
//! is used to spot potential scaling issues induced by the more complex structure of Views.
//! - `view_access`: Compare data access performances of regular vectors to [Views][view]; This
//! is used to spot potential scaling issues induced by the more complex structure of Views.
//!
//! Additionally, a kokkos-equivalent of the blas kernels can be found in the `blas-speedup-kokkos/`
//! subdirectory. These are far from being the most optimized implementation, instead they are written
//! as close-ish counterparts to the Rust benchmarks.
//!
//! ### Examples
//! Generate local documentation:
//!
//! ```bash
//! cargo run --example hello-world
//! cargo doc --no-deps --open
//! ```
//!
//! The following examples are available:
//!
//! - `hello_world`: ...
//! - `hello_world_omp`: ...
//! Note that some elements of the documentation are feature specific.
//!
//! ## Compilation
//!
//! ## Features
//! ### Features
//!
//! Using `features`, the crate can be compiled to use different backend for execution of parallel section.
//! These can also be enabled in benchmarks.
Expand All @@ -81,13 +49,13 @@
//! - `threads` : Uses [`std::thread`] methods to handle parallelization on CPU.
//! - `gpu`: Currently used as a way to gate GPU usage as this cannot be done in pure Rust.
//!
//! ## Compilation
//! ### C++ Interoperability
//!
//! The build script will read the `CXX` environment variable to choose which C++ compiler to use
//! for Rust/C++ interop. Note that the crate itself does not currently use C++ code, only examples
//! do.
//!
//! ### Known issues
//! #### Known issues
//!
//! - On MacOs: Does not work with Apple Clang
//! - Solution: Homebrew Clang or tinker with flags to get OpenMP to work
Expand All @@ -106,7 +74,7 @@
pub mod ffi {
// C++ types and signatures exposed to Rust.
unsafe extern "C++" {
include!("poc-kokkos-rs/include/hello.hpp");
include!("poc-kokkos-rs/src/include/hello.hpp");

fn say_hello();

Expand Down
Loading

0 comments on commit 607a99d

Please sign in to comment.