Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native impls #328

Merged
merged 8 commits into from
Oct 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ proptest-regressions
.current
.cargo
.vscode
rust-toolchain
13 changes: 10 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ rust-version = "1.61"
getrandom = { version = "0.2", features = ["js"] }

[dependencies]
simdutf8 = { version = "0.1.4", features = ["public_imp", "aarch64_neon"] }

lexical-core = { version = "0.8", features = ["format"] }
beef = { version = "0.5", optional = true }
halfbrown = "0.2"
value-trait = { version = "0.6.1" }
simdutf8 = { version = "0.1.4", features = ["public_imp", "aarch64_neon"] }

# ahash known key
once_cell = { version = "1.17", optional = true }
ahash = { version = "0.8", optional = true }
Expand Down Expand Up @@ -53,7 +53,7 @@ name = "parse"
harness = false

[features]
default = ["swar-number-parsing", "serde_impl"]
default = ["swar-number-parsing", "serde_impl", "runtime-detection"]

arraybackend = ["halfbrown/arraybackend"]

Expand Down Expand Up @@ -102,6 +102,13 @@ perf = ["perfcnt", "getopts", "colored", "serde_json"]
# for documentation
docsrs = []

# portable simd support (as of rust 1.73 nightly only)
# portable = ["simdutf8/portable"]


# use runtime detection of the CPU features where possible instead of enforcing an instruction set
runtime-detection = []

[[example]]
name = "perf"

Expand Down
24 changes: 22 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,33 @@

To be able to take advantage of `simd-json` your system needs to be SIMD capable. On `x86` it will select the best SIMD featureset (`avx2`, or `sse4.2`) during runtime. If `simd-json` is compiled with SIMD support, it will disable runtime detection.

`simd-json` supports AVX2, SSE4.2 and NEON.
`simd-json` supports AVX2, SSE4.2 and NEON and simd128 (wasm) natively, it also includes a unoptimized fallback implementation using native rust for other platforms, however this is a last resport measure and nothing we'd recommend relying on.


### Performance characteristics

- CPU native cpu compilation results in the best performance.
- CPU detection for AVX and SSE4.2 is the second fastes (on x86_* only).
- portable std::simd is the next fasted implementaiton when compiled with a native cpu target.
- std::simd or the rust native implementation is the least performant.

### allocator

For best performance we highly suggest using [mimalloc](https://crates.io/crates/mimalloc) or [jemalloc](https://crates.io/crates/jemalloc) instead of the system allocator used by default. Another recent allocator that works well ( but we have yet to test in production a setting ) is [snmalloc](https://github.com/microsoft/snmalloc).

## `serde`
### `runtime-detection`

This feature allowa selecting the optimal algorithn based on availalbe features during runeimte, it has no effect on non x86 or x86_64 platforms. When neither `AVX2` nor `SSE4.2` is spported it will fallback to a native rust implementaiton.

note that a application compiled with `runtime-detection` will not run as fast as an applicaiton compiled for a specific CPU, the reason being is that rust can't optimize as far to the instruction set when it uses the generic instruction set, also non simd parts of the code won't be optimized for the given instruction set either.

### `portable`

**Currently disabled**

An implementation of the algorithm using `std::simd` and up to 512 byte wide registers, currently disabled due to dependencies and highly experimental.

### `serde_impl`

`simd-json` is compatible with serde and `serde-json`. The Value types provided implement serializers and deserializers. In addition to that `simd-json` implements the `Deserializer` trait for the parser so it can deserialize anything that implements the serde `Deserialize` trait. Note, that serde provides both a `Deserializer` and a `Deserialize` trait.

Expand Down
24 changes: 18 additions & 6 deletions examples/perf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,24 @@ mod int {
use perfcnt::linux::{HardwareEventType, PerfCounterBuilderLinux};
use perfcnt::{AbstractPerfCounter, PerfCounter};
use serde::{Deserialize, Serialize};
use simd_json::{Deserializer, Implementation};
use std::io::BufReader;

#[derive(Default, Serialize, Deserialize)]
struct Stats {
algo: String,
best: Stat,
total: Stat,
iters: u64,
}
impl Stats {
fn new(algo: Implementation) -> Self {
Stats {
algo: algo.to_string(),
..Default::default()
}
}
}

#[derive(Default, Serialize, Deserialize)]
struct Stat {
Expand Down Expand Up @@ -96,15 +106,16 @@ mod int {
let branch_instructions = self.total.branch_instructions / self.iters;

println!(
"{:20} {:10} {:10} {:10} {:10} {:10} {:10.3} {:10.3}",
"{:20} {:10} {:10} {:10} {:10} {:10} {:10.3} {:10.3} {:21}",
name,
cycles,
instructions,
branch_instructions,
cache_misses,
cache_references,
((self.best.cycles as f64) / bytes as f64),
((cycles as f64) / bytes as f64)
((cycles as f64) / bytes as f64),
self.algo
);
}
pub fn print_diff(&self, baseline: &Stats, name: &str, bytes: usize) {
Expand Down Expand Up @@ -135,7 +146,7 @@ mod int {
}

println!(
"{:20} {:>10} {:>10} {:>10} {:>10} {:>10} {:10} {:10}",
"{:20} {:>10} {:>10} {:>10} {:>10} {:>10} {:10} {:10} {:21}",
format!("{}(+/-)", name),
d((1.0 - cycles_b as f64 / cycles as f64) * 100.0),
d((1.0 - instructions_b as f64 / instructions as f64) * 100.0),
Expand All @@ -144,6 +155,7 @@ mod int {
d((1.0 - cache_references_b as f64 / cache_references as f64) * 100.0),
d((1.0 - best_cycles_per_byte_b as f64 / best_cycles_per_byte as f64) * 100.0),
d((1.0 - cycles_per_byte_b as f64 / cycles_per_byte as f64) * 100.0),
baseline.algo
);
}
}
Expand All @@ -166,7 +178,7 @@ mod int {
for mut bytes in &mut data_entries[..WARMUP as usize] {
simd_json::to_borrowed_value(&mut bytes).unwrap();
}
let mut stats = Stats::default();
let mut stats = Stats::new(Deserializer::algorithm());
for mut bytes in &mut data_entries[WARMUP as usize..] {
// Set up counters
let pc = stats.start();
Expand Down Expand Up @@ -219,8 +231,8 @@ fn main() {
let matches = opts.parse(&args[1..]).unwrap();

println!(
"{:^20} {:^10} {:^21} {:^21} {:^21}",
" ", "", "Instructions", "Cache.", "Cycle/byte"
"{:^20} {:^10} {:^21} {:^21} {:^21} {:21}",
" ", "", "Instructions", "Cache.", "Cycle/byte", "Algorithm"
);
println!(
"{:^20} {:^10} {:^10} {:^10} {:^10} {:^10} {:^10} {:^10}",
Expand Down
2 changes: 0 additions & 2 deletions src/avx2/mod.rs

This file was deleted.

27 changes: 14 additions & 13 deletions src/charutils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ const STRUCTURAL_OR_WHITESPACE: [u32; 256] = [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
];

#[cfg_attr(not(feature = "no-inline"), inline(always))]
#[cfg_attr(not(feature = "no-inline"), inline)]
pub fn is_not_structural_or_whitespace(c: u8) -> u32 {
unsafe { *STRUCTURAL_OR_WHITESPACE_NEGATED.get_kinda_unchecked(c as usize) }
}

#[cfg_attr(not(feature = "no-inline"), inline(always))]
#[cfg_attr(not(feature = "no-inline"), inline)]
pub fn is_structural_or_whitespace(c: u8) -> u32 {
unsafe { *STRUCTURAL_OR_WHITESPACE.get_kinda_unchecked(c as usize) }
}
Expand Down Expand Up @@ -87,29 +87,30 @@ pub fn codepoint_to_utf8(cp: u32, c: &mut [u8]) -> usize {
unsafe {
if cp <= 0x7F {
*c.get_kinda_unchecked_mut(0) = cp as u8;
return 1; // ascii
}
if cp <= 0x7FF {
1 // ascii
} else if cp <= 0x7FF {
*c.get_kinda_unchecked_mut(0) = ((cp >> 6) + 192) as u8;
*c.get_kinda_unchecked_mut(1) = ((cp & 63) + 128) as u8;
return 2; // universal plane
// Surrogates are treated elsewhere...
//} //else if (0xd800 <= cp && cp <= 0xdfff) {
// return 0; // surrogates // could put assert here
2
// universal plane
// Surrogates are treated elsewhere...
//} //else if (0xd800 <= cp && cp <= 0xdfff) {
// return 0; // surrogates // could put assert here
} else if cp <= 0xFFFF {
*c.get_kinda_unchecked_mut(0) = ((cp >> 12) + 224) as u8;
*c.get_kinda_unchecked_mut(1) = (((cp >> 6) & 63) + 128) as u8;
*c.get_kinda_unchecked_mut(2) = ((cp & 63) + 128) as u8;
return 3;
3
} else if cp <= 0x0010_FFFF {
// if you know you have a valid code point, this is not needed
*c.get_kinda_unchecked_mut(0) = ((cp >> 18) + 240) as u8;
*c.get_kinda_unchecked_mut(1) = (((cp >> 12) & 63) + 128) as u8;
*c.get_kinda_unchecked_mut(2) = (((cp >> 6) & 63) + 128) as u8;
*c.get_kinda_unchecked_mut(3) = ((cp & 63) + 128) as u8;
return 4;
4
} else {
// will return 0 when the code point was too large.
0
}
}
// will return 0 when the code point was too large.
0
}
21 changes: 16 additions & 5 deletions src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ pub enum ErrorType {
/// Expected an unsigned number
ExpectedUnsigned,
/// Internal error
InternalError,
InternalError(InternalError),
/// Invalid escape sequence
InvalidEscape,
/// Invalid exponent in a floating point number
Expand Down Expand Up @@ -88,6 +88,12 @@ pub enum ErrorType {
Io(std::io::Error),
}

#[derive(Debug, PartialEq)]
pub enum InternalError {
InvalidStrucutralIndexes,
TapeError,
}

impl From<std::io::Error> for Error {
fn from(e: std::io::Error) -> Self {
Self::generic(ErrorType::Io(e))
Expand Down Expand Up @@ -116,7 +122,6 @@ impl PartialEq for ErrorType {
| (Self::ExpectedSigned, Self::ExpectedSigned)
| (Self::ExpectedString, Self::ExpectedString)
| (Self::ExpectedUnsigned, Self::ExpectedUnsigned)
| (Self::InternalError, Self::InternalError)
| (Self::InvalidEscape, Self::InvalidEscape)
| (Self::InvalidExponent, Self::InvalidExponent)
| (Self::InvalidNumber, Self::InvalidNumber)
Expand All @@ -136,6 +141,7 @@ impl PartialEq for ErrorType {
| (Self::ExpectedObjectKey, Self::ExpectedObjectKey)
| (Self::Overflow, Self::Overflow) => true,
(Self::Serde(s1), Self::Serde(s2)) => s1 == s2,
(Self::InternalError(e1), Self::InternalError(e2)) => e1 == e2,
_ => false,
}
}
Expand Down Expand Up @@ -195,10 +201,15 @@ impl From<Error> for std::io::Error {

#[cfg(test)]
mod test {
use super::{Error, ErrorType};
use super::{Error, ErrorType, InternalError};
#[test]
fn fmt() {
let e = Error::generic(ErrorType::InternalError);
assert_eq!(e.to_string(), "InternalError at character 0");
let e = Error::generic(ErrorType::InternalError(
InternalError::InvalidStrucutralIndexes,
));
assert_eq!(
e.to_string(),
"InternalError(InvalidStrucutralIndexes) at character 0"
);
}
}
29 changes: 16 additions & 13 deletions src/avx2/deser.rs → src/impls/avx2/deser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,32 +9,29 @@ use std::arch::x86_64::{
_mm256_storeu_si256,
};

use std::mem;

pub use crate::error::{Error, ErrorType};
use crate::safer_unchecked::GetSaferUnchecked;
use crate::stringparse::{handle_unicode_codepoint, ESCAPE_MAP};
use crate::Deserializer;
pub use crate::Result;
use crate::{
error::ErrorType,
safer_unchecked::GetSaferUnchecked,
stringparse::{handle_unicode_codepoint, ESCAPE_MAP},
Deserializer, Result, SillyWrapper,
};

#[target_feature(enable = "avx2")]
#[allow(
clippy::if_not_else,
clippy::transmute_ptr_to_ptr,
clippy::too_many_lines,
clippy::cast_ptr_alignment,
clippy::cast_possible_wrap,
clippy::if_not_else,
clippy::too_many_lines
)]
#[cfg_attr(not(feature = "no-inline"), inline)]
pub(crate) unsafe fn parse_str_avx<'invoke, 'de>(
input: *mut u8,
pub(crate) unsafe fn parse_str<'invoke, 'de>(
input: SillyWrapper<'de>,
data: &'invoke [u8],
buffer: &'invoke mut [u8],
mut idx: usize,
) -> Result<&'de str> {
use ErrorType::{InvalidEscape, InvalidUnicodeCodepoint};

let input = input.input;
// Add 1 to skip the initial "
idx += 1;
//let mut read: usize = 0;
Expand All @@ -47,6 +44,8 @@ pub(crate) unsafe fn parse_str_avx<'invoke, 'de>(
let mut src_i: usize = 0;
let mut len = src_i;
loop {
// _mm256_loadu_si256 does not require alignment
#[allow(clippy::cast_ptr_alignment)]
let v: __m256i =
_mm256_loadu_si256(src.as_ptr().add(src_i).cast::<std::arch::x86_64::__m256i>());

Expand Down Expand Up @@ -96,9 +95,13 @@ pub(crate) unsafe fn parse_str_avx<'invoke, 'de>(

// To be more conform with upstream
loop {
// _mm256_loadu_si256 does not require alignment
#[allow(clippy::cast_ptr_alignment)]
let v: __m256i =
_mm256_loadu_si256(src.as_ptr().add(src_i).cast::<std::arch::x86_64::__m256i>());

// _mm256_storeu_si256 does not require alignment
#[allow(clippy::cast_ptr_alignment)]
_mm256_storeu_si256(
buffer
.as_mut_ptr()
Expand Down
6 changes: 6 additions & 0 deletions src/impls/avx2/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#![allow(unused_imports, dead_code)]
mod deser;
mod stage1;

pub(crate) use deser::parse_str;
pub(crate) use stage1::SimdInput;
Loading