Skip to content

Commit

Permalink
Meshlet continuous LOD (#12755)
Browse files Browse the repository at this point in the history
Adds a basic level of detail system to meshlets. An extremely brief
summary is as follows:
* In `from_mesh.rs`, once we've built the first level of clusters, we
group clusters, simplify the new mega-clusters, and then split the
simplified groups back into regular sized clusters. Repeat several times
(ideally until you can't anymore). This forms a directed acyclic graph
(DAG), where the children are the meshlets from the previous level, and
the parents are the more simplified versions of their children. The leaf
nodes are meshlets formed from the original mesh.
* In `cull_meshlets.wgsl`, each cluster selects whether to render or not
based on the LOD bounding sphere (different than the culling bounding
sphere) of the current meshlet, the LOD bounding sphere of its parent
(the meshlet group from simplification), and the simplification error
relative to its children of both the current meshlet and its parent
meshlet. This kind of breaks two pass occlusion culling, which will be
fixed in a future PR by using an HZB from the previous frame to get the
initial list of occluders.

Many, _many_ improvements to be done in the future
#11518, not least of which is
code quality and speed. I don't even expect this to work on many types
of input meshes. This is just a basic implementation/draft for
collaboration.

Arguable how much we want to do in this PR, I'll leave that up to
maintainers. I've erred on the side of "as basic as possible".

References:
* Slides 27-77 (video available on youtube)
https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf
*
https://blog.traverseresearch.nl/creating-a-directed-acyclic-graph-from-a-mesh-1329e57286e5
*
https://jglrxavpok.github.io/2024/01/19/recreating-nanite-lod-generation.html,
https://jglrxavpok.github.io/2024/03/12/recreating-nanite-faster-lod-generation.html,
https://jglrxavpok.github.io/2024/04/02/recreating-nanite-runtime-lod-selection.html,
and https://github.com/jglrxavpok/Carrot
*
https://github.com/gents83/INOX/tree/master/crates/plugins/binarizer/src
* https://cs418.cs.illinois.edu/website/text/nanite.html


![image](https://github.com/bevyengine/bevy/assets/47158642/e40bff9b-7d0c-4a19-a3cc-2aad24965977)

![image](https://github.com/bevyengine/bevy/assets/47158642/442c7da3-7761-4da7-9acd-37f15dd13e26)

---------

Co-authored-by: Ricky Taylor <[email protected]>
Co-authored-by: vero <[email protected]>
Co-authored-by: François <[email protected]>
Co-authored-by: atlas dostal <[email protected]>
Co-authored-by: Patrick Walton <[email protected]>
  • Loading branch information
6 people authored Apr 23, 2024
1 parent 17633c1 commit 6d6810c
Show file tree
Hide file tree
Showing 12 changed files with 491 additions and 159 deletions.
Binary file removed assets/models/bunny.meshlet_mesh
Binary file not shown.
27 changes: 20 additions & 7 deletions crates/bevy_pbr/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,15 @@ shader_format_glsl = ["bevy_render/shader_format_glsl"]
trace = ["bevy_render/trace"]
ios_simulator = ["bevy_render/ios_simulator"]
# Enables the meshlet renderer for dense high-poly scenes (experimental)
meshlet = ["dep:range-alloc", "dep:bincode"]
meshlet = [
"dep:lz4_flex",
"dep:serde",
"dep:bincode",
"dep:thiserror",
"dep:range-alloc",
]
# Enables processing meshes into meshlet meshes
meshlet_processor = ["dep:meshopt", "dep:thiserror"]
meshlet_processor = ["meshlet", "dep:meshopt", "dep:metis", "dep:itertools"]

[dependencies]
# bevy
Expand All @@ -37,18 +43,25 @@ bevy_utils = { path = "../bevy_utils", version = "0.14.0-dev" }
bevy_window = { path = "../bevy_window", version = "0.14.0-dev" }
bevy_derive = { path = "../bevy_derive", version = "0.14.0-dev" }


# other
meshopt = { version = "0.2", optional = true }
thiserror = { version = "1", optional = true }
bitflags = "2.3"
fixedbitset = "0.5"
# meshlet
lz4_flex = { version = "0.11", default-features = false, features = [
"frame",
], optional = true }
serde = { version = "1", features = ["derive", "rc"], optional = true }
bincode = { version = "1", optional = true }
thiserror = { version = "1", optional = true }
range-alloc = { version = "0.1", optional = true }
meshopt = { version = "0.2", optional = true }
metis = { version = "0.2", optional = true }
itertools = { version = "0.12", optional = true }
# direct dependency required for derive macro
bytemuck = { version = "1", features = ["derive", "must_cast"] }
radsort = "0.1"
smallvec = "1.6"
serde = { version = "1", features = ["derive", "rc"] }
bincode = { version = "1", optional = true }
range-alloc = { version = "0.1", optional = true }
nonmax = "0.5"
static_assertions = "1"

Expand Down
2 changes: 1 addition & 1 deletion crates/bevy_pbr/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// FIXME(3492): remove once docs are ready
#![allow(missing_docs)]
#![cfg_attr(docsrs, feature(doc_auto_cfg))]
#![forbid(unsafe_code)]
#![deny(unsafe_code)]
#![doc(
html_logo_url = "https://bevyengine.org/assets/icon.png",
html_favicon_url = "https://bevyengine.org/assets/icon.png"
Expand Down
73 changes: 62 additions & 11 deletions crates/bevy_pbr/src/meshlet/asset.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
use bevy_asset::{
io::{Reader, Writer},
io::{AsyncReadAndSeek, Reader, Writer},
saver::{AssetSaver, SavedAsset},
Asset, AssetLoader, AsyncReadExt, AsyncWriteExt, LoadContext,
};
use bevy_math::Vec3;
use bevy_reflect::TypePath;
use bytemuck::{Pod, Zeroable};
use lz4_flex::frame::{FrameDecoder, FrameEncoder};
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use std::{io::Cursor, sync::Arc};

/// The current version of the [`MeshletMesh`] asset format.
pub const MESHLET_MESH_ASSET_VERSION: u64 = 0;

/// A mesh that has been pre-processed into multiple small clusters of triangles called meshlets.
///
Expand All @@ -25,8 +29,8 @@ use std::sync::Arc;
/// See also [`super::MaterialMeshletMeshBundle`] and [`super::MeshletPlugin`].
#[derive(Asset, TypePath, Serialize, Deserialize, Clone)]
pub struct MeshletMesh {
/// The total amount of triangles summed across all meshlets in the mesh.
pub total_meshlet_triangles: u64,
/// The total amount of triangles summed across all LOD 0 meshlets in the mesh.
pub worst_case_meshlet_triangles: u64,
/// Raw vertex data bytes for the overall mesh.
pub vertex_data: Arc<[u8]>,
/// Indices into `vertex_data`.
Expand All @@ -35,8 +39,8 @@ pub struct MeshletMesh {
pub indices: Arc<[u8]>,
/// The list of meshlets making up this mesh.
pub meshlets: Arc<[Meshlet]>,
/// A list of spherical bounding volumes, 1 per meshlet.
pub meshlet_bounding_spheres: Arc<[MeshletBoundingSphere]>,
/// Spherical bounding volumes.
pub bounding_spheres: Arc<[MeshletBoundingSpheres]>,
}

/// A single meshlet within a [`MeshletMesh`].
Expand All @@ -51,7 +55,19 @@ pub struct Meshlet {
pub triangle_count: u32,
}

/// A spherical bounding volume used for culling a [`Meshlet`].
/// Bounding spheres used for culling and choosing level of detail for a [`Meshlet`].
#[derive(Serialize, Deserialize, Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct MeshletBoundingSpheres {
/// The bounding sphere used for frustum and occlusion culling for this meshlet.
pub self_culling: MeshletBoundingSphere,
/// The bounding sphere used for determining if this meshlet is at the correct level of detail for a given view.
pub self_lod: MeshletBoundingSphere,
/// The bounding sphere used for determining if this meshlet's parent is at the correct level of detail for a given view.
pub parent_lod: MeshletBoundingSphere,
}

/// A spherical bounding volume used for a [`Meshlet`].
#[derive(Serialize, Deserialize, Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct MeshletBoundingSphere {
Expand All @@ -65,17 +81,24 @@ pub struct MeshletMeshSaverLoad;
impl AssetLoader for MeshletMeshSaverLoad {
type Asset = MeshletMesh;
type Settings = ();
type Error = bincode::Error;
type Error = MeshletMeshSaveOrLoadError;

async fn load<'a>(
&'a self,
reader: &'a mut Reader<'_>,
_settings: &'a Self::Settings,
_load_context: &'a mut LoadContext<'_>,
) -> Result<Self::Asset, Self::Error> {
let version = read_u64(reader).await?;
if version != MESHLET_MESH_ASSET_VERSION {
return Err(MeshletMeshSaveOrLoadError::WrongVersion { found: version });
}

let mut bytes = Vec::new();
reader.read_to_end(&mut bytes).await?;
bincode::deserialize(&bytes)
let asset = bincode::deserialize_from(FrameDecoder::new(Cursor::new(bytes)))?;

Ok(asset)
}

fn extensions(&self) -> &[&str] {
Expand All @@ -87,16 +110,44 @@ impl AssetSaver for MeshletMeshSaverLoad {
type Asset = MeshletMesh;
type Settings = ();
type OutputLoader = Self;
type Error = bincode::Error;
type Error = MeshletMeshSaveOrLoadError;

async fn save<'a>(
&'a self,
writer: &'a mut Writer,
asset: SavedAsset<'a, Self::Asset>,
_settings: &'a Self::Settings,
) -> Result<(), Self::Error> {
let bytes = bincode::serialize(asset.get())?;
writer
.write_all(&MESHLET_MESH_ASSET_VERSION.to_le_bytes())
.await?;

let mut bytes = Vec::new();
let mut sync_writer = FrameEncoder::new(&mut bytes);
bincode::serialize_into(&mut sync_writer, asset.get())?;
sync_writer.finish()?;
writer.write_all(&bytes).await?;

Ok(())
}
}

#[derive(thiserror::Error, Debug)]
pub enum MeshletMeshSaveOrLoadError {
#[error("expected asset version {MESHLET_MESH_ASSET_VERSION} but found version {found}")]
WrongVersion { found: u64 },
#[error("failed to serialize or deserialize asset data")]
SerializationOrDeserialization(#[from] bincode::Error),
#[error("failed to compress or decompress asset data")]
CompressionOrDecompression(#[from] lz4_flex::frame::Error),
#[error("failed to read or write asset data")]
Io(#[from] std::io::Error),
}

async fn read_u64(
reader: &mut (dyn AsyncReadAndSeek + Sync + Send + Unpin),
) -> Result<u64, bincode::Error> {
let mut bytes = [0u8; 8];
reader.read_exact(&mut bytes).await?;
Ok(u64::from_le_bytes(bytes))
}
47 changes: 38 additions & 9 deletions crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,34 @@
@compute
@workgroup_size(128, 1, 1) // 128 threads per workgroup, 1 instanced meshlet per thread
fn cull_meshlets(@builtin(global_invocation_id) cluster_id: vec3<u32>) {
// Fetch the instanced meshlet data
// Fetch the instance data and check for instance culling
if cluster_id.x >= arrayLength(&meshlet_thread_meshlet_ids) { return; }
let instance_id = meshlet_thread_instance_ids[cluster_id.x];
if should_cull_instance(instance_id) {
return;
}

// Fetch other meshlet data
let meshlet_id = meshlet_thread_meshlet_ids[cluster_id.x];
let bounding_sphere = meshlet_bounding_spheres[meshlet_id];
let instance_uniform = meshlet_instance_uniforms[instance_id];
let model = affine3_to_square(instance_uniform.model);
let model_scale = max(length(model[0]), max(length(model[1]), length(model[2])));
let bounding_sphere_center = model * vec4(bounding_sphere.center, 1.0);
let bounding_sphere_radius = model_scale * bounding_sphere.radius;
let bounding_spheres = meshlet_bounding_spheres[meshlet_id];

// Calculate view-space LOD bounding sphere for the meshlet
let lod_bounding_sphere_center = model * vec4(bounding_spheres.self_lod.center, 1.0);
let lod_bounding_sphere_radius = model_scale * bounding_spheres.self_lod.radius;
let lod_bounding_sphere_center_view_space = (view.inverse_view * vec4(lod_bounding_sphere_center.xyz, 1.0)).xyz;

// Calculate view-space LOD bounding sphere for the meshlet's parent
let parent_lod_bounding_sphere_center = model * vec4(bounding_spheres.parent_lod.center, 1.0);
let parent_lod_bounding_sphere_radius = model_scale * bounding_spheres.parent_lod.radius;
let parent_lod_bounding_sphere_center_view_space = (view.inverse_view * vec4(parent_lod_bounding_sphere_center.xyz, 1.0)).xyz;

// Check LOD cut (meshlet error imperceptible, and parent error not imperceptible)
let lod_is_ok = lod_error_is_imperceptible(lod_bounding_sphere_center_view_space, lod_bounding_sphere_radius);
let parent_lod_is_ok = lod_error_is_imperceptible(parent_lod_bounding_sphere_center_view_space, parent_lod_bounding_sphere_radius);
if !lod_is_ok || parent_lod_is_ok { return; }

// In the first pass, operate only on the clusters visible last frame. In the second pass, operate on all clusters.
#ifdef MESHLET_SECOND_CULLING_PASS
Expand All @@ -42,18 +57,22 @@ fn cull_meshlets(@builtin(global_invocation_id) cluster_id: vec3<u32>) {
if !meshlet_visible { return; }
#endif

// Calculate world-space culling bounding sphere for the cluster
let culling_bounding_sphere_center = model * vec4(bounding_spheres.self_culling.center, 1.0);
let culling_bounding_sphere_radius = model_scale * bounding_spheres.self_culling.radius;

// Frustum culling
// TODO: Faster method from https://vkguide.dev/docs/gpudriven/compute_culling/#frustum-culling-function
for (var i = 0u; i < 6u; i++) {
if !meshlet_visible { break; }
meshlet_visible &= dot(view.frustum[i], bounding_sphere_center) > -bounding_sphere_radius;
meshlet_visible &= dot(view.frustum[i], culling_bounding_sphere_center) > -culling_bounding_sphere_radius;
}

#ifdef MESHLET_SECOND_CULLING_PASS
// In the second culling pass, cull against the depth pyramid generated from the first pass
if meshlet_visible {
let bounding_sphere_center_view_space = (view.inverse_view * vec4(bounding_sphere_center.xyz, 1.0)).xyz;
let aabb = project_view_space_sphere_to_screen_space_aabb(bounding_sphere_center_view_space, bounding_sphere_radius);
let culling_bounding_sphere_center_view_space = (view.inverse_view * vec4(culling_bounding_sphere_center.xyz, 1.0)).xyz;
let aabb = project_view_space_sphere_to_screen_space_aabb(culling_bounding_sphere_center_view_space, culling_bounding_sphere_radius);

// Halve the AABB size because the first depth mip resampling pass cut the full screen resolution into a power of two conservatively
let depth_pyramid_size_mip_0 = vec2<f32>(textureDimensions(depth_pyramid, 0)) * 0.5;
Expand All @@ -71,11 +90,11 @@ fn cull_meshlets(@builtin(global_invocation_id) cluster_id: vec3<u32>) {
let occluder_depth = min(min(depth_quad_a, depth_quad_b), min(depth_quad_c, depth_quad_d));
if view.projection[3][3] == 1.0 {
// Orthographic
let sphere_depth = view.projection[3][2] + (bounding_sphere_center_view_space.z + bounding_sphere_radius) * view.projection[2][2];
let sphere_depth = view.projection[3][2] + (culling_bounding_sphere_center_view_space.z + culling_bounding_sphere_radius) * view.projection[2][2];
meshlet_visible &= sphere_depth >= occluder_depth;
} else {
// Perspective
let sphere_depth = -view.projection[3][2] / (bounding_sphere_center_view_space.z + bounding_sphere_radius);
let sphere_depth = -view.projection[3][2] / (culling_bounding_sphere_center_view_space.z + culling_bounding_sphere_radius);
meshlet_visible &= sphere_depth >= occluder_depth;
}
}
Expand All @@ -86,6 +105,16 @@ fn cull_meshlets(@builtin(global_invocation_id) cluster_id: vec3<u32>) {
atomicOr(&meshlet_occlusion[cluster_id.x / 32u], occlusion_bit);
}

// https://stackoverflow.com/questions/21648630/radius-of-projected-sphere-in-screen-space/21649403#21649403
fn lod_error_is_imperceptible(cp: vec3<f32>, r: f32) -> bool {
let d2 = dot(cp, cp);
let r2 = r * r;
let sphere_diameter_uv = view.projection[0][0] * r / sqrt(d2 - r2);
let view_size = f32(max(view.viewport.z, view.viewport.w));
let sphere_diameter_pixels = sphere_diameter_uv * view_size;
return sphere_diameter_pixels < 1.0;
}

// https://zeux.io/2023/01/12/approximate-projected-bounds
fn project_view_space_sphere_to_screen_space_aabb(cp: vec3<f32>, r: f32) -> vec4<f32> {
let inv_width = view.projection[0][0] * 0.5;
Expand Down
Loading

0 comments on commit 6d6810c

Please sign in to comment.