Per-meshlet compressed vertex data (bevyengine#15643)

# Objective - Prepare for streaming by storing vertex data per-meshlet, rather than per-mesh (this means duplicating vertices per-meshlet) - Compress vertex data to reduce the cost of this ## Solution The important parts are in from_mesh.rs, the changes to the Meshlet type in asset.rs, and the changes in meshlet_bindings.wgsl. Everything else is pretty secondary/boilerplate/straightforward changes. - Positions are quantized in centimeters with a user-provided power of 2 factor (ideally auto-determined, but that's a TODO for the future), encoded as an offset relative to the minimum value within the meshlet, and then stored as a packed list of bits using the minimum number of bits needed for each vertex position channel for that meshlet - E.g. quantize positions (lossly, throws away precision that's not needed leading to using less bits in the bitstream encoding) - Get the min/max quantized value of each X/Y/Z channel of the quantized positions within a meshlet - Encode values relative to the min value of the meshlet. E.g. convert from [min, max] to [0, max - min] - The new max value in the meshlet is (max - min), which only takes N bits, so we only need N bits to store each channel within the meshlet (lossless) - We can store the min value and that it takes N bits per channel in the meshlet metadata, and reconstruct the position from the bitstream - Normals are octahedral encoded and than snorm2x16 packed and stored as a single u32. - Would be better to implement the precise variant of octhedral encoding for extra precision (no extra decode cost), but decided to keep it simple for now and leave that as a followup - Tried doing a quantizing and bitstream encoding scheme like I did for positions, but struggled to get it smaller. Decided to go with this for simplicity for now - UVs are uncompressed and take a full 64bits per vertex which is expensive - In the future this should be improved - Tangents, as of the previous PR, are not explicitly stored and are instead derived from screen space gradients - While I'm here, split up MeshletMeshSaverLoader into two separate types Other future changes include implementing a smaller encoding of triangle data (3 u8 indices = 24 bits per triangle currently), and more disk-oriented compression schemes. References: * "A Deep Dive into UE5's Nanite Virtualized Geometry" https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf#page=128 (also available on youtube) * "Towards Practical Meshlet Compression" https://arxiv.org/pdf/2404.06359 * "Vertex quantization in Omniforce Game Engine" https://daniilvinn.github.io/2024/05/04/omniforce-vertex-quantization.html ## Testing - Did you test these changes? If so, how? - Converted the stanford bunny, and rendered it with a debug material showing normals, and confirmed that it's identical to what's on main. EDIT: See additional testing in the comments below. - Are there any parts that need more testing? - Could use some more size comparisons on various meshes, and testing different quantization factors. Not sure if 4 is a good default. EDIT: See additional testing in the comments below. - Also did not test runtime performance of the shaders. EDIT: See additional testing in the comments below. - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Use my unholy script, replacing the meshlet example https://paste.rs/7xQHk.rs (must make MeshletMesh fields pub instead of pub crate, must add lz4_flex as a dev-dependency) (must compile with meshlet and meshlet_processor features, mesh must have only positions, normals, and UVs, no vertex colors or tangents) --- ## Migration Guide - TBD by JMS55 at the end of the release
ickshonpe · Oct 8, 2024 · aa626e4 · aa626e4
1 parent f6cd6a4
commit aa626e4
Show file tree

Hide file tree

Showing 14 changed files with 465 additions and 170 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -1209,7 +1209,7 @@ setup = [
     "curl",
     "-o",
     "assets/models/bunny.meshlet_mesh",
-    "https://raw.githubusercontent.com/JMS55/bevy_meshlet_asset/854eb98353ad94aea1104f355fc24dbe4fda679d/bunny.meshlet_mesh",
+    "https://raw.githubusercontent.com/JMS55/bevy_meshlet_asset/8443bbdee0bf517e6c297dede7f6a46ab712ee4c/bunny.meshlet_mesh",
   ],
 ]
 

diff --git a/crates/bevy_pbr/Cargo.toml b/crates/bevy_pbr/Cargo.toml
@@ -20,7 +20,13 @@ ios_simulator = ["bevy_render/ios_simulator"]
 # Enables the meshlet renderer for dense high-poly scenes (experimental)
 meshlet = ["dep:lz4_flex", "dep:thiserror", "dep:range-alloc", "dep:bevy_tasks"]
 # Enables processing meshes into meshlet meshes
-meshlet_processor = ["meshlet", "dep:meshopt", "dep:metis", "dep:itertools"]
+meshlet_processor = [
+  "meshlet",
+  "dep:meshopt",
+  "dep:metis",
+  "dep:itertools",
+  "dep:bitvec",
+]
 
 [dependencies]
 # bevy
@@ -53,6 +59,7 @@ range-alloc = { version = "0.1.3", optional = true }
 meshopt = { version = "0.3.0", optional = true }
 metis = { version = "0.2", optional = true }
 itertools = { version = "0.13", optional = true }
+bitvec = { version = "1", optional = true }
 # direct dependency required for derive macro
 bytemuck = { version = "1", features = ["derive", "must_cast"] }
 radsort = "0.1"

diff --git a/crates/bevy_pbr/src/meshlet/asset.rs b/crates/bevy_pbr/src/meshlet/asset.rs
@@ -4,7 +4,7 @@ use bevy_asset::{
     saver::{AssetSaver, SavedAsset},
     Asset, AssetLoader, AsyncReadExt, AsyncWriteExt, LoadContext,
 };
-use bevy_math::Vec3;
+use bevy_math::{Vec2, Vec3};
 use bevy_reflect::TypePath;
 use bevy_tasks::block_on;
 use bytemuck::{Pod, Zeroable};
@@ -38,30 +38,51 @@ pub const MESHLET_MESH_ASSET_VERSION: u64 = 1;
 /// See also [`super::MaterialMeshletMeshBundle`] and [`super::MeshletPlugin`].
 #[derive(Asset, TypePath, Clone)]
 pub struct MeshletMesh {
-    /// Raw vertex data bytes for the overall mesh.
-    pub(crate) vertex_data: Arc<[u8]>,
-    /// Indices into `vertex_data`.
-    pub(crate) vertex_ids: Arc<[u32]>,
-    /// Indices into `vertex_ids`.
+    /// Quantized and bitstream-packed vertex positions for meshlet vertices.
+    pub(crate) vertex_positions: Arc<[u32]>,
+    /// Octahedral-encoded and 2x16snorm packed normals for meshlet vertices.
+    pub(crate) vertex_normals: Arc<[u32]>,
+    /// Uncompressed vertex texture coordinates for meshlet vertices.
+    pub(crate) vertex_uvs: Arc<[Vec2]>,
+    /// Triangle indices for meshlets.
     pub(crate) indices: Arc<[u8]>,
     /// The list of meshlets making up this mesh.
     pub(crate) meshlets: Arc<[Meshlet]>,
     /// Spherical bounding volumes.
-    pub(crate) bounding_spheres: Arc<[MeshletBoundingSpheres]>,
+    pub(crate) meshlet_bounding_spheres: Arc<[MeshletBoundingSpheres]>,
 }
 
 /// A single meshlet within a [`MeshletMesh`].
 #[derive(Copy, Clone, Pod, Zeroable)]
 #[repr(C)]
 pub struct Meshlet {
-    /// The offset within the parent mesh's [`MeshletMesh::vertex_ids`] buffer where the indices for this meshlet begin.
-    pub start_vertex_id: u32,
+    /// The bit offset within the parent mesh's [`MeshletMesh::vertex_positions`] buffer where the vertex positions for this meshlet begin.
+    pub start_vertex_position_bit: u32,
+    /// The offset within the parent mesh's [`MeshletMesh::vertex_normals`] and [`MeshletMesh::vertex_uvs`] buffers
+    /// where non-position vertex attributes for this meshlet begin.
+    pub start_vertex_attribute_id: u32,
     /// The offset within the parent mesh's [`MeshletMesh::indices`] buffer where the indices for this meshlet begin.
     pub start_index_id: u32,
     /// The amount of vertices in this meshlet.
-    pub vertex_count: u32,
+    pub vertex_count: u8,
     /// The amount of triangles in this meshlet.
-    pub triangle_count: u32,
+    pub triangle_count: u8,
+    /// Unused.
+    pub padding: u16,
+    /// Number of bits used to to store the X channel of vertex positions within this meshlet.
+    pub bits_per_vertex_position_channel_x: u8,
+    /// Number of bits used to to store the Y channel of vertex positions within this meshlet.
+    pub bits_per_vertex_position_channel_y: u8,
+    /// Number of bits used to to store the Z channel of vertex positions within this meshlet.
+    pub bits_per_vertex_position_channel_z: u8,
+    /// Power of 2 factor used to quantize vertex positions within this meshlet.
+    pub vertex_position_quantization_factor: u8,
+    /// Minimum quantized X channel value of vertex positions within this meshlet.
+    pub min_vertex_position_channel_x: f32,
+    /// Minimum quantized Y channel value of vertex positions within this meshlet.
+    pub min_vertex_position_channel_y: f32,
+    /// Minimum quantized Z channel value of vertex positions within this meshlet.
+    pub min_vertex_position_channel_z: f32,
 }
 
 /// Bounding spheres used for culling and choosing level of detail for a [`Meshlet`].
@@ -84,13 +105,13 @@ pub struct MeshletBoundingSphere {
     pub radius: f32,
 }
 
-/// An [`AssetLoader`] and [`AssetSaver`] for `.meshlet_mesh` [`MeshletMesh`] assets.
-pub struct MeshletMeshSaverLoader;
+/// An [`AssetSaver`] for `.meshlet_mesh` [`MeshletMesh`] assets.
+pub struct MeshletMeshSaver;
 
-impl AssetSaver for MeshletMeshSaverLoader {
+impl AssetSaver for MeshletMeshSaver {
     type Asset = MeshletMesh;
     type Settings = ();
-    type OutputLoader = Self;
+    type OutputLoader = MeshletMeshLoader;
     type Error = MeshletMeshSaveOrLoadError;
 
     async fn save(
@@ -111,18 +132,22 @@ impl AssetSaver for MeshletMeshSaverLoader {
 
         // Compress and write asset data
         let mut writer = FrameEncoder::new(AsyncWriteSyncAdapter(writer));
-        write_slice(&asset.vertex_data, &mut writer)?;
-        write_slice(&asset.vertex_ids, &mut writer)?;
+        write_slice(&asset.vertex_positions, &mut writer)?;
+        write_slice(&asset.vertex_normals, &mut writer)?;
+        write_slice(&asset.vertex_uvs, &mut writer)?;
         write_slice(&asset.indices, &mut writer)?;
         write_slice(&asset.meshlets, &mut writer)?;
-        write_slice(&asset.bounding_spheres, &mut writer)?;
+        write_slice(&asset.meshlet_bounding_spheres, &mut writer)?;
         writer.finish()?;
 
         Ok(())
     }
 }
 
-impl AssetLoader for MeshletMeshSaverLoader {
+/// An [`AssetLoader`] for `.meshlet_mesh` [`MeshletMesh`] assets.
+pub struct MeshletMeshLoader;
+
+impl AssetLoader for MeshletMeshLoader {
     type Asset = MeshletMesh;
     type Settings = ();
     type Error = MeshletMeshSaveOrLoadError;
@@ -147,18 +172,20 @@ impl AssetLoader for MeshletMeshSaverLoader {
 
         // Load and decompress asset data
         let reader = &mut FrameDecoder::new(AsyncReadSyncAdapter(reader));
-        let vertex_data = read_slice(reader)?;
-        let vertex_ids = read_slice(reader)?;
+        let vertex_positions = read_slice(reader)?;
+        let vertex_normals = read_slice(reader)?;
+        let vertex_uvs = read_slice(reader)?;
         let indices = read_slice(reader)?;
         let meshlets = read_slice(reader)?;
-        let bounding_spheres = read_slice(reader)?;
+        let meshlet_bounding_spheres = read_slice(reader)?;
 
         Ok(MeshletMesh {
-            vertex_data,
-            vertex_ids,
+            vertex_positions,
+            vertex_normals,
+            vertex_uvs,
             indices,
             meshlets,
-            bounding_spheres,
+            meshlet_bounding_spheres,
         })
     }