-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dx11] add memory flush/invalidate & image/buffer copies #2149
Conversation
Makes `Memory` keep track of ranges that buffers are bound to and allows for multiple buffers to be bound to the same `memory` now. This gets us around 50% green in `dEQP-VK.api.buffer`
|
||
uint Uint4ToUint(uint4 data) | ||
{ | ||
data.x = min(data.x, 0x000000ff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could probably vectorize this better, e.g.
return dot(min(data, 0xFF), 1 << uint4(0, 8, 16, 24));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! 😄
|
||
uint4 UintToUint4(uint data) | ||
{ | ||
return uint4((data & 0xff000000) >> 24, (data & 0xff0000) >> 16, (data & 0xff00) >> 8, data & 0xff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe even simpler:
return ((data >> uint4(24, 16, 8, 0)) & 0xFF;
RWBuffer<uint> BufferCopyDst: register(u0); | ||
|
||
// R32 | ||
Texture2D<uint> ImageCopySrcR32 : register(t0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should totally share these definitions between different bit sizes (R16/R32/R8/etc)
@@ -24,62 +24,84 @@ pub fn map_index_type(ty: IndexType) -> DXGI_FORMAT { | |||
} | |||
} | |||
|
|||
pub fn typeless_format(format: DXGI_FORMAT) -> Option<DXGI_FORMAT> { | |||
pub fn typeless_format(format: DXGI_FORMAT) -> Option<(DXGI_FORMAT, DXGI_FORMAT)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the return value semantics now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_TYPELESS
and fully qualified UAV format (eg. R32_UINT
)
src/backend/dx11/src/device.rs
Outdated
// go through every range we wrote to | ||
for range in ranges.into_iter() { | ||
let &(memory, ref range) = range.borrow(); | ||
let range = *range.start().unwrap_or(&0)..*range.end().unwrap_or(&memory.size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a helper memory.resolve(range)
function for ranges, like we do in Metal.
src/backend/dx11/src/internal.rs
Outdated
assert_eq!(true, winerror::SUCCEEDED(hr)); | ||
fn compile(device: ComPtr<d3d11::ID3D11Device>, entrypoint: &str) -> ComPtr<d3d11::ID3D11ComputeShader> { | ||
let shader_src = include_bytes!("../shaders/copy.hlsl"); | ||
let bytecode = unsafe { ComPtr::from_raw(shader::compile_hlsl_shader(Stage::Compute, spirv_cross::hlsl::ShaderModel::V5_0, entrypoint, shader_src).unwrap()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's split into multiple lines
@@ -1363,7 +1530,7 @@ pub struct DescriptorSet { | |||
#[derivative(Debug="ignore")] | |||
srv_handles: RefCell<Vec<(u32, ComPtr<d3d11::ID3D11ShaderResourceView>)>>, | |||
#[derivative(Debug="ignore")] | |||
cbv_handles: RefCell<Vec<(u32, ComPtr<d3d11::ID3D11Buffer>)>>, | |||
cbv_handles: RefCell<Vec<(u32, *mut d3d11::ID3D11Buffer)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the raw pointer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to avoid having a Drop
impl for the Buffer
struct, since portability creates a zeroed struct which gets dropped (and tries to free null ComPtr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a bug in portability
than that we need to fix
|
||
[numthreads(1, 1, 1)] | ||
void cs_copy_buffer_image2d_r16(uint3 dispatch_thread_id : SV_DispatchThreadID) { | ||
uint src_idx = BufferImageCopies.BufferVars.x + dispatch_thread_id.x + dispatch_thread_id.y * BufferImageCopies.BufferVars.y / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't GetBufferSrc
used here?
|
||
uint2 data = UintToUint2(BufferCopySrc[src_idx]); | ||
|
||
ImageCopyDstR16[GetImageDst(uint3(2, 1, 0) * dispatch_thread_id + uint3(0, 0, 0))] = data.y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand this. The image is R16
, why do we need to write into 2 different texels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R32_UINT
is the only format which has guaranteed UAV typed load support[0], so I was aiming for 1 buffer (source) access per thread, so in the case of R16
one thread would have to do 2 writes. The dispatch call is also scaled depending on the format.
0: https://msdn.microsoft.com/en-us/library/windows/desktop/mt427455(v=vs.85).aspx
Fixes previous `Memory` implementation. Now works like the following: ``` 0.........................size +----------------------------+ | Memory | +----------------------------+ A..B C.....D E...F 1 fixed-size `STAGING` buffer which gets used for reading back from resources.(and should be used to copy from/to on flush/invalidate): (0..size, ComPtr<Buffer>) 1 `Vec<u8>` which covers the whole memory range (0..size). This is pointer we hand out to users. flush/invalidate moves the affected regions into our `STAGING` buffer to get read/uploaded. *N* Resources: (A..B, ComPtr<Resource>), (C..D, ComPtr<Resource>), (E..F, ComPtr<Resource>), ``` Implements copying between images and buffers. Image<->Image copies are mostly handled by `CopySubresourceRegion` but some formats, while same size, cant be copied with this method: > Cannot invoke CopySubresourceRegion when the Formats of each Resource are not the same or at least castable to each other, unless one format is compressed (DXGI_FORMAT_R9G9B9E5_SHAREDEXP, or DXGI_FORMAT_BC[1,2,3,4,5]_* ) and the source format is similar to the dest according to: BC[1|4] ~= R16G16B16A16|R32G32, BC[2|3|5] ~= R32G32B32A32, R9G9B9E5_SHAREDEXP ~= R32. [ RESOURCE_MANIPULATION ERROR gfx-rs#281: ] These has to be done through compute shaders instead. Image->Buffer & Buffer->Image copies also have to be done through compute shaders, as `CopySubresourceRegion` can only copy between resources of same type (Image<->Image, Buffer<->Buffer). The following formats have Buffer->Image and Image->Buffer copies implemented with these changes: * `R8` * `Rg8` * `R16` * `Rg16` * `R32`
Updated with some changes, most of the other comments are addressed in the blitting stuff I have in another branch |
bors r+ |
2071: Remaping descriptor sets in the gl backend r=kvark a=ZeGentzy I'll rebase off master when I'm done. Uniforms in gl only have a bindings field, not a set one. This means that for shaders that use multiple sets to work, we must change where we are binding them. See page 14 for what I mean: https://www.khronos.org/assets/uploads/developers/library/2016-vulkan-devday-uk/4-Using-spir-v-with-spirv-cross.pdf PR checklist: - [ ] `make` succeeds (on *nix) - [ ] `make reftests` succeeds - [ ] tested examples with the following backends: 2149: [dx11] add memory flush/invalidate & image/buffer copies r=kvark a=fkaa Main changes are adding more robust implementation of `Memory`, adding flush/invalidate and adding image/buffer copies. Implements `Memory` like the following (for `HOST_VISIBLE` memory): ``` 0.........................size +----------------------------+ | Memory | +----------------------------+ A..B C.....D E...F 1 fixed-size `STAGING` buffer which gets used for reading back from resources.(and should be used to copy from/to on flush/invalidate): (0..size, ComPtr<Buffer>) 1 `Vec<u8>` which covers the whole memory range (0..size). This is pointer we hand out to users. flush/invalidate moves the affected regions into our `STAGING` buffer to get read/uploaded. *N* Resources: (A..B, ComPtr<Resource>), (C..D, ComPtr<Resource>), (E..F, ComPtr<Resource>), ``` Implements copying between images and buffers. Image<->Image copies are mostly handled by `CopySubresourceRegion` but some formats, while same size, cant be copied with this method: > Cannot invoke CopySubresourceRegion when the Formats of each Resource are not the same or at least castable to each other, unless one format is compressed (DXGI_FORMAT_R9G9B9E5_SHAREDEXP, or DXGI_FORMAT_BC[1,2,3,4,5]_* ) and the source format is similar to the dest according to: BC[1|4] ~= R16G16B16A16|R32G32, BC[2|3|5] ~= R32G32B32A32, R9G9B9E5_SHAREDEXP ~= R32. [ RESOURCE_MANIPULATION ERROR #281: ] These has to be done through compute shaders instead. Image->Buffer & Buffer->Image copies also have to be done through compute shaders, as `CopySubresourceRegion` can only copy between resources of same type (Image<->Image, Buffer<->Buffer). The following formats have Buffer->Image and Image->Buffer copies implemented with these changes: * `R8` * `Rg8` * `R16` * `Rg16` * `R32` Gets about 400 tests passed and equal amount failed in `dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color` (mostly because `CopySubresourceRegion` failing to copy between some formats as mentioned above) Fixes #issue PR checklist: - [ ] `make` succeeds (on *nix) - [ ] `make reftests` succeeds - [ ] tested examples with the following backends: 2154: hal: Improve buffer documentation and cleanup error handling r=kvark a=msiglreith Fixes #issue PR checklist: - [ ] `make` succeeds (on *nix) - [ ] `make reftests` succeeds - [ ] tested examples with the following backends: Co-authored-by: Hal Gentz <[email protected]> Co-authored-by: Felix Kaaman <[email protected]> Co-authored-by: msiglreith <[email protected]>
Main changes are adding more robust implementation of
Memory
, adding flush/invalidate and adding image/buffer copies. ImplementsMemory
like the following (forHOST_VISIBLE
memory):Implements copying between images and buffers. Image<->Image copies are mostly handled by
CopySubresourceRegion
but some formats, while same size, cant be copied with this method:These has to be done through compute shaders instead. Image->Buffer & Buffer->Image copies also have to be done through compute shaders, as
CopySubresourceRegion
can only copy between resources of same type (Image<->Image, Buffer<->Buffer). The following formats have Buffer->Image and Image->Buffer copies implemented with these changes:R8
Rg8
R16
Rg16
R32
Gets about 400 tests passed and equal amount failed in
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color
(mostly becauseCopySubresourceRegion
failing to copy between some formats as mentioned above)Fixes #issue
PR checklist:
make
succeeds (on *nix)make reftests
succeeds