Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mtl] Borrowed commands #2164

Merged
merged 1 commit into from
Jun 23, 2018
Merged

[mtl] Borrowed commands #2164

merged 1 commit into from
Jun 23, 2018

Conversation

kvark
Copy link
Member

@kvark kvark commented Jun 21, 2018

PR checklist:

  • make succeeds (on *nix)
  • make reftests succeeds
  • tested examples with the following backends:

r? @gfx-rs/metallists

This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).

My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.

@kvark kvark changed the title [wip][mtl] Borrowed commands [mtl] Borrowed commands Jun 21, 2018
@kvark
Copy link
Member Author

kvark commented Jun 21, 2018

It might be the power management playing tricks with me. When plugged in (as opposed to 50% battery), I see stable 60-70fps. Just in case, I double-checked the version before my PR, and it doesn't appear faster. It's roughly the same (which is also kinda concerning), maybe a few frames per second less. With this in mind, removing the WIP tag, open for reviews.

@kvark kvark mentioned this pull request Jun 21, 2018
3 tasks
if aspects.intersects(Aspects::DEPTH | Aspects::STENCIL) {
commands.extend(com.map(soft::RenderCommand::SetDepthStencilDesc));
}
let render_resources = Some(&self.resources_vs).into_iter().chain(Some(&self.resources_fs));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just iter::once instead of the Option here?

let com_buffers = RENDER_STAGES
.iter()
.zip(render_resources.clone())
.flat_map(|(&stage, resources)| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess there is some extra overhead caused by repeating these identical RENDER_STAGES.iter().zip().flat_map(). Could we chain inside instead?

width: vp.rect.w as _,
height: vp.rect.h as _,
znear: vp.depth.start as _,
zfar: if disabilities.broken_viewport_near_depth {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this has anything to do with the z-fighting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can check it locally, I suppose, since this flag is only enabled on Intel, and you got AMD IIRC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on Intel HD Graphics 6000 1536 MB

);
}

fn set_stencil_mask_values<'a>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub diff falls apart for a lot of these functions – let me know if there are any significant changes here, for now I'll assume the code has just been relocated

match stage {
pso::Stage::Vertex =>
encoder.set_vertex_bytes(index as _, bytes.len() as _, bytes.as_ptr() as _),
encoder.set_vertex_bytes(index as _, words.len() as u64 * 4, words.as_ptr() as _),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 4 -> WORD_SIZE or similar? We use this a few times

offset: hal::buffer::Offset,
},
BindBufferData {
stage: hal::pso::Stage,
index: usize,
bytes: Vec<u8>,
words: R::Data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't think 4-byte alignment is necessary in most places for iOS and tvOS, so we may want to reconsider eventually, or set a WORD_SIZE based on the OS

mem::size_of::<u32>()
).to_owned()
&(r.size as u32) as *const u32,
1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mem::size_of::<u32>() / WORD_SIZE if we ever change this (as mentioned in other comment)

.borrow_mut()
.sink()
.pre_render_commands(commands);
if stages.contains(pso::ShaderStageFlags::VERTEX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old approach avoids two calls to pre_render_commands here – not sure how the Option compares to this trade-off

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to go through quite a few hoops with the borrow checker...

}


//#[derive(Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment and extra newline

Texture(&'a metal::TextureRef),
Frame {
swapchain: RwLockReadGuard<'a, SwapchainInner>,
index: hal::FrameImage,
index: hal::FrameImage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comma removed

@kvark
Copy link
Member Author

kvark commented Jun 22, 2018

Thank you for a detailed review! ❤️
Concerns are addressed, let's proceed.
bors r=grovesNL

bors bot added a commit that referenced this pull request Jun 22, 2018
2164: [mtl] Borrowed commands r=grovesNL a=kvark

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends:

r? @gfx-rs/metallists 

This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).

My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.

Co-authored-by: Dzmitry Malyshau <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jun 22, 2018

Build failed

Copy link
Contributor

@grovesNL grovesNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure is unrelated:

info: downloading component 'rustc'
error: failed to parse url: 1.28.0-nightly (662c70a59 2018-06-21)/2018-06-22/rustc-nightly-x86_64-unknown-linux-gnu.tar.xz
info: caused by: relative URL without a base
The command "curl -sSf https://sh.rustup.rs | sh -s -- --default-toolchain=$TRAVIS_RUST_VERSION -y" failed and exited with 1 during .

@kvark
Copy link
Member Author

kvark commented Jun 22, 2018 via email

bors bot added a commit that referenced this pull request Jun 22, 2018
2164: [mtl] Borrowed commands r=grovesNL a=kvark

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends:

r? @gfx-rs/metallists 

This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).

My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.

Co-authored-by: Dzmitry Malyshau <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jun 22, 2018

Build failed

@kvark
Copy link
Member Author

kvark commented Jun 22, 2018

bors retry

@kvark
Copy link
Member Author

kvark commented Jun 22, 2018

bors r=grovesNL

@bors
Copy link
Contributor

bors bot commented Jun 22, 2018

Not awaiting review

bors bot added a commit that referenced this pull request Jun 22, 2018
2071: Remaping descriptor sets in the gl backend r=kvark a=ZeGentzy

I'll rebase off master when I'm done.

Uniforms in gl only have a bindings field, not a set one. This means that for shaders that use multiple sets to work, we must change where we are binding them.

See page 14 for what I mean: https://www.khronos.org/assets/uploads/developers/library/2016-vulkan-devday-uk/4-Using-spir-v-with-spirv-cross.pdf

PR checklist:
- [ ] `make` succeeds (on *nix)
- [ ] `make reftests` succeeds
- [ ] tested examples with the following backends:


2164: [mtl] Borrowed commands r=grovesNL a=kvark

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends:

r? @gfx-rs/metallists 

This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).

My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.

2166: Update example instructions in README.md r=kvark a=king6cong



Co-authored-by: Hal Gentz <[email protected]>
Co-authored-by: Dzmitry Malyshau <[email protected]>
Co-authored-by: king6cong <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jun 22, 2018

Build failed (retrying...)

bors bot added a commit that referenced this pull request Jun 22, 2018
2164: [mtl] Borrowed commands r=grovesNL a=kvark

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends:

r? @gfx-rs/metallists 

This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).

My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.

2166: Update example instructions in README.md r=kvark a=king6cong



Co-authored-by: Dzmitry Malyshau <[email protected]>
Co-authored-by: king6cong <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jun 22, 2018

Build failed (retrying...)

bors bot added a commit that referenced this pull request Jun 22, 2018
2164: [mtl] Borrowed commands r=grovesNL a=kvark

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends:

r? @gfx-rs/metallists 

This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).

My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.

Co-authored-by: Dzmitry Malyshau <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jun 22, 2018

Build failed

@kvark
Copy link
Member Author

kvark commented Jun 23, 2018 via email

@kvark
Copy link
Member Author

kvark commented Jun 23, 2018 via email

@bors
Copy link
Contributor

bors bot commented Jun 23, 2018

Not awaiting review

bors bot added a commit that referenced this pull request Jun 23, 2018
2164: [mtl] Borrowed commands r=grovesNL a=kvark

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends:

r? @gfx-rs/metallists 

This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).

My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.

Co-authored-by: Dzmitry Malyshau <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jun 23, 2018

Build failed

@kvark
Copy link
Member Author

kvark commented Jun 23, 2018

bors retry

@kvark
Copy link
Member Author

kvark commented Jun 23, 2018

bors r=grovesNL

@bors
Copy link
Contributor

bors bot commented Jun 23, 2018

Not awaiting review

bors bot added a commit that referenced this pull request Jun 23, 2018
2164: [mtl] Borrowed commands r=grovesNL a=kvark

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends:

r? @gfx-rs/metallists 

This PR attempts to have lightweight software commands that don't take any heap space or own ObjC objects. In most cases, where a command list is live-recorded and executed once, this should reduce the amount of work we do per command, which is especially important if those commands are thrown away (e.g. because we are not inside a render pass).

My expectation would be to see an improvement in #2161 due to us doing less work. The actual results are somewhat shocking: with v-sync enabled I'm getting the same 59-60 fps as usual. With v-sync OFF, I'm getting between 25 and 50 fps now (which is lower than the previous 50-70). Not sure what's going on, the instrumental profile doesn't give a clue. Please check out the code.

Co-authored-by: Dzmitry Malyshau <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jun 23, 2018

@bors bors bot merged commit 90fd193 into gfx-rs:master Jun 23, 2018
@kvark kvark deleted the mtl-copy branch June 23, 2018 03:31
bors bot added a commit that referenced this pull request Jun 23, 2018
2169: Persistent Metal swapchain r=grovesNL a=kvark

Includes #2164
Fixes #2168
Fixes #2143
Removes deferred image resolution (yay!), and also renames some of the outdated concepts.
We have no flickering, and proper frame sync now.

PR checklist:
- [x] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends:


Co-authored-by: Dzmitry Malyshau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants