Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted helpers used for texture checking #1068

Merged
merged 5 commits into from
Mar 17, 2022

Conversation

kainino0x
Copy link
Collaborator

@kainino0x kainino0x commented Mar 16, 2022

These are some assorted helpers that are used in the texture checking helpers in #1055 (WIP).
The commits are separate changes and may be reviewed separately if desired.

Issue: #881


Requirements for PR author:

  • All missing test coverage is tracked with "TODO" or .unimplemented().
  • New helpers are /** documented */ and new helper files are found in helper_index.txt.
  • Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)

Requirements for reviewer sign-off:

  • Tests are properly located in the test tree.
  • Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
  • Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
  • Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

@github-actions
Copy link

Previews, as seen when this build job started (ca02348):
Run tests | View tsdoc

@kainino0x
Copy link
Collaborator Author

@shaoboyan PTAL at this first round of helpers.

The commits are separate changes and may be reviewed separately if desired.

I'm going to add some tests for floatBitsToNumber since that's difficult to be sure is correct.

@github-actions
Copy link

Previews, as seen when this build job started (a3418d9):
Run tests | View tsdoc

@github-actions
Copy link

Previews, as seen when this build job started (75e6df4):
Run tests | View tsdoc

* Subnormal values are flushed to 0.
* Positive and negative 0 are both considered to be 0 ULPs from 0.
*/
export function floatBitsToNormalULPFromZero(bits: number, fmt: FloatFormat): number {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For copy_to_texture 16-bit float(and 32-bit float) result comparasion. With this helper function, it seems that we could check the result by:

// Assume expect is larger than actual
floatBitsToNormalULPFromZero(Uint16(expected) - Uint16(actual), kFloat16Format) < constant?

Am I right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the direction, I'm a bit worry about the case running time.
As you may know, current compare logic is simple and hack
But it still took longer time because it requires a buffer view reinterpretation.
If we add an extra ops floatBitsToNormalULPFromZero, I think it took longer time.

So maybe a bit hack but do you think it is possible that we took everything as Uint8 as input and do the bit ops? It will save the buffer view reinterpretation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And another option is to save time on the other place rather than the float compare.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is definitely a potential issue and I haven't investigated it enough yet, thanks for highlighting it.
It's even worse than your example code, because it's more like floatBitsToNormalULPFromZero(expected) - floatBitsToNormalULPFromZero(actual).

We have a diffULP helper already for directly determining the ULPs between two values without computing them relative to zero. I'll investigate the performance and see what can be done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a point of comparison,
webgpu:web_platform,copyToTexture,ImageBitmap:from_ImageData:alpha="none";orientation="none";srcDoFlipYDuringCopy=true;dstColorFormat="rgba16float";*
before: 2500ms each
after: 4300ms each

Not quite as bad as I expected, but could probably be better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at 7481681, I'm guessing it was float16BitsToFloat32/float16BitsToFloat32. Which is probably a little more expensive than floatBitsToNormalULPFromZero though I wouldn't expect it to be that much worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at 7481681, I'm guessing it was float16BitsToFloat32/float16BitsToFloat32. Which is probably a little more expensive than floatBitsToNormalULPFromZero though I wouldn't expect it to be that much worse.

Yes, removing these two helper functions help accelerated the tests a lot but it is still worse than the Uint8 comparation a lot (on my machine) but the same performance as float32 comparation. So I suspect this is due to the reinterpretation (But I think it shouldn't took long time).

webgpu:web_platform,copyToTexture,ImageBitmap:from_ImageData:alpha="none";orientation="none";srcDoFlipYDuringCopy=true;dstColorFormat="rgba16float";*
before: 2500ms each
after: 4300ms each

Thanks for testing! I understand that 4300ms is the time that applying diffULP, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug down into the performance of the ImageBitmap:from_ImageData test and found that it was a simple matter of implementing this optimization I had left for myself (in #1055):

    // MAINTENANCE_TODO: Could be faster to actually implement numberToBits directly.
    numberToBits: (components: PerTexelComponent<number>) =>
      ret.unpackBits(new Uint8Array(ret.pack(encode(components)))),

before: 2500ms
draft: 4300ms
after: 2030ms!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Copy link
Contributor

@shaoboyan shaoboyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed float helpers and getTextureSubCopyLayout implementations. Looks great!

@github-actions
Copy link

Previews, as seen when this build job started (2c7eea0):
Run tests | View tsdoc

@kainino0x
Copy link
Collaborator Author

A few more perf numbers:
Optimizing floatBitsToNumber brought one testcase's total runtime from 4250ms to 3500ms. This was before I did other optimizations. The performance was extremely close to the original float16BitsToFloat32 even though that was specialized to float16, so I didn't add back the specialization.

@github-actions
Copy link

Previews, as seen when this build job started (3382eae):
Run tests | View tsdoc

@shaoboyan shaoboyan self-requested a review March 17, 2022 01:32
const workingDataU32 = new Uint32Array(workingData);
const workingDataF32 = new Float32Array(workingData);
export function float32BitsToNumber(bits: number): number {
workingDataU32[0] = bits;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the take away here is don't create temporary TypedArrayBuffer for reinterpretation.

Copy link
Collaborator Author

@kainino0x kainino0x Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured the performance of this briefly while testing a bunch of other things. It didn't have a very large effect, but it was enough that it seemed worth using.

However I tried measuring it against just now (by just moving workingData* inside these functions) and I wasn't able to measure a difference... ~2020ms either way. Maybe it got optimized better somehow when written this way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the test case I was using is no longer bottlenecked on this function. I tested a different test case (rgba32float) which is, and the results are good.
webgpu:web_platform,copyToTexture,ImageBitmap:from_ImageData:alpha="premultiply";orientation="flipY";srcDoFlipYDuringCopy=false;dstColorFormat="rgba32float";dstPremultiplied=true

preallocated (this PR): 1640ms
late allocated (same but workingData moved inside the function): 2130ms
array-initialized (new Float32Array(new Uint32Array([bits]).buffer)[0]): 2260ms

incidentally I realized one of these functions is implemented wrong, so fixing that.

Copy link
Contributor

@shaoboyan shaoboyan Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it seems that the take away is still correct! Thanks for resolving this performance issue!

Copy link
Contributor

@shaoboyan shaoboyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the iteration!

@kainino0x kainino0x enabled auto-merge (rebase) March 17, 2022 02:18
@kainino0x kainino0x merged commit 14b988a into gpuweb:main Mar 17, 2022
@kainino0x kainino0x deleted the pre-texture-checking branch March 17, 2022 02:19
@austinEng
Copy link
Collaborator

@kainino0x after this PR, webgpu:api,operation,command_buffer,image_copy:mip_levels: tests are hitting an assert

1 | + This is a testharness.js-based test.
2 | + FAIL ;dimension="2d" assert_unreached:
3 | +  - INFO: subcase: copySizeInBlocks={"width":5,"height":4,"depthOrArrayLayers":1};originInBlocks={"x":3,"y":2,"z":0};mipLevel=1;textureSize=[64,48,1]
4 | +  - INFO: subcase: copySizeInBlocks={"width":5,"height":4,"depthOrArrayLayers":1};originInBlocks={"x":3,"y":2,"z":0};mipLevel=1;textureSize=[60,48,1]
5 | +  - EXCEPTION: copySize must be a multiple of the block size
6 | +  at assert (https://web-platform.test:8444/gen/third_party/webgpu-cts/src/common/util/util.js:21:15)
7 | +  at getTextureSubCopyLayout (https://web-platform.test:8444/gen/third_party/webgpu-cts/src/webgpu/util/texture/layout.js:32:5)
8 | +  at getTextureCopyLayout (https://web-platform.test:8444/gen/third_party/webgpu-cts/src/webgpu/util/texture/layout.js:19:20)
9 | +  at ImageCopyTest.uploadTextureAndVerifyCopy (https://web-platform.test:8444/gen/third_party/webgpu-cts/src/webgpu/api/operation/command_buffer/image_copy.spec.js:298:47)
10 | +  at RunCaseSpecific.fn (https://web-platform.test:8444/gen/third_party/webgpu-cts/src/webgpu/api/operation/command_buffer/image_copy.spec.js:1095:7)

@kainino0x
Copy link
Collaborator Author

shoot, guess I should have dry-run these changes as well. Thanks for reporting.

kainino0x added a commit to kainino0x/cts that referenced this pull request Mar 17, 2022
@kainino0x kainino0x mentioned this pull request Mar 17, 2022
7 tasks
@kainino0x
Copy link
Collaborator Author

Fix for at least that bug in #1077

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants