-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assorted helpers used for texture checking #1068
Conversation
Previews, as seen when this build job started (ca02348): |
@shaoboyan PTAL at this first round of helpers.
I'm going to add some tests for floatBitsToNumber since that's difficult to be sure is correct. |
ca02348
to
a3418d9
Compare
Previews, as seen when this build job started (a3418d9): |
Previews, as seen when this build job started (75e6df4): |
* Subnormal values are flushed to 0. | ||
* Positive and negative 0 are both considered to be 0 ULPs from 0. | ||
*/ | ||
export function floatBitsToNormalULPFromZero(bits: number, fmt: FloatFormat): number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For copy_to_texture 16-bit float(and 32-bit float) result comparasion. With this helper function, it seems that we could check the result by:
// Assume expect is larger than actual
floatBitsToNormalULPFromZero(Uint16(expected) - Uint16(actual), kFloat16Format) < constant?
Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the direction, I'm a bit worry about the case running time.
As you may know, current compare logic is simple and hack
But it still took longer time because it requires a buffer view reinterpretation.
If we add an extra ops floatBitsToNormalULPFromZero
, I think it took longer time.
So maybe a bit hack but do you think it is possible that we took everything as Uint8 as input and do the bit ops? It will save the buffer view reinterpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another option is to save time on the other place rather than the float compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance is definitely a potential issue and I haven't investigated it enough yet, thanks for highlighting it.
It's even worse than your example code, because it's more like floatBitsToNormalULPFromZero(expected) - floatBitsToNormalULPFromZero(actual)
.
We have a diffULP
helper already for directly determining the ULPs between two values without computing them relative to zero. I'll investigate the performance and see what can be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a point of comparison,
webgpu:web_platform,copyToTexture,ImageBitmap:from_ImageData:alpha="none";orientation="none";srcDoFlipYDuringCopy=true;dstColorFormat="rgba16float";*
before: 2500ms each
after: 4300ms each
Not quite as bad as I expected, but could probably be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at 7481681, I'm guessing it was float16BitsToFloat32
/float16BitsToFloat32
. Which is probably a little more expensive than floatBitsToNormalULPFromZero
though I wouldn't expect it to be that much worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at 7481681, I'm guessing it was float16BitsToFloat32/float16BitsToFloat32. Which is probably a little more expensive than floatBitsToNormalULPFromZero though I wouldn't expect it to be that much worse.
Yes, removing these two helper functions help accelerated the tests a lot but it is still worse than the Uint8 comparation a lot (on my machine) but the same performance as float32 comparation. So I suspect this is due to the reinterpretation (But I think it shouldn't took long time).
webgpu:web_platform,copyToTexture,ImageBitmap:from_ImageData:alpha="none";orientation="none";srcDoFlipYDuringCopy=true;dstColorFormat="rgba16float";*
before: 2500ms each
after: 4300ms each
Thanks for testing! I understand that 4300ms is the time that applying diffULP
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug down into the performance of the ImageBitmap:from_ImageData test and found that it was a simple matter of implementing this optimization I had left for myself (in #1055):
// MAINTENANCE_TODO: Could be faster to actually implement numberToBits directly.
numberToBits: (components: PerTexelComponent<number>) =>
ret.unpackBits(new Uint8Array(ret.pack(encode(components)))),
before: 2500ms
draft: 4300ms
after: 2030ms!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before: 1490ms
after: 1410ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed float helpers and getTextureSubCopyLayout
implementations. Looks great!
Previews, as seen when this build job started (2c7eea0): |
a4403b9
to
3382eae
Compare
A few more perf numbers: |
Previews, as seen when this build job started (3382eae): |
const workingDataU32 = new Uint32Array(workingData); | ||
const workingDataF32 = new Float32Array(workingData); | ||
export function float32BitsToNumber(bits: number): number { | ||
workingDataU32[0] = bits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the take away here is don't create temporary TypedArrayBuffer for reinterpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I measured the performance of this briefly while testing a bunch of other things. It didn't have a very large effect, but it was enough that it seemed worth using.
However I tried measuring it against just now (by just moving workingData*
inside these functions) and I wasn't able to measure a difference... ~2020ms either way. Maybe it got optimized better somehow when written this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the test case I was using is no longer bottlenecked on this function. I tested a different test case (rgba32float) which is, and the results are good.
webgpu:web_platform,copyToTexture,ImageBitmap:from_ImageData:alpha="premultiply";orientation="flipY";srcDoFlipYDuringCopy=false;dstColorFormat="rgba32float";dstPremultiplied=true
preallocated (this PR): 1640ms
late allocated (same but workingData moved inside the function): 2130ms
array-initialized (new Float32Array(new Uint32Array([bits]).buffer)[0]
): 2260ms
incidentally I realized one of these functions is implemented wrong, so fixing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so it seems that the take away is still correct! Thanks for resolving this performance issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the iteration!
3382eae
to
2d74be6
Compare
2d74be6
to
0e8bf3d
Compare
@kainino0x after this PR, webgpu:api,operation,command_buffer,image_copy:mip_levels: tests are hitting an 1 | + This is a testharness.js-based test. |
shoot, guess I should have dry-run these changes as well. Thanks for reporting. |
Fix for at least that bug in #1077 |
These are some assorted helpers that are used in the texture checking helpers in #1055 (WIP).
The commits are separate changes and may be reviewed separately if desired.
Issue: #881
Requirements for PR author:
.unimplemented()
./** documented */
and new helper files are found inhelper_index.txt
.Requirements for reviewer sign-off:
When landing this PR, be sure to make any necessary issue status updates.