-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assorted helpers used for texture checking #1068
Changes from all commits
09e8277
db24bb2
7492ae5
0adaa5d
0e8bf3d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -101,11 +101,76 @@ export function float32ToFloat16Bits(n: number) { | |
* Decodes an IEEE754 16 bit floating point number into a JS `number` and returns. | ||
*/ | ||
export function float16BitsToFloat32(float16Bits: number): number { | ||
const buf = new DataView(new ArrayBuffer(Float32Array.BYTES_PER_ELEMENT)); | ||
// shift exponent and mantissa bits and fill with 0 on right, shift sign bit | ||
buf.setUint32(0, ((float16Bits & 0x7fff) << 13) | ((float16Bits & 0x8000) << 16), true); | ||
// shifting for bias different: f16 uses a bias of 15, f32 uses a bias of 127 | ||
return buf.getFloat32(0, true) * 2 ** (127 - 15); | ||
return floatBitsToNumber(float16Bits, kFloat16Format); | ||
} | ||
|
||
type FloatFormat = { signed: 0 | 1; exponentBits: number; mantissaBits: number; bias: number }; | ||
|
||
/** FloatFormat defining IEEE754 32-bit float. */ | ||
export const kFloat32Format = { signed: 1, exponentBits: 8, mantissaBits: 23, bias: 127 } as const; | ||
/** FloatFormat defining IEEE754 16-bit float. */ | ||
export const kFloat16Format = { signed: 1, exponentBits: 5, mantissaBits: 10, bias: 15 } as const; | ||
|
||
const workingData = new ArrayBuffer(4); | ||
const workingDataU32 = new Uint32Array(workingData); | ||
const workingDataF32 = new Float32Array(workingData); | ||
/** Bitcast u32 (represented as integer Number) to f32 (represented as floating-point Number). */ | ||
export function float32BitsToNumber(bits: number): number { | ||
workingDataU32[0] = bits; | ||
return workingDataF32[0]; | ||
} | ||
/** Bitcast f32 (represented as floating-point Number) to u32 (represented as integer Number). */ | ||
export function numberToFloat32Bits(number: number): number { | ||
workingDataF32[0] = number; | ||
return workingDataU32[0]; | ||
} | ||
|
||
/** | ||
* Decodes an IEEE754 float with the supplied format specification into a JS number. | ||
* | ||
* The format MUST be no larger than a 32-bit float. | ||
*/ | ||
export function floatBitsToNumber(bits: number, fmt: FloatFormat): number { | ||
// Pad the provided bits out to f32, then convert to a `number` with the wrong bias. | ||
// E.g. for f16 to f32: | ||
// - f16: S EEEEE MMMMMMMMMM | ||
// ^ 000^^^^^ ^^^^^^^^^^0000000000000 | ||
// - f32: S eeeEEEEE MMMMMMMMMMmmmmmmmmmmmmm | ||
|
||
const kNonSignBits = fmt.exponentBits + fmt.mantissaBits; | ||
const kNonSignBitsMask = (1 << kNonSignBits) - 1; | ||
const expAndMantBits = bits & kNonSignBitsMask; | ||
let f32BitsWithWrongBias = expAndMantBits << (kFloat32Format.mantissaBits - fmt.mantissaBits); | ||
f32BitsWithWrongBias |= (bits << (31 - kNonSignBits)) & 0x8000_0000; | ||
const numberWithWrongBias = float32BitsToNumber(f32BitsWithWrongBias); | ||
return numberWithWrongBias * 2 ** (kFloat32Format.bias - fmt.bias); | ||
} | ||
|
||
/** | ||
* Given a floating point number (as an integer representing its bits), computes how many ULPs it is | ||
* from zero. | ||
* | ||
* Subnormal numbers are skipped, so that 0 is one ULP from the minimum normal number. | ||
* Subnormal values are flushed to 0. | ||
* Positive and negative 0 are both considered to be 0 ULPs from 0. | ||
*/ | ||
export function floatBitsToNormalULPFromZero(bits: number, fmt: FloatFormat): number { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For copy_to_texture 16-bit float(and 32-bit float) result comparasion. With this helper function, it seems that we could check the result by:
Am I right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is the direction, I'm a bit worry about the case running time. So maybe a bit hack but do you think it is possible that we took everything as Uint8 as input and do the bit ops? It will save the buffer view reinterpretation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And another option is to save time on the other place rather than the float compare. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Performance is definitely a potential issue and I haven't investigated it enough yet, thanks for highlighting it. We have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a point of comparison, Not quite as bad as I expected, but could probably be better There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at 7481681, I'm guessing it was There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, removing these two helper functions help accelerated the tests a lot but it is still worse than the Uint8 comparation a lot (on my machine) but the same performance as float32 comparation. So I suspect this is due to the reinterpretation (But I think it shouldn't took long time).
Thanks for testing! I understand that 4300ms is the time that applying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dug down into the performance of the ImageBitmap:from_ImageData test and found that it was a simple matter of implementing this optimization I had left for myself (in #1055): // MAINTENANCE_TODO: Could be faster to actually implement numberToBits directly.
numberToBits: (components: PerTexelComponent<number>) =>
ret.unpackBits(new Uint8Array(ret.pack(encode(components)))), before: 2500ms There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. before: 1490ms There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool! |
||
const mask_sign = fmt.signed << (fmt.exponentBits + fmt.mantissaBits); | ||
const mask_expt = ((1 << fmt.exponentBits) - 1) << fmt.mantissaBits; | ||
const mask_mant = (1 << fmt.mantissaBits) - 1; | ||
const mask_rest = mask_expt | mask_mant; | ||
|
||
assert(fmt.exponentBits + fmt.mantissaBits <= 31); | ||
|
||
const sign = bits & mask_sign ? -1 : 1; | ||
const rest = bits & mask_rest; | ||
const subnormal_or_zero = (bits & mask_expt) === 0; | ||
const infinity_or_nan = (bits & mask_expt) === mask_expt; | ||
assert(!infinity_or_nan, 'no ulp representation for infinity/nan'); | ||
|
||
// The first normal number is mask_mant+1, so subtract mask_mant to make min_normal - zero = 1ULP. | ||
const abs_ulp_from_zero = subnormal_or_zero ? 0 : rest - mask_mant; | ||
return sign * abs_ulp_from_zero; | ||
} | ||
|
||
/** | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
import { range } from '../../common/util/util.js'; | ||
|
||
/** | ||
* Pretty-prints a "table" of cell values (each being `number | string`), right-aligned. | ||
* Each row may be any iterator, including lazily-generated (potentially infinite) rows. | ||
* | ||
* The first argument is the printing options: | ||
* - fillToWidth: Keep printing columns (as long as there is data) until this width is passed. | ||
* If there is more data, "..." is appended. | ||
* - numberToString: if a cell value is a number, this is used to stringify it. | ||
* | ||
* Each remaining argument provides one row for the table. | ||
*/ | ||
export function generatePrettyTable( | ||
{ fillToWidth, numberToString }: { fillToWidth: number; numberToString: (n: number) => string }, | ||
rows: ReadonlyArray<Iterable<string | number>> | ||
): string { | ||
const rowStrings = range(rows.length, () => ''); | ||
let totalTableWidth = 0; | ||
const iters = rows.map(row => row[Symbol.iterator]()); | ||
|
||
// Loop over columns | ||
for (;;) { | ||
const cellsForColumn = iters.map(iter => { | ||
const r = iter.next(); // Advance the iterator for each row, in lock-step. | ||
return r.done ? undefined : typeof r.value === 'number' ? numberToString(r.value) : r.value; | ||
}); | ||
if (cellsForColumn.every(cell => cell === undefined)) break; | ||
|
||
// Maximum width of any cell in this column, plus one for space between columns | ||
// (also inserts a space at the left of the first column). | ||
const colWidth = Math.max(...cellsForColumn.map(c => (c === undefined ? 0 : c.length))) + 1; | ||
for (let row = 0; row < rowStrings.length; ++row) { | ||
const cell = cellsForColumn[row]; | ||
if (cell !== undefined) { | ||
rowStrings[row] += cell.padStart(colWidth); | ||
} | ||
} | ||
|
||
totalTableWidth += colWidth; | ||
if (totalTableWidth >= fillToWidth) { | ||
for (let row = 0; row < rowStrings.length; ++row) { | ||
if (cellsForColumn[row] !== undefined) { | ||
rowStrings[row] += ' ...'; | ||
} | ||
} | ||
break; | ||
} | ||
} | ||
return rowStrings.join('\n'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the take away here is don't create temporary TypedArrayBuffer for reinterpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I measured the performance of this briefly while testing a bunch of other things. It didn't have a very large effect, but it was enough that it seemed worth using.However I tried measuring it against just now (by just movingworkingData*
inside these functions) and I wasn't able to measure a difference... ~2020ms either way. Maybe it got optimized better somehow when written this way?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the test case I was using is no longer bottlenecked on this function. I tested a different test case (rgba32float) which is, and the results are good.
webgpu:web_platform,copyToTexture,ImageBitmap:from_ImageData:alpha="premultiply";orientation="flipY";srcDoFlipYDuringCopy=false;dstColorFormat="rgba32float";dstPremultiplied=true
preallocated (this PR): 1640ms
late allocated (same but workingData moved inside the function): 2130ms
array-initialized (
new Float32Array(new Uint32Array([bits]).buffer)[0]
): 2260msincidentally I realized one of these functions is implemented wrong, so fixing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so it seems that the take away is still correct! Thanks for resolving this performance issue!