-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve perf of packed depthwise conv2d #4909
Conversation
…he accuracy issue, enabled the flag by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wowwwww ping i'm so impressed... and really glad to see this flag finally get turned on... 2 years later!!!
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @lina128 and @pyu10055)
tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 182 at r1 (raw file):
} else { if (nextTexelOffset === 1) { mainLoop += `
Could add a note about the logic behind this branching?
tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 282 at r1 (raw file):
mainLoop += ` vec4 wTexelR${r}C${c + 1} = getW(${r}, ${c + 1}, d1, q); dotProd += vec4(xR${r}C${c + 1}.xy * wTexelR${r}C${c + 1}.xz, xR${
I guess creating temporary vars for the xy
/zw
didn't resolve the windows issue?
tfjs-backend-webgl/src/flags_webgl.ts, line 66 at r1 (raw file):
/** Whether we will pack the depthwise conv op. */ // TODO: https://github.com/tensorflow/tfjs/issues/1679 ENV.registerFlag('WEBGL_PACK_DEPTHWISECONV', () => true);
🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan, @lina128, and @pyu10055)
tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 182 at r1 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
Could add a note about the logic behind this branching?
added
tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 282 at r1 (raw file):
Previously, annxingyuan (Ann Yuan) wrote…
I guess creating temporary vars for the
xy
/zw
didn't resolve the windows issue?
no, I tried many combinations, temp variable, even accumulate the 4 channels separately with 4 floats. I have not yet find a way to improve accuracy.
but there are some interesting observation, I have used a fixed weight, when the weight is power of 2, no accuracy loss is observed. I am not sure
if that is caused by the multiplication or the addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Ping, this is great!
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @annxingyuan and @pyu10055)
tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 66 at r1 (raw file):
*/ for (let r = 0; r < filterHeight; r++) { for (let texelC = 0; texelC < (texelsAcross / 2 + 1); texelC++) {
Will this condition be calculated in every iteration?
tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 182 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
added
I couldn't see the note.
tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 282 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
no, I tried many combinations, temp variable, even accumulate the 4 channels separately with 4 floats. I have not yet find a way to improve accuracy.
but there are some interesting observation, I have used a fixed weight, when the weight is power of 2, no accuracy loss is observed. I am not sure
if that is caused by the multiplication or the addition.
Can you add a note about why do the localized dot production, I'm guessing this may help other shaders that have similar logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @annxingyuan and @pyu10055)
tfjs-backend-webgl/src/conv_packed_gpu_depthwise.ts, line 360 at r2 (raw file):
//intialize dotProd with a small epsilon seems to reduce GPU accuracy loss. vec4 dotProd = vec4(0.000000000000001);
This is counterintuitive, accuracy is improved by imposing a small error. Curious why :)
This is achieved in two ways:
The accuracy issue is still happening, which on mac both pack and unpack shaders have the same accuracy issues comparing to CPU, but their result matches.
On windows, the unpack shader matches with CPU result, which pack version differs.
But the error rate seems to be compatible with the conv2d vs cpu error rate.
refer #1679
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is