-
Notifications
You must be signed in to change notification settings - Fork 42
Conversation
076c547
to
cb17503
Compare
@@ -20,16 +20,14 @@ class ImageClassificationModel { | |||
throw Error('Fails to initialize neural network context'); | |||
} | |||
this._nn = nnNative; | |||
} else if (this._backend === 'WASM' || this._backend === 'WebGL2') { | |||
} else if (this._backend === 'WASM' || this._backend === 'WebGL') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed all WebGL2 to WebGL in polyfill and examples.
if (this._backend === 'WebGL2') { | ||
options.useWebGL2 = true; | ||
} | ||
options.backend = this._backend; | ||
this._model = await this._nn.createModel(options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced options.useWebGL2
with options.backend
.
For polyfill, options.backend
should be the string 'WASM' or 'WebGL'.
package.json
Outdated
@@ -35,7 +35,8 @@ | |||
"ndarray-ops": "^1.2.2", | |||
"ndarray-squeeze": "^1.0.2", | |||
"selenium-webdriver": "^3.1.0", | |||
"webpack": "^3.5.5" | |||
"webpack": "^3.5.5", | |||
"@tensorflow/tfjs-core": "^0.13.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd webml-polyfill
npm install
I changed the backend name to 'WebGL', and revised some interfaces. |
@GreyZzzzzzXh yes, we will update test cases & benchmark test for testing WASM / WebGL polyfill backend, thanks |
@GreyZzzzzzXh Seems inception_v3 and squeezenet for image_classification_tflite have some issues with my machine:
inception_v3:
But the result was correct in @GreyZzzzzzXh and @pinzhenx 's machine with the same code. |
And the results have some difference with PC and mobile phone(One plus 3T, chrome) with WebGL backend. For example, with mobilenet_v1:
Mobile phone:
It seems like the same issue as what Wenyao write in his graduation paper:
|
@Wenzhao-Xiang, thanks for testing. There are still some issues on mobile devices.
May be like this, but if so we have to revise the source code of tfjs. It may take some time to figure out, so.. maybe we should still remain WebGL2 backend and name this new backend TFJS temporarily. |
Can we make it a reproducible test case for TensorFlow.js team? For example, put it to a public accessible link in your github pages. We can file a bug to TF.js. And I can help bridge TF.js folks to look into it. |
I host the modified code on my github pages. Test by CTS: https://greyzzzzzzxh.github.io/webml/test/index.html?backend=webgl&grep=CTS |
Thanks for doing that! Besides the examples link, could you please give the steps to reproduce the issue? For example, which example, which backend, which model? And please give the expected results. And what is the incorrect result when the issue happens? Also please give the test configuration, e.g. Chrome version (got from chrome://version/), GPU driver version (got from chrome://gpu/) etc.,. Please consult @BruceDai for this kind of bug reporting. Thanks! |
@huningxin , ok i'll do more test and provide detail info next week, thanks! |
The result of the mobile end is different from that of the PC side. Test Env: Expected Result:
Mobilenet v1:
Mobilenet v2:
Inception v3:
Squeezenet:
Actual Result: The output doesn't match the expected result. Mobilenet v1:
Mobilenet v2:
Inception v3:
Squeezenet:
How to Reproduce:
|
Results and reproduction steps are described above, please take a look @huningxin . |
@GreyZzzzzzXh , thanks! How about the expected results? And please list the devices which can deliver expected results. That would be also helpful. |
data in the first four tables is expected result.
PC platform, e.g.
|
Besides, visit https://greyzzzzzzxh.github.io/webml/test/index.html?backend=webgl&grep=CTS for case testing. computer side:
but many cases fail on mobile device:
|
@GreyZzzzzzXh , please complete the actual results in #297 (comment). Thanks! BTW, when testing on the device with incorrect results, are there any errors reported in console? |
done.
no errors or warnings reported in console. |
Besides, tested on tfjs-converter mobilenet demo, tensorflow.js still shows different precision on the computer side and mobile phone side. Test Env: Results: |
Thanks for these details! |
@GreyZzzzzzXh , could you please take a look at tensorflow/tfjs#265. It sounds like tfjs uses float16 on mobile. Is that the root cause? |
thanks! I will do some investigation about this. |
This precision issue can be fixed by upgrading GLSL version to 300 es. |
cb17503
to
8f087a2
Compare
8f087a2
to
0403bf2
Compare
6988770
to
bb84007
Compare
According to GreyZzzzzzXh/tfjs-core@39a56bf, could you please explain the root cause and your method of "upgrade GLSL to version 300 es"? I found you check-in tfjs-core, I would suggest to avoid that. Let's figure out what need to be fixed in tfjs-core. Then propose a fix to tfjs-core repo. |
In general, webgl1.0 and 2.0 uses different versions of the shading language (GLGL100 for webgl1.0 and GLSL300es for webgl2.0). But in tfjs, only GLSL100 is used as the shading language for webgl1.0 and 2.0. The inaccuracy on the phone seems to be because the version of GLSL also has an impact on accuracy. After I changed the GLSL100 to 300es, the float can achieve 32-bit accuracy on the phone, and originally only 16 bits (tested with
I added some comments about how to upgrade GLSL version in GreyZzzzzzXh/tfjs-core@39a56bf.
Besides, i set |
Yeah, the best way is to solve this problem by tfjs team. But if it takes a while to fix, I suggest to import the modified tfjs for temporary use. |
I found something that might help: |
Please go ahead to file a bug and open a PR to tfjs-core with your solution.
Please don't import source code. If we want to maintain a version of tfjs-core, you can publish your version in npm and npm install from there.
Probably, we can report this out. Then test cases can handle the lower precision backend differently. @BruceDai |
I published the modified tfjs-core in npm. Now we can install it by Next I will improve this fix and open a PR to tfjs-core.. |
src/nn/webgl/WebGLModel.js
Outdated
prepareModel() { | ||
this._model._operands.forEach(operand => { | ||
if (utils.isTensor(operand.type)) { | ||
let type = this._getOperandType(operand.type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
?
src/nn/webgl/WebGLModel.js
Outdated
output.buffer.set(operand.dataSync()); | ||
}); | ||
|
||
// console.log(tf.memory()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment?
src/nn/webgl/WebGLModel.js
Outdated
} | ||
|
||
inputs.forEach(input => { | ||
let operand = this._operands[input.index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const?
src/nn/webgl/WebGLModel.js
Outdated
|
||
inputs.forEach(input => { | ||
let operand = this._operands[input.index]; | ||
let inputTensor = tf.tensor(input.buffer, operand.shape, operand.dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const?
src/nn/webgl/WebGLModel.js
Outdated
|
||
switch(op) { | ||
case OperationCode.ADD: { | ||
let in1 = operands[inputs[0]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const?
src/nn/webgl/WebGLModel.js
Outdated
let input = operands[inputs[0]]; | ||
let targetShape = operands[inputs[1]]; | ||
let output = operands[outputs[0]]; | ||
output.assign(input.reshape(targetShape.dataSync())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do dataSync
? I understand it leads to memory read back from GPU to CPU which is bad for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the definition of reshape in NN API, targetShape is a 1-D tensor whose value is stored in GPU. But for tf.reshape, target shape should be an array of integers..
src/nn/webgl/WebGLModel.js
Outdated
output.assign(input.reshape(targetShape.dataSync())); | ||
} break; | ||
case OperationCode.CONCATENATION: { | ||
if (outputs.length < 1 || inputs.length < 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify the condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since tfjs will give detailed reasons for errors, can we remove this check?
src/nn/webgl/WebGLModel.js
Outdated
let bias = operands[inputs[2]]; | ||
let activation = FuseFunctionMap.get(operands[inputs[3]].value[0]); | ||
let output = operands[outputs[0]]; | ||
let batchSize = input.shape[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batchSize
should be product (input.shape) / weights.shape[1]
, according to android nn API spec: https://developer.android.com/ndk/reference/group/neural-networks#group___neural_networks_1ggaabbe492c60331b13038e39d4207940e0aaada7a3dbaf4676aba560c933ff610c5
@GreyZzzzzzXh , finish my review with some comments. Please take a look. Thanks! |
I made some changes, PTAL, thanks! @huningxin |
The Travis CI fails due to "chrome installation error". @ibelem , could you or someone please have a check? |
@huningxin Please feel free to merge this PR since there is the issue of Travis CI and passed with AppVeyor. We are reported Travis CI issue to upstream and also trying other workrounds. |
Thanks for the great work. Looks good to me! |
No description provided.