support new webgl backend. #297

GreyZzzzzzXh · 2018-11-06T09:56:31Z

No description provided.

GreyZzzzzzXh · 2018-11-06T11:44:33Z

examples/image_classification_tflite/ImageClassificationModel.js

@@ -20,16 +20,14 @@ class ImageClassificationModel {
        throw Error('Fails to initialize neural network context');
      }
      this._nn = nnNative;
-    } else if (this._backend === 'WASM' || this._backend === 'WebGL2') {
+    } else if (this._backend === 'WASM' || this._backend === 'WebGL') {


I renamed all WebGL2 to WebGL in polyfill and examples.

GreyZzzzzzXh · 2018-11-06T12:07:15Z

examples/image_classification_tflite/ImageClassificationModel.js

-    if (this._backend === 'WebGL2') {
-      options.useWebGL2 = true;
-    }
+    options.backend = this._backend;
    this._model = await this._nn.createModel(options);


Replaced options.useWebGL2 with options.backend.
For polyfill, options.backend should be the string 'WASM' or 'WebGL'.

GreyZzzzzzXh · 2018-11-06T12:24:10Z

package.json

@@ -35,7 +35,8 @@
    "ndarray-ops": "^1.2.2",
    "ndarray-squeeze": "^1.0.2",
    "selenium-webdriver": "^3.1.0",
-    "webpack": "^3.5.5"
+    "webpack": "^3.5.5",
+    "@tensorflow/tfjs-core": "^0.13.6"


cd webml-polyfill npm install

GreyZzzzzzXh · 2018-11-07T02:08:34Z

I changed the backend name to 'WebGL', and revised some interfaces.
@huningxin , @Wenzhao-Xiang, please help review, thanks!
These changes may effect test and benchmark, please take a look, @BruceDai @ibelem .

BruceDai · 2018-11-07T02:45:34Z

@GreyZzzzzzXh yes, we will update test cases & benchmark test for testing WASM / WebGL polyfill backend, thanks

Wenzhao-Xiang · 2018-11-07T03:34:08Z

@GreyZzzzzzXh Seems inception_v3 and squeezenet for image_classification_tflite have some issues with my machine:
squeezenet:

Label	Probability
admiral	30.07%
refrigerator	24.34%
wine bottle	10.36%

inception_v3:

Label	Probability
fountain	100.00%
toilet tissue	0.00%
bolete	0.00%

But the result was correct in @GreyZzzzzzXh and @pinzhenx 's machine with the same code.
Does tensorflow.js have some requests for machine?

Wenzhao-Xiang · 2018-11-07T07:00:29Z

And the results have some difference with PC and mobile phone(One plus 3T, chrome) with WebGL backend. For example, with mobilenet_v1:
PC:

Label	Probability
bee eater	95.59%
partridge	1.73%
indigo bunting	1.71%

Mobile phone:

Label	Probability
bee eater	95.41%
partridge	1.69%
indigo bunting	1.67%

It seems like the same issue as what Wenyao write in his graduation paper:

On mobile devices, WebGL uses 16-bit signed integers to store data by default, while on the computer,
WebGL uses 32-bit signed integers to store data by default.

GreyZzzzzzXh · 2018-11-07T18:20:18Z

@Wenzhao-Xiang, thanks for testing. There are still some issues on mobile devices.

On mobile devices, WebGL uses 16-bit signed integers to store data by default, while on the computer,
WebGL uses 32-bit signed integers to store data by default.

May be like this, but if so we have to revise the source code of tfjs.

It may take some time to figure out, so.. maybe we should still remain WebGL2 backend and name this new backend TFJS temporarily.

huningxin · 2018-11-08T02:41:05Z

It may take some time to figure out, so.. maybe we should still remain WebGL2 backend and name this new backend TFJS temporarily.

Can we make it a reproducible test case for TensorFlow.js team? For example, put it to a public accessible link in your github pages. We can file a bug to TF.js. And I can help bridge TF.js folks to look into it.

GreyZzzzzzXh · 2018-11-08T15:51:53Z

I host the modified code on my github pages.
Visit https://greyzzzzzzxh.github.io/webml to see the results.

Test by CTS: https://greyzzzzzzxh.github.io/webml/test/index.html?backend=webgl&grep=CTS

huningxin · 2018-11-09T03:13:33Z

I host the modified code on my github pages.
Visit https://greyzzzzzzxh.github.io/webml-examples to see the results.

Thanks for doing that!

Besides the examples link, could you please give the steps to reproduce the issue? For example, which example, which backend, which model? And please give the expected results. And what is the incorrect result when the issue happens? Also please give the test configuration, e.g. Chrome version (got from chrome://version/), GPU driver version (got from chrome://gpu/) etc.,. Please consult @BruceDai for this kind of bug reporting. Thanks!

GreyZzzzzzXh · 2018-11-09T03:49:12Z

@huningxin , ok i'll do more test and provide detail info next week, thanks!

GreyZzzzzzXh · 2018-11-14T02:40:29Z

The result of the mobile end is different from that of the PC side.

Test Env:
Chrome Version: 70.0.3538.80 (official build) (32-bit)
Platform: Android 8.1.0, Pixel 2
GPU driver version: 258.0
tfjs-core version: 0.13.10

Expected Result:
tested on PC platform, e.g.

Linux ubuntu 16.04, Chrome 69.0.3497.100 (Official Build) (64-bit)
Windows 10, Chrome 70.0.3538.102 (Official Build) (32-bit)

Mobilenet v1:

#	Label	Probability
1	bee eater	95.59%
2	jacamar	1.73%
3	brambling	1.71%

Mobilenet v2:

#	Label	Probability
1	bee eater	84.14%
2	indigo bunting	1.07%
3	brambling	0.76%

Inception v3:

#	Label	Probability
1	bee eater	96.31%
2	partridge	0.11%
3	indigo bunting	0.04%

Squeezenet:

#	Label	Probability
1	bee eater	96.71%
2	goldfinch	1.77%
3	ladybug	0.45%

Actual Result:

The output doesn't match the expected result.

Mobilenet v1:

#	Label	Probability
1	bee eater	95.41%
2	brambling	1.69%
3	jacamar	1.67%

Mobilenet v2:

#	Label	Probability
1	bee eater	84.86%
2	indigo bunting	0.94%
3	brambling	0.83%

Inception v3:

#	Label	Probability
1	bee eater	99.22%
2	partridge	0.11%
3	indigo bunting	0.04%

Squeezenet:

#	Label	Probability
1	bee eater	96.88%
2	goldfinch	1.80%
3	ladybug	0.39%

How to Reproduce:

Launch chrome and visit https://greyzzzzzzxh.github.io/webml
Choose Classification Image Demo
Change backend to WebGL
Change different models

GreyZzzzzzXh · 2018-11-14T02:42:41Z

Results and reproduction steps are described above, please take a look @huningxin .

huningxin · 2018-11-14T02:51:40Z

@GreyZzzzzzXh , thanks! How about the expected results? And please list the devices which can deliver expected results. That would be also helpful.

GreyZzzzzzXh · 2018-11-14T04:45:07Z

How about the expected results

data in the first four tables is expected result.

the devices which can deliver expected results

PC platform, e.g.

Linux ubuntu 16.04, Chrome 69.0.3497.100 (Official Build) (64-bit)
Windows 10, Chrome 70.0.3538.102 (Official Build) (32-bit)

GreyZzzzzzXh · 2018-11-14T05:06:20Z

Besides, visit https://greyzzzzzzxh.github.io/webml/test/index.html?backend=webgl&grep=CTS for case testing.

computer side:

passes: 127
failures: 6

but many cases fail on mobile device:

passes: 71
failures: 62

huningxin · 2018-11-21T07:15:39Z

@GreyZzzzzzXh , please complete the actual results in #297 (comment). Thanks!

BTW, when testing on the device with incorrect results, are there any errors reported in console?

GreyZzzzzzXh · 2018-11-21T13:04:09Z

please complete the actual results in #297 (comment)

done.

when testing on the device with incorrect results, are there any errors reported in console?

no errors or warnings reported in console.

GreyZzzzzzXh · 2018-11-21T13:26:28Z

Besides, tested on tfjs-converter mobilenet demo, tensorflow.js still shows different precision on the computer side and mobile phone side.

Test Env:
tfjs version: 0.13.3
tfjs-core version: 0.13.8
Windows 10, Chrome 70.0.3538.77 (official build) (64-bit)
Android 8.1.0, Pixel 2, Chrome 70.0.3538.80 (official build) (32-bit)

Results:

Windows 10:

Android 8.1.0:

huningxin · 2018-11-22T01:02:08Z

Thanks for these details!

huningxin · 2018-11-22T01:06:37Z

@GreyZzzzzzXh , could you please take a look at tensorflow/tfjs#265. It sounds like tfjs uses float16 on mobile. Is that the root cause?

GreyZzzzzzXh · 2018-11-22T05:26:43Z

It sounds like tfjs uses float16 on mobile.

thanks! I will do some investigation about this.

GreyZzzzzzXh · 2018-11-28T02:13:49Z

This precision issue can be fixed by upgrading GLSL version to 300 es.
WebGL1.0 dont work now so we just use webgl2.0 backend and still call it WebGL2.

huningxin · 2018-11-29T00:42:41Z

yes i'll record the changes in README.md, and now i import modified tfjs-core in src/nn/webgl2/tfjs-core for use.

According to GreyZzzzzzXh/tfjs-core@39a56bf, could you please explain the root cause and your method of "upgrade GLSL to version 300 es"?

I found you check-in tfjs-core, I would suggest to avoid that. Let's figure out what need to be fixed in tfjs-core. Then propose a fix to tfjs-core repo.

GreyZzzzzzXh · 2018-11-29T05:49:01Z

the root cause

In general, webgl1.0 and 2.0 uses different versions of the shading language (GLGL100 for webgl1.0 and GLSL300es for webgl2.0). But in tfjs, only GLSL100 is used as the shading language for webgl1.0 and 2.0.

The inaccuracy on the phone seems to be because the version of GLSL also has an impact on accuracy. After I changed the GLSL100 to 300es, the float can achieve 32-bit accuracy on the phone, and originally only 16 bits (tested with tf.ENV.backend.floatPrecision()).

your method of "upgrade GLSL to version 300 es"

I added some comments about how to upgrade GLSL version in GreyZzzzzzXh/tfjs-core@39a56bf.
There are five main places that need to be changed:

Declare the shading language version in the shader code as #version 300 es.
Replace attribute with in.
Replace varying with in/out.
Replace texture2D with texture.
There is no built-in variable gl_FragColor in GLSL300es, so we need to define an out variable for the output.

Besides, i set precision highp sampler2D; and made some changes related to function round().
See GreyZzzzzzXh/tfjs-core@39a56bf for more detail.

GreyZzzzzzXh · 2018-11-29T06:11:41Z

Let's figure out what need to be fixed in tfjs-core. Then propose a fix to tfjs-core repo.

Yeah, the best way is to solve this problem by tfjs team. But if it takes a while to fix, I suggest to import the modified tfjs for temporary use.

pinzhenx · 2018-11-29T06:22:56Z

I found something that might help:
https://medium.com/@invicticide/patching-an-npm-dependency-without-going-completely-insane-aa0b110a80c

huningxin · 2018-11-29T06:29:36Z

Yeah, the best way is to solve this problem by tfjs team

Please go ahead to file a bug and open a PR to tfjs-core with your solution.

I suggest to import the modified tfjs for temporary use.

Please don't import source code. If we want to maintain a version of tfjs-core, you can publish your version in npm and npm install from there.

originally only 16 bits (tested with tf.ENV.backend.floatPrecision()).

Probably, we can report this out. Then test cases can handle the lower precision backend differently. @BruceDai

GreyZzzzzzXh · 2018-11-29T12:22:41Z

I published the modified tfjs-core in npm. Now we can install it by npm install. Thanks @huningxin and @pinzhenx .

Next I will improve this fix and open a PR to tfjs-core..

…core

huningxin · 2018-12-04T03:01:52Z

src/nn/webgl/WebGLModel.js

+  prepareModel() {
+    this._model._operands.forEach(operand => {
+      if (utils.isTensor(operand.type)) {
+        let type = this._getOperandType(operand.type);


huningxin · 2018-12-04T03:03:22Z

src/nn/webgl/WebGLModel.js

+      output.buffer.set(operand.dataSync());
+    });
+
+    // console.log(tf.memory());


remove this comment?

huningxin · 2018-12-04T03:03:51Z

src/nn/webgl/WebGLModel.js

+    }
+
+    inputs.forEach(input => {
+      let operand = this._operands[input.index];


huningxin · 2018-12-04T03:03:59Z

src/nn/webgl/WebGLModel.js

+
+    inputs.forEach(input => {
+      let operand = this._operands[input.index];
+      let inputTensor = tf.tensor(input.buffer, operand.shape, operand.dtype);


huningxin · 2018-12-04T03:05:03Z

src/nn/webgl/WebGLModel.js

+
+    switch(op) {
+      case OperationCode.ADD: {
+        let in1 = operands[inputs[0]];


huningxin · 2018-12-04T03:09:11Z

src/nn/webgl/WebGLModel.js

+        let input = operands[inputs[0]];
+        let targetShape = operands[inputs[1]];
+        let output = operands[outputs[0]];
+        output.assign(input.reshape(targetShape.dataSync()));


do we need to do dataSync? I understand it leads to memory read back from GPU to CPU which is bad for performance.

According to the definition of reshape in NN API, targetShape is a 1-D tensor whose value is stored in GPU. But for tf.reshape, target shape should be an array of integers..

huningxin · 2018-12-04T03:09:37Z

src/nn/webgl/WebGLModel.js

+        output.assign(input.reshape(targetShape.dataSync()));
+      } break;
+      case OperationCode.CONCATENATION: {
+        if (outputs.length < 1 || inputs.length < 2) {


simplify the condition?

since tfjs will give detailed reasons for errors, can we remove this check?

huningxin · 2018-12-04T03:12:52Z

src/nn/webgl/WebGLModel.js

+        let bias = operands[inputs[2]];
+        let activation = FuseFunctionMap.get(operands[inputs[3]].value[0]);
+        let output = operands[outputs[0]];
+        let batchSize = input.shape[0];


batchSize should be product (input.shape) / weights.shape[1], according to android nn API spec: https://developer.android.com/ndk/reference/group/neural-networks#group___neural_networks_1ggaabbe492c60331b13038e39d4207940e0aaada7a3dbaf4676aba560c933ff610c5

src/nn/webgl/WebGLModel.js

huningxin · 2018-12-04T03:15:53Z

@GreyZzzzzzXh , finish my review with some comments. Please take a look. Thanks!

GreyZzzzzzXh · 2018-12-05T02:33:12Z

I made some changes, PTAL, thanks! @huningxin

huningxin · 2018-12-07T06:20:28Z

The Travis CI fails due to "chrome installation error". @ibelem , could you or someone please have a check?

ibelem · 2018-12-07T08:17:21Z

@huningxin Please feel free to merge this PR since there is the issue of Travis CI and passed with AppVeyor. We are reported Travis CI issue to upstream and also trying other workrounds.

huningxin · 2018-12-07T11:13:23Z

Thanks for the great work. Looks good to me!

GreyZzzzzzXh force-pushed the webgl_backend branch from 076c547 to cb17503 Compare November 6, 2018 10:09

GreyZzzzzzXh commented Nov 6, 2018

View reviewed changes

Wenzhao-Xiang requested review from Wenzhao-Xiang and removed request for Wenzhao-Xiang November 7, 2018 07:02

GreyZzzzzzXh force-pushed the webgl_backend branch from cb17503 to 8f087a2 Compare November 28, 2018 02:21

[WebGL2] new backend based on tfjs-core

0403bf2

GreyZzzzzzXh force-pushed the webgl_backend branch from 8f087a2 to 0403bf2 Compare November 28, 2018 02:57

GreyZzzzzzXh added 3 commits November 28, 2018 14:55

[WebGL2] fix conflicts

17905e2

[WebGL2] add the dependencies required by tfjs

34e3280

[WebGL2] update README.md

bb84007

GreyZzzzzzXh force-pushed the webgl_backend branch from 6988770 to bb84007 Compare November 28, 2018 14:28

[WebGL] use 'npm i' to install tfjs-core

16d8af5

[WebGL] modify interface for backend choosing / import official tfjs-…

b20c4db

…core

GreyZzzzzzXh changed the title ~~support new webgl2 backend.~~ support new webgl backend. Dec 3, 2018

GreyZzzzzzXh added 2 commits December 3, 2018 13:23

[WebGL] add warning when float precision is only 16-bit

1e62f85

[WebGL] fix depthwise conv2d bug

0eeb44f

huningxin reviewed Dec 4, 2018

View reviewed changes

[WebGL] fix coding style

d9788ef

[WebGL] improve RESHAPE

af02c9c

huningxin merged commit b23dbf9 into intel:master Dec 7, 2018

GreyZzzzzzXh deleted the webgl_backend branch December 8, 2018 17:16

BruceDai mentioned this pull request Dec 12, 2018

[test] Support new WebGL backend for test cases. #370

Merged

support new webgl backend. #297

support new webgl backend. #297

Conversation

GreyZzzzzzXh commented Nov 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GreyZzzzzzXh commented Nov 7, 2018

BruceDai commented Nov 7, 2018

Wenzhao-Xiang commented Nov 7, 2018

Wenzhao-Xiang commented Nov 7, 2018

GreyZzzzzzXh commented Nov 7, 2018

huningxin commented Nov 8, 2018

GreyZzzzzzXh commented Nov 8, 2018 • edited Loading

huningxin commented Nov 9, 2018

GreyZzzzzzXh commented Nov 9, 2018

GreyZzzzzzXh commented Nov 14, 2018 • edited Loading

GreyZzzzzzXh commented Nov 14, 2018

huningxin commented Nov 14, 2018

GreyZzzzzzXh commented Nov 14, 2018

GreyZzzzzzXh commented Nov 14, 2018

huningxin commented Nov 21, 2018

GreyZzzzzzXh commented Nov 21, 2018

GreyZzzzzzXh commented Nov 21, 2018 • edited Loading

huningxin commented Nov 22, 2018

huningxin commented Nov 22, 2018

GreyZzzzzzXh commented Nov 22, 2018

GreyZzzzzzXh commented Nov 28, 2018

huningxin commented Nov 29, 2018

GreyZzzzzzXh commented Nov 29, 2018 • edited Loading

GreyZzzzzzXh commented Nov 29, 2018

pinzhenx commented Nov 29, 2018

huningxin commented Nov 29, 2018

GreyZzzzzzXh commented Nov 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huningxin commented Dec 4, 2018

GreyZzzzzzXh commented Dec 5, 2018

huningxin commented Dec 7, 2018

ibelem commented Dec 7, 2018

huningxin commented Dec 7, 2018

GreyZzzzzzXh commented Nov 8, 2018 •

edited

Loading

GreyZzzzzzXh commented Nov 14, 2018 •

edited

Loading

GreyZzzzzzXh commented Nov 21, 2018 •

edited

Loading

GreyZzzzzzXh commented Nov 29, 2018 •

edited

Loading