1.17.3 cherry-picks for ORT Web changes #19926

fs-eire · 2024-03-15T01:19:53Z

Description

This PR is a preview of cherry-picks for ort-web to rel-1.17.3 based on rel-1.17.2.

Changes of ort-web to cherry-pick

The following commits are from main branch.

o stands for pick, and x stands for skip.

o   2e0a388c36 [js/webgpu] Add HardSigmoid support (#19215)
o   d226e40856 [js/webgpu] set query type in onRunStart (#19202)
o   61610ff986 [js/webgpu] Add FusedConv clip test case (#18900)
o   a33b5bd1fa [JS/WebGPU] Added Uniforms to SkipLayerNorm. (#18788)
o   591f90c0b9 [js/webgpu] Fix issue of timestamp query (#19258)
o   7252c6e747 [WebNN EP] Support WebNN async API with Asyncify (#19145)
o   5b06505073 [js/webgpu] Fix Tanh explosion (#19201)
o   656ca66186 [js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#18753)
o   a3f0e2422b [js/webgpu] Support f16 uniform (#19098)
o   9e69606360 fix f16 for attention, enable slice and flatten for more types (#19262)
o   624b4e2063 [js/webgpu] Remove enableShapesUniforms (#19279)
o   90883a366a [js/webgpu] Add hardSigmoid activation for fusedConv (#19233)
o   85cef0af8c [js/webgpu] Support capture and replay for jsep (#18989)
o   d73131cf0f [js/webgpu] Use DataType as uniform cpu type (#19281)
o   dd1f6ccc45 [js/webgpu] resolve codescan alert (#19343)
o   3a2ab1963a [js/webgpu] Refactor createTensorShapeVariables (#18883)
o   efc17e79de [js/webgpu] Fix the undefined push error (#19366)
 x  50806a7dd5 [js/web] support external data in npm test (#19377)
o   ccbe264a39 [js/webgpu] Add LeakyRelu activation for fusedConv (#19369)
o   5ff27ef02a [js/webgpu] support customop FastGelu (#19392)
 x  03be65e064 [js/web] fix types exports in package.json (#19458)
o   06269a3952 [js/webgpu] allow uint8 tensors for webgpu (#19545)
o   dfeda9019c [JS/WebGPU] Add MatMulNBits (#19446)
o   1b48054e1b [js/webgpu] Create Split indices helpers by rank, not by shape (#19554)
o   3fe2c137ee [js] small fix to workaround formatter (#19400)
 x  70567a4b3a [js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358)
o   6e04e36e3f [js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317)
o   58f4921686 [js] changes to allow Float16Array if any polyfill is available (#19305)
o   57d6819212 [js/web] Fix fused-conv is not included in npm test (#19581)
o   ebd220b073 Misspelling in README.md (#19433)
o   38c3432393 Bump ip from 1.1.8 to 1.1.9 in /js/react_native (#19582)
o   fe82fccf1a [js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596)
o   76a2a487a1 Bump ip from 1.1.8 to 1.1.9 in /js/react_native/e2e (#19583)
o   29b1106033 [node] Switch to setImmediate to avoid starving the Node.js event loop (#19610)
o   ae3d73c981 [JS/WebGPU] Fix Split and Where to handle corner cases. (#19613)
o   aec2389ad0 [js/webgpu] allows a ProgramInfo's RunData to use zero sized output (#19614)
o   bb43a0f133 [js/webgpu] minor fixes to make tinyllama work (#19564)
o   0edb035808 [js/web] fix suite test list for zero sized tensor (#19638)
o   3cb81cdde2 [js/common] move 'env.wasm.trace' to 'env.trace' (#19617)
o   e30618d055 [js/webgpu] use Headless for webgpu test by default (#19702)
o   f06164ef8b [js/web] transfer input buffer back to caller thread (#19677)
 x  a788514027 [js/web] dump debug logs for karma for diagnose purpose (#19785)
o   24b72d2613 [JS/WebGPU] Preserve zero size input tensor dims. (#19737)
o   4538d31a8b [js/webgpu] expose a few properties in WebGPU API (#19857)
o   53de2d8cb0 [js/webgpu] Enable GroupedConvVectorize path (#19791)
o   ed250b88c3 [JS/WebGPU] Optimize MatMulNBits (#19852)
 x  e771a763c3 [js/test] align web test runner flags with ort.env (#19790)
o   79e50aeef3 [js/web] rewrite backend resolve to allow multiple EPs (#19735)
o   acb0df2280 Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932)
o   b29849a287 [js/common] fix typedoc warnings (#19933)
o   afdab62f53 Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949)
o   28ad6c3955 Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951)
o   7e0d424934 accumulate in fp32 for Reduce* (#19868)
o   4c6a6a37f7 [js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387)
o   01c7aaf6aa [js/webgpu] allow setting env.webgpu.adapter (#19940)
o   c45cff60cf [js/webgpu] fix maxpool / fp16 (#19981)

Cherry-pick commandlines

git cherry-pick 2e0a388c36
git cherry-pick d226e40856
git cherry-pick 61610ff986
git cherry-pick a33b5bd1fa
git cherry-pick 591f90c0b9
git cherry-pick 7252c6e747
git cherry-pick 5b06505073
git cherry-pick 656ca66186
git cherry-pick a3f0e2422b
git cherry-pick 9e69606360
git cherry-pick 624b4e2063
git cherry-pick 90883a366a
git cherry-pick 85cef0af8c  #<<<<< Note: conflicts
git cherry-pick d73131cf0f
git cherry-pick dd1f6ccc45
git cherry-pick 3a2ab1963a
git cherry-pick efc17e79de
git cherry-pick ccbe264a39
git cherry-pick 5ff27ef02a
git cherry-pick 06269a3952
git cherry-pick dfeda9019c
git cherry-pick 1b48054e1b
git cherry-pick 3fe2c137ee
git cherry-pick 6e04e36e3f
git cherry-pick 58f4921686
git cherry-pick 57d6819212
git cherry-pick ebd220b073
git cherry-pick 38c3432393
git cherry-pick fe82fccf1a
git cherry-pick 76a2a487a1
git cherry-pick 29b1106033
git cherry-pick ae3d73c981
git cherry-pick aec2389ad0
git cherry-pick bb43a0f133
git cherry-pick 0edb035808
git cherry-pick 3cb81cdde2
git cherry-pick e30618d055
git cherry-pick f06164ef8b
git cherry-pick 24b72d2613
git cherry-pick 4538d31a8b
git cherry-pick 53de2d8cb0
git cherry-pick ed250b88c3
git cherry-pick 79e50aeef3
git cherry-pick acb0df2280
git cherry-pick b29849a287
git cherry-pick afdab62f53
git cherry-pick 28ad6c3955
git cherry-pick 7e0d424934
git cherry-pick 4c6a6a37f7
git cherry-pick 01c7aaf6aa
git cherry-pick c45cff60cf

Cherry-pick conflicts

85cef0a [js/webgpu] Support capture and replay for jsep #18989
this change is for enabling graph capture feature for JSEP, and it is done after ROCM EP enabled graph capture feature. However, the ROCM EP graph capture feature is not cherry-picked in rel-1.17.2.

### Description This op is required in mobilenetv3-small-100. With this PR, mobilenetv3-small-100 model becomes less than 10 ms from over 100 ms on ADL.

### Description  `env.webgpu.profiling` is a global flag. It may change before each session.run. So the best place is to update it in `onRunStart` event. After this, we can directly check `this.queryType`'s value. Without this pr, we need to make sure that `getCommandEncoder()` is called before checking `this.queryType`. Otherwise, it may happen that `pendingKernels`'s length is not equal to `pendingDispatchNumber`'s length. See the two ugly workarounds [1)](e630dbf#diff-006fc84d3997f96a29b8033bd2075d6a0a9509211bd5812a6b934fc74fedfd9dR267-R268) and [2)](e630dbf#diff-618fe297fbe7a1da586380163b8fd2627311ccc217640a3c5cdc9c17a33472c1R73-R80) if we don't introduce `onRunStart`. Or we need to call `setQueryType` in each kernel run.

Bug: #18899

### Description Added Uniforms to SkipLayerNorm ### Motivation and Context  Improve performance --------- Co-authored-by: Yulong Wang <[email protected]>

When we enable webgpu profiling mode between session.create and session.run, current implementation has a problem to create querySet (and also queryResolveBuffer) if we share the commandEncoder with inputs upload. This PR fixes this by moving the querySet creation to the place we set queryType.

### Description ```math \tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}= \left\{ \begin{array}{cc} -\frac{1-e^{-2\cdot(-x)}}{1+e^{-2\cdot(-x)}}, & x<0 \\ 0, & x=0 \\ \frac{1-e^{-2x}}{1+e^{-2x}}, & x>0 \end{array} \right. ``` ### Motivation and Context On some platforms, $$\tanh(1000)=\frac{e^{1000}-e^{-1000}}{e^{1000}+e^{-1000}}$$ would produce NaN instead of 0.999... or 1 (imagine $e^{1000}=\infty$ and $\frac{\infty}{\infty}$ explodes).

…18753)

### Description  ### Motivation and Context

### Description Add hardSigmoid activation for fusedConv. It will be used by mobilenetv3-small-100 model.

This PR expands the graph capture capability to JS EP, which is similar to #16081. But for JS EP, we don't use the CUDA Graph, instead, we records all gpu commands and replay them, which removes most of the cpu overhead to avoid the the situation that gpu waiting for cpu. mobilenetv2-12 becomes 3.7ms from 6ms on NV 3090 and becomes 3.38ms from 4.58ms on Intel A770. All limitations are similar with CUDA EP: 1. Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. 2. Usage of graph capture is limited to models where-in all ops in the model can be partitioned to the JS EP or CPU EP and no memory copy between them. 3. Shapes of inputs/outputs cannot change across inference calls. 4. IObinding is required. The usage is like below: Method 1: specify outputs buffers explicitly. ``` const sessionOptions = { executionProviders: [ { name: "webgpu", }, ], enableGraphCapture: true, }; const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions); // prepare the inputBuffer/outputBuffer ... ... const feeds = { 'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims }) }; const fetches = { 'output': ort.Tensor.fromGpuBuffer(outputBuffer, { dataType: 'float32', dims: [1, 1000] }) }; let results = await session.run(feeds, fetches); // The first run will begin to capture the graph. // update inputBuffer content ... ... results = = await session.run(feeds, fetches); // The 2ed run and after will directly call replay to execute the graph. ... ... session.release(); ``` Method 2: Don't specify outputs buffers explicitly. Internally, when graph capture is enabled, it will set all outputs location to 'gpu-buffer'. ``` const sessionOptions = { executionProviders: [ { name: "webgpu", }, ], enableGraphCapture: true, }; const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions); // prepare the inputBuffer ... ... const feeds = { 'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims }) }; let results = await session.run(feeds); // The first run will begin to capture the graph. // update inputBuffer content ... ... results = = await session.run(feeds); // The 2ed run and after will directly call replay to execute the graph. ... ... session.release();

This saves turning data type to string by tensorDataTypeEnumToString.

### Description resolve codescan alert: https://github.com/microsoft/onnxruntime/security/code-scanning/17687

### Description This PR fixes below errors when enable webgpu profiling: ``` TypeError: Cannot read properties of undefined (reading 'push') ```

### Description This PR 1) adds LeakyRelu activation for fusedConv; 2) makes `vec4<f16>` value work with `float32` uniforms attributes. For example: `clamp(value, vec4<f16>(uniforms.clip_min), vec4<f16>(uniforms.clip_max)` will throw compilation errors since `uniforms.clip_min` and `uniforms.clip_min` are `f32` not `f16`. So we need to change it to `clamp(value, vec4<f16>(f16(uniforms.clip_min)), vec4<f16>(f16(uniforms.clip_max))` And above problem was introduced when we make activation attributes as uniforms instead of constant. BTW, after adding LeakyRelu, `realesrgan-t256` model can pass.

### Description Support WebGPU custom operator FastGelu.

### Description allow uint8 tensors for webgpu

### Description Add MatMulNBits to support MatMul using 4-bit quantized weights ### Motivation and Context

### Description This is required to make shape uniforms really work. ### Motivation and Context The bug was unveiled in a model with multiple Split nodes. The later nodes would try to reuse a previous pipeline cache, while the old shapes were hardcoded as constants in cache.

### Description Rename shader variable names to snake_case naming and also to avoid formatter behaving inconsistently in win/linux.

### Description upgrade tsc in common from 4.9.5 to 5.2.2

### Description This change adds only necessary code to enable ort-web works with any Float16Array polyfill. Unlike #19302, in this PR, ort-web does not include any specific polyfill; instead, it's user's choice for how to use a polyfill. ORT-web uses Float16Array if it's available; otherwise, fallback to use Uint16Array. ```js // case 1: user does not use polyfill: import * as ort from 'onnxruntime-web'; const myF16Data = new Uint16Array(...); // need to use Uint16Array const myF16tensor = new ort.Tensor('float16', myF16Data, dims); ``` ```js // case 2: user use polyfill: import * as ort from 'onnxruntime-web'; import { Float16Array, isFloat16Array, isTypedArray, getFloat16, setFloat16, f16round, } from "@petamoriken/float16"; globalThis.Float16Array = Float16Array; // ort-web will pick the global Float16Array const myF16Data = new Float16Array(...); // Use the polyfilled Float16Array type const myF16tensor = new ort.Tensor('float16', myF16Data, dims); ```

BUG: #18855 ### Description  ### Motivation and Context

Fixed a misspelling.

@fs-eire

Bumps [ip](https://github.com/indutny/node-ip) from 1.1.8 to 1.1.9. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/indutny/node-ip/commit/1ecbf2fd8c0cc85e44c3b587d2de641f50dc0217"><code>1ecbf2f</code></a> 1.1.9</li> <li><a href="https://github.com/indutny/node-ip/commit/6a3ada9b471b09d5f0f5be264911ab564bf67894"><code>6a3ada9</code></a> lib: fixed CVE-2023-42282 and added unit test</li> <li>See full diff in <a href="https://github.com/indutny/node-ip/compare/v1.1.8...v1.1.9">compare view</a></li> </ul> </details> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ip&package-manager=npm_and_yarn&previous-version=1.1.8&new-version=1.1.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

This is used in sam-h-decoder-f16. ### Description  ### Motivation and Context

@fs-eire

Bumps [ip](https://github.com/indutny/node-ip) from 1.1.8 to 1.1.9. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/indutny/node-ip/commit/1ecbf2fd8c0cc85e44c3b587d2de641f50dc0217"><code>1ecbf2f</code></a> 1.1.9</li> <li><a href="https://github.com/indutny/node-ip/commit/6a3ada9b471b09d5f0f5be264911ab564bf67894"><code>6a3ada9</code></a> lib: fixed CVE-2023-42282 and added unit test</li> <li>See full diff in <a href="https://github.com/indutny/node-ip/compare/v1.1.8...v1.1.9">compare view</a></li> </ul> </details> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ip&package-manager=npm_and_yarn&previous-version=1.1.8&new-version=1.1.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

### Description  1. Fix Where operator to handle Boolean input less than 4 bytes. 2. Fix JSEP test harness to use tensor names consistently. ### Motivation and Context

…19614) ### Description This PR allows zero-sized output. To make the implementation simple, it does not support partial zero-sized tensor. Which means, either all outputs are zero-sized, or an error will be reported. added 2 tests: - op test of `Add` with input T[2,0] T[2,1], and - test_split_zero_size_splits

### Description Fixes build break brought by #19614 Currently WebGL backend does not support zero sized tensor. This change split test data into 2 parts, and only enable zero sized tensor tests for WebGPU.

### Description Try to move 'env.wasm.trace' to 'env.trace' to make it less confusing, because it also works in webgpu. Marked 'env.wasm.trace' as deprecated.

### Description use Chromium Headless for webgpu test by default. Still use normal Chromium with window when debug=true or perfMode=true. Use the [`--headless=new`](https://developer.chrome.com/docs/chromium/new-headless) mode. ### Motivation and Context try to use a more stable way to launch npm tests to avoid a "chrome not found" issue in pipeline, which may potentially caused by windowed application.

### Description When using proxy worker, input buffers should be transferred back to the caller thread after `run()` call is done. Fixes #19488

### Description For Concat operation, the zero-size input tensor shape need to be preserved and, unlike non-zero tensors, the dims are not constrained to match other input tensors' dims. ### Motivation and Context

@xenova

### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in #14579 (comment). - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois

Vectorize met 2 failed cases in a CI bot with NVIDIA GPU, but we couldn't repro with all the GPUs at hand, including NVIDIA GPUs. This PR introduces GPUAdapterInfo and enables this opt on non-NVIDIA GPUs to make the bots happy. No obivous perf gain can be seen if we enable vectorize on NVIDIA. However, it shows big perf improvement on Intel. On my Gen12 Intel GPU, mobilenetv2-12 perf was improved from 11.14ms to 7.1ms.

### Description Use vec<2> or vec<4>, operands in MatMulNBits ### Motivation and Context Improve performance

### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are **real backend**s, which means it's different implementation in code; and there are **backend hint**s, which are used as string names for backend hint; and there are **EP**s of the ONNX Runtime concepts. currently there are only 2 **backend**s in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: | Execution Provider Name (public) / Backend Hint (internal) | Backend | EP in ORT | -------- | ------- | ------- | | "wasm"/"cpu" | WasmBackend | CPU EP | "webgl" | OnnxjsBackend | \* technically not an EP | "webgpu" | WasmBackend | JSEP | "webnn" | WasmBackend | WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: #15796 (comment): EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console

…age (#19932) ### Description Fix #19931 broken Get Started link HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page Co-authored-by: Yulong Wang <[email protected]>

@param

### Description Fix a few warnings in typedoc (for generating JS API): ``` [warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used. [warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation. [warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation. [warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation. [warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation. [warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device. ``` Changes highlighted: - Merge `CoreMlExecutionProviderOption` and `CoreMLExecutionProviderOption`. They expose 2 set of different options for React-native and ORT nodejs binding. This should be fixed in future. - Fix a few inconsistency of names between JSDoc and parameters - Fix broken type links - Exclude trace functions

Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.4 to 1.15.6. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/follow-redirects/follow-redirects/commit/35a517c5861d79dc8bff7db8626013d20b711b06"><code>35a517c</code></a> Release version 1.15.6 of the npm package.</li> <li><a href="https://github.com/follow-redirects/follow-redirects/commit/c4f847f85176991f95ab9c88af63b1294de8649b"><code>c4f847f</code></a> Drop Proxy-Authorization across hosts.</li> <li><a href="https://github.com/follow-redirects/follow-redirects/commit/8526b4a1b2ab3a2e4044299377df623a661caa76"><code>8526b4a</code></a> Use GitHub for disclosure.</li> <li><a href="https://github.com/follow-redirects/follow-redirects/commit/b1677ce00110ee50dc5da576751d39b281fc4944"><code>b1677ce</code></a> Release version 1.15.5 of the npm package.</li> <li><a href="https://github.com/follow-redirects/follow-redirects/commit/d8914f7982403ea096b39bd594a00ee9d3b7e224"><code>d8914f7</code></a> Preserve fragment in responseUrl.</li> <li>See full diff in <a href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.4...v1.15.6">compare view</a></li> </ul> </details> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=follow-redirects&package-manager=npm_and_yarn&previous-version=1.15.4&new-version=1.15.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…19387) The added case will be NAN because of the un-initialized buffer.

@xenova

### Description Allow user to set `env.webgpu.adapter` before creating the first inference session. Feature request: #19857 (comment) @xenova

fs-eire · 2024-03-22T01:09:06Z

/azp run Linux GPU TensorRT CI Pipeline

azure-pipelines · 2024-03-22T01:09:20Z

Azure Pipelines successfully started running 1 pipeline(s).

qjia7 and others added 30 commits March 14, 2024 17:23

[js/webgpu] Add HardSigmoid support (#19215)

a24273e

### Description This op is required in mobilenetv3-small-100. With this PR, mobilenetv3-small-100 model becomes less than 10 ms from over 100 ms on ADL.

[js/webgpu] Add FusedConv clip test case (#18900)

254b543

Bug: #18899

[WebNN EP] Support WebNN async API with Asyncify (#19145)

4eafe73

[js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#…

f02accb

…18753)

[js/webgpu] Support f16 uniform (#19098)

3abc3db

### Description  ### Motivation and Context

fix f16 for attention, enable slice and flatten for more types (#19262)

5ef244a

[js/webgpu] Remove enableShapesUniforms (#19279)

55cede9

[js/webgpu] Add hardSigmoid activation for fusedConv (#19233)

c61a8e5

### Description Add hardSigmoid activation for fusedConv. It will be used by mobilenetv3-small-100 model.

[js/webgpu] Use DataType as uniform cpu type (#19281)

d2db872

This saves turning data type to string by tensorDataTypeEnumToString.

[js/webgpu] resolve codescan alert (#19343)

e305794

### Description resolve codescan alert: https://github.com/microsoft/onnxruntime/security/code-scanning/17687

[js/webgpu] Refactor createTensorShapeVariables (#18883)

fcefc67

[js/webgpu] Fix the undefined push error (#19366)

bff2f5b

### Description This PR fixes below errors when enable webgpu profiling: ``` TypeError: Cannot read properties of undefined (reading 'push') ```

[js/webgpu] support customop FastGelu (#19392)

257cf5e

### Description Support WebGPU custom operator FastGelu.

[js/webgpu] allow uint8 tensors for webgpu (#19545)

2eb5f3b

### Description allow uint8 tensors for webgpu

[JS/WebGPU] Add MatMulNBits (#19446)

0f97b5b

### Description Add MatMulNBits to support MatMul using 4-bit quantized weights ### Motivation and Context

[js] small fix to workaround formatter (#19400)

9641002

### Description Rename shader variable names to snake_case naming and also to avoid formatter behaving inconsistently in win/linux.

[js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317)

a750c39

### Description upgrade tsc in common from 4.9.5 to 5.2.2

[js/web] Fix fused-conv is not included in npm test (#19581)

4d0a685

BUG: #18855 ### Description  ### Motivation and Context

Misspelling in README.md (#19433)

fb9d285

Fixed a misspelling.

satyajandhyala and others added 11 commits March 14, 2024 18:27

[js/webgpu] minor fixes to make tinyllama work (#19564)

16c5546

[js/web] fix suite test list for zero sized tensor (#19638)

d6a7bd6

### Description Fixes build break brought by #19614 Currently WebGL backend does not support zero sized tensor. This change split test data into 2 parts, and only enable zero sized tensor tests for WebGPU.

[js/common] move 'env.wasm.trace' to 'env.trace' (#19617)

d61ef6f

### Description Try to move 'env.wasm.trace' to 'env.trace' to make it less confusing, because it also works in webgpu. Marked 'env.wasm.trace' as deprecated.

[js/web] transfer input buffer back to caller thread (#19677)

ce612c7

### Description When using proxy worker, input buffers should be transferred back to the caller thread after `run()` call is done. Fixes #19488

[JS/WebGPU] Optimize MatMulNBits (#19852)

1df9911

### Description Use vec<2> or vec<4>, operands in MatMulNBits ### Motivation and Context Improve performance

fs-eire force-pushed the fs-eire/rel-1.17.3 branch from 2334a89 to 1df9911 Compare March 15, 2024 01:27

fs-eire and others added 9 commits March 19, 2024 18:02

Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" p…

0208b1e

…age (#19932) ### Description Fix #19931 broken Get Started link HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page Co-authored-by: Yulong Wang <[email protected]>

accumulate in fp32 for Reduce* (#19868)

b2c48b9

[js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#…

333efa0

…19387) The added case will be NAN because of the un-initialized buffer.

[js/webgpu] allow setting env.webgpu.adapter (#19940)

53377a3

### Description Allow user to set `env.webgpu.adapter` before creating the first inference session. Feature request: #19857 (comment) @xenova

[js/webgpu] fix maxpool / fp16 (#19981)

310c099

fs-eire changed the title ~~1.17.3 cherry-picks for ORT Web changes (round 1)~~ 1.17.3 cherry-picks for ORT Web changes Mar 20, 2024

fs-eire marked this pull request as ready for review March 20, 2024 01:05

fs-eire requested a review from a team as a code owner March 20, 2024 01:05

snnn approved these changes Mar 20, 2024

View reviewed changes

YUNQIUGUO approved these changes Mar 29, 2024

View reviewed changes

YUNQIUGUO merged commit 45ff957 into rel-1.17.3 Mar 29, 2024
100 of 107 checks passed

YUNQIUGUO deleted the fs-eire/rel-1.17.3 branch March 29, 2024 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.17.3 cherry-picks for ORT Web changes #19926

1.17.3 cherry-picks for ORT Web changes #19926

fs-eire commented Mar 15, 2024 •

edited

Loading

fs-eire commented Mar 22, 2024

azure-pipelines bot commented Mar 22, 2024

1.17.3 cherry-picks for ORT Web changes #19926

1.17.3 cherry-picks for ORT Web changes #19926

Conversation

fs-eire commented Mar 15, 2024 • edited Loading

Description

fs-eire commented Mar 22, 2024

azure-pipelines bot commented Mar 22, 2024

fs-eire commented Mar 15, 2024 •

edited

Loading