Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UT] regression in test_subprocess.py with the PTDB 0.5.3 #800

Open
pbchekin opened this issue Apr 3, 2024 · 14 comments · Fixed by #1488 · May be fixed by #2505
Open

[UT] regression in test_subprocess.py with the PTDB 0.5.3 #800

pbchekin opened this issue Apr 3, 2024 · 14 comments · Fixed by #1488 · May be fixed by #2505
Assignees

Comments

@pbchekin
Copy link
Contributor

pbchekin commented Apr 3, 2024

12 tests cases are failing:

2024-04-02T21:44:57.4181029Z =========================== short test summary info ============================
2024-04-02T21:44:57.4181419Z FAILED language/test_subprocess.py::test_print[device_print-int16] - assert False
2024-04-02T21:44:57.4182039Z FAILED language/test_subprocess.py::test_print[device_print-long] - assert False
2024-04-02T21:44:57.4182626Z FAILED language/test_subprocess.py::test_print[print-int32] - assert False
2024-04-02T21:44:57.4183238Z FAILED language/test_subprocess.py::test_print[device_print-float32] - assert False
2024-04-02T21:44:57.4183857Z FAILED language/test_subprocess.py::test_print[device_print-int8] - assert False
2024-04-02T21:44:57.4184439Z FAILED language/test_subprocess.py::test_print[device_print-int32] - assert False
2024-04-02T21:44:57.4185023Z FAILED language/test_subprocess.py::test_print[device_print-float16] - assert False
2024-04-02T21:44:57.4185590Z FAILED language/test_subprocess.py::test_print[device_print-float64] - assert False
2024-04-02T21:44:57.4186209Z FAILED language/test_subprocess.py::test_print[device_print-uint8] - assert False
2024-04-02T21:44:57.4186820Z FAILED language/test_subprocess.py::test_print[device_print_hex-int16] - assert False
2024-04-02T21:44:57.4187416Z FAILED language/test_subprocess.py::test_print[device_print_hex-int32] - assert False
2024-04-02T21:44:57.4188022Z FAILED language/test_subprocess.py::test_print[device_print_hex-int64] - assert False
2024-04-02T21:44:57.4188461Z ======================== 12 failed, 21 passed in 24.07s ========================
pbchekin added a commit that referenced this issue Apr 3, 2024
Build a new runner image with packages from the recent rolling stable
release 821.
Newly skipped tests are tracked in #797, #800.
Pass rate: 92.76% -> 92.6%
@vlad-penkin vlad-penkin added bug Something isn't working tests: ut labels Apr 3, 2024
@whitneywhtsang
Copy link
Contributor

Continue to fail with agama 821.32.

@quintinwang5
Copy link
Contributor

Blocked by new driver's bug. Already file a JIRA.

@quintinwang5
Copy link
Contributor

This should not be a driver's bug. Because driver team cannot reproduce it with oneapi 2024.0. I confirmed that in the same environment(should be 821.30), 2024.0 works, but 2024.1 fails. So this may be a compiler regression. Will file a new JIRA to compiler team.

pbchekin added a commit that referenced this issue Apr 17, 2024
Introducing a new approach for skipping tests. Skip lists are located in
the directory `scripts/skiplist`. Currently there are two skip lists:

* `scripts/skiplist/default` - default, tests to skip in the main
workflow and by `test-triton.sh` script.
* `scripts/skiplist/conda` - tests to skip in the conda workflow
(currently it requires more tests to skip).

In future, it is possible to add more skip lists. To specify a custom
skip list set `TRITON_TEST_SKIPLIST_DIR` before executing
`test-triton.sh`, for example, conda workflow sets this to use a custom
skip list:

```
TRITON_TEST_SKIPLIST_DIR=scripts/skiplist/conda
```

Every skip list can contain 0 or more `.txt` files, each file
corresponds to a "test suite". The whole list of existing test suites
can be obtained from `test-triton.sh` by searching for different values
of `TRITON_TEST_SUITE`, for example:

```
TRITON_DISABLE_LINE_INFO=0 TRITON_TEST_SUITE=line_info \
pytest --verbose --device xpu language/test_line_info.py
```

Currently there are 7 "tests suites":

* `language`
* `subprocess`
* `runtime`
* `line_info`
* `interpreter`
* `operators`
* `regression`

For example, if you want to skip a test in a "subprocess" test suite,
add a line with the fully qualified test name to
`scripts/skiplist/default/subprocess.txt`:

```
# This is a comment. Please use comments to specify why next tests are skipped, for example
# #800
test/unit/language/test_subprocess.py::test_print[print-int32]
```

If a `.txt` file for a test suite does not exist, or empty then no tests
will be skipped. To get a full list of the available tests for a test
suite, use `pytest` with `--collect-only`, for example:

```
$ pytest --collect-only language/test_subprocess.py -q | sort
33 tests collected in 0.01s
test/unit/language/test_subprocess.py::test_assert[assert]
test/unit/language/test_subprocess.py::test_assert[device_assert]
...
```

This PR contains only several tests to skip (see
`scripts/skiplist/default/subprocess.txt`), see #800. The corresponding
Python code to skip tests programmatically has been removed from
`test_subprocess.py`. If this works, we can gradually populate the
default skip list by removing code to skip from Python files.

Note that an error will be generated if you try to skip a test that does
not exists. This is intendent behavior (we use
`--select-fail-on-missing` for
[pytest-select](https://pypi.org/project/pytest-select/)) and required
to calculate a pass rate.
@vlad-penkin vlad-penkin self-assigned this Apr 22, 2024
@vlad-penkin vlad-penkin removed their assignment May 28, 2024
@pbchekin pbchekin assigned vlad-penkin and unassigned AshburnLee May 28, 2024
@vlad-penkin
Copy link
Contributor

This issue needs to be rechecked after June Rolling Driver release.

@vlad-penkin vlad-penkin assigned AshburnLee and unassigned vlad-penkin Jun 6, 2024
@AshburnLee
Copy link
Contributor

AshburnLee commented Jun 12, 2024

Continue to fail with agama 821.35 & 881.19 6/12/2024

@AshburnLee
Copy link
Contributor

Continue to fail with agama 821. 6/17/2024

@AshburnLee
Copy link
Contributor

Continue to fail with agama 821.

@vlad-penkin vlad-penkin changed the title [UT] regression in test_subprocess.py with the rolling stable release 821 [UT] regression in test_subprocess.py with the oneAPI 2024.1.0 and PTDB 0.5.1 Jun 26, 2024
@vlad-penkin
Copy link
Contributor

@AshburnLee could you please retest with the Agama 914?

@AshburnLee
Copy link
Contributor

Continue to fail with agama 914.

1 similar comment
@AshburnLee
Copy link
Contributor

Continue to fail with agama 914.

etiotto pushed a commit that referenced this issue Jul 9, 2024
Closes #800

I accidentally discovered this effect while I was working on
#1082. It
seems that accessing the tensor elements via `repr` builtin python
function stimulates the execution of the kernel and obtaining output
from it.

Although the behavior is different from other backends, it allows us to
at least somehow test this feature right now.

---------

Signed-off-by: Anatoly Myachev <[email protected]>
@whitneywhtsang whitneywhtsang reopened this Jul 9, 2024
@anmyachev
Copy link
Contributor

Reminder: don't forget to remove:

if device == "xpu":
# FIXME: remove trigger to get output from kernel
repr(x)
repr(y)

@vlad-penkin
Copy link
Contributor

vlad-penkin commented Aug 17, 2024

With PTDB 0.5.3 and Agama 950 this issue is still reproducible without the repr fix, 29 test variants are failing

FAILED language/test_subprocess.py::test_print[device_print-int8]
FAILED language/test_subprocess.py::test_print[device_print-uint8]
FAILED language/test_subprocess.py::test_print[device_print-int16]
FAILED language/test_subprocess.py::test_print[device_print-int32]
FAILED language/test_subprocess.py::test_print[device_print-long]
FAILED language/test_subprocess.py::test_print[device_print-float16]
FAILED language/test_subprocess.py::test_print[device_print-float32]
FAILED language/test_subprocess.py::test_print[device_print-float64]
FAILED language/test_subprocess.py::test_print[device_print_scalar-int8]
FAILED language/test_subprocess.py::test_print[device_print_scalar-uint8]
FAILED language/test_subprocess.py::test_print[device_print_scalar-int16]
FAILED language/test_subprocess.py::test_print[device_print_scalar-int32]
FAILED language/test_subprocess.py::test_print[device_print_scalar-long]
FAILED language/test_subprocess.py::test_print[device_print_scalar-float16]
FAILED language/test_subprocess.py::test_print[device_print_scalar-float32]
FAILED language/test_subprocess.py::test_print[device_print_scalar-float64]
FAILED language/test_subprocess.py::test_print[print-int32]
FAILED language/test_subprocess.py::test_print[static_print-int32]
FAILED language/test_subprocess.py::test_print[no_arg_print-int32]
FAILED language/test_subprocess.py::test_print[print_no_arg-int32]
FAILED language/test_subprocess.py::test_print[device_print_large-int32]
FAILED language/test_subprocess.py::test_print[print_multiple_args-int32]
FAILED language/test_subprocess.py::test_print[device_print_multiple_args-int32]
FAILED language/test_subprocess.py::test_print[device_print_hex-int16]
FAILED language/test_subprocess.py::test_print[device_print_hex-int32]
FAILED language/test_subprocess.py::test_print[device_print_hex-int64]
FAILED language/test_subprocess.py::test_print[device_print_pointer-int32]
FAILED language/test_subprocess.py::test_print[device_print_negative-int32]
FAILED language/test_subprocess.py::test_print[device_print_uint-uint32]

With repr fix 6 test variant are failing. All 6 tests are included into the default skip list.

test/unit/language/test_subprocess.py::test_print[device_print-float16]
test/unit/language/test_subprocess.py::test_print[device_print-float32]
test/unit/language/test_subprocess.py::test_print[device_print-float64]
test/unit/language/test_subprocess.py::test_print[device_print_scalar-float16]
test/unit/language/test_subprocess.py::test_print[device_print_scalar-float64]
test/unit/language/test_subprocess.py::test_print[device_print_scalar-float32]

@vlad-penkin vlad-penkin changed the title [UT] regression in test_subprocess.py with the oneAPI 2024.1.0 and PTDB 0.5.1 [UT] regression in test_subprocess.py with the PTDB 0.5.3 Aug 17, 2024
@vlad-penkin
Copy link
Contributor

@etiotto and @whitneywhtsang what are the next steps to resolve the issue?

@whitneywhtsang
Copy link
Contributor

@etiotto and @whitneywhtsang what are the next steps to resolve the issue?

There is a CMPLRLLVM ticket opened for this issue, and we should continue to follow up there to have it fixed.

Retribution98 added a commit to Retribution98/intel-xpu-backend-for-triton that referenced this issue Oct 17, 2024
Closes intel#800

Signed-off-by: Kirill Suvorov <[email protected]>
@Retribution98 Retribution98 linked a pull request Oct 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment