Enable more tests for CPU in CI #51

int3 · 2024-07-15T20:36:17Z

Instead of having all tests marked with pytest.mark.cpu, I've taken to enabling all tests in a file by default. I think test_core.py can be the exception for now since it has an order magnitude more tests than the other files.

* [BACKEND][CPU] Implement the empty cpu backend * Run clang-format * Fix yadf error Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

A quick addition on how to use it.

Summary: This is stll a kind of the boilerplate and basic lowering for the first milestone (compiling vector addition). This PR firstly lowers `tt.func` and `tt.return`. Test Plan: It can safely compile an empty kernel. ``` @triton.jit def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr): return ``` > TRITON_ENABLE_LLVM_DEBUG=1 TRITON_CPU_BACKEND=1 python3 empty_kerne.py ``` //===-------------------------------------------===// Legalizing operation : 'tt.func'(0x73be2a0) { * Fold { } -> FAILURE : unable to fold * Pattern : 'tt.func -> ()' { Trying to match "(anonymous namespace)::FuncOpConversion" ** Insert : 'llvm.func'(0x6c04c70) ** Insert Block into : 'llvm.func'(0x6c04c70) ** Insert Block into : 'llvm.func'(0x6c04c70) ** Erase : 'tt.func'(0x73be2a0) "(anonymous namespace)::FuncOpConversion" result 1 //===-------------------------------------------===// Legalizing operation : 'llvm.func'(0x6c04c70) { } -> SUCCESS : operation marked legal by the target //===-------------------------------------------===// ... //===-------------------------------------------===// Legalizing operation : 'tt.return'(0x73efeb0) { "tt.return"() : () -> () * Fold { } -> FAILURE : unable to fold * Pattern : 'tt.return -> ()' { Trying to match "(anonymous namespace)::ReturnOpConversion" ** Insert : 'llvm.return'(0x73c0f00) ** Replace : 'tt.return'(0x73efeb0) "(anonymous namespace)::ReturnOpConversion" result 1 //===-------------------------------------------===// Legalizing operation : 'llvm.return'(0x73c0f00) { "llvm.return"() : () -> () } -> SUCCESS : operation marked legal by the target //===-------------------------------------------===// } -> SUCCESS : pattern applied successfully ```

@printf

…riton-lang#1) Summary: As title, `tl.program_id` needs to be supported first. As of now, we think pid will be provided as additional function arguments to the kernel. So, getting program_id is mapped to reading one of the last three arguments. I also quickly implemented `tl.device_print` or `print`, only for scalar types for a quick "Hello, World!" testing. Test Plan: Tested with a simple example: ``` @triton.jit def add_kernel(...): pid = tl.program_id(axis=0) # We use a 1D launch grid so axis is 0. foo = pid + 42 tl.device_print("Hello, World!", foo, pid) ``` The resulting .llir is valid: ``` @printfFormat_1 = internal constant [31 x i8] c"pid (%u, %u, %u) test: %u, %u\0A\00" declare !dbg !3 i32 @printf(ptr, ...) define void @add_kernel(ptr addrspace(1) %0, ptr addrspace(1) %1, ptr addrspace(1) %2, i32 %3, i32 %4, i32 %5, i32 %6) !dbg !7 { %8 = add i32 %4, 42, !dbg !8 %9 = call i32 (ptr, ...) @printf(ptr @printfFormat_0, i32 %4, i32 %5, i32 %6, i32 %8, i32 %4) ret void, !dbg !9 } ``` Tried to compile with a fake main function: ``` > % cat main.c extern void add_kernel(float*, float*, float*, int, int, int, int); int main() { add_kernel(0, 0, 0, 4, 5, 6, 7); } > % llc -filetype=obj add_kernel.llir && clang -o a.out add_kernel.llir.o main.c > % ./a.out pid (5, 6, 7) Hello, World!: 47, 5 ```

Signed-off-by: Ilya Enkovich <[email protected]>

Co-authored-by: Shane Nay <[email protected]>

…n-lang#4) Signed-off-by: Ilya Enkovich <[email protected]>

…ion flows (triton-lang#6) * Support basic lowering through vector dialect in CPU backend. Signed-off-by: Ilya Enkovich <[email protected]> * Use axis info in memory op lowering. Signed-off-by: Ilya Enkovich <[email protected]> * Mark test_ptx_cast as enabled for CPU. Signed-off-by: Ilya Enkovich <[email protected]> * Support umulhi operation. Signed-off-by: Ilya Enkovich <[email protected]> * Support tl.clamp, tl.minimum, tl.maximum. Signed-off-by: Ilya Enkovich <[email protected]> * Add enable_fp_fusion opt for CPU (only affects ASM dump now). Signed-off-by: Ilya Enkovich <[email protected]> * Fix kernel args passing for propagated constants. Signed-off-by: Ilya Enkovich <[email protected]> * Add permutations support. Signed-off-by: Ilya Enkovich <[email protected]> * Support 2-D transfer_read/transfer_write lowering. Signed-off-by: Ilya Enkovich <[email protected]> * Introduce shape info analysis and use it for loads/stores by block pointers. Delay scalar pointers lowering. Signed-off-by: Ilya Enkovich <[email protected]> * Support 'other' arg for loads. Signed-off-by: Ilya Enkovich <[email protected]> * Support tl.join. Signed-off-by: Ilya Enkovich <[email protected]> * Minor renaming. Signed-off-by: Ilya Enkovich <[email protected]> --------- Signed-off-by: Ilya Enkovich <[email protected]>

Signed-off-by: Ilya Enkovich <[email protected]>

…ent (triton-lang#8) * [BACKEND][CPU] Make it buildable and runnable in a different environment * Revert seemingly inconsistent python code formatting

Signed-off-by: Ilya Enkovich <[email protected]>

Signed-off-by: Ilya Enkovich <[email protected]> Co-authored-by: Minjang Kim <[email protected]>

…iton-lang#11) * [CPU] Support flexible active driver + update vector-add tutorial * Update vector-add to run CPU always + optional GPU * Update do_bench for CPU

…ng#16) Signed-off-by: Gregory Shimansky <[email protected]>

…ng#17) * Fixed yaml syntax Signed-off-by: Gregory Shimansky <[email protected]> * Removed cpu label from run-on Signed-off-by: Gregory Shimansky <[email protected]> * Added missing zlib-dev Signed-off-by: Gregory Shimansky <[email protected]> * Added missing apt-get update Signed-off-by: Gregory Shimansky <[email protected]> * Remove pip cache because on self-hosted runner it slows things down Signed-off-by: Gregory Shimansky <[email protected]> * Corrected path to tests Signed-off-by: Gregory Shimansky <[email protected]> * Added installation of torch==2.1.2 Signed-off-by: Gregory Shimansky <[email protected]> --------- Signed-off-by: Gregory Shimansky <[email protected]>

* [CPU] Add OpenMP launcher * Address the comments * Fix induction variable type * Always use preallocated output buffer for CPU with torch.add

Signed-off-by: Ilya Enkovich <[email protected]>

* [CPU] Dump human-readable asm code in TRITON_CACHE_DIR * Don't touch the main compiler.py

Signed-off-by: Gregory Shimansky <[email protected]>

Signed-off-by: Ilya Enkovich <[email protected]>

…-lang#23) * add un-masked tiled matrix-multiplication for triton-cpu * clean and add comment * move test under tutorials

…ffer for CPU. (triton-lang#24)

* Fix RelWithDebInfo build. Signed-off-by: Ilya Enkovich <[email protected]> * Skip fp8 cast tests on CPU. Signed-off-by: Ilya Enkovich <[email protected]> * Fix segfault. Signed-off-by: Ilya Enkovich <[email protected]> * [BACKEND] Update LLVM version to llvm/llvm-project@765206e (triton-lang#4059) * Add -s option to pytest run. Signed-off-by: Ilya Enkovich <[email protected]> * Add a workaround for LLVM bug causing test failure on Skylake CPU. Signed-off-by: Ilya Enkovich <[email protected]> * Add a workaround for LLVM fpext bug causing test failure on Skylake CPU. Signed-off-by: Ilya Enkovich <[email protected]> * Fix formatting. Signed-off-by: Ilya Enkovich <[email protected]> --------- Signed-off-by: Ilya Enkovich <[email protected]> Co-authored-by: Pablo Zimmermann <[email protected]>

Signed-off-by: Ilya Enkovich <[email protected]>

int3 · 2024-07-15T21:26:12Z

.github/workflows/build-test.yml

+          python -m pytest -s -n 32 --device cpu python/test/unit/language/test_compile_errors.py
+          python -m pytest -s -n 32 --device cpu python/test/unit/runtime/test_bindings.py
+          python -m pytest -s -n 32 --device cpu python/test/unit/runtime/test_cache.py
+          python -m pytest -s -n 32 --device cpu python/test/unit/runtime/test_launch.py


maybe I should combine all these commands and run their tests under a single command

int3 · 2024-07-15T23:09:31Z

Let me know if you would like me to break out the commits into individual PRs

int3 · 2024-07-15T23:22:01Z

third_party/cpu/backend/driver.py

@@ -125,11 +125,10 @@ def format_of(ty):

    args_format = ''.join([format_of(_extracted_type(ty)) for ty in signature.values()])
    format = "iiiOKOOOO" + args_format
-    arg_ptrs_list = ', '.join(f"&arg{i}" for i, ty in signature.items()) if len(signature) > 0 else ''


The if check here was redundant

Devjiu · 2024-07-16T13:42:49Z

python/test/unit/language/test_core.py

 @pytest.mark.interpreter
 @pytest.mark.parametrize("debug", [False, True])
 def test_interleave(device, debug):

+    if device == "cpu" and debug:
+        pytest.skip("Test aborts for device=cpu and debug=True")


This happens due to missing support of device_assert.
Of course, we can skip this for now, but it should be fixed with #35

yep, @minjang actually suggested that I look at tackling that (:

Yup, I'm porting my previous implementation of device_print to the new repo, but having some crashes :) It should be working soon. device_assert is also very similar.

…dd to CI The previous test implementation would break when run under pytest-xdist as it reused & removed the same temporary directory for the triton cache.

Most of the tests are skipped for now...

The only test that needed fixing was `test_unknown_annotations`, where we were generating invalidate code for the launcher. In particular, when `kernel_fn_args` was empty, we would get the following error: ``` /var/folders/_z/88s630fd3d9fx72mbmx90qvw0000gn/T/tmpy481mz0l/main.cpp:37:29: error: expected ';' before '(' token 37 | using kernel_ptr_t = void(*)(, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); | ^ | ; ```

It was giving "not compiled with CUDA" errors because the test was querying for the CUDA compute capability, but it was not actually using this information. It looks like the first iteration of the test needed to know the compute capability, but that changed somewhere along the way.

ienkovich

Thanks for the patch, it looks great!

I have a couple of minor comments:

Please try to avoid CPU-unrelated changes, it can complicate our rebases on the upstream repo and the following merge of the backend. You can submit such changes directly to the upstream Triton repo. E. g. changes in conftest.py don't seem to be backend-specific.
Could you please create issues in the Feature Support section of our project for tests that are skipped for now and are supposed to be enabled in the future for CPU? And also for test suites (if any) that are not covered yet by your patch? It would help a lot in tracking missing features for CPU.

ienkovich · 2024-07-17T16:26:00Z

python/test/unit/language/test_compile_errors.py

@@ -294,7 +294,9 @@ def kernel():
    triton.compile(triton.compiler.ASTSource(fn=kernel, signature={}, constants={}))


-def test_global_access_in_fn_default_arg():
+def test_global_access_in_fn_default_arg(device):
+    if device == "cpu":


I don't know which device is used for interpreter tests, but all other device checks in tests check for the interpreter mode first. Let's follow the same is_cpu scheme everywhere.

hm it looks like that's no longer the case in upstream? https://github.com/triton-lang/triton/blob/0dd9029abf61c949471eb512a0b1e0da55339859/python/test/unit/language/test_core.py#L32-L34

oh wait, I see the check was just folded into get_current_target(). tbh I think it's cleaner to have interpreter-only tests marked explicitly as such, rather than conflating the interpreter check with the platform check... like, what if we have some tests that pass for both the interpreter and the CPU?

I don't know which device is used for the interpreter, guess it might be CPU. So this check might disable the test for the interpreter.

ienkovich · 2024-07-17T16:28:00Z

python/test/unit/language/test_conversions.py

@@ -282,6 +282,10 @@ def upcast_test(src_dtype, dst_dtype, exponent_bits, mantissa_bits, exponent_bia
 ])
 def test_typeconvert_upcast(src_dtype, dst_dtype, device):

+    if device == "cpu":


These tests are covered in #40

ienkovich · 2024-07-17T16:29:56Z

python/test/unit/runtime/test_cache.py

-tmpdir = ".tmp"
+
+def get_device_key(device):
+    if device == "cuda":


Can we get it from the active driver as in jit.py?

int3 · 2024-07-17T17:44:51Z

Please try to avoid CPU-unrelated changes

Yeah, I did kind of realize that :) I actually ended up splitting out some of the changes into triton-lang#4334 and triton-lang#4341 -- will rebase upon them once they're landed.

Let's follow the same is_cpu scheme everywhere.

Yeah I'm actually not sure why we have two different schemes. Let me ask @ptillet in triton-lang#4334 since I have the same changes there.

ienkovich · 2024-07-17T20:27:27Z

Yeah, I did kind of realize that :) I actually ended up splitting out some of the changes into triton-lang#4334 and triton-lang#4341 -- will rebase upon them once they're landed.

That's perfect. Thanks!

int3 · 2024-07-19T17:41:57Z

python/test/unit/language/test_conversions.py

+    if device == "cpu":
+        if "float8" in src_dtype or "float8" in dst_dtype:
+            pytest.skip("float8 tests are not supported on CPU")
+        pytest.skip("FIXME: Test aborts for device=cpu")


this is #58

int3 · 2024-07-19T17:43:25Z

python/test/unit/runtime/test_cache.py


-def test_jit_debug() -> None:
+    if device == "cpu":
+        pytest.skip("FIXME: Test aborts for device=cpu")


this is device_assert

int3 · 2024-07-19T17:43:34Z

python/test/unit/runtime/test_launch.py

-def test_metadata() -> None:
+def test_metadata(device) -> None:
+    if device == "cpu":
+        pytest.skip("FIXME: Test aborts for device=cpu")


this is #57

int3 · 2024-07-19T17:47:09Z

python/test/unit/language/test_compile_errors.py

-def test_global_access_in_fn_default_arg():
+def test_global_access_in_fn_default_arg(device):
+    if device == "cpu":
+        pytest.skip("Test aborts for device=cpu")


This is also #57

int3 · 2024-08-31T17:15:57Z

Obsoleted by #106 and #131.

bertmaher and others added 30 commits June 24, 2024 00:31

Short preamble for the README, explaining why this clone exists

1ab0593

OSS Automated Fix: Addition of Code of Conduct (triton-lang#1)

b4369d0

[BACKEND][CPU] Initial plumbing for cpu backend (triton-lang#2)

5fb04c1

* [BACKEND][CPU] Implement the empty cpu backend * Run clang-format * Fix yadf error Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

[BACKEND][CPU] Create TritonCPU and conversion dialects (triton-lang#3)

ae4908e

Update README.md

427a4ee

A quick addition on how to use it.

Quick patches to make it work after rebasing (triton-lang#3)

4bcc918

Support basic lowering through vector dialect in CPU backend.

9c523f6

Signed-off-by: Ilya Enkovich <[email protected]>

Revert unreviewed changes. (triton-lang#5)

df024dd

Co-authored-by: Shane Nay <[email protected]>

Add a workaround for LLVM bug in codegen for bf16 vector cast. (trito…

c2d2b4e

…n-lang#4) Signed-off-by: Ilya Enkovich <[email protected]>

Add support for tl.cat operation. (triton-lang#9)

6480563

Signed-off-by: Ilya Enkovich <[email protected]>

[BACKEND][CPU] Make it buildable and runnable in a different environm…

0419e8f

…ent (triton-lang#8) * [BACKEND][CPU] Make it buildable and runnable in a different environment * Revert seemingly inconsistent python code formatting

Add support for simple reductions. (triton-lang#10)

08eff50

Signed-off-by: Ilya Enkovich <[email protected]>

Support tl.histogram for CPU. (triton-lang#12)

fd49147

Signed-off-by: Ilya Enkovich <[email protected]> Co-authored-by: Minjang Kim <[email protected]>

Fix merge and compile errors (triton-lang#13)

e4eeea8

[CPU] Support flexible active driver + update vector-add tutorial (tr…

193a0b6

…iton-lang#11) * [CPU] Support flexible active driver + update vector-add tutorial * Update vector-add to run CPU always + optional GPU * Update do_bench for CPU

Added a simple workflow to run on self-hosted intel runner (triton-la…

b78ddce

…ng#16) Signed-off-by: Gregory Shimansky <[email protected]>

[CPU] Add an OpenMP-based CPU launcher (triton-lang#15)

74f111f

* [CPU] Add OpenMP launcher * Address the comments * Fix induction variable type * Always use preallocated output buffer for CPU with torch.add

Support generic reduction and scan cases. (triton-lang#14)

0f9a0cf

Signed-off-by: Ilya Enkovich <[email protected]>

[CPU] Dump human-readable asm code in TRITON_CACHE_DIR (triton-lang#19)

975c640

* [CPU] Dump human-readable asm code in TRITON_CACHE_DIR * Don't touch the main compiler.py

Added g++ installation after switching to ubuntu-22.04 (triton-lang#21)

3ff6130

Signed-off-by: Gregory Shimansky <[email protected]>

Support atomic ops for CPU. (triton-lang#20)

d36444e

Signed-off-by: Ilya Enkovich <[email protected]>

[TUTORIAL] Add unmasked matrix multiply example to triton-cpu (triton…

a11dae4

…-lang#23) * add un-masked tiled matrix-multiplication for triton-cpu * clean and add comment * move test under tutorials

Update matrix-multiplication-cpu tutorial, use preallocated output bu…

d39ed0f

…ffer for CPU. (triton-lang#24)

Use static compilation for kernels. (triton-lang#29)

1b9997c

Signed-off-by: Ilya Enkovich <[email protected]>

Move byte manipulation ops from elwise ops conversion. (triton-lang#28)

f3093c3

Signed-off-by: Ilya Enkovich <[email protected]>

int3 requested a review from digantdesai July 15, 2024 21:15

int3 commented Jul 15, 2024

View reviewed changes

int3 force-pushed the more-tests branch 2 times, most recently from 4edd05f to 5a769b4 Compare July 16, 2024 02:02

Devjiu reviewed Jul 16, 2024

View reviewed changes

int3 force-pushed the more-tests branch from 5a769b4 to 88aef62 Compare July 16, 2024 16:44

int3 added 10 commits July 16, 2024 18:40

[cpu] Fix preload device key + make test_cache.py concurrent-safe & a…

5bcb00c

…dd to CI The previous test implementation would break when run under pytest-xdist as it reused & removed the same temporary directory for the triton cache.

[cpu] Add test_launch.py to CI

e135f1c

[cpu] Add test_bindings.py to CI

f92ae07

[cpu] Add test_conversions.py to CI

8301cff

Most of the tests are skipped for now...

[cpu] Run CI tests as a batched command

fa06c02

[cpu] Add pytest.mark.cpu to a few more core tests

d5130b1

[cpu] Enable test_debug_dump.py in CI

78a55d3

[cpu] Add test_standard.py and test_decorator.py to CI

ceb1f88

int3 force-pushed the more-tests branch from 88aef62 to ceb1f88 Compare July 16, 2024 22:41

ienkovich reviewed Jul 17, 2024

View reviewed changes

int3 marked this pull request as draft July 17, 2024 17:59

int3 commented Jul 19, 2024

View reviewed changes

Devjiu force-pushed the main branch from 4d53f02 to d71038d Compare August 13, 2024 15:49

int3 force-pushed the main branch 2 times, most recently from 1a50ae8 to fb7ec84 Compare August 29, 2024 15:32

int3 closed this Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable more tests for CPU in CI #51

Enable more tests for CPU in CI #51

int3 commented Jul 15, 2024 •

edited

Loading

int3 Jul 15, 2024 •

edited

Loading

int3 commented Jul 15, 2024

int3 Jul 15, 2024

Devjiu Jul 16, 2024

int3 Jul 16, 2024 •

edited

Loading

minjang Jul 16, 2024

ienkovich left a comment

ienkovich Jul 17, 2024

int3 Jul 17, 2024

int3 Jul 17, 2024 •

edited

Loading

ienkovich Jul 17, 2024

ienkovich Jul 17, 2024

ienkovich Jul 17, 2024

int3 commented Jul 17, 2024

ienkovich commented Jul 17, 2024

int3 Jul 19, 2024

int3 Jul 19, 2024

int3 Jul 19, 2024

int3 Jul 19, 2024

int3 commented Aug 31, 2024

Enable more tests for CPU in CI #51

Enable more tests for CPU in CI #51

Conversation

int3 commented Jul 15, 2024 • edited Loading

int3 Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

int3 commented Jul 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

int3 Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ienkovich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

int3 Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

int3 commented Jul 17, 2024

ienkovich commented Jul 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

int3 commented Aug 31, 2024

int3 commented Jul 15, 2024 •

edited

Loading

int3 Jul 15, 2024 •

edited

Loading

int3 Jul 16, 2024 •

edited

Loading

int3 Jul 17, 2024 •

edited

Loading