fix[next][dace]: Fix translation of if statement from tasklet to inter-state condition #1469

edopao · 2024-02-26T07:15:33Z

This is a 0-day bug, which finally manifested itself as invalid memory access error on gpu node in gt4py programs.

The bug is that if-nodes were translated to tasklets. However, tasklets assume that all inputs are evaluated. For if-nodes, we need to enforce exclusive execution of one of the two branches. That means that only one of the two arguments will be evaluated at runtime. We achieve this by implementing the true/false branches as separate states and checking the if-statement as condition on the inter-state edge.

This reverts commit 2e401c0.

philip-paul-mueller · 2024-02-26T11:22:22Z

src/gt4py/next/program_processors/runners/dace_iterator/itir_to_tasklet.py

+    # make the result of the if-statement evaluation available inside current state
+    ctx_stmt_node = ValueExpr(current_state.add_access(stmt_node.value.data), stmt_node.dtype)
+
+    # we distinguish between select if-statements, where both true and false branches are symbolic expressions,


If I remember correctly, a symbolic expression can also have memory access, or not?
I mean in interstate edges you can generate expressions such as a = array[idx] or in the condition branch array[idx] == 1.
So, in my view, even a fully symbolic expression could potentially perform an invalid access, or I am wrong in that sense?

Good point, but based on DaCe documentation symbols are supposed to remain constant within the state scope and only change on inter-state edges.

A particular reason that makes symbols useful is the fact they stay constant throughout their defined scope. A symbol defined in a scope (e.g., map parameter) cannot change at all, and symbols that are defined outside an SDFG state cannot be modified inside a state, only in assignments of state transitions.

I assume it should be DaCe's responsibility during lowering to C++/CUDA code to ensure that symbolic expressions are evaluated and assigned to scope variables before entering the code-block of each state.

I see now that I was contradicting myself with what I wrote above. I still assume that the SDFG is correct, because the new logic is checking that the true/false branch are SymbolExpr. I do not know how to prevent that these symbolic expressions are not the result of some invalid operation. Hopefully this should not happen, because the main tasklet, including field access, is entirely represented inside the SDFG scope (transformer.context.body).

What I was saying is, that if you do an invalid memory access, it does not mater if it happens in a symbol assignment expression or in a Tasklet.
If idx is an index that is out of bound (for array array) the expression array[idx] will be invalid regardless if it happens in a symbol access or not.
Here is an example where an invalid access happens:

sdfg = dace.SDFG("Test") sdfg.add_array("A", shape=(1000,), dtype=dace.float64, transient=False) sdfg.add_array("__return", shape=(1000,), dtype=dace.float64, transient=False) init_state = sdfg.add_state("init", is_start_state=True) comp_state = sdfg.add_state("comp_state") sdfg.add_edge( init_state, comp_state, dace.InterstateEdge( assignments={'invalid_symbol': 'A[999 + 2]'} ) ) comp_state.add_mapped_tasklet( "invalid_access", map_ranges=[('__i0', '0:1000')], inputs=dict(__in=dace.Memlet(data='A', subset='__i0')), code='__out = __in + invalid_symbol', outputs=dict(__out=dace.Memlet(data='__return', subset='__i0')), external_edges=True, ) A = np.ones(1000) csdfg = sdfg.compile() res = csdfg(A=A)

I am not sure if you can do that.
It would boil down to the assumption that a symbolic expression is always valid and a ValueExpr is not, where in the code is this assumption established?

I mean it would make sense to establish this guarantee, because it would allow further optimizations, but I do not see that this is currently done.

I understood that, though in a second pass :) and I totally agree with you. Based on the way we lower ITIR to SDFG, we do not access arrays on inter-state edges to assign state symbols, so hopefully this is not a problem. Otherwise I have no idea how to ensure that symbol values are the result of valid symbolic expressions.

src/gt4py/next/program_processors/runners/dace_iterator/itir_to_tasklet.py

philip-paul-mueller

LGTM

philip-paul-mueller

Although, I am not convinced that it works, I will now approve the PR, but I suggested to reconsider merging it.

philip-paul-mueller · 2024-02-26T14:00:30Z

src/gt4py/next/program_processors/runners/dace_iterator/itir_to_tasklet.py

+    # make the result of the if-statement evaluation available inside current state
+    ctx_stmt_node = ValueExpr(current_state.add_access(stmt_node.value.data), stmt_node.dtype)
+
+    # we distinguish between select if-statements, where both true and false branches are symbolic expressions,


I am not sure if you can do that.
It would boil down to the assumption that a symbolic expression is always valid and a ValueExpr is not, where in the code is this assumption established?

I mean it would make sense to establish this guarantee, because it would allow further optimizations, but I do not see that this is currently done.

edopao · 2024-02-26T14:37:04Z

The SDFG works under the assumption that symbol values do not depend on field access. This is compatible we the observation that lowering of ITIR to SDFG puts field access at the innermost level of SDFG-nesting, using a deref tasklet.

This is an important invariant to preserve, and we agreed on stating it as a code comment.

edopao added 18 commits February 19, 2024 23:32

[dace] Fix for GPU illegal memory access

8b95f7b

[dace] Fix formatting

8009c8c

[dace] Minor edit

e06d719

Merge remote-tracking branch 'origin/main' into dace-fix_gpu_crash

32dc3c1

[dace] Revert changes on lift expressions

fc14a82

[dace] Fix for if-branch exclusive execution

a25a5ea

[dace] Fix previous commit

4d4af5c

[dace] Fix previous commit (1)

fdad203

[dace] Fix previous commit (2)

28a4bbf

[dace] Update comment

e84c246

[dace] Update comment (1)

dd06b78

[dace] Remove empty states

57391e2

[dace] Revert change

2e401c0

Revert "[dace] Revert change"

f1eac2e

This reverts commit 2e401c0.

[dace] Re-apply change + fix

8c1c15d

[dace] Update test case

6e1a0c7

[dace] Update test case (1)

b8bcd46

[dace] Revert change in test code

00b65e0

edopao marked this pull request as ready for review February 26, 2024 11:03

edopao requested a review from philip-paul-mueller February 26, 2024 11:04

philip-paul-mueller reviewed Feb 26, 2024

View reviewed changes

[dace] Code formatting

60970d3

edopao requested a review from philip-paul-mueller February 26, 2024 13:56

philip-paul-mueller reviewed Feb 26, 2024

View reviewed changes

philip-paul-mueller approved these changes Feb 26, 2024

View reviewed changes

edopao merged commit b86a347 into GridTools:main Feb 26, 2024
31 checks passed

edopao deleted the dace-fix_gpu_crash branch February 26, 2024 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix[next][dace]: Fix translation of if statement from tasklet to inter-state condition #1469

fix[next][dace]: Fix translation of if statement from tasklet to inter-state condition #1469

edopao commented Feb 26, 2024

philip-paul-mueller Feb 26, 2024

edopao Feb 26, 2024

edopao Feb 26, 2024 •

edited

Loading

philip-paul-mueller Feb 26, 2024

philip-paul-mueller Feb 26, 2024

edopao Feb 26, 2024

philip-paul-mueller left a comment

philip-paul-mueller left a comment

philip-paul-mueller Feb 26, 2024

edopao commented Feb 26, 2024

fix[next][dace]: Fix translation of if statement from tasklet to inter-state condition #1469

fix[next][dace]: Fix translation of if statement from tasklet to inter-state condition #1469

Conversation

edopao commented Feb 26, 2024

philip-paul-mueller Feb 26, 2024

Choose a reason for hiding this comment

edopao Feb 26, 2024

Choose a reason for hiding this comment

edopao Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

philip-paul-mueller Feb 26, 2024

Choose a reason for hiding this comment

philip-paul-mueller Feb 26, 2024

Choose a reason for hiding this comment

edopao Feb 26, 2024

Choose a reason for hiding this comment

philip-paul-mueller left a comment

Choose a reason for hiding this comment

philip-paul-mueller left a comment

Choose a reason for hiding this comment

philip-paul-mueller Feb 26, 2024

Choose a reason for hiding this comment

edopao commented Feb 26, 2024

edopao Feb 26, 2024 •

edited

Loading