Design of specific backends (GPU, OpenMP, Coarrays, etc.) #1996

certik · 2023-06-21T23:26:37Z

So far most of the code that our backends (C, LLVM, WASM) generate does not depend on any third party API, it just uses native operations (such as arithmetic) of the given platform, sometimes libc, and sometimes it calls into our own runtime library.

We now have to figure out how to target backends that make heavy use of a custom 3rd party API (typically C) to do all the operations. Examples of such backends:

SymEngine symbolic backend
GPU and other accelerator backends
OpenMP
Coarrays
pthreads
CPython interoperability

The two approaches are:

We represent the operations in ASR, either with backend explicit nodes, or with higher level operations (parallel do concurrent). Each backend then has to implement translating the operation to specific API calls (say OpenMP or SymEngine).
We do this translation as ASR->ASR pass. The input is, say, do concurrent, and the output is ASR with specific calls to OpenMP (if OpenMP is used) or GPU API (if GPU offloading is used). All backends work with it.

We can use a combination of the two approaches. But the second approach is preferable, since we can see how the code looks like after the transformation (of "do concurrent" into OpenMP or CUDA) and optionally apply more ASR->ASR passes further optimizing the code; we can use our verify() to check correctness; and all backends will work with no special support.

The text was updated successfully, but these errors were encountered:

rebcabin · 2023-06-22T13:59:43Z

Yes: delaying commitments to API details to the last possible phase preserves options in the earlier phases. Once those commitments are made, all downstream phases are stuck with them :)

…

On Wed, Jun 21, 2023 at 4:26 PM Ondřej Čertík ***@***.***> wrote: So far most of the code that our backends (C, LLVM, WASM) generate does not depend on any third party API, it just uses native operations (such as arithmetic) of the given platform, sometimes libc, and sometimes it calls into our own runtime library. We now have to figure out how to target backends that make heavy use of a custom 3rd party API (typically C) to do all the operations. Examples of such backends: - SymEngine symbolic backend - GPU and other accelerator backends - OpenMP - Coarrays - pthreads The two approaches are: - We represent the operations in ASR, either with backend explicit nodes, or with higher level operations (parallel do concurrent). Each backend then to implement translating the operation to specific API calls (say OpenMP or SymEngine). Each backend has to reimplement it. - We do this translation as ASR->ASR pass. The input is, say, do concurrent, and the output is ASR with specific calls to OpenMP (if OpenMP is used) or GPU API (if GPU offloading is used). All backends work with it. We can use a combination of the two approaches. But the second approach is preferable, since we can see how the code looks like after the transformation (of "do concurrent" into OpenMP or CUDA) and optionally apply more ASR->ASR passes further optimizing the code. And we can use our verify() to check correctness. — Reply to this email directly, view it on GitHub <#1996>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABSRR5UX6ENVTVNR3VQYODXMN7LPANCNFSM6AAAAAAZPNMIOM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

certik · 2023-06-23T03:45:58Z

I think we can always write an ASR->ASR pass that implements some feature using some low level API. We don't have to use it, but when we use it, then the backends just work. The downside is a possibly slower compilation speed. Later, we can "fuse" it with the backend (=implement it in the backend directly), as long as it can be done in a maintainable way. The ASR passes thus act as the quickest and cleanest way to get all the features in that we need. Later, we can decide not to run some, and instead do it directly in the backend, as a compiler speed optimization.

certik mentioned this issue Jun 22, 2023

Symbolic TODO #1987

Closed

9 tasks

This was referenced Jun 23, 2023

Initial PythonCallable implementation #1984

Merged

Deliver LPython MVP #1704

Closed

This was referenced Jul 1, 2023

ASR refactoring lfortran/lfortran#1303

Open

Feature request: allow to compile CuPy code #2075

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design of specific backends (GPU, OpenMP, Coarrays, etc.) #1996

Design of specific backends (GPU, OpenMP, Coarrays, etc.) #1996

certik commented Jun 21, 2023 •

edited

Loading

rebcabin commented Jun 22, 2023 via email

certik commented Jun 23, 2023 •

edited

Loading

Design of specific backends (GPU, OpenMP, Coarrays, etc.) #1996

Design of specific backends (GPU, OpenMP, Coarrays, etc.) #1996

Comments

certik commented Jun 21, 2023 • edited Loading

rebcabin commented Jun 22, 2023 via email

certik commented Jun 23, 2023 • edited Loading

certik commented Jun 21, 2023 •

edited

Loading

certik commented Jun 23, 2023 •

edited

Loading