Some checks are pending
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Waiting to run
35 lines
1.1 KiB
TeX
35 lines
1.1 KiB
TeX
\chapter{Implementation}
|
|
\label{cha:implementation}
|
|
|
|
somewhere in here explain why one kernel per expression and not one kernel for all expressions
|
|
|
|
\section{Technologies}
|
|
Short section; CUDA, PTX, Julia, CUDA.jl
|
|
|
|
Probably reference the performance evaluation papers for Julia and CUDA.jl
|
|
|
|
\section{Expression Processing}
|
|
Talk about why this needs to be done and how it is done (the why is basically: simplifies evaluation/transpilation process; the how is in ExpressionProcessing.jl)
|
|
|
|
\section{Interpreter}
|
|
Talk about how the interpreter has been developed.
|
|
|
|
UML-Ablaufdiagram
|
|
|
|
main loop; kernel transpiled by CUDA.jl into PTX and then executed
|
|
|
|
Memory access (currently global memory only)
|
|
no dynamic memory allocation like on CPU (stack needs to have fixed size)
|
|
|
|
\section{Transpiler}
|
|
Talk about how the transpiler has been developed (probably largest section, because it just has more interesting parts)
|
|
|
|
UML-Ablaufdiagram
|
|
|
|
Front-End and Back-End
|
|
Caching of back-end results
|
|
|
|
PTX code generated and compiled using CUDA.jl (so basically the driver) and then executed
|
|
|
|
Memory access (global memory and register management especially register management)
|