master-thesis/thesis/chapters/implementation.tex
Daniel c62aff806a
Some checks are pending
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Waiting to run
small updates and notes for further writing
2025-04-15 19:32:39 +02:00

37 lines
1.2 KiB
TeX

\chapter{Implementation}
\label{cha:implementation}
somewhere in here explain why one kernel per expression and not one kernel for all expressions
Go into the details why this implementation is tuned towards performance and should be the optimum at that
\section{Technologies}
Short section; CUDA, PTX, Julia, CUDA.jl
Probably reference the performance evaluation papers for Julia and CUDA.jl
\section{Expression Processing}
Talk about why this needs to be done and how it is done (the why is basically: simplifies evaluation/transpilation process; the how is in ExpressionProcessing.jl)
\section{Interpreter}
Talk about how the interpreter has been developed.
UML-Ablaufdiagram
main loop; kernel transpiled by CUDA.jl into PTX and then executed
Memory access (currently global memory only)
no dynamic memory allocation like on CPU (stack needs to have fixed size)
\section{Transpiler}
Talk about how the transpiler has been developed (probably largest section, because it just has more interesting parts)
UML-Ablaufdiagram
Front-End and Back-End
Caching of back-end results
PTX code generated and compiled using CUDA.jl (so basically the driver) and then executed
Memory access (global memory and register management especially register management)