evaluation: updated notes for chapter
Some checks failed
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Has been cancelled

This commit is contained in:
Daniel 2025-04-13 14:20:16 +02:00
parent a5c34a53b7
commit ef721b13e0
2 changed files with 2 additions and 0 deletions

View File

@ -23,6 +23,7 @@ Initial: CPU-Side single-threaded; up to 1024 threads per block; bounds-checking
2.) Using @inbounds -> noticeable improvement in 2 out of 3
3.) Tuned blocksize with NSight compute -> slight improvement
4.) used int32 everywhere to reduce register usage -> significant performance drop (probably because a lot more waiting time "latency hiding not working basically", or more type conversions happening on GPU? look at generated PTX code and use that as an argument to describe why it is slower)
5.) reverted previous; used fastmath instead -> imporvement (large var set is now faster than on transpiler)
\subsection{Transpiler}
Results only for Transpiler (also contains final kernel configuration and probably quick overview/recap of the implementation used and described in Implementation section
@ -35,6 +36,7 @@ Initial: CPU-Side single-threaded; up to 1024 threads per block; bounds-checking
2.) Using @inbounds -> small improvement only on CPU side code
3.) Tuned blocksize with NSight compute -> slight improvement
4.) Only changed things on interpreter side
5.) Only changed things on interpreter side
\subsection{Comparison}
Comparison of Interpreter and Transpiler as well as Comparing the two with CPU interpreter

Binary file not shown.