evaluation: updated notes for chapter
Some checks failed
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Has been cancelled
Some checks failed
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Has been cancelled
This commit is contained in:
parent
a5c34a53b7
commit
ef721b13e0
|
@ -23,6 +23,7 @@ Initial: CPU-Side single-threaded; up to 1024 threads per block; bounds-checking
|
|||
2.) Using @inbounds -> noticeable improvement in 2 out of 3
|
||||
3.) Tuned blocksize with NSight compute -> slight improvement
|
||||
4.) used int32 everywhere to reduce register usage -> significant performance drop (probably because a lot more waiting time "latency hiding not working basically", or more type conversions happening on GPU? look at generated PTX code and use that as an argument to describe why it is slower)
|
||||
5.) reverted previous; used fastmath instead -> imporvement (large var set is now faster than on transpiler)
|
||||
|
||||
\subsection{Transpiler}
|
||||
Results only for Transpiler (also contains final kernel configuration and probably quick overview/recap of the implementation used and described in Implementation section
|
||||
|
@ -35,6 +36,7 @@ Initial: CPU-Side single-threaded; up to 1024 threads per block; bounds-checking
|
|||
2.) Using @inbounds -> small improvement only on CPU side code
|
||||
3.) Tuned blocksize with NSight compute -> slight improvement
|
||||
4.) Only changed things on interpreter side
|
||||
5.) Only changed things on interpreter side
|
||||
|
||||
\subsection{Comparison}
|
||||
Comparison of Interpreter and Transpiler as well as Comparing the two with CPU interpreter
|
||||
|
|
BIN
thesis/main.pdf
BIN
thesis/main.pdf
Binary file not shown.
Loading…
Reference in New Issue
Block a user