benchmarking: tuned interpreter blocksize

2025-05-20 09:05:35 +02:00
parent a9ffd5da63
commit 250deb334c
5 changed files with 26 additions and 15 deletions
--- a/thesis/chapters/evaluation.tex
+++ b/thesis/chapters/evaluation.tex
@ -62,6 +62,7 @@ Document the process of performance tuning
 Initial: no cache; 256 blocksize; exprs pre-processed and sent to GPU on every call; vars sent on every call; frontend + dispatch are multithreaded

 1.) Done before parameter optimisation loop: Frontend, transmitting Exprs and Variables (improved runtime)
+2.) tuned blocksize to have as little wasted threads as possible (new blocksize 121 -> 3-blocks -> 363 threads but 362 threads needed per expression)


 \subsection{Transpiler}
@ -75,6 +76,7 @@ Document the process of performance tuning
 Initial: no cache; 256 blocksize; exprs pre-processed and transpiled on every call; vars sent on every call; frontend + transpilation + dispatch are multithreaded

 1.) Done before parameter optimisation loop: Frontend, transmitting Exprs and Variables (improved runtime)
+2.) All expressions to execute are transpiled first (before they were transpiled for every execution, even in parameter optimisation scenarios). Compilation is still done every time, because too little RAM was available (compilation takes the most time, so this is only a minor boost)

 \subsection{Comparison}
 Comparison of Interpreter and Transpiler as well as Comparing the two with CPU interpreter