evaluation: found thath benchmark 2 can't be executed by any implementation due to RAM constraints
Some checks are pending
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Waiting to run
Some checks are pending
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Waiting to run
This commit is contained in:
@ -49,6 +49,7 @@ Since only the evaluators are benchmarked, the expressions to be evaluated must
|
||||
|
||||
The first benchmark involves a very large set of roughly $250\,000$ expressions. This means that all $250\,000$ expressions are evaluated in a single generation when using GP. In a typical generation, significantly fewer expressions would be evaluated. However, this benchmark is designed to show how the evaluators can handle large volumes of data.
|
||||
|
||||
TODO:::: Remove this benchmark, as it just uses too much RAM
|
||||
A second benchmark, with slight modifications to the first, is also conducted. Given that GPUs are very good at executing work in parallel, the number of variable sets is increased in this benchmark. Therefore, the second benchmark consists of the same $250\,000$ expressions, but the number of variable sets has been increased by a factor of 30 to a total of roughly $10\,000$. This benchmark aims to demonstrate how the GPU is best used for a larger number of variable sets. A higher number of variable sets is also more representative of the scenarios the evaluators will be employed.
|
||||
|
||||
The third benchmark is conducted to demonstrate how the evaluators will perform in more realistic scenarios. For this benchmark the number of expressions has been reduced to roughly $10\,000$, and the number of variable sets is again $362$. The purpose of this benchmark is to demonstrate how the evaluators are likely perform in a typical scenario.
|
||||
@ -85,6 +86,7 @@ The first benchmark consisted of $250\,000$ expressions and $362$ variable sets
|
||||
% talk about kernel configuration (along the lines of: results achieved with block size of X) etc. Also include that CPU and GPU utilisation was 100% the entire time. If this is too short, just add it to the above paragraph and make the 4 benchmark sections relatively short, as the most interesting information is in the performance tuning and comparison sections anyway
|
||||
|
||||
\subsubsection{Benchmark 2}
|
||||
TODO: Remove this benchmark, none of the implementations had enough RAM available
|
||||
|
||||
\subsubsection{Benchmark 3}
|
||||
std of 750.1 ms
|
||||
@ -97,6 +99,9 @@ std of 750.1 ms
|
||||
|
||||
\subsubsection{Benchmark 4}
|
||||
|
||||
blocksize 128: 84.84 blocks fast (prolly because less wasted threads)
|
||||
bocksize 192: 56.56 blocks very slow
|
||||
|
||||
\subsubsection{Performance Tuning} % either subsubSection or change the title to "Performance Tuning Interpreter"
|
||||
Document the process of performance tuning (mostly GPU, but also talk about CPU. Especially the re-aranging of data transfer and non usage of a cache)
|
||||
|
||||
@ -116,11 +121,15 @@ Results only for Transpiler (also contains final kernel configuration and probab
|
||||
\subsubsection{Benchmark 1}
|
||||
|
||||
\subsubsection{Benchmark 2}
|
||||
TODO: Remove this benchmark
|
||||
|
||||
\subsubsection{Benchmark 3}
|
||||
kernels can now be compiled at the same time as they are generated (should drastically improve performance)
|
||||
|
||||
\subsubsection{Benchmark 4}
|
||||
|
||||
Even larger var sets would be perfect. 10k is rather small and the GPU barely has any work to do
|
||||
|
||||
\subsection{Performance Tuning}
|
||||
Document the process of performance tuning
|
||||
|
||||
@ -140,6 +149,7 @@ talk about that compute portion is just too little. Only more complex expression
|
||||
|
||||
|
||||
\subsubsection{Benchmark 2}
|
||||
TODO: Remove this benchmark
|
||||
CPU Did not finish due to RAM constraints
|
||||
|
||||
\subsubsection{Benchmark 3}
|
||||
|
BIN
thesis/main.pdf
BIN
thesis/main.pdf
Binary file not shown.
Reference in New Issue
Block a user