thesis: finished implementing feedback
Some checks are pending
CI / Julia 1.10 - ubuntu-latest - x64 - (push) Waiting to run
CI / Julia 1.6 - ubuntu-latest - x64 - (push) Waiting to run
CI / Julia pre - ubuntu-latest - x64 - (push) Waiting to run

This commit is contained in:
2025-06-29 16:33:48 +02:00
parent 5e42668e1a
commit 408f0ac795
4 changed files with 25 additions and 0 deletions

View File

@ -32,6 +32,8 @@ A typical GP generation generates multiple expressions at once. If for example a
Each expression is part of a search space of all possible expressions consisting of the defined operators, variables and constants up to a defined maximum length. With the help of GP, this search space is explored, however, the generated expressions might not perfectly fit the data. To further refine the generated expressions, the concept of parameter optimisation can be used as described by \textcite{kommenda_local_2018}. Parameter optimisation is a kind of local search where parameters $p$ are introduced in the generated equations. In Equation \ref{eq:example} the parameter $p_1$ will be modified over some amount of iterations. This modification should assist in finding a local or even the global optimum by better fitting the expressions to the data. For example $50$ local search steps can be used, meaning that each expression needs to be evaluated $50$ times with the same variables, but different parameters. As a result, one GP generation consequently requires a total $300 * 50 = 15\,000$ evaluations of the expressions. However, typically more than one GP generation is needed to find a good solution. While the exact number of generations is problem specific, for this example a total of $100$ generations can be assumed. Each generation again generates $300$ expressions and needs to perform $50$ local search steps. This results in a total of $300 * 50 * 100 = 1\,500\,000$ evaluations which need to be performed during the entire runtime of the GP algorithm. These values have been taken from the GP algorithm for predicting discharge voltage curves of batteries as described by \textcite{kronberger_symbolic_2024}. Their GP algorithm converged after $54$ generations, resulting in $300 * 50 * 54 \approx 800\,000$ evaluations. This calculation omits the number of data points, which are the main contributor towards the total runtime. As for each generated expression, each data point needs to be used for parametrising the variables, drastically increasing the number of evaluations. They used a total of $11\,000$ data points, resulting in a total of $800\,000 * 11\,000 = 8.8 \text{billion}$ evaluations. Their results took over two days to compute on an eight core desktop CPU. While they did not provide runtime information for all problems they tested, the voltage curve prediction was the slowest. The other problems were in the range of a few seconds and up to a day. Especially the problems that took several hours to days to finish show, that there is still room for performance improvements. While a better CPU with more cores can be used, it is interesting to determine, if using GPUs can yield noticeable better performance.
In his master's thesis \textcite{weinberger_vektoroperationen_2018} explored the possibility of utilising vector operations in the field of GP. He mainly focused on vectorising the evaluation on the CPU and by utilising the GPU to evaluate the expression trees generated by a GP algorithm. By utilising OpenCL and an AMD GPU he achieved a speed-up of two when utilising vectorisation on the CPU and a speed-up of 116 when utilising the GPU. This shows that the GPU also has great potential in the more specific case of symbolic regression with the above described parameter optimisation.
\section[GPGPU]{General Purpose Computation on Graphics Processing Units}
\label{sec:gpgpu}
Graphics cards (GPUs) are commonly used to increase the performance of many different applications. Originally they were designed to improve performance and visual quality in games. \textcite{dokken_gpu_2005} first described the usage of GPUs for general purpose programming (GPGPU). They have shown how the graphics pipeline can be used for GPGPU programming. Because this approach also requires the programmer to understand the graphics terminology, this was not a great solution. Therefore, Nvidia released CUDA\footnote{\url{https://developer.nvidia.com/cuda-toolkit}} in 2007 with the goal of allowing developers to program GPUs independent of the graphics pipeline and terminology. A study of the programmability of GPUs with CUDA and the resulting performance has been conducted by \textcite{huang_gpu_2008}. They found that GPGPU programming has potential, even for non-embarassingly parallel problems.