thesis: aded abstract and kurzfassung; re-read conclusion and evaluation to iron out mistakes etc.
Some checks are pending
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Waiting to run

This commit is contained in:
2025-06-09 14:11:58 +02:00
parent b494803611
commit 3efd8a6c26
10 changed files with 255417 additions and 78 deletions

View File

@ -1,21 +1,21 @@
\chapter{Introduction}
\label{cha:Introduction}
This chapter provides an entry point for this thesis. First the motivation of exploring this topic is presented. In addition, the research questions of this thesis are outlined. Lastly the methodology on how to answer these questions will be explained. This master thesis is associated with the FFG COMET project ProMetHeus (\#904919). The developed software is used and further developed for modelling in the ProMetHeus project.
This chapter provides an entry point for this thesis. First, the motivation of exploring this topic is presented. In addition, the research questions of this thesis are outlined. Finally, the structure of this thesis is described, explaining how each part contributes to answering the research questions.
\section{Background and Motivation}
%
% Not totally happy with this yet
%
Optimisation and acceleration of program code is a crucial part in many fields. For example video games need optimisation to lower the minimum hardware requirements which allows more people to run the game, increasing sales. Another example where optimisation is important are computer simulations. For those, optimisation is even more crucial, as this allows the scientists to run more detailed simulations or get the simulation results faster. Equation learning or symbolic regression is another field that can heavily benefit from optimisation. One part of equation learning, is to evaluate the expressions generated by a search algorithm which can make up a significant portion of the runtime. This thesis is concerned with optimising the evaluation part to increase the overall performance of equation learning algorithms.
Optimisation and acceleration of program code is a crucial part in many fields. For example video games need optimisation to lower the minimum hardware requirements which allows more people to run the game, increasing sales. Another example where optimisation is important are computer simulations. For those, optimisation is even more crucial, as this allows the scientists to run more detailed simulations or get the simulation results faster. Equation learning or symbolic regression is another field that can heavily benefit from optimisation. One part of equation learning, is to evaluate the expressions generated by a search algorithm, which can make up a significant portion of the runtime. This thesis is concerned with optimising the evaluation part to increase the overall performance of equation learning algorithms.
The following expression $5 - \text{abs}(x_1) \, \sqrt{p_1} / 10 + 2^{x_2}$ which contains simple mathematical operations as well as variables $x_n$ and parameters $p_n$ is one example that can be generated by the equation learning algorithm, Usually an equation learning algorithm generates multiple of such expressions per iteration. Out of these expressions all possibly relevant ones have to be evaluated. Additionally, multiple different values need to be inserted for all variables and parameters, drastically increasing the amount of evaluations that need to be performed.
The following expression $5 - \text{abs}(x_1) \, \sqrt{p_1} / 10 + 2^{x_2}$, which contains simple mathematical operations as well as variables $x_n$ and parameters $p_n$, is one example that can be generated by the equation learning algorithm, Usually an equation learning algorithm generates hundreds or even thousands of such expressions per iteration, all of which have to be evaluated. Additionally, multiple different values must be entered for all variables and parameters, drastically increasing the amount of evaluations that need to be performed.
In his blog, \textcite{sutter_free_2004} described how the free lunch is over in terms of the ever-increasing performance of hardware like the CPU. He states that to gain additional performance, developers need to start developing software for multiple cores and not just hope that on the next generation of CPUs the program magically runs faster. While this approach means more development overhead, a much greater speed-up can be achieved. However, in some cases the speed-up achieved by this is still not large enough and another approach is needed. One of these approaches is the utilisation of Graphics Processing Units (GPUs) as an easy and affordable option as compared to compute clusters. Especially when talking about performance per dollar, GPUs are very inexpensive as found by \textcite{brodtkorb_graphics_2013}. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using GPUs for weather simulation. In addition to computer simulations, GPU acceleration also can be found in other places such as networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}. These solutions were all developed using CUDA\footnote{\url{https://developer.nvidia.com/cuda-toolkit}}. However, it is also possible to develop assembly like code for GPUs using Parallel Thread Execution (PTX)\footnote{\url{https://docs.nvidia.com/cuda/parallel-thread-execution/}} to gain more control.
In his blog, \textcite{sutter_free_2004} described how the free lunch is over in terms of the ever-increasing performance of hardware like the CPU. He states that to gain additional performance, developers need to start developing software for multiple cores and not just hope that on the next generation of CPUs the program magically runs faster. While this approach means more development overhead, a much greater speed-up can be achieved. However, in some cases the speed-up achieved by this is still not large enough, and another approach is needed. One of these approaches is the utilisation of Graphics Processing Units (GPUs) as an easy and affordable option as compared to compute clusters. Especially when talking about performance per dollar, GPUs are very inexpensive as found by \textcite{brodtkorb_graphics_2013}. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using GPUs for weather simulation. In addition to computer simulations, GPU acceleration also can be found in other places such as networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}. These solutions were all developed using CUDA\footnote{\url{https://developer.nvidia.com/cuda-toolkit}}. However, it is also possible to develop assembly like code for GPUs using Parallel Thread Execution (PTX)\footnote{\url{https://docs.nvidia.com/cuda/parallel-thread-execution/}} to gain more control.
\section{Research Question}
With these successful implementations of GPU acceleration, this thesis also attempts to improve the performance of evaluating mathematical equations, generated at runtime for symbolic regression using GPUs. Therefore, the following research questions are formulated:
Given the successful implementation of GPU acceleration, the aim of this thesis is to improve the performance of evaluating mathematical equations, generated at runtime for symbolic regression using GPUs. Therefore, the following research questions are formulated:
\begin{itemize}
\item How can simple arithmetic expressions that are generated at runtime be efficiently evaluated on GPUs?
@ -23,7 +23,7 @@ With these successful implementations of GPU acceleration, this thesis also atte
\item Under which circumstances is the interpretation of the expressions on the GPU or the translation to the intermediate language Parallel Thread Execution (PTX) more efficient?
\end{itemize}
Answering the first question is necessary to ensure the approach of this thesis is actually feasible. If it is feasible, it is important to evaluate if evaluating the expressions on the GPU actually improves the performance over a parallelised CPU evaluator. To answer if the GPU evaluator is faster than the CPU evaluator, the last research question is important. As there are two major ways of implementing an evaluator on the GPU, they need to be implemented and evaluated to finally state if evaluating expressions on the GPU is faster and if so, which type of implementation results in the best performance.
Answering the first question is necessary to ensure the approach of this thesis is feasible. If it is feasible, it is important to determine if evaluating the expressions on the GPU improves the performance over a parallelised CPU evaluator. To answer if the GPU evaluator is faster than the CPU evaluator, the last research question is important. As there are two major ways of implementing an evaluator on the GPU, both need to be implemented and evaluated to finally state if evaluating expressions on the GPU is faster and if so, which type of implementation results in the best performance under which circumstances.
\section{Thesis Structure}