started implementing feedback
Some checks failed
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Has been cancelled
Some checks failed
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Has been cancelled
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Has been cancelled
This commit is contained in:
@ -7,26 +7,30 @@ This chapter provides an entry point for this thesis. First the motivation of ex
|
||||
%
|
||||
% Not totally happy with this yet
|
||||
%
|
||||
Optimisation and acceleration of program code is a crucial part in many fields. For example video games need optimisation to lower the minimum hardware requirements which allows more people to run the game, increasing sales. Another example where optimisation is important are computer simulations. For those, optimisation is even more crucial, as this allows the scientists to run more detailed simulations or get the simulation results faster. Equation learning is another field that can heavily benefit from optimisation. One part of equation learning, is to evaluate the expressions generated by the algorithm which can make up a significant portion of the runtime of the algorithm. This thesis is concerned with optimising the evaluation part to increase the overall performance of the equation learning algorithm.
|
||||
Optimisation and acceleration of program code is a crucial part in many fields. For example video games need optimisation to lower the minimum hardware requirements which allows more people to run the game, increasing sales. Another example where optimisation is important are computer simulations. For those, optimisation is even more crucial, as this allows the scientists to run more detailed simulations or get the simulation results faster. Equation learning or symbolic regression is another field that can heavily benefit from optimisation. One part of equation learning, is to evaluate the expressions generated by a search algorithm which can make up a significant portion of the runtime. This thesis is concerned with optimising the evaluation part to increase the overall performance of equation learning algorithms.
|
||||
|
||||
Considering the following expression $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ 3$ which contains simple mathematical operations as well as variables $x_n$ and parameters $p_n$. This expression is one example that can be generated by the equation learning algorithm and needs to be evaluated for the next iteration. Usually multiple expressions are generated per iteration, which also need to be evaluated. Additionally, multiple different values need to be inserted for all variables and parameters, drastically increasing the amount of evaluations that need to be performed.
|
||||
The following expression $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ x_3$ which contains simple mathematical operations as well as variables $x_n$ and parameters $p_n$ is one example that can be generated by the equation learning algorithm, Usually an equation learning algorithm generates multiple of such expressions per iteration. Out of these expressions all possibly relevant ones have to be evaluated. Additionally, multiple different values need to be inserted for all variables and parameters, drastically increasing the amount of evaluations that need to be performed.
|
||||
|
||||
In his Blog \textcite{sutter_free_2004} described how the free lunch is over in terms of the ever-increasing performance of hardware like the CPU. He states that to gain additional performance, developers need to start developing software for multiple cores and not just hope that on the next generation of CPUs the program magically runs faster. While this approach means more development overhead, a much greater speed-up can be achieved. However, in some cases the speed-up achieved by this is still not large enough and another approach is needed. One of these approaches is the utilisation of consumer Graphics Processing Units (GPUs) as an easy and affordable option as compared to compute clusters. Enterprise GPUs like the B200\footnote{\url{https://www.nvidia.com/de-de/data-center/dgx-b200/}} cost at least \$30000, and are only available as a full system with 8 GPUs, power delivery etc \parencite{bajwa_microsoft_2024}. Data centres specialised for artificial intelligence workloads often cost billions of dollars \parencite{bajwa_microsoft_2024}. However, despite these costs for enterprise GPUs, cheaper consumer GPUs can also deliver a great performance uplift. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using GPUs for weather simulation. In addition to computer simulations, GPU acceleration also can be found in other places such as networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}.
|
||||
|
||||
|
||||
%The free lunch theorem as described by \textcite{adam_no_2019} states that to gain additional performance, a developer cannot just hope for future hardware to be faster, especially on a single core.
|
||||
|
||||
The free lunch theorem as described by \textcite{adam_no_2019} states that to gain additional performance, a developer cannot just hope for future hardware to be faster, especially on a single core. Therefore, algorithms need to utilise the other cores on a processor to further acceleration. While this approach means more development overhead, a much greater speed-up can be achieved. However, in some cases the speed-up achieved by this is still not large enough and another approach is needed. One of these approaches is the utilisation of a Graphics Processing Unit (GPU) as an easy and affordable option as compared to compute clusters. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using the GPU for weather simulation. In addition to computer simulations GPU acceleration also can be found in other places like networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}.
|
||||
|
||||
|
||||
\section{Research Question}
|
||||
With these successful implementations of GPU acceleration, this thesis also attempts to improve the performance of evaluating mathematical equations using GPUs. Therefore, the following research questions are formulated:
|
||||
|
||||
\begin{itemize}
|
||||
\item How can simple arithmetic expressions that are generated at runtime be efficiently evaluated on graphics cards?
|
||||
\item Under what circumstances is the evaluation of simple arithmetic expressions faster on a graphics card than on a CPU?
|
||||
\item How can simple arithmetic expressions that are generated at runtime be efficiently evaluated on GPUs?
|
||||
\item Under what circumstances is the evaluation of simple arithmetic expressions faster on a GPU than on a CPU?
|
||||
\item Under which circumstances is the interpretation of the expressions on the GPU or the translation to the intermediate language Parallel Thread Execution (PTX) more efficient?
|
||||
\end{itemize}
|
||||
|
||||
Answering the first question is necessary to ensure the approach of this thesis is actually feasible. If it is feasible, it is important to evaluate if evaluating the expressions on the GPU actually improves the performance over a parallelised CPU evaluator. To answer if the GPU evaluator is faster than the CPU evaluator, the last research question is important. As there are two major ways of implementing an evaluator on the GPU, they need to be implemented and evaluated to finally state if evaluating expressions on the GPU is faster and if so, which type of implementation results in the best performance.
|
||||
|
||||
|
||||
\section{Methodology}
|
||||
\section{Thesis Structure}
|
||||
In order to answer the research questions, this thesis is divided into the following chapters:
|
||||
|
||||
\begin{description}
|
||||
@ -37,7 +41,7 @@ In order to answer the research questions, this thesis is divided into the follo
|
||||
\item[Chapter 4: Implementation] \mbox{} \\
|
||||
This chapter explains the implementation of the GPU interpreter and transpiler. The details of the implementation with the used technologies are covered, such as the interpretation process and the transpilation of the expressions into Parallel Thread Execution (PTX) code.
|
||||
\item[Chapter 5: Evaluation] \mbox{} \\
|
||||
The software and hardware requirements and the evaluation environment are introduced in this chapter. Furthermore, the results of the comparison of the GPU and CPU evaluators are presented to show which of these yields the best performance.
|
||||
The software and hardware requirements and the evaluation environment are introduced in this chapter. All three evaluators will be compared against each other and the form of the expressions used for the comparisons are outlined. Finally, the results of the comparison of the GPU and CPU evaluators are presented to show which of these yields the best performance.
|
||||
\item[Chapter 6: Conclusion] \mbox{} \\
|
||||
In the final chapter, the entire work is summarised. A brief overview of the implementation as well as the evaluation results will be provided. Additionally, an outlook of possible future research is given.
|
||||
\end{description}
|
||||
|
Reference in New Issue
Block a user