started implementing feedback
Some checks are pending
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Waiting to run

This commit is contained in:
Daniel 2025-03-13 13:18:50 +01:00
parent fddfa23b4f
commit ed9d8766be
6 changed files with 100 additions and 12 deletions

View File

@ -1 +1,2 @@
{"rule":"OXFORD_SPELLING_Z_NOT_S","sentence":"^\\QOptimisation of software\\E$"}
{"rule":"TO_TOO","sentence":"^\\QThey introduced the division operator, which led to much better results.\\E$"}

View File

@ -7,26 +7,30 @@ This chapter provides an entry point for this thesis. First the motivation of ex
%
% Not totally happy with this yet
%
Optimisation and acceleration of program code is a crucial part in many fields. For example video games need optimisation to lower the minimum hardware requirements which allows more people to run the game, increasing sales. Another example where optimisation is important are computer simulations. For those, optimisation is even more crucial, as this allows the scientists to run more detailed simulations or get the simulation results faster. Equation learning is another field that can heavily benefit from optimisation. One part of equation learning, is to evaluate the expressions generated by the algorithm which can make up a significant portion of the runtime of the algorithm. This thesis is concerned with optimising the evaluation part to increase the overall performance of the equation learning algorithm.
Optimisation and acceleration of program code is a crucial part in many fields. For example video games need optimisation to lower the minimum hardware requirements which allows more people to run the game, increasing sales. Another example where optimisation is important are computer simulations. For those, optimisation is even more crucial, as this allows the scientists to run more detailed simulations or get the simulation results faster. Equation learning or symbolic regression is another field that can heavily benefit from optimisation. One part of equation learning, is to evaluate the expressions generated by a search algorithm which can make up a significant portion of the runtime. This thesis is concerned with optimising the evaluation part to increase the overall performance of equation learning algorithms.
Considering the following expression $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ 3$ which contains simple mathematical operations as well as variables $x_n$ and parameters $p_n$. This expression is one example that can be generated by the equation learning algorithm and needs to be evaluated for the next iteration. Usually multiple expressions are generated per iteration, which also need to be evaluated. Additionally, multiple different values need to be inserted for all variables and parameters, drastically increasing the amount of evaluations that need to be performed.
The following expression $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ x_3$ which contains simple mathematical operations as well as variables $x_n$ and parameters $p_n$ is one example that can be generated by the equation learning algorithm, Usually an equation learning algorithm generates multiple of such expressions per iteration. Out of these expressions all possibly relevant ones have to be evaluated. Additionally, multiple different values need to be inserted for all variables and parameters, drastically increasing the amount of evaluations that need to be performed.
In his Blog \textcite{sutter_free_2004} described how the free lunch is over in terms of the ever-increasing performance of hardware like the CPU. He states that to gain additional performance, developers need to start developing software for multiple cores and not just hope that on the next generation of CPUs the program magically runs faster. While this approach means more development overhead, a much greater speed-up can be achieved. However, in some cases the speed-up achieved by this is still not large enough and another approach is needed. One of these approaches is the utilisation of consumer Graphics Processing Units (GPUs) as an easy and affordable option as compared to compute clusters. Enterprise GPUs like the B200\footnote{\url{https://www.nvidia.com/de-de/data-center/dgx-b200/}} cost at least \$30000, and are only available as a full system with 8 GPUs, power delivery etc \parencite{bajwa_microsoft_2024}. Data centres specialised for artificial intelligence workloads often cost billions of dollars \parencite{bajwa_microsoft_2024}. However, despite these costs for enterprise GPUs, cheaper consumer GPUs can also deliver a great performance uplift. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using GPUs for weather simulation. In addition to computer simulations, GPU acceleration also can be found in other places such as networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}.
%The free lunch theorem as described by \textcite{adam_no_2019} states that to gain additional performance, a developer cannot just hope for future hardware to be faster, especially on a single core.
The free lunch theorem as described by \textcite{adam_no_2019} states that to gain additional performance, a developer cannot just hope for future hardware to be faster, especially on a single core. Therefore, algorithms need to utilise the other cores on a processor to further acceleration. While this approach means more development overhead, a much greater speed-up can be achieved. However, in some cases the speed-up achieved by this is still not large enough and another approach is needed. One of these approaches is the utilisation of a Graphics Processing Unit (GPU) as an easy and affordable option as compared to compute clusters. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using the GPU for weather simulation. In addition to computer simulations GPU acceleration also can be found in other places like networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}.
\section{Research Question}
With these successful implementations of GPU acceleration, this thesis also attempts to improve the performance of evaluating mathematical equations using GPUs. Therefore, the following research questions are formulated:
\begin{itemize}
\item How can simple arithmetic expressions that are generated at runtime be efficiently evaluated on graphics cards?
\item Under what circumstances is the evaluation of simple arithmetic expressions faster on a graphics card than on a CPU?
\item How can simple arithmetic expressions that are generated at runtime be efficiently evaluated on GPUs?
\item Under what circumstances is the evaluation of simple arithmetic expressions faster on a GPU than on a CPU?
\item Under which circumstances is the interpretation of the expressions on the GPU or the translation to the intermediate language Parallel Thread Execution (PTX) more efficient?
\end{itemize}
Answering the first question is necessary to ensure the approach of this thesis is actually feasible. If it is feasible, it is important to evaluate if evaluating the expressions on the GPU actually improves the performance over a parallelised CPU evaluator. To answer if the GPU evaluator is faster than the CPU evaluator, the last research question is important. As there are two major ways of implementing an evaluator on the GPU, they need to be implemented and evaluated to finally state if evaluating expressions on the GPU is faster and if so, which type of implementation results in the best performance.
\section{Methodology}
\section{Thesis Structure}
In order to answer the research questions, this thesis is divided into the following chapters:
\begin{description}
@ -37,7 +41,7 @@ In order to answer the research questions, this thesis is divided into the follo
\item[Chapter 4: Implementation] \mbox{} \\
This chapter explains the implementation of the GPU interpreter and transpiler. The details of the implementation with the used technologies are covered, such as the interpretation process and the transpilation of the expressions into Parallel Thread Execution (PTX) code.
\item[Chapter 5: Evaluation] \mbox{} \\
The software and hardware requirements and the evaluation environment are introduced in this chapter. Furthermore, the results of the comparison of the GPU and CPU evaluators are presented to show which of these yields the best performance.
The software and hardware requirements and the evaluation environment are introduced in this chapter. All three evaluators will be compared against each other and the form of the expressions used for the comparisons are outlined. Finally, the results of the comparison of the GPU and CPU evaluators are presented to show which of these yields the best performance.
\item[Chapter 6: Conclusion] \mbox{} \\
In the final chapter, the entire work is summarised. A brief overview of the implementation as well as the evaluation results will be provided. Additionally, an outlook of possible future research is given.
\end{description}

View File

@ -1,14 +1,15 @@
\chapter{Fundamentals and Related Work}
\label{cha:relwork}
The goal of this chapter is to provide an overview of equation learning to establish common knowledge of the topic and problem this thesis is trying to solve. First the field of equation learning is explored which helps to contextualise the topic of this thesis. The main part of this chapter is split into two sub-parts. The first part is exploring research that has been done in the field of general purpose computations on the GPU (GPGPU) as well as the fundamentals of it. Focus lies on exploring how graphics processing units (GPUs) are used to achieve substantial speed-ups and when and where they can be effectively employed. The second part describes the basics of how interpreters and compilers are built and how they can be adapted to the workflow of programming GPUs. When discussing GPU programming concepts, the terminology used is that of Nvidia and may differ from that used for AMD GPUs.
The goal of this chapter is to provide an overview of equation learning or symbolic regression to establish common knowledge of the topic and problem this thesis is trying to solve. First the field of equation learning is explored which helps to contextualise the topic of this thesis. The main part of this chapter is split into two sub-parts. The first part is exploring research that has been done in the field of general purpose computations on the GPU (GPGPU) as well as the fundamentals of it. Focus lies on exploring how graphics processing units (GPUs) are used to achieve substantial speed-ups and when and where they can be effectively employed. The second part describes the basics of how interpreters and compilers are built and how they can be adapted to the workflow of programming GPUs. When discussing GPU programming concepts, the terminology used is that of Nvidia and may differ from that used for AMD GPUs.
\section{Equation learning}
% Section describing what equation learning is and why it is relevant for the thesis
% !!!!!!!!!!!!! TODO: More sources. I know I have read this but I guess I forgot to mention the source. TODO: Search for source
Equation learning is a field of research that aims at understanding and discovering equations from a set of data from various fields like mathematics and physics. Data is usually much more abundant while models often are elusive. Because of this, generating equations with a computer can more easily lead to discovering equations that describe the observed data. \textcite{brunton_discovering_2016} describe an algorithm that leverages equation learning to discover equations for physical systems. A more literal interpretation of equation learning is demonstrated by \textcite{pfahler_semantic_2020}. They use machine learning to learn the form of equations. Their aim was to simplify the discovery of relevant publications by the equations they use and not by technical terms, as they may differ by the field of research. However, this kind of equation learning is not relevant for this thesis.
Symbolic regression is a subset of equation learning, that specialises more towards discovering mathematical equations. A lot of research is done in this field. \textcite{keijzer_scaled_2004} and \textcite{korns_accuracy_2011} presented ways of improving the quality of symbolic regression algorithms, making symbolic regression more feasible for problem-solving. Additionally, \textcite{jin_bayesian_2020} proposed an alternative to genetic programming (GP) for the use in symbolic regression. Their approach increased the quality of the results noticeably compared to GP alternatives. The first two approaches are more concerned with the quality of the output, while the third is also concerned with interpretability and reducing memory consumption. \textcite{bartlett_exhaustive_2024} also describe an approach to generate simpler and higher quality equations while being faster than GP algorithms. Heuristics like GP or neural networks as used by \textcite{werner_informed_2021} in their equation learner can help with finding good solutions faster, accelerating scientific progress. As seen by these publications, increasing the quality of generated equations but also increasing the speed of finding these equations is a central part in symbolic regression and equation learning in general. This means research in improving the computational performance of these algorithms is desired.
Symbolic regression is a subset of equation learning, that specialises more towards discovering mathematical equations. A lot of research is done in this field. Using genetic programming (GP) for different problems, including symbolic regression, was first described by \textcite{koza_genetic_1994}. He described that finding a computer program to solve a problem for a given input and output, can be done by traversing the search space of all solutions. This fits very well for the goal of symbolic regression, where a mathematical expression needs to be found to describe a problem with specific inputs and outputs. In 2010, \textcite{koza_human-competitive_2010} provided an overview of results that were generated with the help of GP and were competitive with human solutions. \textcite{keijzer_scaled_2004} and \textcite{korns_accuracy_2011} presented ways of improving the quality of symbolic regression algorithms, making symbolic regression more feasible for problem-solving. \textcite{bartlett_exhaustive_2024} describe an exhaustive approach for symbolic regression which can find the true optimum for perfectly optimised parameters while retaining simple and interpretable results. Alternatives to GP for symbolic regression also exist with one proposed by \textcite{jin_bayesian_2020}. Their approach increased the quality of the results noticeably compared to GP alternatives. Another alternative to heuristics like GP is the usage of neural networks. One such alternative has been introduced by \textcite{martius_extrapolation_2016} where they used a neural network for their equation learner with mixed results. Later, an extension has been provided by \textcite{sahoo_learning_2018}. They introduced the division operator, which led to much better results. Further improvements have been described by \textcite{werner_informed_2021} with their informed equation learner. By incorporating domain expert knowledge they could limit the search space and find better solutions for particular domains. As seen by these publications, increasing the quality of generated equations and also increasing the speed of finding these equations is a central part in symbolic regression and equation learning in general. This means research in improving the computational performance of these algorithms is desired.
The expressions generated by an equation learning algorithm can look like this $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ 3$. They consist of several unary and binary operators but also of constants, variables and parameters and expressions mostly differ in length and the kind of terms in the expressions. Per iteration many of these expressions are generated and in addition, matrices of values for the variables and parameters are also created. One row of the variable matrix corresponds to one instantiation of all expressions and this matrix contains multiple rows. This leads to a drastic increase of instantiated expressions that need to be evaluated. Parameters are a bit simpler, as they can be treated as constants for one iteration but can have a different value on another iteration. This means that parameters do not increase the number of expressions that need to be evaluated. However, the increase in evaluations introduced by the variables is still drastic and therefore increases the algorithm runtime significantly.
%% !!!!!!!!! TODO: Continue here with implementing kronberger feedback
The expressions generated by an equation learning algorithm can look like this $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ x_3$. They consist of several unary and binary operators but also of constants, variables and parameters and expressions mostly differ in length and the kind of terms in the expressions. Per iteration many of these expressions are generated and in addition, matrices of values for the variables and parameters are also created. One row of the variable matrix corresponds to one instantiation of all expressions and this matrix contains multiple rows. This leads to a drastic increase of instantiated expressions that need to be evaluated. Parameters are a bit simpler, as they can be treated as constants for one iteration but can have a different value on another iteration. This means that parameters do not increase the number of expressions that need to be evaluated. However, the increase in evaluations introduced by the variables is still drastic and therefore increases the algorithm runtime significantly.
\section[GPGPU]{General Purpose Computation on Graphics Processing Units}

Binary file not shown.

View File

@ -32,7 +32,7 @@
%%%-----------------------------------------------------------------------------
\title{Interpreter and Transpiler for simple expressions on Nvidia GPUs using Julia}
\author{Daniel Wiplinger}
\author{Daniel Roth}
\programname{Software Engineering}
%\programtype{Fachhochschul-Bachelorstudiengang} % select/edit

View File

@ -628,3 +628,85 @@ Publisher: Multidisciplinary Digital Publishing Institute},
date = {2025-03},
file = {HIP programming model — HIP 6.3.42134 Documentation:C\:\\Users\\danwi\\Zotero\\storage\\6KRNU6PG\\programming_model.html:text/html},
}
@online{sutter_free_2004,
title = {The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software},
url = {http://www.gotw.ca/publications/concurrency-ddj.htm},
author = {Sutter, Herb},
urldate = {2025-03-13},
date = {2004-12},
file = {The Free Lunch Is Over\: A Fundamental Turn Toward Concurrency in Software:C\:\\Users\\danwi\\Zotero\\storage\\UU2CZWUR\\concurrency-ddj.html:text/html},
}
@article{bajwa_microsoft_2024,
title = {Microsoft, {OpenAI} plan \$100 billion data-center project, media report says},
url = {https://www.reuters.com/technology/microsoft-openai-planning-100-billion-data-center-project-information-reports-2024-03-29/},
abstract = {Microsoft and {OpenAI} are working on plans for a data center project that could cost as much as \$100 billion and include an artificial intelligence supercomputer called "Stargate" set to launch in 2028, The Information reported on Friday.},
journaltitle = {Reuters},
author = {Bajwa, Arsheeya},
urldate = {2025-03-13},
date = {2024-03-29},
langid = {english},
file = {Snapshot:C\:\\Users\\danwi\\Zotero\\storage\\G7PGNJJJ\\microsoft-openai-planning-100-billion-data-center-project-information-reports-2024-03-29.html:text/html},
}
@article{koza_genetic_1994,
title = {Genetic programming as a means for programming computers by natural selection},
volume = {4},
rights = {http://www.springer.com/tdm},
issn = {0960-3174, 1573-1375},
url = {http://link.springer.com/10.1007/BF00175355},
doi = {10.1007/BF00175355},
number = {2},
journaltitle = {Statistics and Computing},
shortjournal = {Stat Comput},
author = {Koza, {JohnR}.},
urldate = {2025-03-13},
date = {1994-06},
langid = {english},
}
@article{koza_human-competitive_2010,
title = {Human-competitive results produced by genetic programming},
volume = {11},
issn = {1389-2576, 1573-7632},
url = {http://link.springer.com/10.1007/s10710-010-9112-3},
doi = {10.1007/s10710-010-9112-3},
pages = {251--284},
number = {3},
journaltitle = {Genetic Programming and Evolvable Machines},
shortjournal = {Genet Program Evolvable Mach},
author = {Koza, John R.},
urldate = {2025-03-13},
date = {2010-09},
langid = {english},
file = {Full Text:C\:\\Users\\danwi\\Zotero\\storage\\Y32QERP5\\Koza - 2010 - Human-competitive results produced by genetic programming.pdf:application/pdf},
}
@misc{martius_extrapolation_2016,
title = {Extrapolation and learning equations},
rights = {{arXiv}.org perpetual, non-exclusive license},
url = {https://arxiv.org/abs/1610.02995},
doi = {10.48550/ARXIV.1610.02995},
abstract = {In classical machine learning, regression is treated as a black box process of identifying a suitable function from a hypothesis set without attempting to gain insight into the mechanism connecting inputs and outputs. In the natural sciences, however, finding an interpretable function for a phenomenon is the prime goal as it allows to understand and generalize results. This paper proposes a novel type of function learning network, called equation learner ({EQL}), that can learn analytical expressions and is able to extrapolate to unseen domains. It is implemented as an end-to-end differentiable feed-forward network and allows for efficient gradient based training. Due to sparsity regularization concise interpretable expressions can be obtained. Often the true underlying source expression is identified.},
publisher = {{arXiv}},
author = {Martius, Georg and Lampert, Christoph H.},
urldate = {2025-03-13},
date = {2016},
note = {Version Number: 1},
keywords = {68T05, 68T30, 68T40, 62J02, 65D15, Artificial Intelligence (cs.{AI}), {FOS}: Computer and information sciences, I.2.6; I.2.8, Machine Learning (cs.{LG})},
}
@misc{sahoo_learning_2018,
title = {Learning Equations for Extrapolation and Control},
rights = {{arXiv}.org perpetual, non-exclusive license},
url = {https://arxiv.org/abs/1806.07259},
doi = {10.48550/ARXIV.1806.07259},
abstract = {We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.},
publisher = {{arXiv}},
author = {Sahoo, Subham S. and Lampert, Christoph H. and Martius, Georg},
urldate = {2025-03-13},
date = {2018},
note = {Version Number: 1},
keywords = {68T05, 68T30, 68T40, 62M20, 62J02, 65D15, 70E60, 93C40, {FOS}: Computer and information sciences, I.2.6; I.2.8, Machine Learning (cs.{LG}), Machine Learning (stat.{ML})},
}