finished implementing thesis feedback

2025-03-14 16:11:25 +01:00
parent ed9d8766be
commit f3446a2b11
5 changed files with 173 additions and 63 deletions
--- a/thesis/chapters/introduction.tex
+++ b/thesis/chapters/introduction.tex
@ -9,9 +9,9 @@ This chapter provides an entry point for this thesis. First the motivation of ex
 %
 Optimisation and acceleration of program code is a crucial part in many fields. For example video games need optimisation to lower the minimum hardware requirements which allows more people to run the game, increasing sales. Another example where optimisation is important are computer simulations. For those, optimisation is even more crucial, as this allows the scientists to run more detailed simulations or get the simulation results faster. Equation learning or symbolic regression is another field that can heavily benefit from optimisation. One part of equation learning, is to evaluate the expressions generated by a search algorithm which can make up a significant portion of the runtime. This thesis is concerned with optimising the evaluation part to increase the overall performance of equation learning algorithms.
-The following expression $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ x_3$ which contains simple mathematical operations as well as variables $x_n$ and parameters $p_n$ is one example that can be generated by the equation learning algorithm, Usually an equation learning algorithm generates multiple of such expressions per iteration. Out of these expressions all possibly relevant ones have to be evaluated. Additionally, multiple different values need to be inserted for all variables and parameters, drastically increasing the amount of evaluations that need to be performed.
+The following expression $5 - \text{abs}(x_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ x_3$ which contains simple mathematical operations as well as variables $x_n$ and parameters $p_n$ is one example that can be generated by the equation learning algorithm, Usually an equation learning algorithm generates multiple of such expressions per iteration. Out of these expressions all possibly relevant ones have to be evaluated. Additionally, multiple different values need to be inserted for all variables and parameters, drastically increasing the amount of evaluations that need to be performed.
-In his Blog \textcite{sutter_free_2004} described how the free lunch is over in terms of the ever-increasing performance of hardware like the CPU. He states that to gain additional performance, developers need to start developing software for multiple cores and not just hope that on the next generation of CPUs the program magically runs faster. While this approach means more development overhead, a much greater speed-up can be achieved. However, in some cases the speed-up achieved by this is still not large enough and another approach is needed. One of these approaches is the utilisation of consumer Graphics Processing Units (GPUs) as an easy and affordable option as compared to compute clusters. Enterprise GPUs like the B200\footnote{\url{https://www.nvidia.com/de-de/data-center/dgx-b200/}} cost at least \$30000, and are only available as a full system with 8 GPUs, power delivery etc \parencite{bajwa_microsoft_2024}. Data centres specialised for artificial intelligence workloads often cost billions of dollars \parencite{bajwa_microsoft_2024}. However, despite these costs for enterprise GPUs, cheaper consumer GPUs can also deliver a great performance uplift. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using GPUs for weather simulation. In addition to computer simulations, GPU acceleration also can be found in other places such as networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}.
+In his Blog \textcite{sutter_free_2004} described how the free lunch is over in terms of the ever-increasing performance of hardware like the CPU. He states that to gain additional performance, developers need to start developing software for multiple cores and not just hope that on the next generation of CPUs the program magically runs faster. While this approach means more development overhead, a much greater speed-up can be achieved. However, in some cases the speed-up achieved by this is still not large enough and another approach is needed. One of these approaches is the utilisation of Graphics Processing Units (GPUs) as an easy and affordable option as compared to compute clusters. Especially when talking about performance per dollar, GPUs are very inexpensive as found by \textcite{brodtkorb_graphics_2013}. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using GPUs for weather simulation. In addition to computer simulations, GPU acceleration also can be found in other places such as networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}.
 %The free lunch theorem as described by \textcite{adam_no_2019} states that to gain additional performance, a developer cannot just hope for future hardware to be faster, especially on a single core. 
--- a/thesis/chapters/relwork.tex
+++ b/thesis/chapters/relwork.tex
@ -3,23 +3,29 @@
 The goal of this chapter is to provide an overview of equation learning or symbolic regression to establish common knowledge of the topic and problem this thesis is trying to solve. First the field of equation learning is explored which helps to contextualise the topic of this thesis. The main part of this chapter is split into two sub-parts. The first part is exploring research that has been done in the field of general purpose computations on the GPU (GPGPU) as well as the fundamentals of it. Focus lies on exploring how graphics processing units (GPUs) are used to achieve substantial speed-ups and when and where they can be effectively employed. The second part describes the basics of how interpreters and compilers are built and how they can be adapted to the workflow of programming GPUs. When discussing GPU programming concepts, the terminology used is that of Nvidia and may differ from that used for AMD GPUs.
 \section{Equation learning}
-% !!!!!!!!!!!!! TODO: More sources. I know I have read this but I guess I forgot to mention the source. TODO: Search for source
+Equation learning is a field of research that can be used for understanding and discovering equations from a set of data from various fields like mathematics and physics. Data is usually much more abundant while models often are elusive which is demonstrated by \textcite{guillemot_climate_2022} where they explain how validating the models against large amounts of data is a big part in creating such models. Because of this effort, generating equations with a computer can more easily lead to discovering equations that describe the observed data. \textcite{brunton_discovering_2016} describe an algorithm that leverages equation learning to discover equations for physical systems. A more literal interpretation of equation learning is demonstrated by \textcite{pfahler_semantic_2020}. They use machine learning to learn the form of equations. Their aim was to simplify the discovery of relevant publications by the equations they use and not by technical terms, as they may differ by the field of research. However, this kind of equation learning is not relevant for this thesis.
 Equation learning is a field of research that aims at understanding and discovering equations from a set of data from various fields like mathematics and physics. Data is usually much more abundant while models often are elusive. Because of this, generating equations with a computer can more easily lead to discovering equations that describe the observed data. \textcite{brunton_discovering_2016} describe an algorithm that leverages equation learning to discover equations for physical systems. A more literal interpretation of equation learning is demonstrated by \textcite{pfahler_semantic_2020}. They use machine learning to learn the form of equations. Their aim was to simplify the discovery of relevant publications by the equations they use and not by technical terms, as they may differ by the field of research. However, this kind of equation learning is not relevant for this thesis.
-Symbolic regression is a subset of equation learning, that specialises more towards discovering mathematical equations. A lot of research is done in this field. Using genetic programming (GP) for different problems, including symbolic regression, was first described by \textcite{koza_genetic_1994}. He described that finding a computer program to solve a problem for a given input and output, can be done by traversing the search space of all solutions. This fits very well for the goal of symbolic regression, where a mathematical expression needs to be found to describe a problem with specific inputs and outputs. In 2010, \textcite{koza_human-competitive_2010} provided an overview of results that were generated with the help of GP and were competitive with human solutions. \textcite{keijzer_scaled_2004} and \textcite{korns_accuracy_2011} presented ways of improving the quality of symbolic regression algorithms, making symbolic regression more feasible for problem-solving. \textcite{bartlett_exhaustive_2024} describe an exhaustive approach for symbolic regression which can find the true optimum for perfectly optimised parameters while retaining simple and interpretable results. Alternatives to GP for symbolic regression also exist with one proposed by \textcite{jin_bayesian_2020}. Their approach increased the quality of the results noticeably compared to GP alternatives. Another alternative to heuristics like GP is the usage of neural networks. One such alternative has been introduced by \textcite{martius_extrapolation_2016} where they used a neural network for their equation learner with mixed results. Later, an extension has been provided by \textcite{sahoo_learning_2018}. They introduced the division operator, which led to much better results. Further improvements have been described by \textcite{werner_informed_2021} with their informed equation learner. By incorporating domain expert knowledge they could limit the search space and find better solutions for particular domains. As seen by these publications, increasing the quality of generated equations and also increasing the speed of finding these equations is a central part in symbolic regression and equation learning in general. This means research in improving the computational performance of these algorithms is desired.
+Symbolic regression is a subset of equation learning, that specialises more towards discovering mathematical equations. A lot of research is done in this field. Using genetic programming (GP) for different problems, including symbolic regression, was first described by \textcite{koza_genetic_1994}. He described that finding a computer program to solve a problem for a given input and output, can be done by traversing the search space of all solutions. This fits well for the goal of symbolic regression, where a mathematical expression needs to be found to describe a problem with specific inputs and outputs. Later, \textcite{koza_human-competitive_2010} provided an overview of results that were generated with the help of GP and were competitive with human solutions, showing how symbolic regression is a useful tool. In their book Symbolic Regression, \textcite{kronberger_symbolic_2024} show how symbolic regression can be applied for real world scenarios. They also describe symbolic regression in great detail, while being tailored towards beginners and experts. 
-%% !!!!!!!!! TODO: Continue here with implementing kronberger feedback
+\textcite{keijzer_scaled_2004} and \textcite{korns_accuracy_2011} presented ways of improving the quality of symbolic regression algorithms, making symbolic regression more feasible for problem-solving. \textcite{bartlett_exhaustive_2024} describe an exhaustive approach for symbolic regression which can find the true optimum for perfectly optimised parameters while retaining simple and interpretable results. Alternatives to GP for symbolic regression also exist with one proposed by \textcite{jin_bayesian_2020}. Their approach increased the quality of the results noticeably compared to GP alternatives. Another alternative to heuristics like GP is the usage of neural networks. One such alternative has been introduced by \textcite{martius_extrapolation_2016} where they used a neural network for their equation learner with mixed results. Later, an extension has been provided by \textcite{sahoo_learning_2018}. They introduced the division operator, which led to much better results. Further improvements have been described by \textcite{werner_informed_2021} with their informed equation learner. By incorporating domain expert knowledge they could limit the search space and find better solutions for particular domains. One drawback of these three implementations is the fact that their neural networks are fixed. An equation learner which can change the network at runtime and therefore evolve over time is proposed by \textcite{dong_evolving_2024}. Their approach further improved the results of neural network equation learners. In their work, \textcite{lemos_rediscovering_2022} also used a neural network for symbolic regression. They were able to find an equivalent to Newton's law of gravitation and rediscovered Newton's second and third law only with trajectory data of bodies of our solar system. Although these laws were already known, this research has shown how neural networks and machine learning in general have great potential. An implementation for an equation learner in the physics domain is proposed by \textcite{sun_symbolic_2023}. Their algorithm was specifically designed for nonlinear dynamics often occurring in physical systems. When compared to other implementations their equation learner was able to create better results but have the main drawback of high computational cost. As seen by these publications, increasing the quality of generated equations and also increasing the speed of finding these equations is a central part in symbolic regression and equation learning in general. 
-The expressions generated by an equation learning algorithm can look like this $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ x_3$. They consist of several unary and binary operators but also of constants, variables and parameters and expressions mostly differ in length and the kind of terms in the expressions. Per iteration many of these expressions are generated and in addition, matrices of values for the variables and parameters are also created. One row of the variable matrix corresponds to one instantiation of all expressions and this matrix contains multiple rows. This leads to a drastic increase of instantiated expressions that need to be evaluated. Parameters are a bit simpler, as they can be treated as constants for one iteration but can have a different value on another iteration. This means that parameters do not increase the number of expressions that need to be evaluated. However, the increase in evaluations introduced by the variables is still drastic and therefore increases the algorithm runtime significantly. 
+
 As described earlier, the goal of equation learning is to find an expression that fits a given set of data. The data usually consists of a set of inputs that have been applied to the unknown expression and the output after the input has been applied. An example for such data is described by \textcite{werner_informed_2021}. In one instance they want to find the power loss formula for an electric machine. They used four inputs, direct and quadratic current as well as temperature and motor speed, and they have an observed output which is the power loss. Now for an arbitrary problem with different input and outputs, the equation learner tries to find an expression that fits this data \parencite{koza_genetic_1994}. Fitting in this context means that when the input is applied to the expression, the result will be the same as the observed output. In order to avoid overfitting \textcite{bomarito_bayesian_2022} have proposed a way of using Bayesian model selection to combat overfitting and reduce the complexity of the generated expressions. This also helps with making the expressions more generalisable and therefore be applicable to unseen inputs. A survey conducted by \textcite{dabhi_survey_2012} shows how overfitting is not desirable and why more generalisable solutions are preferred. To generate an equation, first the operators need to be defined that make up the equation. It is also possible to define a maximum length for an expression as proposed by \textcite{bartlett_exhaustive_2024}. Expressions also consist of constants as well as variables which represent the inputs. Assuming that a given problem has three variables, the equation learner could generate an expression as seen in \ref{eq:example} where $x_n$ are the variables and $O$ is the output which should correspond to the observed output for the given variables.
 \begin{equation} \label{eq:example}
 	O = 5 - \text{abs}(x_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ x_3
 \end{equation}
 A typical equation learner generates multiple expressions at once. If the equation learner generates $300$ expressions and each expression needs to be evaluated $50$ times to get the best parametrisation for this expression, the total number of evaluations is $300 * 50 = 15\,000$. However, it is likely that multiple runs or generations in the context of GP need to be performed. The number of generations is dependent to the problem, but assuming a maximum of $100$ generations the total number of evaluations is equal to $300 * 50 * 100 = 1\,500\,000$. These values have been taken from the equation learner for predicting discharge voltage curves of batteries as described by \textcite{kronberger_symbolic_2024}. Their equation learner converged after 54 generations, resulting in evaluating $800\,000$ expressions. Depending on the complexity of the generated expressions, performing all of these evaluations takes up a lot of the runtime. Their results took over two days on an eight core desktop CPU. While they did not provide runtime information for all problems they tested, the voltage curve prediction was the slowest. The other problems were in the range of a few seconds and up to a day. Especially the problems that took several hours to days to finish show, that there is still room for performance improvements. While a better CPU with more cores can be used, it is interesting to determine, if using Graphics cards can yield noticeable better performance or not, which is the goal of this thesis. 
 \section[GPGPU]{General Purpose Computation on Graphics Processing Units}
 \label{sec:gpgpu}
-Graphics cards (GPUs) are commonly used to increase the performance of many different applications. Originally they were designed to improve performance and visual quality in games. \textcite{dokken_gpu_2005} first described the usage of GPUs for general purpose programming. They have shown how the graphics pipeline can be used for GPGPU programming. Because this approach also requires the programmer to understand the graphics terminology, this was not a great solution. Therefore, Nvidia released CUDA\footnote{\url{https://developer.nvidia.com/cuda-toolkit}} in 2007 with the goal of allowing developers to program GPUs independent of the graphics pipeline and terminology. A study of the programmability of GPUs with CUDA and the resulting performance has been conducted by \textcite{huang_gpu_2008}. They found that GPGPU programming has potential, even for non-embarassingly parallel problems. Research is also done in making the low level CUDA development simpler. \textcite{han_hicuda_2011} have described a directive-based language to make development simpler and less error-prone, while retaining the performance of handwritten code. To drastically simplify CUDA development \textcite{besard_effective_2019} showed that it is possible to develop with CUDA in the high level programming language Julia\footnote{\url{https://julialang.org/}} while performing similar to CUDA written in C. In a subsequent study \textcite{lin_comparing_2021} found that high performance computing (HPC) on the CPU and GPU in Julia performs similar to HPC development in C. This means that Julia can be a viable alternative to Fortran, C and C++ in the HPC field and has the additional benefit of developer comfort since it is a high level language with modern features such as garbage-collectors. \textcite{besard_rapid_2019} have also shown how the combination of Julia and CUDA help in rapidly developing HPC software. While this thesis in general revolves around CUDA, there also exist alternatives by AMD called ROCm\footnote{\url{https://www.amd.com/de/products/software/rocm.html}} and a vendor independent alternative called OpenCL\footnote{\url{https://www.khronos.org/opencl/}}.
+Graphics cards (GPUs) are commonly used to increase the performance of many different applications. Originally they were designed to improve performance and visual quality in games. \textcite{dokken_gpu_2005} first described the usage of GPUs for general purpose programming (GPGPU). They have shown how the graphics pipeline can be used for GPGPU programming. Because this approach also requires the programmer to understand the graphics terminology, this was not a great solution. Therefore, Nvidia released CUDA\footnote{\url{https://developer.nvidia.com/cuda-toolkit}} in 2007 with the goal of allowing developers to program GPUs independent of the graphics pipeline and terminology. A study of the programmability of GPUs with CUDA and the resulting performance has been conducted by \textcite{huang_gpu_2008}. They found that GPGPU programming has potential, even for non-embarassingly parallel problems. Research is also done in making the low level CUDA development simpler. \textcite{han_hicuda_2011} have described a directive-based language to make development simpler and less error-prone, while retaining the performance of handwritten code. To drastically simplify CUDA development, \textcite{besard_effective_2019} showed that it is possible to develop with CUDA in the high level programming language Julia\footnote{\url{https://julialang.org/}} with similar performance to CUDA written in C. In a subsequent study \textcite{lin_comparing_2021} found that high performance computing (HPC) on the CPU and GPU in Julia performs similar to HPC development in C. This means that Julia can be a viable alternative to Fortran, C and C++ in the HPC field and has the additional benefit of developer comfort since it is a high level language with modern features such as garbage-collectors. \textcite{besard_rapid_2019} have also shown how the combination of Julia and CUDA help in rapidly developing HPC software. While this thesis in general revolves around CUDA, there also exist alternatives by AMD called ROCm\footnote{\url{https://www.amd.com/de/products/software/rocm.html}} and a vendor independent alternative called OpenCL\footnote{\url{https://www.khronos.org/opencl/}}.
-While in the early days of GPGPU programming a lot of research has been done to assess if this approach is feasible, it now seems obvious to use GPUs to accelerate algorithms. Weather simulations began using GPUs very early for their models. In 2008 \textcite{michalakes_gpu_2008} proposed a method for simulating weather with the WRF model on a GPU. With their approach, they reached a speed-up of the most compute intensive task of 5 to 20, with very little GPU optimisation effort. They also found that the GPU usages was very low, meaning there are resources and potential for more detailed simulations. Generally, simulations are great candidates for using GPUs, as they can benefit heavily from a high degree of parallelism and data throughput. \textcite{koster_high-performance_2020} have developed a way of using adaptive time steps to improve the performance of time step simulations, while retaining their precision and constraint correctness. Black hole simulations are crucial for science and education for a better understanding of our world. \textcite{verbraeck_interactive_2021} have shown that simulating complex Kerr (rotating) black holes can be done on consumer hardware in a few seconds. Schwarzschild black hole simulations can be performed in real-time with GPUs as described by \textcite{hissbach_overview_2022} which is especially helpful for educational scenarios. While both approaches do not have the same accuracy as detailed simulations on supercomputers, they show how single GPUs can yield similar accuracy at a fraction of the cost. Networking can also heavily benefit from GPU acceleration as shown by \textcite{han_packetshader_2010}, where they achieved a significant increase in throughput than with a CPU only implementation. Finite element structural analysis is an essential tool for many branches of engineering and can also heavily benefit from the usage of GPUs as demonstrated by \textcite{georgescu_gpu_2013}. However, it also needs to be noted, that GPUs are not always better performing than CPUs as illustrated by \textcite{lee_debunking_2010}, but they still can lead to performance improvements nonetheless.
+While in the early days of GPGPU programming a lot of research has been done to assess if this approach is feasible, it now seems obvious to use GPUs to accelerate algorithms. GPUs have been used early to speed up weather simulation models. \textcite{michalakes_gpu_2008} proposed a method for simulating weather with the Weather Research and Forecast (WRF) model on a GPU. With their approach, they reached a speed-up of the most compute intensive task of 5 to 20, with little GPU optimisation effort. They also found that the GPU usage was low, meaning there are resources and potential for more detailed simulations. Generally, simulations are great candidates for using GPUs, as they can benefit heavily from a high degree of parallelism and data throughput. \textcite{koster_high-performance_2020} have developed a way of using adaptive time steps on the GPU to considerably improve the performance of numerical and discrete simulations. In addition to the performance gains they were able to retain the precision and constraint correctness of the simulation. Black hole simulations are crucial for science and education for a better understanding of our world. \textcite{verbraeck_interactive_2021} have shown that simulating complex Kerr (rotating) black holes can be done on consumer hardware in a few seconds. Schwarzschild black hole simulations can be performed in real-time with GPUs as described by \textcite{hissbach_overview_2022} which is especially helpful for educational scenarios. While both approaches do not have the same accuracy as detailed simulations on supercomputers, they show how a single GPU can yield similar accuracy at a fraction of the cost. Software network routing can also heavily benefit from GPU acceleration as shown by \textcite{han_packetshader_2010}, where they achieved a significantly higher throughput than with a CPU only implementation. Finite element structural analysis is an essential tool for many branches of engineering and can also heavily benefit from the usage of GPUs as demonstrated by \textcite{georgescu_gpu_2013}. However, it also needs to be noted, that GPUs are not always better performing than CPUs as illustrated by \textcite{lee_debunking_2010}, but they still can lead to performance improvements nonetheless.
 \subsection{Programming GPUs}
-The development process on a GPU is vastly different from a CPU. A CPU has tens or hundreds of complex cores with the AMD Epyc 9965\footnote{\url{https://www.amd.com/en/products/processors/server/epyc/9005-series/amd-epyc-9965.html}} having a staggering $192$ of those complex cores and twice as many threads. A guide for a simple one core 8-bit CPU has been published by \textcite{schuurman_step-by-step_2013}. He describes the many different and complex parts of a CPU core. Modern CPUs are even more complex, with dedicated fast integer and floating-point arithmetic gates as well as logic gates, sophisticated branch prediction and much more. This makes a CPU perfect for handling complex control flows on a single program strand and on modern CPUs even multiple strands simultaneously. However, as seen in section \ref{sec:gpgpu}, this often isn't enough. On the other hand, a GPU contains thousands or even tens of thousands of cores. For example, the GeForce RTX 5090\footnote{\url{https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/}} contains a total of $21760$ CUDA cores. To achieve this enormous core count a single GPU core has to be much simpler than one CPU core. As described by \textcite{nvidia_cuda_2024} a GPU designates much more transistors towards floating-point computations. This results in less efficient integer arithmetic and control flow handling. There is also less Cache available per core and clock speeds are usually also much lower than those on a CPU. An overview of the differences of a CPU and a GPU architecture can be seen in figure \ref{fig:cpu_vs_gpu}.
+The development process on a GPU is vastly different from a CPU. A CPU has tens or hundreds of complex cores with the AMD Epyc 9965\footnote{\url{https://www.amd.com/en/products/processors/server/epyc/9005-series/amd-epyc-9965.html}} having a staggering $192$ of those complex cores and twice as many threads. A guide for a simple one core 8-bit CPU has been published by \textcite{schuurman_step-by-step_2013}. He describes the different and complex parts of a CPU core. Modern CPUs are even more complex, with dedicated fast integer and floating-point arithmetic gates as well as logic gates, sophisticated branch prediction and much more. This makes a CPU perfect for handling complex control flows on a single program strand and on modern CPUs even multiple strands simultaneously. However, as seen in section \ref{sec:gpgpu}, this often isn't enough. On the other hand, a GPU contains thousands or even tens of thousands of cores. For example, the GeForce RTX 5090\footnote{\url{https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/}} contains a total of $21760$ CUDA cores. To achieve this enormous core count a single GPU core has to be much simpler than one CPU core. As described by \textcite{nvidia_cuda_2024} a GPU designates much more transistors towards floating-point computations. This results in less efficient integer arithmetic and control flow handling. There is also less Cache available per core and clock speeds are usually also much lower than those on a CPU. An overview of the differences of a CPU and a GPU architecture can be seen in figure \ref{fig:cpu_vs_gpu}.
 \begin{figure}
 	\centering
--- a/thesis/hgbbib.sty
+++ b/thesis/hgbbib.sty
@ -14,7 +14,7 @@
 \RequirePackage{xifthen}
 %\usepackage[style=numeric-comp,backend=biber,bibencoding=auto]{biblatex}
-\usepackage[style=\@bibstyle,backend=biber]{biblatex}
+\usepackage[style=\@bibstyle,backend=biber,uniquelist=false]{biblatex}
 \ExecuteBibliographyOptions{
 	bibencoding=auto,
 	bibwarn=true,
--- a/thesis/main.pdf
+++ b/thesis/main.pdf
--- a/thesis/references.bib
+++ b/thesis/references.bib
@ -269,7 +269,6 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Michalakes, John and Vachharajani, Manish},
 	urldate = {2025-02-14},
 	date = {2008-04},
 	note = {{ISSN}: 1530-2075},
 	keywords = {Acceleration, Bandwidth, Computer architecture, Concurrent computing, Graphics, Large-scale systems, Parallel processing, Predictive models, Weather forecasting, Yarn},
 	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\ZFEVRLEZ\\Michalakes und Vachharajani - 2008 - GPU acceleration of numerical weather prediction.pdf:application/pdf;IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\PYY4F7JB\\4536351.html:text/html},
 }
@ -321,7 +320,6 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Brunton, Steven L. and Proctor, Joshua L. and Kutz, J. Nathan},
 	urldate = {2025-02-26},
 	date = {2016-04-12},
 	note = {Publisher: Proceedings of the National Academy of Sciences},
 	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\6R643NFZ\\Brunton et al. - 2016 - Discovering governing equations from data by sparse identification of nonlinear dynamical systems.pdf:application/pdf},
 }
@ -423,7 +421,6 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Bartlett, Deaglan J. and Desmond, Harry and Ferreira, Pedro G.},
 	urldate = {2025-02-28},
 	date = {2024-08},
 	note = {Conference Name: {IEEE} Transactions on Evolutionary Computation},
 	keywords = {Optimization, Complexity theory, Mathematical models, Biological system modeling, Cosmology data analysis, minimum description length, model selection, Numerical models, Search problems, Standards, symbolic regression ({SR})},
 	file = {Eingereichte Version:C\:\\Users\\danwi\\Zotero\\storage\\Y6LFWDH2\\Bartlett et al. - 2024 - Exhaustive Symbolic Regression.pdf:application/pdf;IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\2HU5A8RL\\10136815.html:text/html},
 }
@ -455,27 +452,10 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Huang, Qihang and Huang, Zhiyi and Werstein, Paul and Purvis, Martin},
 	urldate = {2025-03-01},
 	date = {2008-12},
 	note = {{ISSN}: 2379-5352},
 	keywords = {Computer architecture, Application software, Central Processing Unit, Computer graphics, Distributed computing, Grid computing, Multicore processing, Pipelines, Programming profession, Rendering (computer graphics)},
 	file = {IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\2FJP9K25\\references.html:text/html},
 }
@article{han_hicuda_2011,
 	title = {{hiCUDA}: High-Level {GPGPU} Programming},
 	volume = {22},
 	url = {https://ieeexplore.ieee.org/abstract/document/5445082},
 	shorttitle = {{hiCUDA}},
 	abstract = {Graphics Processing Units ({GPUs}) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in {GPU} programmability. Although the Compute Unified Device Architecture ({CUDA}) is a simple C-like interface for programming {NVIDIA} {GPUs}, porting applications to {CUDA} remains a challenge to average programmers. In particular, {CUDA} places on the programmer the burden of packaging {GPU} code in separate functions, of explicitly managing data transfer between the host and {GPU} memories, and of manually optimizing the utilization of the {GPU} memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed {hiCUDA}},
 	pages = {78--90},
 	number = {1},
 	journaltitle = {{IEEE} Transactions on Parallel and Distributed Systems},
 	author = {Han, Tianyi David and Abdelrahman, Tarek S.},
 	urldate = {2025-03-01},
 	date = {2011},
 	note = {Conference Name: {IEEE} Transactions on Parallel and Distributed Systems},
 	file = {IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\5K63T7RB\\5445082.html:text/html},
 }
@article{verbraeck_interactive_2021,
 	title = {Interactive Black-Hole Visualization},
 	volume = {27},
@ -489,24 +469,10 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Verbraeck, Annemieke and Eisemann, Elmar},
 	urldate = {2025-03-02},
 	date = {2021-02},
 	note = {Conference Name: {IEEE} Transactions on Visualization and Computer Graphics},
 	keywords = {Rendering (computer graphics), Algorithms, Cameras, Computer Graphics Techniques, Distortion, Engineering, Mathematics, Observers, Physical \& Environmental Sciences, Ray tracing, Real-time systems, Visualization},
 	file = {PDF:C\:\\Users\\danwi\\Zotero\\storage\\HDASRGYN\\Verbraeck und Eisemann - 2021 - Interactive Black-Hole Visualization.pdf:application/pdf},
 }
@book{hissbach_overview_2022,
 	title = {An Overview of Techniques for Egocentric Black Hole Visualization and Their Suitability for Planetarium Applications},
 	isbn = {978-3-03868-189-2},
 	url = {https://doi.org/10.2312/vmv.20221207},
 	abstract = {The visualization of black holes is used in science communication to educate people about our universe and concepts of general relativity. Recent visualizations aim to depict black holes in realtime, overcoming the challenge of efficient general relativistic ray tracing. In this state-of-the-art report, we provide the first overview of existing works about egocentric black hole visualization that generate images targeted at general audiences. We focus on Schwarzschild and Kerr black holes and discuss current methods to depict the distortion of background panoramas, point-shaped stars, nearby objects, and accretion disks. Approaches to realtime visualizations are highlighted. Furthermore, we present the implementation of a state-of-the-art black hole visualization in the planetarium software Uniview.},
 	publisher = {The Eurographics Association},
 	author = {Hissbach, Anny-Marleen and Dick, Christian and Lawonn, Kai},
 	urldate = {2025-03-02},
 	date = {2022},
 	langid = {english},
 	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\TBBLEZ5N\\Hissbach et al. - 2022 - An Overview of Techniques for Egocentric Black Hole Visualization and Their Suitability for Planetar.pdf:application/pdf},
 }
@inproceedings{schuurman_step-by-step_2013,
 	location = {New York, {NY}, {USA}},
 	title = {Step-by-step design and simulation of a simple {CPU} architecture},
@ -537,7 +503,6 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Franchetti, F. and Kral, S. and Lorenz, J. and Ueberhuber, C.W.},
 	urldate = {2025-03-08},
 	date = {2005-02},
 	note = {Conference Name: Proceedings of the {IEEE}},
 	keywords = {Concurrent computing, Parallel processing, Automatic vectorization, Boosting, Computer aided instruction, Computer applications, Digital signal processing, digital signal processing ({DSP}), fast Fourier transform ({FFT}), Kernel, Registers, short vector single instruction, multiple data ({SIMD}), Signal processing algorithms, Spirals, symbolic vectorization},
 	file = {Eingereichte Version:C\:\\Users\\danwi\\Zotero\\storage\\J48HM9VD\\Franchetti et al. - 2005 - Efficient Utilization of SIMD Extensions.pdf:application/pdf;IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\W6PT75CV\\1386659.html:text/html},
 }
@ -598,7 +563,6 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	institution = {{ENS} Lyon},
 	type = {Research Report},
 	author = {Collange, Caroline},
 	urldate = {2025-03-08},
 	date = {2011-09},
 	keywords = {{GPU}, {SIMD}, Control-flow reconvergence, {SIMT}},
 	file = {HAL PDF Full Text:C\:\\Users\\danwi\\Zotero\\storage\\M2WPWNXF\\Collange - 2011 - Stack-less SIMT reconvergence at low cost.pdf:application/pdf},
@ -615,7 +579,6 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Fung, Wilson W. L. and Aamodt, Tor M.},
 	urldate = {2025-03-08},
 	date = {2011-02},
 	note = {{ISSN}: 2378-203X},
 	keywords = {Pipelines, Kernel, Graphics processing unit, Hardware, Instruction sets, Compaction, Random access memory},
 	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\TRPWUTI6\\Fung und Aamodt - 2011 - Thread block compaction for efficient SIMT control flow.pdf:application/pdf;IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\LYPYEA8U\\5749714.html:text/html},
 }
@ -635,19 +598,7 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Sutter, Herb},
 	urldate = {2025-03-13},
 	date = {2004-12},
-	file = {The Free Lunch Is Over\: A Fundamental Turn Toward Concurrency in Software:C\:\\Users\\danwi\\Zotero\\storage\\UU2CZWUR\\concurrency-ddj.html:text/html},
+	file = {Free_Lunch.pdf:C\:\\Users\\danwi\\Zotero\\storage\\ICE8KXP8\\Free_Lunch.pdf:application/pdf;The Free Lunch Is Over\: A Fundamental Turn Toward Concurrency in Software:C\:\\Users\\danwi\\Zotero\\storage\\UU2CZWUR\\concurrency-ddj.html:text/html},
 }
@article{bajwa_microsoft_2024,
 	title = {Microsoft, {OpenAI} plan \$100 billion data-center project, media report says},
 	url = {https://www.reuters.com/technology/microsoft-openai-planning-100-billion-data-center-project-information-reports-2024-03-29/},
 	abstract = {Microsoft and {OpenAI} are working on plans for a data center project that could cost as much as \$100 billion and include an artificial intelligence supercomputer called "Stargate" set to launch in 2028, The Information reported on Friday.},
 	journaltitle = {Reuters},
 	author = {Bajwa, Arsheeya},
 	urldate = {2025-03-13},
 	date = {2024-03-29},
 	langid = {english},
 	file = {Snapshot:C\:\\Users\\danwi\\Zotero\\storage\\G7PGNJJJ\\microsoft-openai-planning-100-billion-data-center-project-information-reports-2024-03-29.html:text/html},
 }
@article{koza_genetic_1994,
@ -664,6 +615,7 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	urldate = {2025-03-13},
 	date = {1994-06},
 	langid = {english},
 	file = {PDF:C\:\\Users\\danwi\\Zotero\\storage\\SAHSU45G\\Koza - 1994 - Genetic programming as a means for programming computers by natural selection.pdf:application/pdf},
 }
@article{koza_human-competitive_2010,
@ -707,6 +659,158 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	author = {Sahoo, Subham S. and Lampert, Christoph H. and Martius, Georg},
 	urldate = {2025-03-13},
 	date = {2018},
-	note = {Version Number: 1},
+	keywords = {{FOS}: Computer and information sciences, I.2.6; I.2.8, Machine Learning (cs.{LG}), 68T05, 68T30, 68T40, 62M20, 62J02, 65D15, 70E60, 93C40, Machine Learning (stat.{ML})},
-	keywords = {68T05, 68T30, 68T40, 62M20, 62J02, 65D15, 70E60, 93C40, {FOS}: Computer and information sciences, I.2.6; I.2.8, Machine Learning (cs.{LG}), Machine Learning (stat.{ML})},
+}
@article{han_hicuda_2011,
 	title = {{hiCUDA}: High-Level {GPGPU} Programming},
 	volume = {22},
 	rights = {https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/{IEEE}.html},
 	issn = {1045-9219},
 	url = {http://ieeexplore.ieee.org/document/5445082/},
 	doi = {10.1109/TPDS.2010.62},
 	shorttitle = {{hiCUDA}},
 	pages = {78--90},
 	number = {1},
 	journaltitle = {{IEEE} Transactions on Parallel and Distributed Systems},
 	shortjournal = {{IEEE} Trans. Parallel Distrib. Syst.},
 	author = {Han, Tianyi David and Abdelrahman, Tarek S.},
 	urldate = {2025-03-13},
 	date = {2011-01},
 	file = {PDF:C\:\\Users\\danwi\\Zotero\\storage\\PTANK4EC\\Han and Abdelrahman - 2011 - hiCUDA High-Level GPGPU Programming.pdf:application/pdf},
 }
@article{brodtkorb_graphics_2013,
 	title = {Graphics processing unit ({GPU}) programming strategies and trends in {GPU} computing},
 	volume = {73},
 	rights = {https://www.elsevier.com/tdm/userlicense/1.0/},
 	issn = {07437315},
 	url = {https://linkinghub.elsevier.com/retrieve/pii/S0743731512000998},
 	doi = {10.1016/j.jpdc.2012.04.003},
 	pages = {4--13},
 	number = {1},
 	journaltitle = {Journal of Parallel and Distributed Computing},
 	shortjournal = {Journal of Parallel and Distributed Computing},
 	author = {Brodtkorb, André R. and Hagen, Trond R. and Sætra, Martin L.},
 	date = {2013-01},
 	langid = {english},
 	file = {Full Text:C\:\\Users\\danwi\\Zotero\\storage\\GZVCZUFG\\Brodtkorb et al. - 2013 - Graphics processing unit (GPU) programming strategies and trends in GPU computing.pdf:application/pdf},
 }
@inproceedings{hissbach_overview_2022,
 	title = {An Overview of Techniques for Egocentric Black Hole Visualization and Their Suitability for Planetarium Applications},
 	isbn = {978-3-03868-189-2},
 	doi = {10.2312/vmv.20221207},
 	booktitle = {Vision, Modeling, and Visualization},
 	publisher = {The Eurographics Association},
 	author = {Hissbach, Anny-Marleen and Dick, Christian and Lawonn, Kai},
 	editor = {Bender, Jan and Botsch, Mario and Keim, Daniel A.},
 	date = {2022},
 	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\TBBLEZ5N\\Hissbach et al. - 2022 - An Overview of Techniques for Egocentric Black Hole Visualization and Their Suitability for Planetar.pdf:application/pdf},
 }
@inbook{guillemot_climate_2022,
 	edition = {1},
 	title = {Climate Models},
 	isbn = {978-1-009-08209-9 978-1-316-51427-6},
 	url = {https://www.cambridge.org/core/product/identifier/9781009082099%23CN-bp-14/type/book_part},
 	pages = {126--136},
 	booktitle = {A Critical Assessment of the Intergovernmental Panel on Climate Change},
 	publisher = {Cambridge University Press},
 	author = {Guillemot, Hélène},
 	bookauthor = {Hulme, Mike},
 	editor = {De Pryck, Kari},
 	urldate = {2025-03-14},
 	date = {2022-12-31},
 	file = {Full Text:C\:\\Users\\danwi\\Zotero\\storage\\MUKXXCV9\\Guillemot - 2022 - Climate Models.pdf:application/pdf},
 }
@inproceedings{bomarito_bayesian_2022,
 	location = {Boston Massachusetts},
 	title = {Bayesian model selection for reducing bloat and overfitting in genetic programming for symbolic regression},
 	isbn = {978-1-4503-9268-6},
 	url = {https://dl.acm.org/doi/10.1145/3520304.3528899},
 	doi = {10.1145/3520304.3528899},
 	eventtitle = {{GECCO} '22: Genetic and Evolutionary Computation Conference},
 	pages = {526--529},
 	booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
 	publisher = {{ACM}},
 	author = {Bomarito, G. F. and Leser, P. E. and Strauss, N. C. M. and Garbrecht, K. M. and Hochhalter, J. D.},
 	urldate = {2025-03-14},
 	date = {2022-07-09},
 	langid = {english},
 	file = {PDF:C\:\\Users\\danwi\\Zotero\\storage\\ZPS5ZYYQ\\Bomarito et al. - 2022 - Bayesian model selection for reducing bloat and overfitting in genetic programming for symbolic regr.pdf:application/pdf},
 }
@article{dabhi_survey_2012,
 	title = {A Survey on Techniques of Improving Generalization Ability of Genetic Programming Solutions},
 	rights = {{arXiv}.org perpetual, non-exclusive license},
 	url = {https://arxiv.org/abs/1211.1119},
 	doi = {10.48550/ARXIV.1211.1119},
 	abstract = {In the field of empirical modeling using Genetic Programming ({GP}), it is important to evolve solution with good generalization ability. Generalization ability of {GP} solutions get affected by two important issues: bloat and over-fitting. We surveyed and classified existing literature related to different techniques used by {GP} research community to deal with these issues. We also point out limitation of these techniques, if any. Moreover, the classification of different bloat control approaches and measures for bloat and over-fitting are also discussed. We believe that this work will be useful to {GP} practitioners in following ways: (i) to better understand concepts of generalization in {GP} (ii) comparing existing bloat and over-fitting control techniques and (iii) selecting appropriate approach to improve generalization ability of {GP} evolved solutions.},
 	author = {Dabhi, Vipul K. and Chaudhary, Sanjay},
 	urldate = {2025-03-14},
 	date = {2012},
 	keywords = {{FOS}: Computer and information sciences, Neural and Evolutionary Computing (cs.{NE})},
 	file = {PDF:C\:\\Users\\danwi\\Zotero\\storage\\JCULR888\\Dabhi and Chaudhary - 2012 - A Survey on Techniques of Improving Generalization Ability of Genetic Programming Solutions.pdf:application/pdf},
 }
@book{kronberger_symbolic_2024,
 	title = {Symbolic Regression},
 	isbn = {978-1-315-16640-7},
 	url = {http://dx.doi.org/10.1201/9781315166407},
 	pagetotal = {308},
 	publisher = {Chapman and Hall/{CRC}},
 	author = {Kronberger, Gabriel and Burlacu, Bogdan and Kommenda, Michael and Winkler, Stephan M. and Affenzeller, Michael},
 	date = {2024-07},
 	file = {PDF:C\:\\Users\\danwi\\Zotero\\storage\\43RPG26H\\Kronberger et al. - 2024 - Symbolic Regression.pdf:application/pdf},
 }
@misc{sun_symbolic_2023,
 	title = {Symbolic Physics Learner: Discovering governing equations via Monte Carlo tree search},
 	url = {http://arxiv.org/abs/2205.13134},
 	doi = {10.48550/arXiv.2205.13134},
 	shorttitle = {Symbolic Physics Learner},
 	abstract = {Nonlinear dynamics is ubiquitous in nature and commonly seen in various science and engineering disciplines. Distilling analytical expressions that govern nonlinear dynamics from limited data remains vital but challenging. To tackle this fundamental issue, we propose a novel Symbolic Physics Learner ({SPL}) machine to discover the mathematical structure of nonlinear dynamics. The key concept is to interpret mathematical operations and system state variables by computational rules and symbols, establish symbolic reasoning of mathematical formulas via expression trees, and employ a Monte Carlo tree search ({MCTS}) agent to explore optimal expression trees based on measurement data. The {MCTS} agent obtains an optimistic selection policy through the traversal of expression trees, featuring the one that maps to the arithmetic expression of underlying physics. Salient features of the proposed framework include search flexibility and enforcement of parsimony for discovered equations. The efficacy and superiority of the {SPL} machine are demonstrated by numerical examples, compared with state-of-the-art baselines.},
 	number = {{arXiv}:2205.13134},
 	publisher = {{arXiv}},
 	author = {Sun, Fangzheng and Liu, Yang and Wang, Jian-Xun and Sun, Hao},
 	urldate = {2025-03-14},
 	date = {2023-02-02},
 	keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Symbolic Computation, Nonlinear Sciences - Chaotic Dynamics, Physics - Computational Physics},
 	file = {Preprint PDF:C\:\\Users\\danwi\\Zotero\\storage\\YBXYH5D6\\Sun et al. - 2023 - Symbolic Physics Learner Discovering governing equations via Monte Carlo tree search.pdf:application/pdf;Snapshot:C\:\\Users\\danwi\\Zotero\\storage\\D9SDYVT3\\2205.html:text/html},
 }
@article{makke_interpretable_2024,
 	title = {Interpretable scientific discovery with symbolic regression: a review},
 	volume = {57},
 	issn = {1573-7462},
 	url = {https://doi.org/10.1007/s10462-023-10622-0},
 	doi = {10.1007/s10462-023-10622-0},
 	shorttitle = {Interpretable scientific discovery with symbolic regression},
 	abstract = {Symbolic regression is emerging as a promising machine learning method for learning succinct underlying interpretable mathematical expressions directly from data. Whereas it has been traditionally tackled with genetic programming, it has recently gained a growing interest in deep learning as a data-driven model discovery tool, achieving significant advances in various application domains ranging from fundamental to applied sciences. In this survey, we present a structured and comprehensive overview of symbolic regression methods, review the adoption of these methods for model discovery in various areas, and assess their effectiveness. We have also grouped state-of-the-art symbolic regression applications in a categorized manner in a living review.},
 	pages = {2},
 	number = {1},
 	journaltitle = {Artificial Intelligence Review},
 	shortjournal = {Artif Intell Rev},
 	author = {Makke, Nour and Chawla, Sanjay},
 	urldate = {2025-03-14},
 	date = {2024-01-02},
 	langid = {english},
 	keywords = {Artificial Intelligence, Automated Scientific Discovery, Interpretable {AI}, Symbolic Regression},
 	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\7PFYYUJZ\\Makke and Chawla - 2024 - Interpretable scientific discovery with symbolic regression a review.pdf:application/pdf},
 }
@misc{lemos_rediscovering_2022,
 	title = {Rediscovering orbital mechanics with machine learning},
 	url = {http://arxiv.org/abs/2202.02306},
 	doi = {10.48550/arXiv.2202.02306},
 	abstract = {We present an approach for using machine learning to automatically discover the governing equations and hidden properties of real physical systems from observations. We train a "graph neural network" to simulate the dynamics of our solar system's Sun, planets, and large moons from 30 years of trajectory data. We then use symbolic regression to discover an analytical expression for the force law implicitly learned by the neural network, which our results showed is equivalent to Newton's law of gravitation. The key assumptions that were required were translational and rotational equivariance, and Newton's second and third laws of motion. Our approach correctly discovered the form of the symbolic force law. Furthermore, our approach did not require any assumptions about the masses of planets and moons or physical constants. They, too, were accurately inferred through our methods. Though, of course, the classical law of gravitation has been known since Isaac Newton, our result serves as a validation that our method can discover unknown laws and hidden properties from observed data. More broadly this work represents a key step toward realizing the potential of machine learning for accelerating scientific discovery.},
 	number = {{arXiv}:2202.02306},
 	publisher = {{arXiv}},
 	author = {Lemos, Pablo and Jeffrey, Niall and Cranmer, Miles and Ho, Shirley and Battaglia, Peter},
 	urldate = {2025-03-14},
 	date = {2022-02-04},
 	keywords = {Astrophysics - Earth and Planetary Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning},
 	file = {Preprint PDF:C\:\\Users\\danwi\\Zotero\\storage\\9YPFHHRY\\Lemos et al. - 2022 - Rediscovering orbital mechanics with machine learning.pdf:application/pdf;Snapshot:C\:\\Users\\danwi\\Zotero\\storage\\YIFHYWCY\\2202.html:text/html},
 }