Related Work: finished equation learning section; started GPGPU section

2025-03-01 13:14:37 +01:00
parent 28ef6b121e
commit 34d98f9997
3 changed files with 92 additions and 4 deletions
--- a/thesis/chapters/relwork.tex
+++ b/thesis/chapters/relwork.tex
@ -6,18 +6,23 @@ The goal of this chapter is to provide an overview of equation learning to estab
 % Section describing what equation learning is and why it is relevant for the thesis
 Equation learning is a field of research that aims at understanding and discovering equations from a set of data from various fields like mathematics and physics. Data is usually much more abundant while models often are elusive. Because of this, generating equations with a computer can more easily lead to discovering equations that describe the observed data. \textcite{brunton_discovering_2016} describe an algorithm that leverages equation learning to discover equations for physical systems. A more literal interpretation of equation learning is demonstrated by \textcite{pfahler_semantic_2020}. They use machine learning to learn the form of equations. Their aim was to simplify the discovery of relevant publications by the equations they use and not by technical terms, as they may differ by the field of research. However, this kind of equation learning is not relevant for this thesis.

-Symbolic regression is a subset of equation learning, that specialises more towards discovering mathematical equations. A lot of research is done in this field. \textcite{keijzer_scaled_2004} and \textcite{korns_accuracy_2011} presented ways of improving the quality of symbolic regression algorithms, making symbolic regression more feasible for problem-solving. Additionally, \textcite{jin_bayesian_2020} proposed an alternative to genetic programming (GP) for the use in symbolic regression. Their approach increased the quality of the results noticeably compared to GP alternatives. The first two approaches are more concerned with the quality of the output, while the third is also concerned with interpretability and reducing memory consumption. Heuristics like GP or neural networks as used by \textcite{werner_informed_2021} in their equation learner can help with finding good solutions faster, accelerating scientific progress. One key part of equation learning in general is the computational evaluation of the generated equations. As this is an expensive operation, improving the performance reduces computation times and in turn, helps all approaches to find solutions more quickly.
-% probably a quick detour to show how a generated equation might look and why evaluating them is expensive
+Symbolic regression is a subset of equation learning, that specialises more towards discovering mathematical equations. A lot of research is done in this field. \textcite{keijzer_scaled_2004} and \textcite{korns_accuracy_2011} presented ways of improving the quality of symbolic regression algorithms, making symbolic regression more feasible for problem-solving. Additionally, \textcite{jin_bayesian_2020} proposed an alternative to genetic programming (GP) for the use in symbolic regression. Their approach increased the quality of the results noticeably compared to GP alternatives. The first two approaches are more concerned with the quality of the output, while the third is also concerned with interpretability and reducing memory consumption. \textcite{bartlett_exhaustive_2024} also describe an approach to generate simpler and higher quality equations while being faster than GP algorithms. Heuristics like GP or neural networks as used by \textcite{werner_informed_2021} in their equation learner can help with finding good solutions faster, accelerating scientific progress. As seen by these publications, increasing the quality of generated equations but also increasing the speed of finding these equations is a central part in symbolic regression and equation learning in general. This means research in improving the computational performance of these algorithms is desired.

-% talk about cases where porting algorithms to gpus helped increase performance. This will be the transition the the below sections
+The expressions generated by an equation learning algorithm can look like this $x_1 + 5 - \text{abs}(p_1) * \text{sqrt}(x_2) / 10 + 2 \char`^ 3$. They consist of several unary and binary operators but also of constants, variables and parameters and expressions mostly differ in length and the kind of terms in the expressions. Per iteration many of these expressions are generated and in addition, matrices of values for the variables and parameters are also created. One row of the variable matrix corresponds to one instantiation of all expressions and this matrix contains multiple rows. This leads to a drastic increase of instantiated expressions that need to be evaluated. Parameters are a bit simpler, as they can be treated as constants for one iteration but can have a different value on another iteration. This means that parameters do not increase the number of expressions that need to be evaluated. However, the increase in evaluations introduced by the variables is still drastic and therefore increases the algorithm runtime significantly. 


 \section[GPGPU]{General Purpose Computation on Graphics Processing Units}
-Describe what GPGPU is and how it differs from classical programming. talk about architecture (SIMD/SIMT) and some scientific papers on how they use GPUs to accelerate tasks
+Graphics cards (GPUs) are commonly used to increase the performance of many different applications. Originally they were designed to improve performance and visual quality in games. \textcite{dokken_gpu_2005} first described the usage of GPUs for general purpose programming. They have shown how the graphics pipeline can be used for GPGPU programming. Because this approach also requires the programmer to understand the graphics terminology, this was not a great solution. Therefore, Nvidia released CUDA\footnote{\url{https://developer.nvidia.com/cuda-toolkit}} in 2007 with the goal of allowing developers to program GPUs independent of the graphics pipeline and terminology. A study of the programmability of GPUs with CUDA and the resulting performance has been conducted by \textcite{huang_gpu_2008}. They found that GPGPU programming has potential, even for non-embarassingly parallel problems. Research is also done in making the low level CUDA development simpler. \textcite{han_hicuda_2011} have described a directive-based language to make development simpler and less error-prone, while retaining the performance of handwritten code. To drastically simplify CUDA development \textcite{besard_effective_2019} showed that it is possible to develop with CUDA in the high level programming language Julia\footnote{\url{https://julialang.org/}} while performing similar to CUDA written in C. In a subsequent study \textcite{lin_comparing_2021} found that high performance computing (HPC) on the CPU and GPU in Julia performs similar to HPC development in C. This means that Julia can be a viable alternative to Fortran, C and C++ in the HPC field and has the additional benefit of developer comfort since it is a high level language with modern features such as garbage-collectors. \textcite{besard_rapid_2019} have also shown how the combination of Julia and CUDA help in rapidly developing HPC software. While this section and thesis in general talk about CUDA, as it is a widely used framework for GPGPU programming, there also exist alternatives by AMD called ROCm\footnote{\url{https://www.amd.com/de/products/software/rocm.html}} and a vendor independent alternative called OpenCL\footnote{\url{https://www.khronos.org/opencl/}}.
+
+talk about the fields GPGPU really helped make performance improvements (weather simulations etc). Then describe how it differs from classical programming. talk about architecture (SIMD/SIMT; a lot of "slow" cores).

 \subsection[PTX]{Parallel Thread Execution}
 Describe what PTX is to get a common ground for the implementation chapter. Probably a short section

+% Maybe make this instead of what is there below:
+% \section{Compilers}
+% \subsection{Transpilers}
+% \subsection{Interpreters}

 \section{GPU Interpretation}
 Different sources on how to do interpretation on the gpu (and maybe interpretation in general too?)
--- a/thesis/main.pdf
+++ b/thesis/main.pdf
--- a/thesis/references.bib
+++ b/thesis/references.bib
@ -392,3 +392,86 @@ Publisher: Multidisciplinary Digital Publishing Institute},
 	keywords = {Statistics - Methodology},
 	file = {Preprint PDF:C\:\\Users\\danwi\\Zotero\\storage\\3MP48UI3\\Jin et al. - 2020 - Bayesian Symbolic Regression.pdf:application/pdf;Snapshot:C\:\\Users\\danwi\\Zotero\\storage\\UNNZKPRJ\\1910.html:text/html},
 }
+
+@inproceedings{winter_are_2021,
+	location = {New York, {NY}, {USA}},
+	title = {Are dynamic memory managers on {GPUs} slow? a survey and benchmarks},
+	isbn = {978-1-4503-8294-6},
+	url = {https://doi.org/10.1145/3437801.3441612},
+	doi = {10.1145/3437801.3441612},
+	series = {{PPoPP} '21},
+	shorttitle = {Are dynamic memory managers on {GPUs} slow?},
+	abstract = {Dynamic memory management on {GPUs} is generally understood to be a challenging topic. On current {GPUs}, hundreds of thousands of threads might concurrently allocate new memory or free previously allocated memory. This leads to problems with thread contention, synchronization overhead and fragmentation. Various approaches have been proposed in the last ten years and we set out to evaluate them on a level playing field on modern hardware to answer the question, if dynamic memory managers are as slow as commonly thought of. In this survey paper, we provide a consistent framework to evaluate all publicly available memory managers in a large set of scenarios. We summarize each approach and thoroughly evaluate allocation performance (thread-based as well as warp-based), and look at performance scaling, fragmentation and real-world performance considering a synthetic workload as well as updating dynamic graphs. We discuss the strengths and weaknesses of each approach and provide guidelines for the respective best usage scenario. We provide a unified interface to integrate any of the tested memory managers into an application and switch between them for benchmarking purposes. Given our results, we can dispel some of the dread associated with dynamic memory managers on the {GPU}.},
+	pages = {219--233},
+	booktitle = {Proceedings of the 26th {ACM} {SIGPLAN} Symposium on Principles and Practice of Parallel Programming},
+	publisher = {Association for Computing Machinery},
+	author = {Winter, Martin and Parger, Mathias and Mlakar, Daniel and Steinberger, Markus},
+	urldate = {2025-02-27},
+	date = {2021-02-17},
+}
+
+@article{bartlett_exhaustive_2024,
+	title = {Exhaustive Symbolic Regression},
+	volume = {28},
+	issn = {1941-0026},
+	url = {https://ieeexplore.ieee.org/abstract/document/10136815},
+	doi = {10.1109/TEVC.2023.3280250},
+	abstract = {Symbolic regression ({SR}) algorithms attempt to learn analytic expressions which fit data accurately and in a highly interpretable manner. Conventional {SR} suffers from two fundamental issues which we address here. First, these methods search the space stochastically (typically using genetic programming) and hence do not necessarily find the best function. Second, the criteria used to select the equation optimally balancing accuracy with simplicity have been variable and subjective. To address these issues we introduce exhaustive {SR} ({ESR}), which systematically and efficiently considers all possible equations—made with a given basis set of operators and up to a specified maximum complexity—and is therefore guaranteed to find the true optimum (if parameters are perfectly optimized) and a complete function ranking subject to these constraints. We implement the minimum description length principle as a rigorous method for combining these preferences into a single objective. To illustrate the power of {ESR} we apply it to a catalog of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding 40 functions (out of 5.2 million trial functions) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not uniquely prefer the expansion history of the standard model of cosmology. We make our code and full equation sets publicly available.},
+	pages = {950--964},
+	number = {4},
+	journaltitle = {{IEEE} Transactions on Evolutionary Computation},
+	author = {Bartlett, Deaglan J. and Desmond, Harry and Ferreira, Pedro G.},
+	urldate = {2025-02-28},
+	date = {2024-08},
+	note = {Conference Name: {IEEE} Transactions on Evolutionary Computation},
+	keywords = {Optimization, Complexity theory, Mathematical models, Biological system modeling, Cosmology data analysis, minimum description length, model selection, Numerical models, Search problems, Standards, symbolic regression ({SR})},
+	file = {Eingereichte Version:C\:\\Users\\danwi\\Zotero\\storage\\Y6LFWDH2\\Bartlett et al. - 2024 - Exhaustive Symbolic Regression.pdf:application/pdf;IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\2HU5A8RL\\10136815.html:text/html},
+}
+
+@inproceedings{dokken_gpu_2005,
+	location = {New York, {NY}, {USA}},
+	title = {The {GPU} as a high performance computational resource},
+	isbn = {978-1-59593-204-4},
+	url = {https://doi.org/10.1145/1090122.1090126},
+	doi = {10.1145/1090122.1090126},
+	series = {{SCCG} '05},
+	abstract = {With the introduction in 2003 of standard {GPUs} with 32 bit floating point numbers and programmable Vertex and Fragment processors, the processing power of the {GPU} was made available to non-graphics applications. As the {GPU} is aimed at computer graphics, the concepts in {GPU}-programming are based on computer graphics terminology, and the strategies for programming have to be based on the architecture of the graphics pipeline. At {SINTEF} in Norway a 4-year strategic institute project (2004-2007) "Graphics hardware as a high-end computational resource", http://www.math.sintef.no/gpu/ aims at making {GPUs} available as a computational resource both to academia and industry. This paper addresses the challenges of {GPU}-programming and results of the project's first year.},
+	pages = {21--26},
+	booktitle = {Proceedings of the 21st Spring Conference on Computer Graphics},
+	publisher = {Association for Computing Machinery},
+	author = {Dokken, Tor and Hagen, Trond R. and Hjelmervik, Jon M.},
+	urldate = {2025-03-01},
+	date = {2005-05-12},
+}
+
+@inproceedings{huang_gpu_2008,
+	title = {{GPU} as a General Purpose Computing Resource},
+	url = {https://ieeexplore.ieee.org/abstract/document/4710975/references#references},
+	doi = {10.1109/PDCAT.2008.38},
+	abstract = {In the last few years, {GPUs}(Graphics Processing Units) have made rapid development. Their ever-increasing computing power and decreasing cost have attracted attention from both industry and academia. In addition to graphics applications, researchers are interested in using them for general purpose computing. Recently, {NVIDIA} released a new computing architecture, {CUDA} (compute united device architecture), for its {GeForce} 8 series, Quadro {FX}, and Tesla {GPU} products. This new architecture can change fundamentally the way in which {GPUs} are used. In this paper, we study the programmability of {CUDA} and its {GeForce} 8 {GPU} and compare its performance with general purpose processors, in order to investigate its suitability for general purpose computation.},
+	eventtitle = {2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies},
+	pages = {151--158},
+	booktitle = {2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies},
+	author = {Huang, Qihang and Huang, Zhiyi and Werstein, Paul and Purvis, Martin},
+	urldate = {2025-03-01},
+	date = {2008-12},
+	note = {{ISSN}: 2379-5352},
+	keywords = {Application software, Central Processing Unit, Computer architecture, Computer graphics, Distributed computing, Grid computing, Multicore processing, Pipelines, Programming profession, Rendering (computer graphics)},
+	file = {IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\2FJP9K25\\references.html:text/html},
+}
+
+@article{han_hicuda_2011,
+	title = {{hiCUDA}: High-Level {GPGPU} Programming},
+	volume = {22},
+	url = {https://ieeexplore.ieee.org/abstract/document/5445082},
+	shorttitle = {{hiCUDA}},
+	abstract = {Graphics Processing Units ({GPUs}) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in {GPU} programmability. Although the Compute Unified Device Architecture ({CUDA}) is a simple C-like interface for programming {NVIDIA} {GPUs}, porting applications to {CUDA} remains a challenge to average programmers. In particular, {CUDA} places on the programmer the burden of packaging {GPU} code in separate functions, of explicitly managing data transfer between the host and {GPU} memories, and of manually optimizing the utilization of the {GPU} memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed {hiCUDA}},
+	pages = {78--90},
+	number = {1},
+	journaltitle = {{IEEE} Transactions on Parallel and Distributed Systems},
+	author = {Han, Tianyi David and Abdelrahman, Tarek S.},
+	urldate = {2025-03-01},
+	date = {2011},
+	note = {Conference Name: {IEEE} Transactions on Parallel and Distributed Systems},
+	file = {IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\5K63T7RB\\5445082.html:text/html},
+}