updated introduction: background and motivation

2025-02-14 12:42:42 +01:00
parent 4afc15a737
commit 250da02353
5 changed files with 282 additions and 505 deletions
--- a/thesis/.vscode/ltex.disabledRules.en-GB.txt
+++ b/thesis/.vscode/ltex.disabledRules.en-GB.txt
@ -0,0 +1 @@
+OXFORD_SPELLING_Z_NOT_S
--- a/thesis/.vscode/ltex.hiddenFalsePositives.en-GB.txt
+++ b/thesis/.vscode/ltex.hiddenFalsePositives.en-GB.txt
@ -0,0 +1 @@
+{"rule":"OXFORD_SPELLING_Z_NOT_S","sentence":"^\\QOptimisation of software\\E$"}
--- a/thesis/chapters/introduction.tex
+++ b/thesis/chapters/introduction.tex
@ -1,10 +1,12 @@
 \chapter{Introduction}
 \label{cha:Introduction}

-This chapter provides an entry point for this thesis. First the motivation of exploring this topic is presented. Next are the research questions that will be answered in this thesis are outlined. Lastly the methodology on how to answer these questions will be explained.
+This chapter provides an entry point for this thesis. First the motivation of exploring this topic is presented. In addition, the research questions of this thesis are outlined. Lastly the methodology on how to answer these questions will be explained.

 \section{Background and Motivation}
+Optimisation of program code is a crucial part in many different fields. For example video games need a lot of optimisation to lower the minimum hardware requirements which allows more people to run the game. Another example for optimisation are computer simulations. For those, optimisation is even more crucial, as this allows the scientists to run more detailed simulations or get the simulation results faster. Equation learning is another field that can heavily benefit from optimisation. One part of equation learning, is to evaluate the expressions generated by the algorithm. This thesis is concerned with optimising that part to increase the overall performance of the equation learning algorithm. The free lunch theorem as described by \textcite{adam_no_2019} states that optimising a program which runs on a single core will eventually lead to a performance plateau as no further optimisations can be done. Therefore, these algorithms need to utilise the other cores on a processor. In some cases the speed-up achieved by this is still not large enough and another approach is needed. One of these approaches is the utilisation of a Graphics Processing Unit (GPU) to further increase the performance. \textcite{michalakes_gpu_2008} have shown a noticeable speed-up when using the GPU for weather simulation. In addition to simulation GPU acceleration also can be found in other places like networking \parencite{han_packetshader_2010} or structural analysis of buildings \parencite{georgescu_gpu_2013}.

+With these successful implementations of GPU acceleration, this thesis also attempts to improve the performance of evaluating mathematical equations using GPUs. The baseline to compare against is an expression evaluator running in parallel on the CPU. (talk a bit more about what will be attempted I think. Look at other papers to see what they write in this section.)

 \section{Research Question}
 What are the research questions and how they will be answered
--- a/thesis/main.pdf
+++ b/thesis/main.pdf
--- a/thesis/references.bib
+++ b/thesis/references.bib
@ -1,544 +1,317 @@
-%%% Biber accepts '%' for marking comments

-@book{BachBWV988,
-	author={Bach, Johann Sebastian},
-	title={Goldberg-Variationen für Streichquartett, BWV 988},
-	editor={Anka, Dana},
-	publisher={Musikverlag Hans Sikorski},
-	location={Hamburg},
-	date={2017},
-	langid={ngerman}
+@article{besard_rapid_2019,
+	title = {Rapid software prototyping for heterogeneous and distributed platforms},
+	volume = {132},
+	issn = {09659978},
+	url = {https://linkinghub.elsevier.com/retrieve/pii/S0965997818310123},
+	doi = {10.1016/j.advengsoft.2019.02.002},
+	pages = {29--46},
+	journaltitle = {Advances in Engineering Software},
+	shortjournal = {Advances in Engineering Software},
+	author = {Besard, Tim and Churavy, Valentin and Edelman, Alan and Sutter, Bjorn De},
+	urldate = {2024-11-22},
+	date = {2019-06},
+	langid = {english},
+	file = {Volltext:C\:\\Users\\danwi\\Zotero\\storage\\VNWQAR9Q\\Besard et al. - 2019 - Rapid software prototyping for heterogeneous and distributed platforms.pdf:application/pdf},
 }

-@thesis{Bacher2004,
-	author={Bacher, Florian},
-	title={Interaktionsmöglichkeiten mit Bildschirmen und großflächigen Projektionen},
-	type={bathesis},
-	date={2004-06},
-	institution={University of Applied Sciences Upper Austria, Medientechnik und {-design}},
-	location={Hagenberg, Austria},
-	langid={ngerman}
+@article{besard_effective_2019,
+	title = {Effective Extensible Programming: Unleashing Julia on {GPUs}},
+	volume = {30},
+	rights = {https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/{IEEE}.html},
+	issn = {1045-9219, 1558-2183, 2161-9883},
+	url = {https://ieeexplore.ieee.org/document/8471188/},
+	doi = {10.1109/TPDS.2018.2872064},
+	shorttitle = {Effective Extensible Programming},
+	pages = {827--841},
+	number = {4},
+	journaltitle = {{IEEE} Transactions on Parallel and Distributed Systems},
+	shortjournal = {{IEEE} Trans. Parallel Distrib. Syst.},
+	author = {Besard, Tim and Foket, Christophe and De Sutter, Bjorn},
+	urldate = {2024-11-22},
+	date = {2019-04-01},
+	file = {Eingereichte Version:C\:\\Users\\danwi\\Zotero\\storage\\T34I73BI\\Besard et al. - 2019 - Effective Extensible Programming Unleashing Julia on GPUs.pdf:application/pdf},
 }

-@manual{Bezos2023,
-	author={Bezos, Javier and Braams, Johannes L.},
-	title={Babel},
-	subtitle={Localization and internationalization},
-	date={2023-10-25},
-	version={3.96},
-	url={http://mirrors.ctan.org/macros/latex/required/babel/base/babel.pdf},
-	langid={english}
+@inproceedings{lin_comparing_2021,
+	location = {St. Louis, {MO}, {USA}},
+	title = {Comparing Julia to Performance Portable Parallel Programming Models for {HPC}},
+	rights = {https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/{IEEE}.html},
+	isbn = {978-1-6654-1118-9},
+	url = {https://ieeexplore.ieee.org/document/9652798/},
+	doi = {10.1109/PMBS54543.2021.00016},
+	eventtitle = {2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems ({PMBS})},
+	pages = {94--105},
+	booktitle = {2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems ({PMBS})},
+	publisher = {{IEEE}},
+	author = {Lin, Wei-Chen and {McIntosh}-Smith, Simon},
+	urldate = {2024-11-22},
+	date = {2021-11},
+	file = {Eingereichte Version:C\:\\Users\\danwi\\Zotero\\storage\\U6EQPD62\\Lin und McIntosh-Smith - 2021 - Comparing Julia to Performance Portable Parallel Programming Models for HPC.pdf:application/pdf},
 }

-@incollection{BurgeBurger1999,
-	author={Burge, Mark and Burger, Wilhelm},
-	title={Ear Biometrics},
-	booktitle={Biometrics},
-	booksubtitle={Personal Identification in Networked Society},
-	publisher={Kluwer Academic Publishers},
-	date={1999},
-	location={Boston},
-	editor={Jain, Anil K. and Bolle, Ruud and Pankanti, Sharath},
-	chapter={13},
-	pages={273-285},
-	doi={10.1007/0-306-47044-6_13},
-	langid={english}
+@online{nvidia_cuda_2024,
+	title = {{CUDA} C++ Programming Guide},
+	url = {https://docs.nvidia.com/cuda/cuda-c-programming-guide/},
+	author = {{Nvidia}},
+	urldate = {2024-11-22},
+	date = {2024-11},
 }

-@inproceedings{Burger1987,
-	author={Burger, Wilhelm and Bhanu, Bir},
-	title={Qualitative Motion Understanding},
-	booktitle={Proceedings of the Tenth International Joint Conference on Artificial Intelligence},
-	date={1987-08},
-	editor={McDermott, John P.},
-	eventdate={1987-08-23/1987-08-28},
-	venue={Milano},
-	publisher={Morgan Kaufmann Publishers},
-	location={San Francisco},
-	pages={819-821},
-	doi={10.1007/978-1-4615-3566-9},
-	langid={english}
+@article{koster_massively_2020,
+	title = {Massively Parallel Rule-Based Interpreter Execution on {GPUs} Using Thread Compaction},
+	volume = {48},
+	issn = {1573-7640},
+	url = {https://doi.org/10.1007/s10766-020-00670-2},
+	doi = {10.1007/s10766-020-00670-2},
+	abstract = {Interpreters are well researched in the field of compiler construction and program generation. They are typically used to realize program execution of different programming languages without a compilation step. However, they can also be used to model complex rule-based simulations: The interpreter applies all rules one after another. These can be iteratively applied on a globally updated state in order to get the final simulation result. Many simulations for domain-specific problems already leverage the parallel processing capabilities of Graphics Processing Units ({GPUs}). They use hardware-specific tuned rule implementations to achieve maximum performance. However, every interpreter-based system requires a high-level algorithm that detects active rules and determines when they are evaluated. A common approach in this context is the use of different interpreter routines for every problem domain. Executing such functions in an efficient way mainly involves dealing with hardware peculiarities like thread divergences, {ALU} computations and memory operations. Furthermore, the interpreter is often executed on multiple states in parallel these days. This is particularly important for heuristic search or what-if analyses, for instance. In this paper, we present a novel and easy-to-implement method based on thread compaction to realize generic rule-based interpreters in an efficient way on {GPUs}. It is optimized for many states using a specially designed memory layout. Benchmarks on our evaluation scenarios show that the performance can be significantly increased in comparison to existing commonly-used implementations.},
+	pages = {675--691},
+	number = {4},
+	journaltitle = {International Journal of Parallel Programming},
+	shortjournal = {Int J Parallel Prog},
+	author = {Köster, M. and Groß, J. and Krüger, A.},
+	urldate = {2024-11-29},
+	date = {2020-08-01},
+	langid = {english},
+	keywords = {{GPU}, Interpreter execution, Memory layout, Thread compaction},
+	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\8ETAIXGL\\Köster et al. - 2020 - Massively Parallel Rule-Based Interpreter Execution on GPUs Using Thread Compaction.pdf:application/pdf},
 }

-@book{BurgerBurge2022,
-	author={Burger, Wilhelm and Burge, Mark James},
-	title={Digital Image Processing},
-	subtitle={An Algorithmic Introduction},
-	publisher={Springer},
-	location={Cham},
-	edition={3},
-	date={2022},
-	doi={10.1007/978-3-031-05744-1},
-	langid={english}
+@inproceedings{krolik_r3d3_2021,
+	title = {r3d3: Optimized Query Compilation on {GPUs}},
+	url = {https://ieeexplore.ieee.org/document/9370323},
+	doi = {10.1109/CGO51591.2021.9370323},
+	shorttitle = {r3d3},
+	abstract = {Query compilation is an effective approach to improve the performance of repeated database queries. {GPU}-based approaches have significant promise, but face difficulties in managing compilation time, data transfer costs, and in addressing a reasonably comprehensive range of {SQL} operations. In this work we describe a hybrid {AoT}/{JIT} approach to {GPU}-based query compilation. We use multiple optimizations to reduce execution, compile, and data transfer times, improving performance over both other {GPU}-based approaches and {CPU}-based query compilers as well. Our design addresses a wide range of {SQL} queries, sufficient to demonstrate the practicality of using {GPUs} for query optimization.},
+	eventtitle = {2021 {IEEE}/{ACM} International Symposium on Code Generation and Optimization ({CGO})},
+	pages = {277--288},
+	booktitle = {2021 {IEEE}/{ACM} International Symposium on Code Generation and Optimization ({CGO})},
+	author = {Krolik, Alexander and Verbrugge, Clark and Hendren, Laurie},
+	urldate = {2024-11-29},
+	date = {2021-02},
+	keywords = {Compilers, Data transfer, Databases, {GPUs}, Graphics processing units, Memory management, Optimization, Query processing, Runtime, {SQL} database queries},
+	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\NJM2FK56\\Krolik et al. - 2021 - r3d3 Optimized Query Compilation on GPUs.pdf:application/pdf;IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\F6KDT83Y\\9370323.html:text/html},
 }

-@manual{Carlisle2021,
-	author={Carlisle, David P.},
-	title={Packages in the {`graphics'} bundle},
-	date={2021-03-05},
-	url={http://mirrors.ctan.org/macros/latex/required/graphics/grfguide.pdf},
-	langid={english}
+@inproceedings{koster_high-performance_2020,
+	location = {Cham},
+	title = {High-Performance Simulations on {GPUs} Using Adaptive Time Steps},
+	isbn = {978-3-030-60245-1},
+	doi = {10.1007/978-3-030-60245-1_26},
+	abstract = {Graphics Processing Units ({GPUs}) are widely spread nowadays due to their parallel processing capabilities. Leveraging these hardware features is particularly important for computationally expensive tasks and workloads. Prominent use cases are optimization problems and simulations that can be parallelized and tuned for these architectures. In the general domain of simulations (numerical and discrete), the overall logic is split into several components that are executed one after another. These components need step-size information which determines the number of steps (e.g. the elapsed time) they have to perform. Small step sizes are often required to ensure a valid simulation result with respect to precision and constraint correctness. Unfortunately, they are often the main bottleneck of the simulation. In this paper, we introduce a new and generic way of realizing high-performance simulations with multiple components using adaptive time steps on {GPUs}. Our method relies on a code-analysis phase that resolves data dependencies between different components. This knowledge is used to generate specially-tuned execution kernels that encapsulate the underlying component logic. An evaluation on our simulation benchmarks shows that we are able to considerably improve runtime performance compared to prior work.},
+	pages = {369--385},
+	booktitle = {Algorithms and Architectures for Parallel Processing},
+	publisher = {Springer International Publishing},
+	author = {Köster, Marcel and Groß, Julian and Krüger, Antonio},
+	editor = {Qiu, Meikang},
+	date = {2020},
+	langid = {english},
+	keywords = {Related-Work},
 }

-@image{CocaCola1940,
-	author={Wolcott, Marion Post},
-	title={Natchez, Miss.},
-	note={Library of Congress Prints and Photographs Division Washington, Farm Security Administration/Office of War Information Color Photographs},
-	date={1940-08},
-	url={https://www.loc.gov/pictures/item/2017877479/},
-	langid={english}
+@inproceedings{koster_macsq_2022,
+	location = {Cham},
+	title = {{MACSQ}: Massively Accelerated {DeepQ} Learning on {GPUs} Using On-the-fly State Construction},
+	isbn = {978-3-030-96772-7},
+	doi = {10.1007/978-3-030-96772-7_35},
+	shorttitle = {{MACSQ}},
+	abstract = {The current trend of using artificial neural networks to solve computationally intensive problems is omnipresent. In this scope, {DeepQ} learning is a common choice for agent-based problems. {DeepQ} combines the concept of Q-Learning with (deep) neural networks to learn different Q-values/matrices based on environmental conditions. Unfortunately, {DeepQ} learning requires hundreds of thousands of iterations/Q-samples that must be generated and learned for large-scale problems. Gathering data sets for such challenging tasks is extremely time consuming and requires large data-storage containers. Consequently, a common solution is the automatic generation of input samples for agent-based {DeepQ} networks. However, a usual workflow is to create the samples separately from the training process in either a (set of) pre-processing step(s) or interleaved with the training process. This requires the input Q-samples to be materialized in order to be fed into the training step of the attached neural network. In this paper, we propose a new {GPU}-focussed method for on-the-fly generation of training samples tightly coupled with the training process itself. This allows us to skip the materialization process of all samples (e.g. avoid dumping them disk), as they are (re)constructed when needed. Our method significantly outperforms usual workflows that generate the input samples on the {CPU} in terms of runtime performance and memory/storage consumption.},
+	pages = {383--395},
+	booktitle = {Parallel and Distributed Computing, Applications and Technologies},
+	publisher = {Springer International Publishing},
+	author = {Köster, Marcel and Groß, Julian and Krüger, Antonio},
+	editor = {Shen, Hong and Sang, Yingpeng and Zhang, Yong and Xiao, Nong and Arabnia, Hamid R. and Fox, Geoffrey and Gupta, Ajay and Malek, Manu},
+	date = {2022},
+	langid = {english},
+	keywords = {Related Work},
 }

-@unpublished{Dai2016,
-	author={Dai, Jifeng and Li,Yi and He, Kaiming and Sun, Jian},
-	title={{R-FCN:} Object Detection via Region-Based Fully Convolutional Networks},
-	date={2016},
-	pubstate={prepublished},
-	doi={10.48550/arXiv.1605.06409},
-	langid={english}
+@inproceedings{dietz_mimd_2010,
+	location = {Berlin, Heidelberg},
+	title = {{MIMD} Interpretation on a {GPU}},
+	isbn = {978-3-642-13374-9},
+	doi = {10.1007/978-3-642-13374-9_5},
+	abstract = {Programming heterogeneous parallel computer systems is notoriously difficult, but {MIMD} models have proven to be portable across multi-core processors, clusters, and massively parallel systems. It would be highly desirable for {GPUs} (Graphics Processing Units) also to be able to leverage algorithms and programming tools designed for {MIMD} targets. Unfortunately, most {GPU} hardware implements a very restrictive multi-threaded {SIMD}-based execution model.},
+	pages = {65--79},
+	booktitle = {Languages and Compilers for Parallel Computing},
+	publisher = {Springer},
+	author = {Dietz, Henry G. and Young, B. Dalton},
+	editor = {Gao, Guang R. and Pollock, Lori L. and Cavazos, John and Li, Xiaoming},
+	date = {2010},
+	langid = {english},
+	keywords = {Hilfreich},
 }

-@manual{Daniel2018,
-	author={Daniel, Marco and Gundlach, Patrick and Schmidt, Walter and Knappen, Jörg and Partl, Hubert and Hyna, Irene},
-	title={\LaTeX2e-Kurzbeschreibung},
-	date={2018-04-08},
-	version={3.0c},
-	url={http://mirrors.ctan.org/info/lshort/german/l2kurz.pdf},
-	langid={ngerman}
+@inproceedings{langdon_simd_2008,
+	location = {Berlin, Heidelberg},
+	title = {A {SIMD} Interpreter for Genetic Programming on {GPU} Graphics Cards},
+	isbn = {978-3-540-78671-9},
+	doi = {10.1007/978-3-540-78671-9_7},
+	abstract = {Mackey-Glass chaotic time series prediction and nuclear protein classification show the feasibility of evaluating genetic programming populations directly on parallel consumer gaming graphics processing units. Using a Linux {KDE} computer equipped with an {nVidia} {GeForce} 8800 {GTX} graphics processing unit card the C++ {SPMD} interpretter evolves programs at Giga {GP} operations per second (895 million {GPops}). We use the {RapidMind} general processing on {GPU} ({GPGPU}) framework to evaluate an entire population of a quarter of a million individual programs on a non-trivial problem in 4 seconds. An efficient reverse polish notation ({RPN}) tree based {GP} is given.},
+	pages = {73--85},
+	booktitle = {Genetic Programming},
+	publisher = {Springer},
+	author = {Langdon, W. B. and Banzhaf, Wolfgang},
+	editor = {O’Neill, Michael and Vanneschi, Leonardo and Gustafson, Steven and Esparcia Alcázar, Anna Isabel and De Falco, Ivanoe and Della Cioppa, Antonio and Tarantino, Ernesto},
+	date = {2008},
+	langid = {english},
+	keywords = {Hilfreich},
 }

-@report{Drake1948,
-	author={Drake, Hubert M. and McLaughlin, Milton D. and Goodman, Harold R.},
-	title={Results obtained during accelerated transonic tests of the {Bell} {XS-1} airplane in flights to a {MACH} number of 0.92},
-	type={techreport},
-	institution={NASA Dryden Flight Research Center},
-	date={1948-01},
-	location={Edwards, CA},
-	number={NACA-RM-L8A05A},
-	url={https://www.nasa.gov/centers/dryden/pdf/87528main_RM-L8A05A.pdf},
-	langid={english}
+@inproceedings{cano_gpu-parallel_2014,
+	location = {New York, {NY}, {USA}},
+	title = {{GPU}-parallel subtree interpreter for genetic programming},
+	isbn = {978-1-4503-2662-9},
+	url = {https://dl.acm.org/doi/10.1145/2576768.2598272},
+	doi = {10.1145/2576768.2598272},
+	series = {{GECCO} '14},
+	abstract = {Genetic Programming ({GP}) is a computationally intensive technique but its nature is embarrassingly parallel. Graphic Processing Units ({GPUs}) are many-core architectures which have been widely employed to speed up the evaluation of {GP}. In recent years, many works have shown the high performance and efficiency of {GPUs} on evaluating both the individuals and the fitness cases in parallel. These approaches are known as population parallel and data parallel. This paper presents a parallel {GP} interpreter which extends these approaches and adds a new parallelization level based on the concurrent evaluation of the individual's subtrees. A {GP} individual defined by a tree structure with nodes and branches comprises different depth levels in which there are independent subtrees which can be evaluated concurrently. Threads can cooperate to evaluate different subtrees and share the results via {GPU}'s shared memory. The experimental results show the better performance of the proposal in terms of the {GP} operations per second ({GPops}/s) that the {GP} interpreter is capable of processing, achieving up to 21 billion {GPops}/s using a {NVIDIA} 480 {GPU}. However, some issues raised due to limitations of currently available hardware are to be overcomed by the dynamic parallelization capabilities of the next generation of {GPUs}.},
+	pages = {887--894},
+	booktitle = {Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation},
+	publisher = {Association for Computing Machinery},
+	author = {Cano, Alberto and Ventura, Sebastian},
+	urldate = {2024-11-28},
+	date = {2014-07-12},
+	keywords = {Hilfreich},
+	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\NYV739K8\\Cano und Ventura - 2014 - GPU-parallel subtree interpreter for genetic programming.pdf:application/pdf},
 }

-@book{Duden1997,
-	author={Friedrich, Christoph},
-	title={Schriftliche Arbeiten im technisch-natur\-wissen\-schaft\-lichen Studium}, 
-	subtitle={Ein Leitfaden zur effektiven Erstellung und zum Einsatz moderner Arbeitsmethoden},
-	publisher={Bibliographisches Institut},
-	series={Duden Taschenbücher},
-	volume={27},
-	location={Mannheim},
-	date={1997},
-	langid={ngerman}
+@inproceedings{pfahler_semantic_2020,
+	location = {New York, {NY}, {USA}},
+	title = {Semantic Search in Millions of Equations},
+	isbn = {978-1-4503-7998-4},
+	url = {https://dl.acm.org/doi/10.1145/3394486.3403056},
+	doi = {10.1145/3394486.3403056},
+	series = {{KDD} '20},
+	abstract = {Given the increase of publications, search for relevant papers becomes tedious. In particular, search across disciplines or schools of thinking is not supported. This is mainly due to the retrieval with keyword queries: technical terms differ in different sciences or at different times. Relevant articles might better be identified by their mathematical problem descriptions. Just looking at the equations in a paper already gives a hint to whether the paper is relevant. Hence, we propose a new approach for retrieval of mathematical expressions based on machine learning. We design an unsupervised representation learning task that combines embedding learning with self-supervised learning. Using graph convolutional neural networks we embed mathematical expression into low-dimensional vector spaces that allow efficient nearest neighbor queries. To train our models, we collect a huge dataset with over 29 million mathematical expressions from over 900,000 publications published on {arXiv}.org. The math is converted into an {XML} format, which we view as graph data. Our empirical evaluations involving a new dataset of manually annotated search queries show the benefits of using embedding models for mathematical retrieval.},
+	pages = {135--143},
+	booktitle = {Proceedings of the 26th {ACM} {SIGKDD} International Conference on Knowledge Discovery \& Data Mining},
+	publisher = {Association for Computing Machinery},
+	author = {Pfahler, Lukas and Morik, Katharina},
+	urldate = {2024-11-30},
+	date = {2020-08-20},
+	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\TQBLKG25\\Pfahler und Morik - 2020 - Semantic Search in Millions of Equations.pdf:application/pdf},
 }

-@thesis{Eberl1987,
-	author={Eberl, Gerhard},
-	title={Automatischer Landeanflug durch Rechnersehen},
-	type={phdthesis},
-	date={1987-08},
-	institution={Universität der Bundeswehr, Fakultät für Raum- und Luftfahrttechnik},
-	location={München},
-	langid={ngerman}
+@misc{werner_informed_2021,
+	title = {Informed Equation Learning},
+	url = {http://arxiv.org/abs/2105.06331},
+	doi = {10.48550/arXiv.2105.06331},
+	abstract = {Distilling data into compact and interpretable analytic equations is one of the goals of science. Instead, contemporary supervised machine learning methods mostly produce unstructured and dense maps from input to output. Particularly in deep learning, this property is owed to the generic nature of simple standard link functions. To learn equations rather than maps, standard non-linearities can be replaced with structured building blocks of atomic functions. However, without strong priors on sparsity and structure, representational complexity and numerical conditioning limit this direct approach. To scale to realistic settings in science and engineering, we propose an informed equation learning system. It provides a way to incorporate expert knowledge about what are permitted or prohibited equation components, as well as a domain-dependent structured sparsity prior. Our system then utilizes a robust method to learn equations with atomic functions exhibiting singularities, as e.g. logarithm and division. We demonstrate several artificial and real-world experiments from the engineering domain, in which our system learns interpretable models of high predictive power.},
+	number = {{arXiv}:2105.06331},
+	publisher = {{arXiv}},
+	author = {Werner, Matthias and Junginger, Andrej and Hennig, Philipp and Martius, Georg},
+	urldate = {2024-11-30},
+	date = {2021-05-13},
+	eprinttype = {arxiv},
+	eprint = {2105.06331},
+	keywords = {Computer Science - Machine Learning},
+	file = {Preprint PDF:C\:\\Users\\danwi\\Zotero\\storage\\HEYBR254\\Werner et al. - 2021 - Informed Equation Learning.pdf:application/pdf},
 }

-@legislation{EuRichtlinie2000,
-	author={{Europäische Union}},
-	title={Richtline 2000/14/EG des Europäischen Parlaments und des Rates vom 8.\ Mai 2000 zur 
-		Angleichung der Rechtsvorschriften der Mitgliedstaaten über umweltbelastende Geräuschemissionen 
-		von zur Verwendung im Freien vorgesehenen Geräten und Maschinen},
-	howpublished={Amtsblatt der Europäischen Gemeinschaften, L 162},
-	date={2000-05-08},
-	url={https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CONSLEG:2000L0014:20051227:de:PDF},
-	langid={ngerman}
+@article{memarzia_-depth_2015,
+	title = {An In-depth Study on the Performance Impact of {CUDA}, {OpenCL}, and {PTX} Code},
+	volume = {10},
+	abstract = {In recent years, the rise of {GPGPU} as a viable solution for high performance computing has been accompanied by fresh challenges for developers. Chief among these challenges is efficiently harnessing the formidable power of the {GPU} and finding performance bottlenecks. Many factors play a role in a {GPU} application’s performance. This creates the need for studies performance comparisons, and ways to analyze programs from a fundamental level. With that in mind, our goal is to present an in-depth performance comparison of the {CUDA} and {OpenCL} platforms, and study how {PTX} code can affect performance. In order to achieve this goal, we explore the subject from three different angles: kernel execution times, data transfers that occur between the host and device, and the {PTX} code that is generated by each platform’s compiler. We carry out our experiments using ten real-world {GPU} kernels from the digital image processing domain, a selection of variable input data sizes, and a pair of {GPUs} based on the Nvidia Fermi and Kepler architectures. We show how {PTX} statistics and analysis can be used to provide further insight on performance discrepancies and bottlenecks. Our results indicate that, in an unbiased comparison such as this one, the {OpenCL} and {CUDA} platforms are essentially similar in terms of performance.},
+	author = {Memarzia, Puya and Khunjush, Farshad},
+	date = {2015},
+	langid = {english},
+	file = {PDF:C\:\\Users\\danwi\\Zotero\\storage\\GKAYMMNN\\Memarzia und Khunjush - 2015 - An In-depth Study on the Performance Impact of CUDA, OpenCL, and PTX Code.pdf:application/pdf},
 }

-@manual{Fear2020,
-	author={Fear, Simon},
-	title={Publication quality tables in \LaTeX},
-	date={2020-01-14},
-	version={v1.61803398},
-	url={http://mirrors.ctan.org/macros/latex/contrib/booktabs/booktabs.pdf},
-	langid={english}
+@online{noauthor_-depth_nodate,
+	title = {An In-depth Study on the Performance Impact of {CUDA}, {OpenCL}, and {PTX} Code},
+	url = {https://www.global-sci.org/intro/article_detail.html?journal=undefined&article_id=22555},
+	urldate = {2024-12-01},
+	file = {An In-depth Study on the Performance Impact of CUDA, OpenCL, and PTX Code:C\:\\Users\\danwi\\Zotero\\storage\\7CPIZPCF\\article_detail.html:text/html},
 }

-@book{Faires1934,
-	author={Faires, Virgil Moring},
-	title={Design of Machine Elements},
-	publisher={The Macmillan Company},
-	date={1934},
-	origdate={1920},
-	note={Originalausgabe 1920},
-	langid={english}
+@article{bastidas_fuertes_transpiler-based_2023,
+	title = {Transpiler-Based Architecture Design Model for Back-End Layers in Software Development},
+	volume = {13},
+	rights = {http://creativecommons.org/licenses/by/3.0/},
+	issn = {2076-3417},
+	url = {https://www.mdpi.com/2076-3417/13/20/11371},
+	doi = {10.3390/app132011371},
+	abstract = {The utilization of software architectures and designs is widespread in software development, offering conceptual frameworks to address recurring challenges. A transpiler is a tool that automatically converts source code from one high-level programming language to another, ensuring algorithmic equivalence. This study introduces an innovative software architecture design model that integrates transpilers into the back-end layer, enabling the automatic transformation of business logic and back-end components from a single source code (the coding artifact) into diverse equivalent versions using distinct programming languages (the automatically produced code). This work encompasses both abstract and detailed design aspects, covering the proposal, automated processes, layered design, development environment, nest implementations, and cross-cutting components. In addition, it defines the main target audiences, discusses pros and cons, examines their relationships with prevalent design paradigms, addresses considerations about compatibility and debugging, and emphasizes the pivotal role of the transpiler. An empirical experiment involving the practical application of this model was conducted by implementing a collaborative to-do list application. This paper comprehensively outlines the relevant methodological approach, strategic planning, precise execution, observed outcomes, and insightful reflections while underscoring the the model’s pragmatic viability and highlighting its relevance across various software development contexts. Our contribution aims to enrich the field of software architecture design by introducing a new way of designing multi-programming-language software.},
+	pages = {11371},
+	number = {20},
+	journaltitle = {Applied Sciences},
+	author = {Bastidas Fuertes, Andrés and Pérez, María and Meza, Jaime},
+	urldate = {2025-01-03},
+	date = {2023-01},
+	langid = {english},
+	note = {Number: 20
+Publisher: Multidisciplinary Digital Publishing Institute},
+	keywords = {back-end layers, design model, software architecture, software development, source-to-source transformations, transpiler},
+	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\AD55DPJ4\\Bastidas Fuertes et al. - 2023 - Transpiler-Based Architecture Design Model for Back-End Layers in Software Development.pdf:application/pdf},
 }

-@online{Feder2006,
-	author={Feder, Alexander},
-	title={{BibTeX.org}},
-	date={2006},
-	url={https://www.bibtex.org/},
-	urldate={2023-11-06},
-	langid={english}
+@incollection{adam_no_2019,
+	location = {Cham},
+	title = {No Free Lunch Theorem: A Review},
+	isbn = {978-3-030-12767-1},
+	url = {https://doi.org/10.1007/978-3-030-12767-1_5},
+	shorttitle = {No Free Lunch Theorem},
+	abstract = {The “No Free Lunch” theorem states that, averaged over all optimization problems, without re-sampling, all optimization algorithms perform equally well. Optimization, search, and supervised learning are the areas that have benefited more from this important theoretical concept. Formulation of the initial No Free Lunch theorem, very soon, gave rise to a number of research works which resulted in a suite of theorems that define an entire research field with significant results in other scientific areas where successfully exploring a search space is an essential and critical task. The objective of this paper is to go through the main research efforts that contributed to this research field, reveal the main issues, and disclose those points that are helpful in understanding the hypotheses, the restrictions, or even the inability of applying No Free Lunch theorems.},
+	pages = {57--82},
+	booktitle = {Approximation and Optimization : Algorithms, Complexity and Applications},
+	publisher = {Springer International Publishing},
+	author = {Adam, Stavros P. and Alexandropoulos, Stamatios-Aggelos N. and Pardalos, Panos M. and Vrahatis, Michael N.},
+	editor = {Demetriou, Ioannis C. and Pardalos, Panos M.},
+	urldate = {2025-02-14},
+	date = {2019},
+	langid = {english},
+	doi = {10.1007/978-3-030-12767-1_5},
 }

-@legislation{FhStG1993,
-	title={Bundesgesetz über Fachhochschulen},
-	titleaddon={Fachhochschulgesetz – FHG},
-	howpublished={BGBl.\ Nr.\ 340/1993, zuletzt geändert mit Bundesgesetz BGBl.\ I Nr.\ 177/2021},
-	date={1993-05-28},
-	url={https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=Bundesnormen&Gesetzesnummer=10009895},
-	langid={ngerman}
+@inproceedings{michalakes_gpu_2008,
+	title = {{GPU} acceleration of numerical weather prediction},
+	url = {https://ieeexplore.ieee.org/abstract/document/4536351},
+	doi = {10.1109/IPDPS.2008.4536351},
+	abstract = {Weather and climate prediction software has enjoyed the benefits of exponentially increasing processor power for almost 50 years. Even with the advent of large-scale parallelism in weather models, much of the performance increase has come from increasing processor speed rather than increased parallelism. This free ride is nearly over. Recent results also indicate that simply increasing the use of large- scale parallelism will prove ineffective for many scenarios. We present an alternative method of scaling model performance by exploiting emerging architectures using the fine-grain parallelism once used in vector machines. The paper shows the promise of this approach by demonstrating a 20 times speedup for a computationally intensive portion of the Weather Research and Forecast ({WRF}) model on an {NVIDIA} 8800 {GTX} graphics processing unit ({GPU}). We expect an overall 1.3 times speedup from this change alone.},
+	eventtitle = {2008 {IEEE} International Symposium on Parallel and Distributed Processing},
+	pages = {1--7},
+	booktitle = {2008 {IEEE} International Symposium on Parallel and Distributed Processing},
+	author = {Michalakes, John and Vachharajani, Manish},
+	urldate = {2025-02-14},
+	date = {2008-04},
+	note = {{ISSN}: 1530-2075},
+	keywords = {Acceleration, Bandwidth, Computer architecture, Concurrent computing, Graphics, Large-scale systems, Parallel processing, Predictive models, Weather forecasting, Yarn},
+	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\ZFEVRLEZ\\Michalakes und Vachharajani - 2008 - GPU acceleration of numerical weather prediction.pdf:application/pdf;IEEE Xplore Abstract Record:C\:\\Users\\danwi\\Zotero\\storage\\PYY4F7JB\\4536351.html:text/html},
 }

-@video{Futurama1999,
-	author={Groening, Matt},
-	title={Futurama},
-	titleaddon={Season 1 Collection},
-	howpublished={DVD},
-	date={2002-02},
-	organization={Twentieth Century Fox Home Entertainment},
-	langid={english}
+@article{han_packetshader_2010,
+	title = {{PacketShader}: a {GPU}-accelerated software router},
+	volume = {40},
+	issn = {0146-4833},
+	url = {https://doi.org/10.1145/1851275.1851207},
+	doi = {10.1145/1851275.1851207},
+	shorttitle = {{PacketShader}},
+	abstract = {We present {PacketShader}, a high-performance software router framework for general packet processing with Graphics Processing Unit ({GPU}) acceleration. {PacketShader} exploits the massively-parallel processing power of {GPU} to address the {CPU} bottleneck in current software routers. Combined with our high-performance packet I/O engine, {PacketShader} outperforms existing software routers by more than a factor of four, forwarding 64B {IPv}4 packets at 39 Gbps on a single commodity {PC}. We have implemented {IPv}4 and {IPv}6 forwarding, {OpenFlow} switching, and {IPsec} tunneling to demonstrate the flexibility and performance advantage of {PacketShader}. The evaluation results show that {GPU} brings significantly higher throughput over the {CPU}-only implementation, confirming the effectiveness of {GPU} for computation and memory-intensive operations in packet processing.},
+	pages = {195--206},
+	number = {4},
+	journaltitle = {{SIGCOMM} Comput. Commun. Rev.},
+	author = {Han, Sangjin and Jang, Keon and Park, {KyoungSoo} and Moon, Sue},
+	urldate = {2025-02-14},
+	date = {2010-08-30},
 }

-@incollection{GershwinSummertime,
-	author={Gershwin, George and Heyward, DuBose},
-	title={Summertime},
-	booktitle={The Greatest Songs of George Gershwin},
-	publisher={Chappel Music},
-	location={London},
-	pages={40-43},
-	date={1979},
-	langid={english}
-}
-
-@book{HaydnCelloConcerto2,
-	author={Haydn, Josef},
-	title={Konzert für Violoncello No.\ 2 in D-Dur, Hob.VIIb:2},
-	editor={Soldan,Kurt},
-	publisher={C. F. Peters},
-	location={Leipzig},
-	date={1920},
-	langid={ngerman}
-}
-
-@book{Hemleben1969,
-	author={Hemleben, Johannes},
-	title={Galilei, Galileo},
-	publisher={rororo},
-	date={1969},
-	edition={20},
-	langid={ngerman}
-}
-
-@book{Higham2020,
-	author={Higham, Nicholas J.},
-	title={Handbook of Writing for the Mathematical Sciences},
-	publisher={Society for Industrial and Applied Mathematics (SIAM)},
-	location={Philadelphia},
-	edition={3},
-	date={2020},
-	url={https://www.maths.manchester.ac.uk/~higham/hwms/},
-	langid={english}
-}
-
-@video{HistoryOfComputers2008,
-	title={History of Computers},
-	date={2008-09-24},
-	url={https://www.youtube.com/watch?v=LvKxJ3bQRKE},
-	langid={english}
-}
-
-@online{IBM360,
-	author={IBM},
-	title={System 360},
-	subtitle={From Computers to Computer Systems},
-	date={2012-03-07},
-	url={https://www.ibm.com/ibm/history/ibm100/us/en/icons/system360/impacts/},
-	urldate={2023-11-06},
-	langid={english}
-}
-
-@manual{Kime2023,
-	author={Kime, Philip and Wemheuer, Moritz and Lehman, Philipp},
-	title={The \texttt{biblatex} Package},
-	subtitle={Programmable Bibliographies and Citations},
-	date={2023-03-05},
-	version={3.19},
-	url={http://mirrors.ctan.org/macros/latex/contrib/biblatex/doc/biblatex.pdf},
-	langid={english}
-}
-
-@book{Kopka2003,
-	author={Kopka, Helmut and Daly, Patrick William},
-	title={Guide to \LaTeX},
-	series={Tools and Techniques for Computer Typesetting},
-	publisher={Addison-Wesley},
-	location={Reading, MA},
-	date={2003},
-	edition={4},
-	langid={english}
-}
-
-@book{Lamport1994,
-	author={Lamport, Leslie},
-	title={{LaTeX}, A Document Preparation System},
-	subtitle={User's Guide and Reference Manual},
-	publisher={Addison-Wesley},
-	location={Reading, MA},
-	date={1994},
-	edition={2},
-	langid={english}
-}
-
-@book{Lamport1995,
-	author={Lamport, Leslie},
-	title={Das {LaTeX}-Handbuch},
-	publisher={Addison-Wesley},
-	location={Reading, MA},
-	date={1995},
-	edition={3},
-	langid={ngerman}
-}
-
-@software{LegendOfZelda1998,
-	author={Miyamoto, Shigeru and Aonuma, Eiji and Koizumi, Yoshiaki},
-	title={The Legend of Zelda: Ocarina of Time},
-	howpublished={N64 Cartridge},
-	publisher={Nintendo},
-	date={1998-11},
-	langid={english}
-}
-
-@thesis{Loimayr2019,
-	author={Loimayr, Nora},
-	title={Utilization of GPU-Based Smoothed Particle Hydrodynamics for Immersive Audiovisal Experiences},
-	type={mathesis},
-	date={2019-11-26},
-	institution={University of Applied Sciences Upper Austria, Interactive Media},
-	location={Hagenberg, Austria},
-	url={https://theses.fh-hagenberg.at/thesis/Loimayr19},
-	langid={english}
-}
-
-@article{Mermin1989,
-	author={Mermin, Nathaniel David},
-	title={What's Wrong with these Equations?},
-	journaltitle={Physics Today},
-	volume={42},
-	number={10},
-	date={1989},
-	pages={9-11},
-	doi={10.1063/1.2811173},
-	langid={english}
-}
-
-@manual{Mittelbach2023,
-	author={Mittelbach, Frank and Schöpf, Rainer and Downes, Michael and Jones, David M. and Carlisle, David},
-	title={The \texttt{amsmath} package},
-	date={2023-05-13},
-	version={2.17o},
-	url={http://mirrors.ctan.org/macros/latex/required/amsmath/amsmath.pdf},
-	langid={english}
-}
-
-@movie{Nosferatu1922,
-	title={Nosferatu -- A Symphony of Horrors},
-	howpublished={Film},
-	date={1922},
-	note={Drehbuch/Regie: F.\ W.\ Murnau. Mit Max Schreck, Gustav von Wangenheim, Greta Schröder.},
-	langid={english}
-}
-
-@manual{Oetiker2021,
-	author={Oetiker, Tobias and Partl, Hubert and Hyna, Irene and Schlegl, Elisabeth},
-	title={The Not So Short Introduction to \LaTeXe},
-	subtitle={Or \LaTeXe in 139 minutes},
-	date={2021-03-09},
-	version={6.4},
-	url={http://mirrors.ctan.org/info/lshort/english/lshort.pdf},
-	langid={english}
-}
-
-@legislation{OoeRaumordnungsgesetz1994,
-	title={Landesgesetz vom 6. Oktober 1993 über die Raumordnung im Land Oberösterreich},
-	titleaddon={Oö. Raumordnungsgesetz 1994 - Oö. ROG 1994},
-	howpublished={LGBl.Nr. 114/1993 zuletzt geändert durch LGBl.Nr. 125/2020},
-	date={1993-12-23},
-	url={https://www.ris.bka.gv.at/GeltendeFassung.wxe?Abfrage=LrOO&Gesetzesnummer=10000370},
-	langid={ngerman}
-}
-
-@manual{Oostrum2022,
-	author={Oostrum, {Pieter van}},
-	title={The \textrm{fancyhdr} and \textrm{extramarks} packages},
-	date={2022-11-09},
-	version={4.1},
-	url={http://mirrors.ctan.org/macros/latex/contrib/fancyhdr/fancyhdr.pdf},
-	langid={english}
-}
-
-@manual{Pakin2021,
-	author={Pakin, Scott},
-	title={The Comprehensive {\LaTeX} Symbol List},
-	date={2021-05-05},
-	url={http://mirrors.ctan.org/info/symbols/comprehensive/symbols-a4.pdf},
-	langid={english}
-}
-
-@manual{Patashnik1988,
-	author={Patashnik, Oren},
-	title={{BiBTeXing}},
-	date={1988-02-08},
-	url={http://mirrors.ctan.org/biblio/bibtex/base/btxdoc.pdf},
-	langid={english}
-}
-
-@patent{Pike2008,
-	author={Pike, Dion},
-	title={Master-slave communications system and method for a network element},
-	type={US Patent},
-	holder={Alcatel-Lucent SAS},
-	number={7,460,482},
-	date={2008-12-02},
-	url={https://patents.google.com/patent/US7460482},
-	langid={english}
-}
-
-@movie{Psycho1960,
-	title={Psycho},
-	howpublished={Film},
-	date={1960},
-	note={Regie: Alfred Hitchcock, 
-		Drehbuch: Joseph Stefano.
-		Nach dem Roman von Robert Bloch. 
-		Mit Anthony Perkins, Vera Miles, Janet Leigh.},
-	langid={english}
-}
-
-@book{Sedgewick2011,
-	author={Sedgewick, Robert and Wayne, Kevin},
-	title={Algorithms},
-	publisher={Addison-Wesley},
-	location={Reading, MA},
-	date={2011},
-	edition={4},
-	langid={english}
-}
-
-@book{ShostakovichOp110,
-	author={Shostakovich, Dimitri},
-	title={Streichquartett Nr.\ 8 in c-Moll, Op.\ 110},
-	editor={Hans Sikorski},
-	publisher={G. Schirmer},
-	location={New York},
-	date={1960},
-	langid={ngerman}
-}
-
-@manual{Sommerfeldt2023,
-	author={Sommerfeldt, Axel},
-	title={Customizing captions of floating environments},
-	date={2023-07-10},
-	version={3.6},
-	url={http://mirrors.ctan.org/macros/latex/contrib/caption/caption.pdf},
-	langid={english}
-}
-
-@software{SpringFramework,
-	title={Spring Framework},
-	url={https://github.com/spring-projects/spring-framework},
-	langid={english}
-}
-
-@article{Vardavoulia2001,
-	author={Vardavoulia, Maria I. and Andreadis, Ioannis and Tsalides, Phillipos},
-	title={A new vector median filter for colour image processing},
-	journaltitle={Pattern Recognition Letters},
-	volume={22},
-	number={6-7},
-	pages={675-689},
-	date={2001},
-	doi={10.1016/S0167-8655(00)00141-0},
-	langid={english}
-}
-
-@manual{Voss2014,
-	author={Voß, Herbert},
-	title={Math mode},
-	date={2014-01-30},
-	version={2.47},
-	url={http://mirrors.ctan.org/obsolete/info/math/voss/mathmode/Mathmode.pdf},
-	langid={english}
-}
-
-@standard{WHATWGHTMLLivingStandard,
-	author={{Web Hypertext Application Technology Working Group}},
-	shortauthor={WHATWG},
-	title={HTML},
-	titleaddon={Living Standard},
-	date={2023-11-06},
-	url={https://html.spec.whatwg.org/multipage/},
-	langid={english}
-}
-
-@online{WikiReliquienschrein2023,
-	title={Reliquienschrein},
-	url={https://de.wikipedia.org/wiki/Reliquienschrein},
-	date={2023-09-22},
-	urldate={2023-11-06},
-	langid={ngerman}
-}
-
-@online{WikibooksLaTeXLengths2018,
-	title={LaTeX/Lengths},
-	url={https://en.wikibooks.org/wiki/LaTeX/Lengths},
-	date={2018-08-04},
-	urldate={2023-11-06},
-	langid={english}
-}
-
-@audio{Zappa1995,
-	author={Zappa, Frank},
-	title={Freak Out!},
-	type={audiocd},
-	date={1995-05},
-	organization={Rykodisc, New York},
-	langid={english}
-}
-
-%% used in thesis proposal example:
-
-@inproceedings{Finke2008,
-	author={Finke, Matthias and Tang, Anthony and Leung, Rock and Blackstock, Michael},
-	title={Lessons Learned: Game Design for Large Public Displays},
-	booktitle={Proceedings of the 3rd International Conference on Digital Interactive Media in Entertainment and Arts},
-	series={DIMEA '08},
-	pages={26-33},
-	date={2008},
-	publisher={ACM},
-	location={New York, NY, USA},
-	doi={10.1145/1413634.1413644},
-	langid={english}
-}
-
-@inproceedings{Hochleitner2013,
-	author={Hochleitner, Wolfgang and Lankes, Michael and Diephuis, Jeremiah and Hochleitner, Christina},
-	title={Limelight -- Fostering Sociability in a Co-located Game},
-	booktitle={Proceedings of the CHI 2013 Workshop on Designing and Evaluating Sociability in Online Video Games},
-	series={CHI '13},
-	pages={23-28},
-	date={2013},
-	location={Paris, France},
-	langid={english}
-}
-
-@book{Schell2019,
-	author={Schell, Jesse},
-	title={The Art of Game Design},
-	subtitle={A Book of Lenses},
-	publisher={CRC Press},
-	date={2019},
-	edition={3},
-	location={Boca Raton, FL, USA},
-	doi={10.1201/b22101},
-	langid={english}
+@article{georgescu_gpu_2013,
+	title = {{GPU} Acceleration for {FEM}-Based Structural Analysis},
+	volume = {20},
+	issn = {1886-1784},
+	url = {https://doi.org/10.1007/s11831-013-9082-8},
+	doi = {10.1007/s11831-013-9082-8},
+	abstract = {Graphic Processing Units ({GPUs}) have greatly exceeded their initial role of graphics accelerators and have taken a new role of co-processors for computation—heavy tasks. Both hardware and software ecosystems have now matured, with fully {IEEE} compliant double precision and memory correction being supported and a rich set of software tools and libraries being available. This in turn has lead to their increased adoption in a growing number of fields, both in academia and, more recently, in industry. In this review we investigate the adoption of {GPUs} as accelerators in the field of Finite Element Structural Analysis, a design tool that is now essential in many branches of engineering. We survey the work that has been done in accelerating the most time consuming steps of the analysis, indicate the speedup that has been achieved and, where available, highlight software libraries and packages that will enable the reader to take advantage of such acceleration. Overall, we try to draw a high level picture of where the state of the art is currently at.},
+	pages = {111--121},
+	number = {2},
+	journaltitle = {Archives of Computational Methods in Engineering},
+	shortjournal = {Arch Computat Methods Eng},
+	author = {Georgescu, Serban and Chow, Peter and Okuda, Hiroshi},
+	urldate = {2025-02-14},
+	date = {2013-06-01},
+	langid = {english},
+	keywords = {Compute Unify Device Architecture, Element Stiffness Matrice, Global Stiffness Matrix, Iterative Solver, Matrix Solver},
+	file = {Full Text PDF:C\:\\Users\\danwi\\Zotero\\storage\\352VGH3Y\\Georgescu et al. - 2013 - GPU Acceleration for FEM-Based Structural Analysis.pdf:application/pdf},
 }
				`@ -0,0 +1 @@`
				`{"rule":"OXFORD_SPELLING_Z_NOT_S","sentence":"^\\QOptimisation of software\\E$"}`