implementation: finished pre-processing section; updated code
Some checks are pending
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.10) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, 1.6) (push) Waiting to run
CI / Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }} (x64, ubuntu-latest, pre) (push) Waiting to run

This commit is contained in:
Daniel 2025-04-26 13:46:23 +02:00
parent ad2eab2e0a
commit e571fa5bd6
10 changed files with 238 additions and 46 deletions

View File

@ -0,0 +1,174 @@
<mxfile host="app.diagrams.net" agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:137.0) Gecko/20100101 Firefox/137.0" version="26.2.14">
<diagram name="Page-1" id="6PRo98IcIigsbWnrE1av">
<mxGraphModel dx="1181" dy="655" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1169" pageHeight="827" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="gfXG8frgiKgzaB5gouxS-22" value="Interpreter" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="250" y="60" width="100" height="40" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-1" value="Pre-Processing" style="rounded=0;whiteSpace=wrap;html=1;" vertex="1" parent="1">
<mxGeometry x="500" y="60" width="90" height="40" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-2" value="GPU" style="rounded=0;whiteSpace=wrap;html=1;" vertex="1" parent="1">
<mxGeometry x="640" y="60" width="90" height="40" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-3" value="" style="html=1;points=[[0,0,0,0,5],[0,1,0,0,-5],[1,0,0,0,5],[1,1,0,0,-5]];perimeter=orthogonalPerimeter;outlineConnect=0;targetShapes=umlLifeline;portConstraint=eastwest;newEdgeStyle={&quot;curved&quot;:0,&quot;rounded&quot;:0};" vertex="1" parent="1">
<mxGeometry x="295" y="100" width="10" height="440" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-7" value="" style="html=1;points=[[0,0,0,0,5],[0,1,0,0,-5],[1,0,0,0,5],[1,1,0,0,-5]];perimeter=orthogonalPerimeter;outlineConnect=0;targetShapes=umlLifeline;portConstraint=eastwest;newEdgeStyle={&quot;curved&quot;:0,&quot;rounded&quot;:0};" vertex="1" parent="1">
<mxGeometry x="540" y="160" width="10" height="40" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-8" value="expr_to_postfix(expr): ExpressionElement[]" style="html=1;verticalAlign=bottom;endArrow=block;curved=0;rounded=0;entryX=0;entryY=0;entryDx=0;entryDy=5;" edge="1" target="hKyrbmUfddmyC9NB2b_t-7" parent="1" source="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="420" y="185" as="sourcePoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-9" value="" style="html=1;verticalAlign=bottom;endArrow=open;endSize=8;curved=0;rounded=0;exitX=0;exitY=1;exitDx=0;exitDy=-5;dashed=1;" edge="1" source="hKyrbmUfddmyC9NB2b_t-7" parent="1" target="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="420" y="255" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-16" value="intermediate_representations" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="hKyrbmUfddmyC9NB2b_t-9">
<mxGeometry x="-0.008" y="-1" relative="1" as="geometry">
<mxPoint y="-9" as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-10" value="" style="endArrow=none;dashed=1;html=1;rounded=0;entryX=0.5;entryY=1;entryDx=0;entryDy=0;" edge="1" parent="1" source="hKyrbmUfddmyC9NB2b_t-7" target="hKyrbmUfddmyC9NB2b_t-1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="550" y="150" as="sourcePoint" />
<mxPoint x="780" y="260" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-11" value="" style="endArrow=none;dashed=1;html=1;rounded=0;" edge="1" parent="1" target="hKyrbmUfddmyC9NB2b_t-7">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="545" y="540" as="sourcePoint" />
<mxPoint x="539.76" y="260" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-12" value="loop" style="shape=umlFrame;whiteSpace=wrap;html=1;pointerEvents=0;" vertex="1" parent="1">
<mxGeometry x="170" y="140" width="420" height="80" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-14" value="&lt;font style=&quot;font-size: 9px;&quot;&gt;[for each expression]&lt;/font&gt;" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;" vertex="1" parent="1">
<mxGeometry x="170" y="170" width="90" height="20" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-17" value="loop" style="shape=umlFrame;whiteSpace=wrap;html=1;pointerEvents=0;" vertex="1" parent="1">
<mxGeometry x="170" y="360" width="560" height="60" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-18" value="&lt;font style=&quot;font-size: 9px;&quot;&gt;[for each intermediate_representation]&lt;/font&gt;" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;" vertex="1" parent="1">
<mxGeometry x="172" y="393" width="120" height="20" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-20" value="kernel(intermediate_representation, variables, parameters)" style="html=1;verticalAlign=bottom;endArrow=open;curved=0;rounded=0;endFill=0;" edge="1" parent="1">
<mxGeometry relative="1" as="geometry">
<mxPoint x="305" y="393" as="sourcePoint" />
<mxPoint x="685" y="393" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-23" value="" style="endArrow=none;dashed=1;html=1;rounded=0;entryX=0.5;entryY=1;entryDx=0;entryDy=0;" edge="1" parent="1" source="hKyrbmUfddmyC9NB2b_t-34" target="hKyrbmUfddmyC9NB2b_t-2">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="610" y="250" as="sourcePoint" />
<mxPoint x="660" y="200" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-26" value="" style="html=1;points=[[0,0,0,0,5],[0,1,0,0,-5],[1,0,0,0,5],[1,1,0,0,-5]];perimeter=orthogonalPerimeter;outlineConnect=0;targetShapes=umlLifeline;portConstraint=eastwest;newEdgeStyle={&quot;curved&quot;:0,&quot;rounded&quot;:0};" vertex="1" parent="1">
<mxGeometry x="680" y="450" width="10" height="50" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-27" value="read_results()" style="html=1;verticalAlign=bottom;endArrow=block;curved=0;rounded=0;entryX=0;entryY=0;entryDx=0;entryDy=5;" edge="1" target="hKyrbmUfddmyC9NB2b_t-26" parent="1" source="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="305" y="444" as="sourcePoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-28" value="resultMatrix" style="html=1;verticalAlign=bottom;endArrow=open;dashed=1;endSize=8;curved=0;rounded=0;exitX=0;exitY=1;exitDx=0;exitDy=-5;" edge="1" source="hKyrbmUfddmyC9NB2b_t-26" parent="1" target="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry x="0.0012" relative="1" as="geometry">
<mxPoint x="305" y="494.0000000000001" as="targetPoint" />
<mxPoint as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-30" value="" style="endArrow=none;dashed=1;html=1;rounded=0;" edge="1" parent="1" target="hKyrbmUfddmyC9NB2b_t-26">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="685" y="540" as="sourcePoint" />
<mxPoint x="710" y="390" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-35" value="send_data(variables)" style="html=1;verticalAlign=bottom;endArrow=block;curved=0;rounded=0;entryX=0;entryY=0;entryDx=0;entryDy=5;" edge="1" target="hKyrbmUfddmyC9NB2b_t-34" parent="1" source="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="720" y="225" as="sourcePoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-36" value="" style="html=1;verticalAlign=bottom;endArrow=open;dashed=1;endSize=8;curved=0;rounded=0;exitX=0;exitY=1;exitDx=0;exitDy=-5;" edge="1" source="hKyrbmUfddmyC9NB2b_t-34" parent="1" target="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="720" y="255" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-37" value="" style="endArrow=none;dashed=1;html=1;rounded=0;entryX=0.5;entryY=1;entryDx=0;entryDy=0;" edge="1" parent="1" source="hKyrbmUfddmyC9NB2b_t-38" target="hKyrbmUfddmyC9NB2b_t-34">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="700" y="349" as="sourcePoint" />
<mxPoint x="700" y="120" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-34" value="" style="html=1;points=[[0,0,0,0,5],[0,1,0,0,-5],[1,0,0,0,5],[1,1,0,0,-5]];perimeter=orthogonalPerimeter;outlineConnect=0;targetShapes=umlLifeline;portConstraint=eastwest;newEdgeStyle={&quot;curved&quot;:0,&quot;rounded&quot;:0};" vertex="1" parent="1">
<mxGeometry x="680" y="240" width="10" height="20" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-39" value="send_data(parameters)" style="html=1;verticalAlign=bottom;endArrow=block;curved=0;rounded=0;entryX=0;entryY=0;entryDx=0;entryDy=5;" edge="1" target="hKyrbmUfddmyC9NB2b_t-38" parent="1" source="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="750" y="288" as="sourcePoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-40" value="" style="html=1;verticalAlign=bottom;endArrow=open;dashed=1;endSize=8;curved=0;rounded=0;exitX=0;exitY=1;exitDx=0;exitDy=-5;" edge="1" source="hKyrbmUfddmyC9NB2b_t-38" parent="1" target="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="750" y="358" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-38" value="" style="html=1;points=[[0,0,0,0,5],[0,1,0,0,-5],[1,0,0,0,5],[1,1,0,0,-5]];perimeter=orthogonalPerimeter;outlineConnect=0;targetShapes=umlLifeline;portConstraint=eastwest;newEdgeStyle={&quot;curved&quot;:0,&quot;rounded&quot;:0};" vertex="1" parent="1">
<mxGeometry x="680" y="280" width="10" height="20" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-42" value="" style="html=1;points=[[0,0,0,0,5],[0,1,0,0,-5],[1,0,0,0,5],[1,1,0,0,-5]];perimeter=orthogonalPerimeter;outlineConnect=0;targetShapes=umlLifeline;portConstraint=eastwest;newEdgeStyle={&quot;curved&quot;:0,&quot;rounded&quot;:0};" vertex="1" parent="1">
<mxGeometry x="680" y="320" width="10" height="21" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-43" value="send_data(intermediate_representations)" style="html=1;verticalAlign=bottom;endArrow=block;curved=0;rounded=0;entryX=0;entryY=0;entryDx=0;entryDy=5;" edge="1" target="hKyrbmUfddmyC9NB2b_t-42" parent="1" source="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="820" y="325" as="sourcePoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-44" value="" style="html=1;verticalAlign=bottom;endArrow=open;dashed=1;endSize=8;curved=0;rounded=0;exitX=0;exitY=1;exitDx=0;exitDy=-5;" edge="1" source="hKyrbmUfddmyC9NB2b_t-42" parent="1" target="hKyrbmUfddmyC9NB2b_t-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="820" y="395" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-45" value="" style="endArrow=none;dashed=1;html=1;rounded=0;" edge="1" parent="1" source="hKyrbmUfddmyC9NB2b_t-42" target="hKyrbmUfddmyC9NB2b_t-38">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="710" y="310" as="sourcePoint" />
<mxPoint x="710" y="290" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-46" value="" style="endArrow=none;dashed=1;html=1;rounded=0;" edge="1" parent="1" source="hKyrbmUfddmyC9NB2b_t-42" target="hKyrbmUfddmyC9NB2b_t-26">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="710" y="363" as="sourcePoint" />
<mxPoint x="710" y="330" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-49" value="interpret(expressions)" style="html=1;verticalAlign=bottom;startArrow=circle;startFill=1;endArrow=open;startSize=6;endSize=8;curved=0;rounded=0;" edge="1" parent="1">
<mxGeometry x="0.1057" width="80" relative="1" as="geometry">
<mxPoint x="172" y="124" as="sourcePoint" />
<mxPoint x="295" y="124" as="targetPoint" />
<mxPoint as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-50" value="" style="ellipse;html=1;shape=endState;fillColor=#000000;strokeColor=default;" vertex="1" parent="1">
<mxGeometry x="180" y="520" width="20" height="20" as="geometry" />
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-51" value="" style="endArrow=open;html=1;rounded=0;entryX=1;entryY=0.5;entryDx=0;entryDy=0;dashed=1;endFill=0;" edge="1" parent="1" source="hKyrbmUfddmyC9NB2b_t-3" target="hKyrbmUfddmyC9NB2b_t-50">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="230" y="640" as="sourcePoint" />
<mxPoint x="280" y="590" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hKyrbmUfddmyC9NB2b_t-52" value="resultMatrix" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="hKyrbmUfddmyC9NB2b_t-51">
<mxGeometry x="0.1271" relative="1" as="geometry">
<mxPoint x="8" y="-10" as="offset" />
</mxGeometry>
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>

View File

@ -26,8 +26,9 @@ function interpret_gpu(exprs::Vector{Expr}, X::Matrix{Float32}, p::Vector{Vector
ncols = size(X, 2)
results = Matrix{Float32}(undef, ncols, length(exprs))
# TODO: create CuArray for variables here already, as they never change
for i in 1:repetitions # Simulate parameter tuning -> local search (X remains the same, p gets changed in small steps and must be performed sequentially)
for i in 1:repetitions # Simulate parameter tuning -> local search (X remains the same, p gets changed in small steps and must be performed sequentially, which it is with this impl)
results = Interpreter.interpret(exprs, X, p)
end
@ -40,8 +41,9 @@ function evaluate_gpu(exprs::Vector{Expr}, X::Matrix{Float32}, p::Vector{Vector{
ncols = size(X, 2)
results = Matrix{Float32}(undef, ncols, length(exprs))
# TODO: create CuArray for variables here already, as they never change
for i in 1:repetitions # Simulate parameter tuning -> local search (X remains the same, p gets changed in small steps and must be performed sequentially)
for i in 1:repetitions # Simulate parameter tuning -> local search (X remains the same, p gets changed in small steps and must be performed sequentially, which it is with this impl)
results = Transpiler.evaluate(exprs, X, p)
end

View File

@ -9,6 +9,7 @@ export ExpressionElement
@enum Operator ADD=1 SUBTRACT=2 MULTIPLY=3 DIVIDE=4 POWER=5 ABS=6 LOG=7 EXP=8 SQRT=9
@enum ElementType EMPTY=0 FLOAT32=1 OPERATOR=2 INDEX=3
const binary_operators = [ADD, SUBTRACT, MULTIPLY, DIVIDE, POWER]
const unary_operators = [ABS, LOG, EXP, SQRT]
struct ExpressionElement
@ -17,12 +18,13 @@ struct ExpressionElement
end
const PostfixType = Vector{ExpressionElement}
const cache = Dict{Expr, PostfixType}()
"
Converts a julia expression to its postfix notation.
NOTE: All 64-Bit values will be converted to 32-Bit. Be aware of the lost precision
NOTE: All 64-Bit values will be converted to 32-Bit. Be aware of the lost precision.
NOTE: This function is not thread save, especially cache access is not thread save
"
function expr_to_postfix(expr::Expr)::PostfixType
function expr_to_postfix(expr::Expr, cache::Dict{Expr, PostfixType})::PostfixType
if haskey(cache, expr)
return cache[expr]
end
@ -34,7 +36,7 @@ function expr_to_postfix(expr::Expr)::PostfixType
arg = expr.args[j]
if typeof(arg) === Expr
append!(postfix, expr_to_postfix(arg))
append!(postfix, expr_to_postfix(arg, cache))
elseif typeof(arg) === Symbol # variables/parameters
# maybe TODO: replace the parameters with their respective values, as this might make the expr evaluation faster
exprElement = convert_to_ExpressionElement(convert_var_to_int(arg))
@ -56,6 +58,8 @@ function expr_to_postfix(expr::Expr)::PostfixType
if operator in unary_operators
push!(postfix, convert_to_ExpressionElement(operator))
end
cache[expr] = postfix
return postfix
end

View File

@ -6,6 +6,8 @@ using ..Utils
export interpret
const cacheFrontend = Dict{Expr, PostfixType}()
"Interprets the given expressions with the values provided.
# Arguments
- expressions::Vector{ExpressionProcessing.PostfixType} : The expressions to execute in postfix form
@ -13,10 +15,9 @@ export interpret
- parameters::Vector{Vector{Float32}} : The parameters to use. Each Vector contains the values for the parameters p1..pn. The number of parameters can be different for every expression
"
function interpret(expressions::Vector{Expr}, variables::Matrix{Float32}, parameters::Vector{Vector{Float32}})::Matrix{Float32}
exprs = Vector{ExpressionProcessing.PostfixType}(undef, length(expressions))
@inbounds for i in eachindex(expressions)
exprs[i] = ExpressionProcessing.expr_to_postfix(expressions[i])
exprs[i] = ExpressionProcessing.expr_to_postfix(expressions[i], cacheFrontend)
end
variableCols = size(variables, 2) # number of variable sets to use for each expression

View File

@ -7,21 +7,23 @@ using ..Utils
const BYTES = sizeof(Float32)
const Operand = Union{Float32, String} # Operand is either fixed value or register
cache = Dict{Expr, CuFunction}() # needed if multiple runs with the same expr but different parameters are performed
const cacheFrontend = Dict{Expr, PostfixType}()
const transpilerCache = Dict{Expr, CuFunction}() # needed if multiple runs with the same expr but different parameters are performed
function evaluate(expressions::Vector{Expr}, variables::Matrix{Float32}, parameters::Vector{Vector{Float32}})::Matrix{Float32}
varRows = size(variables, 1)
variableCols = size(variables, 2)
kernels = Vector{CuFunction}(undef, length(expressions))
# TODO: test this again with multiple threads. The first time I tried, I was using only one thread
# Test this parallel version again when doing performance tests. With the simple "functionality" tests this took 0.03 seconds while sequential took "0.00009" seconds
# Threads.@threads for i in eachindex(expressions)
# cacheLock = ReentrantLock()
# cacheHit = false
# lock(cacheLock) do
# if haskey(cache, expressions[i])
# kernels[i] = cache[expressions[i]]
# if haskey(transpilerCache, expressions[i])
# kernels[i] = transpilerCache[expressions[i]]
# cacheHit = true
# end
# end
@ -42,16 +44,16 @@ function evaluate(expressions::Vector{Expr}, variables::Matrix{Float32}, paramet
# mod = CuModule(image)
# kernels[i] = CuFunction(mod, "ExpressionProcessing")
# @lock cacheLock cache[expressions[i]] = kernels[i]
# @lock cacheLock transpilerCache[expressions[i]] = kernels[i]
# end
@inbounds for i in eachindex(expressions)
if haskey(cache, expressions[i])
kernels[i] = cache[expressions[i]]
if haskey(transpilerCache, expressions[i])
kernels[i] = transpilerCache[expressions[i]]
continue
end
formattedExpr = ExpressionProcessing.expr_to_postfix(expressions[i])
formattedExpr = ExpressionProcessing.expr_to_postfix(expressions[i], cacheFrontend)
kernel = transpile(formattedExpr, varRows, Utils.get_max_inner_length(parameters), variableCols, i-1) # i-1 because julia is 1-based but PTX needs 0-based indexing
linker = CuLink()
@ -61,7 +63,7 @@ function evaluate(expressions::Vector{Expr}, variables::Matrix{Float32}, paramet
mod = CuModule(image)
kernels[i] = CuFunction(mod, "ExpressionProcessing")
cache[expressions[i]] = kernels[i]
transpilerCache[expressions[i]] = kernels[i]
end
cudaVars = CuArray(variables) # maybe put in shared memory (see PerformanceTests.jl for more info)
@ -78,7 +80,7 @@ function evaluate(expressions::Vector{Expr}, variables::Matrix{Float32}, paramet
cudacall(kernel, (CuPtr{Float32},CuPtr{Float32},CuPtr{Float32}), cudaVars, cudaParams, cudaResults; threads=threads, blocks=blocks)
end
return cudaResults
end

View File

@ -73,16 +73,17 @@ end
# Add /usr/local/cuda/bin in .bashrc to PATH to access ncu and nsys (depending how well this works with my 1080 do it on my machine, otherwise re do the tests and perform them on FH PCs)
# University setup at 10.20.1.7 if needed
compareWithCPU = true
compareWithCPU = false
suite = BenchmarkGroup()
suite["CPU"] = BenchmarkGroup(["CPUInterpreter"])
suite["GPUI"] = BenchmarkGroup(["GPUInterpreter"])
suite["GPUT"] = BenchmarkGroup(["GPUTranspiler"])
varsets_small = 100
varsets_medium = 1000
varsets_large = 10000
# TODO: see CpuInterpreterTests.jl to see how all data is loaded and implement this here
varsets_small = 1000 # 1k should be absolute minimum
varsets_medium = 10000
varsets_large = 100000 # 100k should be absolute maximum (although not as strict as minimum)
if compareWithCPU
X_small = randn(Float32, varsets_small, 5)
@ -112,7 +113,7 @@ suite["GPUT"]["large varset"] = @benchmarkable evaluate_gpu(exprsGPU, X_large_GP
loadparams!(suite, BenchmarkTools.load("params.json")[1], :samples, :evals, :gctrial, :time_tolerance, :evals_set, :gcsample, :seconds, :overhead, :memory_tolerance)
results = run(suite, verbose=true, seconds=180)
results = run(suite, verbose=true, seconds=3600) # 1 hour because of CPU. lets see if more is needed
if compareWithCPU
medianCPU = median(results["CPU"])

View File

@ -3,7 +3,7 @@ using CUDA
using .Transpiler
using .Interpreter
varsets_medium = 1000
varsets_medium = 10000
X = randn(Float32, 5, varsets_medium)
exprsGPU = [

View File

@ -27,14 +27,14 @@ Again, the question arises if the performance of CUDA.jl is sufficient to be use
\section{Pre-Processing}
% Talk about why this needs to be done and how it is done (the why is basically: simplifies evaluation/transpilation process; the how is in ExpressionProcessing.jl (the why is probably not needed because it is explained in concept and design))
The pre-processing or frontend step is very important. As already explained in Chapter \ref{cha:conceptdesign} it is responsible for ensuring the given expressions are valid and that they are transformed into an intermediate representation. This section aims at explaining how the intermediate representation is implemented, as well as how it is generated from a mathematical expression.
The pre-processing or frontend step is very important. As already explained in Chapter \ref{cha:conceptdesign}, it is responsible for ensuring that the given expressions are valid and that they are transformed into an intermediate representation. This section aims to explain how the intermediate representation is implemented, as well as how it is generated from a mathematical expression.
\subsection{Intermediate Representation}
\label{sec:ir}
% Talk about how it looks and why it was chosen to look like this
The intermediate representation is mainly designed to be lightweight and easily transferrable to the GPU. Since the interpreter is running on the GPU this was a very important consideration. Since the transpilation process is performed on the CPU and is therefore very flexible in terms of the intermediate representation, the focus lied mostly on being efficient for the interpreter.
The intermediate representation is mainly designed to be lightweight and easily transferrable to the GPU. Since the interpreter runs on the GPU, this was a very important consideration. Because the transpilation process is done on the CPU, and is therefore very flexible in terms of the intermediate representation, the focus was mainly on being efficient for the interpreter.
The intermediate representation can not take on any form. While it has already been defined that expressions are converted to postfix notation, there are different ways of storing the data. The first logical decision is to create an array where each entry represents a token. On the CPU it would be possible to define each entry to be a pointer to the token object. Each of these objects could be of a different type, for example an object holding a constant value while another object holds an operator. Additionally, each of these objects could include its own logic on what to do when it is encountered during the evaluation process. However, on the GPU, this is not possible, as an array entry must hold a value and not a pointer to another memory location. Furthermore, if this would be possible, it would be a bad idea. As explained in Section \ref{sec:memory_model}, when loading data from memory, larger chunks are retrieved at once. If the data is scattered around the GPUs memory, a lot of unwanted data is transferred. This can be seen in figure \ref{fig:excessive-memory-transfer}, where if the data is stored consecutive, much fewer data operations and much less data in general needs to be transferred.
The intermediate representation cannot take any form. While it has already been defined that expressions are converted to postfix notation, there are several ways to store the data. The first logical choice is to create an array where each entry represents a token. On the CPU it would be possible to define each entry as a pointer to the token object. Each of these objects could be of a different type, for example one object that holds a constant value while another object holds an operator. In addition, each of these objects could contain its own logic about what to do when it is encountered during the evaluation process. However, on the GPU, this is not possible, as an array entry must hold a value and not a pointer to another memory location. Furthermore, even if it were possible, it would be a bad idea. As explained in Section \ref{sec:memory_model}, when loading data from global memory, larger chunks are retrieved at once. If the data is scattered across the GPU's global memory, a lot of unwanted data will be transferred. This can be seen in Figure \ref{fig:excessive-memory-transfer}, where if the data is stored sequentially, far fewer data operations and far less data in general needs to be transferred.
\begin{figure}
\centering
@ -43,15 +43,17 @@ The intermediate representation can not take on any form. While it has already b
\label{fig:excessive-memory-transfer}
\end{figure}
Because of this and because the GPU does not allow pointers, another solution is required. Instead of storing pointers to objects of different types in an array, it is possible to store one object with meta information. The object therefore contains the type of the value stored, and the value itself as described in \ref{sec:pre-processing}. The four types that need to be stored in this object, differ significantly in the value they represent.
Because of this and because the GPU does not allow pointers, another solution is required. Instead of storing pointers to objects of different types in an array, it is possible to store one object with meta information. The object thus contains the type of the stored value, and the value itself, as described in Section \ref{sec:pre-processing}. The four types that need to be stored in this object, differ significantly in the value they represent.
Variables and parameters are very simple to store. Because they represent indices to the variable matrix or the parameter vector, this (integer) index can be stored as is in the value property of the object.
Variables and parameters are very simple to store. Because they represent indices to the variable matrix or the parameter vector, this (integer) index can be stored as is in the value property of the object. The type can then be used to determine whether it is an index to a variable or a parameter access.
Constants are also very simple, as they represent a single 32-bit floating point value. However, because of the variables and parameters, the value property is already defined as an integer and not as a floating point number. Unlike languages like Python, where every number is a floating point number, in Julia they are different and can therefore not be stored in the same property. Creating a second property only for constants is not feasible, as this would introduce 4 bytes per object that need to be sent to the GPU which most of the time does not contain a value. To avoid sending unnecessary bytes, a mechanism provided by Julia called reinterpret can be used. This allows the bits of a variable of one type, to be treated as the bits of another type. The bits used to represent a floating point value are then interpreted as an integer and can be stored in the same property. On the GPU, the same concept can be applied to interpret the integer value as a floating point value again for further computations. This is also the reason why the original type of the value needs to be stored alongside the value, to correctly interpret the stored bits and in turn correctly evaluate the expressions.
Constants are also very simple, as they represent a single 32-bit floating point value. However, because of the variables and parameters, the value property is already defined as an integer and not as a floating point number. Unlike languages like Python, where every number is a floating point number, in Julia they are different and therefore cannot be stored in the same property. Creating a second property for constants only is not feasible, as this would introduce 4 bytes per object that need to be sent to the GPU which most of the time does not contain a defined value.
Operators are very different from variables, parameters and constants. Because they represent an operation, rather than a value, another option is needed to store them. An operator can be mapped to a number, to identify the operation. For example if the addition operator is mapped to the integer number one, if during evaluation, the evaluator comes across an object of type operator and a value of one, it knows which operation it needs to perform. This can be done for all operators which means it is possible to store them in the same object with the same property and only the type must be specified. The mapping of an operator to a value is often called an operation code or opcode, and each operator is represented as one opcode.
To avoid sending unnecessary bytes, a mechanism provided by Julia called reinterpret can be used. This allows the bits of a variable of one type, to be treated as the bits of another type. The bits used to represent a floating point number are then interpreted as an integer and can be stored in the same property. On the GPU, the same concept can be applied to reinterpret the integer value as a floating point value for further calculations. This is also the reason why the original type of the value needs to be stored alongside the value in order for the stored to be interpreted correctly and the expressions to be evaluated correctly.
With this, the intermediate representation is defined. Figure \ref{fig:pre-processing-result-impl} shows how a simple expression would look after the pre-processing step.
Operators are very different from variables, parameters and constants. Because they represent an operation rather than a value, a different way of storing them is required. An operator can be mapped to a number to identify the operation. For example, if the addition operator is mapped to the integer $1$, then when the evaluator encounters an object of type operator and a value of $1$, it will know which operation to perform. This can be done for all operators which means it is possible to store them in the same object with the same property. and only the type needs to be specified. The mapping of an operator to a value is often called an operation code, or opcode, and each operator is represented as one opcode.
With this, the intermediate representation is defined. Figure \ref{fig:pre-processing-result-impl} shows how a simple expression would look after the pre-processing step. Note that the vluae $2.5$ has been reinterpreted as an integer, resulting in the seemingly random value.
\begin{figure}
\centering
\includegraphics[width=.9\textwidth]{pre-processing_result_impl.png}
@ -61,14 +63,14 @@ With this, the intermediate representation is defined. Figure \ref{fig:pre-proce
\subsection{Processing}
Now that the intermediate representation has been defined, the processing step can be implemented. This section describes the structure of the expressions and how they are processed. Furthermore, the process of parsing the expressions to ensure their validity and the conversion into the intermediate representation is explained.
Now that the intermediate representation has been defined, the processing step can be implemented. This section describes the structure of the expressions and how they are processed. It also explains the process of parsing the expressions to ensure their validity and converting them into the intermediate representation.
\subsubsection{Expressions}
With the pre-processing step the first modern feature of Julia has been used. As already mentioned, Julia offers extensive support for meta-programming which is important for this step. Julia represents its own code as a data structure, which allows a developer to manipulate the code at runtime. The code is stored in the so-called Expr object as an abstract syntax tree (AST) which is the most minimal tree representation of a given expression. As a result, mathematical expressions can also be represented as such an Expr object instead of a simple string. Which is a major benefit, as these expression can then easily be manipulated by the symbolic regression algorithm. This is the main reason why the pre-processing step requires the expressions to be provided as an Expr object instead of a string.
With the pre-processing step, the first modern feature of Julia has been used. As already mentioned, Julia provides extensive support for meta-programming, which is important for this step. Julia represents its own code as a data structure, which allows a developer to manipulate the code at runtime. The code is stored in the so-called Expr object as an Abstract Syntax Tree (AST), which is the most minimal tree representation of a given expression. As a result, mathematical expressions can also be represented as such an Expr object instead of a simple string. Which is a major benefit, because these expressions can then be easily manipulated by the symbolic regression algorithm. This is the main reason why the pre-processing step requires the expressions to be provided as an Expr object instead of a string.
Another major benefit of the expressions being stored in the Expr object and therefore as an AST, is the included operator precedence. Because it is a tree where the leaves are the constants, variables or parameters (also called terminal symbols) and the nodes are the operators, the correct result will be calculated when evaluating the tree from bottom to top. As seen in Figure \ref{fig:expr-ast} the expression $1 + x_1 \, \log(p_1)$ when parsed as an AST contains the correct operator precedence. First the bottom most subtree $\log(p_1)$ must be evaluated before the multiplication and after that the addition can be evaluated.
Another major benefit of the expressions being stored in the Expr object and therefore as an AST, is the included operator precedence. Because it is a tree where the leaves are the constants, variables or parameters (also called terminal symbols) and the nodes are the operators, the correct result will be calculated when evaluating the tree from bottom to top. As can be seen in Figure \ref{fig:expr-ast}, the expression $1 + x_1 \, \log(p_1)$, when parsed as an AST, contains the correct operator precedence. First the bottom most subtree $\log(p_1)$ must be evaluated before the multiplication, and after that, the addition can be evaluated.
It needs to be mentioned however, that Julia stores the tree as a list of arrays to allow one node to have as many children as needed. For example the expression $1+2+\dots+n$ only contains additions which is a commutative operation, meaning the order of operations is irrelevant. The AST for this expression would contain the operator at the first position in the array and the values at the following positions. This ensures that the AST is as minimal as possible.
It should be noted however, that Julia stores the tree as a list of arrays to allow a node to have as many children as necessary. For example the expression $1+2+\dots+n$ contains only additions, which is a commutative operation, meaning that the order of operations is irrelevant. The AST for this expression would contain the operator at the first position in the array and the values at the following positions. This ensures that the AST is as minimal as possible.
\begin{figure}
\centering
@ -81,8 +83,8 @@ It needs to be mentioned however, that Julia stores the tree as a list of arrays
To convert the AST of an expression into the intermediate representation, a top-down traversal of the tree is required. The steps for this are as follows:
\begin{enumerate}
\item Extract the operator for later use.
\item Convert all constants, variables and parameters to the object (expression element) described in Section \ref{sec:ir}.
\item Extract the operator and convert it to its opcode for later use.
\item Convert all constants, variables and parameters and operators to the object (expression element) described in Section \ref{sec:ir}.
\item Append the expression elements to the postfix expression.
\item If the operator is a binary operator and there are more than two expression elements, append the operator after the first two elements and then after each element.
\item If a subtree exists, apply all previous steps and append it to the existing postfix expression.
@ -90,20 +92,26 @@ To convert the AST of an expression into the intermediate representation, a top-
\item Return the generated postfix expression/intermediate representation.
\end{enumerate}
As explained above, a node of a binary operator can have $n$ children. In these cases, additional handling is required to ensure correct conversion. This handling is condensed in step 4 of the list above. Essentially, after the first two elements, the operator must be added and for every following element, the operator must be added as well. The expression $1+2+3+4$ will be converted to the AST $+\,1\,2\,3\,4$ and without step 4, the expression would be $1\,2\,3\,4\,+$. If the operator is added after the first two elements and then after each element, the correct expression $1\,2\,+\,3\,+\,4\,+$ will be generated.
The validation of the expression is performed throughout the parsing process. Validating that only correct operators are used is performed in step 1. To be able to convert the operator to its corresponding opcode, it must be validated that an opcode exists for it, and therefore whether it is valid or not. Similarly, converting the tokens into an expression element object ensures that only valid variables and parameters are present in the expression. This is handled in step 2.
% talk about the process of parsing.
% Include code fragments
% probably point out how meta-programming is used (more detailed than above)
% talk about how invalid expressions are handled
% talk about generation of intermediate representation
% especially talk about cache
As explained above, a node of a binary operator can have $n$ children. In these cases, additional handling is required to ensure correct conversion. This handling is summarised in step 4. Essentially, the operator must be added after the first two elements, and for each subsequent element, the operator must also be added. The expression $1+2+3+4$ is converted to the AST $+\,1\,2\,3\,4$ and without step 4 the postfix expression would be $1\,2\,3\,4\,+$. If the operator is added after the first two elements and then after each subsequent element, the correct postfix expression $1\,2\,+\,3\,+\,4\,+$ will be generated.
Each subtree of the AST is its own separate AST, which can be converted to postfix notation in the same way the whole AST can be converted. This means that the algorithm only needs to be able to handle leave nodes, and when it encounters a subtree, it recursively calls itself to parse the remaining AST. Step 5 indicates this recursive behaviour.
While the same expression usually occurs only once, sub-expressions can occur multiple times. In the example in Figure \ref{fig:expr-ast}, the whole expression $1 + x_1 \, \log(p_1)$ is unlikely to be generated more than once by the symbolic regression algorithm. However, the sub-expression $\log(p_1)$ is much more likely to be generated multiple times. This means that the generation of the intermediate representation for this subtree only needs to be done once and can be reused later. Therefore, a cache can be used to store the intermediate representation for this sub-expression and access it again later to eliminate the parsing overhead.
Caching can be applied to both individual sub-expressions as well as the entire expression. While it is unlikely for the whole expression to recur frequently, either as a whole or as part of a larger expression, implementing a cache will not degrade performance and will, in fact, enhance it if repetitions do occur. In the context of parameter optimisation, where the evaluators are employed, expressions will recur, making full-expression caching advantageous. The primary drawback of caching is the increased use of RAM. However, given that RAM is plentiful in modern systems, this should not pose a significant issue.
\section{Interpreter}
Talk about how the interpreter has been developed.
UML-Ablaufdiagram
\begin{figure}
\centering
\includegraphics[width=.95\textwidth]{interpreter_sequence_diagram.png}
\caption{The sequence diagram of the interpreter.}
\label{fig:interpreter-sequence}
\end{figure}
main loop; kernel transpiled by CUDA.jl into PTX and then executed

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.