NEWG2Q2EYR3BOPWVN44EXIPVH2GH23BYFHJXUSNYWWT7WWDEAIKQC Q3TJZVCBWD7FDMS4MWTT7VUNOI3R3MQCOU6JJCJ5UVIDJPXKMLWAC MQJMCLGFSBSFJ3JPCIHQSBZS4XZPGJI7NGGRASX5UVGJO6HES25AC JCK77YLTPL6VV5ZOTMGUJSFZ5DNMS3MZGP5UQLXYW6WSMEK6GAQAC R2A2OD7J5ATBRR7W4QPTZDVH6GGHQ7XBX2JE5R3GKVS5WIQ43O3AC TKJMPTB73W5TLLTNUOC4FGRQ7DHE5GRBTL7C5QBXRCNQPMJKGEQAC QIVGMR6INKBVT5IFCO2UJJQEO4PVW3ZM7IDTCTCOKOX7VKG7UPFAC 7JUQQYQCE7G3ESFJETNGEPBCYGR5NGMAWHFPWI24RSQJPFM5YCTQC 567NUQ633R5ESTE2LWXVUGM7XEZGJ65IRKPOCDZJS3JE5RAMR5KAC KVSF5AFJEIMIYAC264OUAA56SGP5MFAUG3AECKOTUPIXPGULVFQQC GNUONJQNZG2MNEE55RW3XOQP3UICEBOKLVLKV42H3K64IBGLUGIQC QCVDPRXH57UFLU4YV3P2I5RQZGCBUKROYUZHDWOCHL43KDYN7MPAC M3WK5HF3XU5MN2JKJMRJVDMC74F4MZAQ23FTGJRW4CGU5ZL72MHAC 45HZPHVHYJ2BXATQ4MKCMPMMPZFBNJECY7JV3WX63BSA4CSQN6HAC M5WHUJXX7KNK425RXFM2U6WT3M4PARJFO7T7MLN2IBDEYDKPXTMAC DDNESFMD4LGNTDGTCW27FQPCOR5Y5MJWN2TWCSIRNBK6QGBITBRQC 6PTHERE26NXSC3ZXSI5UYKWTMPYZD4WMYFBJKZE3MRRTKC7MIU6QC YS5O2UANXCWRI3W7KMRKKK7OT7R6V3QDL4JS4DRLXM2EZUIPFALAC A5NVU6DZYOG6GRXJU6DZLCUQTKA7LPUWOGQ4NCJSY6FJB6GSP3OAC UVN5HOW3H2OVESJ62UFNQNS6VXHI5TRV22HQ44EOYU42453SDD2QC EIPHTX56WTRFBMBNA4VYMKK4L7GWHA4Q2AQ3BCG2ZGZB3DBRENBQC QMVJ2ZKX5ALAMMIL7E47ZQS67QFZJ7Z3HAY5G7X72KPEQEHWRGCQC @InCollection{barett2009smt,author = {Clark W. Barrett and Roberto Sebastiani and Sanjit A. Seshia and Cesare Tinelli},booktitle = {Handbook of Satisfiability},publisher = {{IOS} Press},title = {Satisfiability Modulo Theories},year = {2009},editor = {Armin Biere and Marijn Heule and Hans van Maaren and Toby Walsh},pages = {825--885},series = {Frontiers in Artificial Intelligence and Applications},volume = {185},bibsource = {dblp computer science bibliography, https://dblp.org},biburl = {https://dblp.org/rec/series/faia/BarrettSST09.bib},doi = {10.3233/978-1-58603-929-5-825},file = {:/home/paradyx/MA/literatur/walsh2009sat.pdf:PDF},timestamp = {Fri, 06 May 2022 08:00:40 +0200},url = {https://doi.org/10.3233/978-1-58603-929-5-825},}@InProceedings{abraham17smt,author = {Erika {\'{A}}brah{\'{a}}m and Gereon Kremer},booktitle = {19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, {SYNASC} 2017, Timisoara, Romania, September 21-24, 2017},title = {{SMT} Solving for Arithmetic Theories: Theory and Tool Support},year = {2017},editor = {Tudor Jebelean and Viorel Negru and Dana Petcu and Daniela Zaharie and Tetsuo Ida and Stephen M. Watt},pages = {1--8},publisher = {{IEEE} Computer Society},bibsource = {dblp computer science bibliography, https://dblp.org},biburl = {https://dblp.org/rec/conf/synasc/AbrahamK17.bib},doi = {10.1109/SYNASC.2017.00009},timestamp = {Fri, 24 Mar 2023 00:01:57 +0100},url = {https://doi.org/10.1109/SYNASC.2017.00009},}@Book{walsh2009smt,editor = {Armin Biere and Marijn Heule and Hans van Maaren and Toby Walsh},publisher = {{IOS} Press},title = {Handbook of Satisfiability},year = {2009},isbn = {978-1-58603-929-5},series = {Frontiers in Artificial Intelligence and Applications},volume = {185},bibsource = {dblp computer science bibliography, https://dblp.org},biburl = {https://dblp.org/rec/series/faia/2009-185.bib},file = {:/home/paradyx/MA/literatur/walsh2009sat.pdf:PDF},timestamp = {Fri, 06 May 2022 08:00:40 +0200},}
This chapter will present the implementation of a (sub-\gls{scc})partial-evaluation with property-based abstraction as described in the previous\Cref{ch:theory}. In \Sref{sec:implementation} the implementation is describedusing pseudo-code focusing mostly on the practical aspects of the partialevaluation. \Sref{sec:abstract_domain} will briefly discuss the possibleabstract domains and the reasons why we decided to use the abstract domain ofPolyhedra. \Sref{sec:loop_heads} will focus on the different possibilities tomark locations for abstraction, which is part of the history-dependent aspect ofthe abstraction layer. \Sref{sec:technical_details} describes the technicaldetails of the actual implementation and finally in \Sref{sec:performance} thecorrectness and performance of the implementation is evaluated practically.
Recall \ref{def:runtime_complexity}; the newly constructed refined program shallhave they have the same worst-case expected runtime complexity as the originalprogram which implies that runtime-complexity bounds found for the refinedprogram are also runtime-complexity bounds for the original one. Naturally thecontrol-flow refinement will add and replace transitions and locations, hencethe number runtime-complexity bounds and size-bounds of each transition won'tbe preserved. However we will show, that each transition in the refined programis a similar to a transition in the original program and hence every run in thenew program has a run of equal runtime-complexity and probability in theoriginal program and vice-versa.
Recall Definition\ref{def:runtime_complexity}; the newly constructed refinedprogram shall have the same worst-case expected runtime complexity as theoriginal program which implies that runtime-complexity bounds found for therefined program are also runtime-complexity bounds for the original one.Naturally the control-flow refinement will add and replace transitions andlocations, hence the number runtime-complexity bounds and size-bounds of eachtransition won't be preserved. However we will show, that each transition in therefined program is a similar to a transition in the original program and henceevery run in the new program has a run of equal runtime-complexity andprobability in the original program and vice-versa.
are temporary variables whose values reassigned by the scheduler on everytransition and whose values do not propagate throughout the program.
are temporary variables whose values are reassigned by the scheduler onevery transition and whose values do not propagate throughout the program.
complex, since the transition taken do not only depend on the current state butalso on the answer of the scheduler and some random event. Instead of simulatingevery single possible run, it would be nice to treat \emph{similar} states asone. Instead of simulating a single assignment, the set of all possibleassignments after a step is described by a set of constraints. The resultinggraph is called a \emph{partial evaluation graph}. It was formalized for\glspl{chc} by \citeauthor{gallagher2019eptcs}\cite{gallagher2019eptcs}. In thisthesis the definitions are adapted to better fit the algorithm and the formalismof \Glspl{pip}.
complex, since the transition taken does not only depend on the current statebut also on the answer of the scheduler and some random event. Instead ofsimulating every single possible run, it would be nice to treat \emph{similar}states as one. Instead of simulating a single assignment, the set of allpossible assignments after a step is described by a set of constraints. Theresulting graph is called a \emph{partial evaluation graph}. It was formalizedfor \glspl{chc} by \citeauthor{gallagher2019eptcs}\cite{gallagher2019eptcs}. Inthis thesis the definitions are adapted to better fit the algorithm and theformalism of \Glspl{pip}.
For any $\varphi\in\Phi_\PV$, probabilistic update $\eta : \PV \rightarrow\Z[\V\union\D]$, if there exists an $s\in \Sigma$ with $s\models\varphi$then
For any $\varphi\in\Phi_\PV$ and probabilistic update $\eta : \PV\rightarrow \Z[\V\union\D]$, if there exists an $s\in \Sigma$ with$s\models\varphi$ then
\textit{true}\rangle$ is evaluated over the transition $t_1$,resulting in a single new version $\langle \ell_1, \textit{true}\rangle$. Thesecond unfolding steps, unfolds $\langle \ell_1,\textit{true}\rangle$ over $t_3$ to $\langle \ell_2, x\leq0\rangle$ and over$t_2$ to $\langle \ell_1, x>1\rangle$. $\ell_2$ has no outgoing transitions,hence no new transitions are added.$\langle \ell_1, x>1\rangle$ is unfolded to $\langle \ell_1, x>2\rangle$over $t_2$ only since the guard of $t_3$ is not satisfiable from within $x >2$. The new version is unfolded over and aver again, until end of time.
\textit{true}\rangle$ is evaluated over the transition $t_1$, resulting in asingle new version $\langle \ell_1, \textit{true}\rangle$. The secondunfolding steps, unfolds $\langle \ell_1, \textit{true}\rangle$ over $t_3$to $\langle \ell_2, x\leq0\rangle$ and over $t_2$ to $\langle \ell_1,x>1\rangle$. $\ell_2$ has no outgoing transitions, hence no new transitionsare added. $\langle \ell_1, x>1\rangle$ is unfolded to $\langle \ell_1,x>2\rangle$ over $t_2$ only since the guard of $t_3$ is not satisfiable fromwithin $x > 2$. The new version is unfolded over and after again, until theend of time.
\textbf{Claim:} For any finite prefix $f \in \fpath_\Prog'$ there exists afinite prefix $\varf' \in \fpath_\Prog$ for which
\textbf{Claim:} Let $s_0\in\Sigma$ be an initial state. For any finiteprefix $f \in \fpath_\Prog'$ with probability $\prSs(\varf) > 0$ thereexists a finite prefix $\varf' \in \fpath_\Prog$ for which
\textbf{Induction start:} $n = 0$. Let $f_0 = c_0 = (\langle \ell_0,\texttt{true}\rangle, t_\text{in}, s_0) \in \fpath_\Prog'$ which is the onlyinitial configuration with non-zero probability $\prSs(f_0) = 1$. We set$\varf'_0 = c'_0 = (\ell_0, t_\text{in}, s_0)$ which is by construction theonly valid starting configuration of $\Prog'$ with $\prSns(c'_0) = 1$.
\textbf{Induction start:} $n = 0$.
\textbf{Induction step.} Assume the induction hypothesis holds for $n \in\N$. Let $f_{n+1} = c_0\dots{}c_nc_{n+1} \in \fpath_{\Prog'}$ of length $n+1\in \N$ be an admissible prefix with probability $\prSs(f_{n+1})>0$, then$f_n = c_0\dots{}c_{n}$ of length $n$ is also admissible with $\prSs(f_n) >0$. By induction hypothesis a finite prefix $\varf'_n = c'_0\dots{}c'_{n}\in \fpath_{\Prog}$ of equal length and probability exists.Let $c_{n+1} = (\langle \ell_{n+1}, \_ \rangle, t_{n+1}, s_{n+1}) \in\confs_{\Prog'}$. We set $c'_{n+1} = (\ell_{n+1}, \bar{t}_{n+1}, s_{n+1})$to be the last configuration of $\varf'_{n+1}=\varf'_{n}c'_{n+1}$.Let $g\in\GTPn$, and $\tilde{s} \in \Sigma$ be the general transition andassignment selected by the scheduler $\scheduler(c_{n+1}) = (g, \tilde{s})$.\todo{continue here}
$s_0 \in \Sigma$ be any starting state. For any scheduler $\scheduler\in \Pi_\text{MD}$ let $\scheduler' \in\Pi_\text{HD}$ be constructed asin Lemma \ref{lem:constrfleq}.\todo{lemma Rückrichtung fehlt}Analogously to Theorem \ref{thm:correctness} the following relationholds:
$s_0 \in \Sigma$ be any initial state. For any scheduler $\scheduler \in\MDS_{\Prog'}$ let $\scheduler' \in \HDS_\Prog$ be constructed as inLemma \ref{lem:constrfgeq}. Analogously to Theorem \ref{thm:correctness}the following relation holds:
It is known from literature that for a markovian decision process thebest expected reward using a history dependent scheduler is the same asthe best expected reward using a markovian scheduler(see. Theorem\todo{cite literature} of \cite{puterman1994markov}). Since a run on theprogram is equivalent to a markovian decision process as explained in\Sref{ssec:markov}, the Theorem applies to the expected value and thefollowing holds:
By Lemma \ref{lem:hdsvsmds} the fact that we used a history dependentscheduler doesn't matter in the supremum and we get
\sup_{\scheduler \in \MDS}(\ESs(\Rt(\Prog))) \leq \sup_{\scheduler'\in \HDS}(\ESns(\Rt(\Prog'))) = \sup_{\scheduler' \in\MDS}(\ESns(\Rt(\Prog')))\\
\sup_{\scheduler \in \HDS_\Prog}(\ESs(\Rt_\Prog)) \leq\sup_{\scheduler \in \HDS_{\Prog}}(\ESs(\Rt_{\Prog'})) \\
new program has a worst-case expected runtime and worst-case expected cost atleast as large as the original program. Hence any bounds found for the partialevaluation are bounds for the original program. Theorem \ref{thm:thightness} hasshown additionally, that the worst-case expected runtime and costs don't getworse with a partial evaluation.\section{Abstraction}\label{sec:abstraction}In \Sref{sec:partialevaluation} the abstraction was represented by an oracle,that decides if the location is abstracted and an abstraction function. It wasstated that Algorithm \ref{alg:evaluate_abstr} terminates if\begin{enumerate}\item the oracle eventually always decides to abstract, and\item the abstraction has only a finite number of results.\end{enumerate}First, we will discuss some possibilities to select the locations forabstraction, and then adapt the properties based abstraction developed by\citeauthor{Domenech19}\cite{Domenech19} for probabilistic integer programs.
new program has a worst-case expected runtime at least as large as the originalprogram. Hence any bounds found for the partial evaluation are bounds for theoriginal program. Theorem \ref{thm:thightness} has shown additionally, that theworst-case expected runtime and costs don't get worse with a partial evaluation.
\begin{comment}pro: easy to implementcontra:- abstractions are expensive- abstractions loose precisionopen question:- how to select properties\end{comment}The naive approach would be to abstract at every location, as it also stated in\cite{Domenech19}. The advantage is that the implementation is easy. Theevaluated program is small and the analysis thereof fast. The disadvantage isthat a lot of evaluated control-flow is lost to the abstraction.The property based abstraction abstraction presented in\Sref{sec:propertybasedabstraction} requires $\Theta(n)$ calls to the\gls{smt}-solver which is computationally expensive.The primary goal of the presented \gls{cfr} via partial evaluation is toevaluate overlapping loops in order to find easier to analyse smaller loops.In practice, the control-flow of a loop is mostly linear. Abstracting the linearcontrol-flow could lead to two linear sections merging together which wouldotherwise-stay separated and thus make finding tight bounds harder in theprocess.
\section{History Dependent Abstractions}In \Sref{sec:partialevaluation} the abstraction the details of the abstractionwere omitted. Recall Definition \ref{def:abstraction}, an abstraction space mustbe finite and the abstraction function must map any probability to elements ofthis abstraction space. The abstraction function was kept simple, in order tokeep the proofs consise. In this section we will expand to take the evaluationhistory into account, which opens the possibility for more refined abstractionfunctions.
\subsection{Abstract Loop heads}\label{ssec:loopheads}The second method proposed by \cite{Domenech19} is to abstract only on loopheads. Termination is guaranteed since every infinite path in the evaluationtree of a finite program must visit some loop (and in turn the head) an infinitenumber of times. With only finitely many possible abstraction results, it isguaranteed that at some point some version repeats and the evaluationbacktracks.The exact definition of a loop head is unclear. For sake of termination it issufficient that every loop contains at least one (but possibly arbitrary)location marked for abstraction. The intuitive candidate is the head of the loopsince that's the location where most branching will probably occur in practice.For this implementation a loop head is defined as follows.\begin{definition}A \textit{loop head} of a loop is the first location of the loop encounteredduring a depth-first search starting at the start location $l_0$.
\begin{definition}[History dependent abstraction]A history dependent abstraction function is a function $\G \rightarrow$
When selecting all loop heads for abstraction, multiple loop heads can end up ona single loop, for when the loop is overlapping with another loop.\begin{figure}\input{figures/ch3_loopheads}\end{figure}
In this section we will present the property-based abstractionproposed by \citeauthor{Domenech19}\cite{Domenech19} for non-probabilisticprograms but which can be applied equally to probabilistic programs.
(see examples \todo{} \todo{}). Detecting loops and loop-heads is furtherdiscussed in \Sref{sec:findingloops}.\todo{prove upper bound on program size}% Since% In addition control-flow refinement was introduced to unroll loops in such a way% that hidden branching is resolved. The heuristics presented by \cite{Domenech19}% are focused on loops and the properties are selected for the loop heads.% Constructing a heuristic, that is well suited for every location in the program% is not obvious is left open for future work.\subsection{Property based abstraction}\label{sec:propertybasedabstraction}\section{Sub-\gls{scc} level evaluation}For the sub-SCC level evaluation Algorithm \ref{alg:abstr_eval} is adapted tostop the evaluation when leaving the component and instead of starting only fromthe initial location of the program, the evaluation starts from everyentrytransition to the program. The exact algorithm is displayed in Algorithm\ref{alg:eval_component}.\begin{algorithm}\caption{Evaluate the component $S \subseteq \T_\Prog$ of a Program $\Prog$with abstraction $\alpha$.\label{alg:eval_component}}\KwData{ A \gls{pip} $\Prog$, component $S$, abstraction function $\alpha$,and abstraction oracle $S_\alpha$}\KwResult{A Graph $G$, and a set of pairwise disjunct general transitions$\GTG \subset \GT_\text{PE}$}$\ET \leftarrow \Tout(S)$\;\SetKwFunction{FEvaluate}{evaluate}\SetKwProg{Fn}{Function}{:}{end}$\GTG \leftarrow \emptyset$\;\Fn{\FEvaluate{$G$, $v$}}{$\langle l, \varphi\rangle \leftarrow v$\;\For{$g \in \GTout(l)$} {$g' \leftarrow \unfold^*(v, g)$\label{alg:abstr_line_unfold_g}\;\If {$g^* \neq \emptyset$ \label{alg:abstr_line_filteremptyg}}{$g^* \leftarrow \emptyset$\;\For{$t = (v, p, \tau, \eta, \langle l', \varphi\rangle) \in g$}{\uIf {$g \in \ET$} {$t \leftarrow (v, p, \tau, \eta, \langle l',\texttt{true}\rangle)$\label{alg:abstr_line_exit}\;}\ElseIf {$S_\alpha$} {$t \leftarrow (v, p, \tau, \eta, \langle l',\alpha(\varphi)\rangle)$\label{alg:abstr_line_abstraction}\;}$g^* \leftarrow g^* + t$\label{alg:abstr_line_addtog}\;\uIf {$v' \not\in V(G)$}{$G \leftarrow G + t$\;$G \leftarrow \FEvaluate{G, v'}$\;}\Else {$G \leftarrow G + t$\label{alg:abstr_line_addt}\;}}$\GTG \leftarrow \GTG + g^*$\;}}\Return {$G$}}$G \leftarrow \text{ graph of }\Prog \text{ lifted to a partial evaluationgraph}$\;$G \leftarrow G - S$\;\For{$t = (\ell, p, \tau, \eta, \ell')\in \Tin(S)$} {$G \leftarrow \FEvaluate{$G, \langle\ell',\texttt{true}\rangle $}$ \;}\Return $G, \GTG$\end{algorithm}The partial evaluation starts at every entry transition $\T_in(S)$ to thecomponent. When unfolding an exit transition $t = (\ell, \_, \_, \_, \ell') \in\Tout(S)$ from a version $\langle \ell, \varphi\rangle$, to $\unfold^*(\langle\ell, \varphi\rangle, t) = \{(\langle\ell,\varphi\rangle, \_, \_, \_,\langle\ell', \varphi'\rangle)\}$ the target version$\langle\ell',\varphi'\rangle$ is replaced by the trivial overapproximation$\langle\ell',\texttt{true}\rangle$. This target is always already contained inthe program, by construction and the algorithm always backtracks.Lets cosider an admissible finite prefix $f \in \fpath_\Prog$ with\begin{align*}f &= c_0c_1\dots{}c_k\dots{}c_m\dots{}c_n \\&=(\ell_0,t_\in,s_0)(\ell_1,t_1,s_1)\dots\underbrace{(\ell_k,t_k,s_k)\dots(\ell_m,t_m,s_m)}_{t_k,\dots,t_m\in S}\dots(\ell_n,t_n,s_n)\end{align*}with an infix $c_m\dots{}c_k$ going through the component.By construction, up to the component the partial evaluation $\Prog'$ contains asimilar prefix$(\langle\ell_0,\texttt{true}\rangle,t_\in,s_0)(\langle\ell_1,\texttt{true}\rangle,t_1,s_1)\dots(\langle\ell_{k-1},\texttt{true}\rangle,t_{k-1},s_{k-1})\in \fpath_{\Prog'}$. The component is entered via the transition$t_k\in\Tin(S)$ which is was used as a starting point for the evaluation withthe trivial version $\langle\ell_{k-1}, \texttt{true}\rangle$ in Algorithm\ref{alg:eval_component}. By the same argument as in Theorem\ref{thm:correctness}, a similar prefix$(\langle\ell_0,\texttt{true}\rangle,t_\in,s_0)(\langle\ell_1,\texttt{true}\rangle,t_1,s_1)\dots(\langle\ell_{k-1},\texttt{true}\rangle,t_{k-1},s_{k-1})(\langle\ell_k,\varphi_k\rangle,t_k,s_k)\dots(\langle\ell_{m-1},\varphi_{m-1}\rangle,t_{m-1},s_{m-1})\in \fpath_{\Prog'}$ through the component exists in $\Prog'$. The exittransition $t_m$ must have been unfolded before backtracking and afterwards thesimilar prefix trivially follows the lifted versions again, yielding a similarfinite prefix$(\langle\ell_0,\texttt{true}\rangle,t_\in,s_0)(\langle\ell_1,\texttt{true}\rangle,t_1,s_1)\dots(\langle\ell_{k-1},\texttt{true}\rangle,t_{k-1},s_{k-1})(\langle\ell_k,\varphi_k\rangle,t_k,s_k)\dots(\langle\ell_{m-1},\varphi_{m-1}\rangle,t_{m-1},s_{m-1})(\langle\ell_m,\texttt{true}\rangle,t_m,s_m)\dots(\langle\ell_n,\texttt{true}\rangle,t_n,s_n)\in \fpath_{\Prog'}$.For any finite prefix visiting the component more than once, the similar finiteprefix of $\Prog'$ is found by making the argument above multiple times.\section{Foo}\citeauthor{Domenech19} described the property based abstraction thatwill be used for this algorithm. The idea is the following: the solution spaceis cut into pieces by a finite number of constraints, the so called properties.Then the abstraction computes which of those properties, are entailed by theversion. Those are chosen as abstract representation for the version. Since thenumber of properties is finite, the powerset of properties is finite as well andhence the number of all possible abstractions is too.Also the abstraction is entailed by the version. \todo{rewrite}\begin{definition}[Property based abstraction].\todo{definition}\end{definition}\begin{example}\todo{picture with properties}\end{example}Every abstraction is expensive, because it requires numerous entailement checks(depending on the number of properties), wich require each a call to aSAT-checker. The more properties are chose the more precise the abstractionget's but the more expensive the computation is. Choosing the relevantproperties for the loops is done heuristically and will be described in thefollowing subsection.\begin{comment}motivate abstraction\end{comment}Selecting the locations to abstract is crucial for good implementation. From atheoretical point of view the selection of locations to abstract guaranteestermination of the control-flow refinement. Yet, with every abstraction therefined program looses precision compared to the evaluated control-flow. In thefollowing some different possibilities to select the locations marked forabstractions are discussed.Finding loops in a program can be done with the algorithm described by\citeauthor{johnson1975}\cite{johnson1975}.The algorithm is linear in the number of edges in the graph, specifically$\mathcal{O}(n+e (c+1))$\subsection{Feedback Vertex Set}As stated in the previous section \Sref{ssec:loopheads}, it is sufficient tofind only a set of locations that cover every loop in the program. The problemis one of Karp's 21 NP-complete problems andcommonly called \textit{Feedback Vertex Set} or \textit{Feedback Node Set}\cite{karp2010reducibility}.\begin{definition}[Feedback Vertex Set Problem]Given a directed graph $G = (V, E)$. Find the minimal subset $R \subseteqV$ such that every (directed) cycle of $G$ contains a node in $R$.\end{definition}Detecting all loops(cycles) in a graph can be done easily done with analgorithm describe\gls{dfs} onthe graph. And the problem can instead be formulated as a hitting set problem.\begin{definition}[Hitting Set Problem]Given a finite collection of finite sets $S = \{S_1, \dots, S_n\}$, whereeach $S_i \subseteq U$ for some set $U$, find a minimal set $R$ such that$R\intersect S_i \neq \emptyset$ for all $i = 1,\dots,n$.\end{definition}A hitting set problem can be formulated as a \gls{ilp} and solved by readilyavailable \gls{smt} solvers. Recall definition \ref{def:loop}. In the followingthe \gls{ilp} used for finding the feedback vertex set of an integer programwith locations $L$ and a total of $N$ loops $a_1, \dots, a_N$ is constructed. Itis assumed that all loops have been found by \gls{dfs} as described in\Sref{findingloops}.\begin{mini!}{}{\sum_{i=1}^{n} x_i\label{eq:lip1-goal}}{\label{eq:ilp-vfs}}{}\addConstraint{\sum_{i : l_i \in \text{Loc}(a_j)} x_i}{\geq0,\protect\quad\label{eq:lip1-hit}}{j = 1,\dots,N}\addConstraint{ x_i }{\leq 1,\protect\quad\label{eq:ilp-leq1}}{i = 1,\dots,n}\addConstraint{ x_i }{\geq 0,\protect\quad\label{eq:ilp-geq0}}{i = 1,\dots,n}\end{mini!}Every variable $x_i$ of the \gls{ilp} \ref{eq:ilp-vfs} indicates if a locationis part of the hitting set in which case the variable is assigned to 1, or notin which case the variable is assigned the value 0. The constraints\ref{eq:ilp-hit} guarantees that every loop is \enquote{hit}, while\ref{eq:ilp-leq1} and \ref{eq:ilp-geq0} enforce that every location is onlyselected at most once. The optimization goal \ref{eq:ilp-vfs} being to selectthe fewest locations possible.Computing the feedback vertex set of a program is independent of the partialevaluation and can be done a-priori. It is thus considered to be an offlinemethod. It shall be noted that the locations inside the feedback vertex set donot need to be the closed location to the start location of a loop. Fordetermination of properties it might be necessary to rotate the loop so that themarked locations are the first location in the loop.\todo{prove upper bound on program size}\subsection{$k$-encounters}We have seen two offline strategies to select locations for abstraction. In thefollowing we will introduce a technique that decides on abstraction duringpartial evaluation using the path evaluated. The technique counts the number ofoccurences on the path for every location in the program. Instead of abstractingon first re-occurrence, one can decide on a parameter $k \in \N$ resulting inthe first $k$ re-occurrences to be ignored before abstracting.The idea is to unroll loops up to a constant number of times so that loops withconstant bounds $b \leq k$ are perfectly unrolled without any abstraction.\begin{example}\todo{example with constantly bounded loop}\end{example}\todo{formalize, prove upper bound on program size}\begin{comment}technique:- online- daikstra during traversalpro:- takes unreachable loops into account- allows for some constantly bounded loops to be perfectly unrolled.contra:- hard to implement- hard to reason on\end{comment}\section{Choosing properties}\section{Correctness}\begin{definition}[Equivalent \glspl{pip}]Let $\Prog_1,\Prog_2$ be \glspl{pip} with $\Prog_1 = (\PV, \Loc_1, \GT_1,l_0)$, $\Prog_2 = (\PV, \Loc_2, \GT_2, l_0)$ and their respectiveprobability spaces $(\runs_1, \F_1, \PrSs^1)$ and $(\runs_2, \F_2,\PrSs^2)$. Besides $\T_1$ and $\T_2$ denote the set of transitions of$\Prog_1$ and $\Prog_2$; $\fpath_1$ and $\fpath_2$ the set of finite paths,etc.The two programs are equivalent if the two random variables $\Rt_{\Prog_1}$and $\Rt_{\Prog_2}$ have the same probability distributions $\mu_1$ and$\mu_2$ with $\mu_1(k) = \mu_2(k)$ for all $k\in\overline{\N}$.\end{definition}\begin{theorem}If $\Prog_1$ and $\Prog_2$ are equivalent then:\begin{enumerate}\item A total runtime-complexity bound for the $\Prog_1$ is a totalruntime complexity bound for $\Prog_2$ and vice-versa.\item $\Prog_1$ and $\Prog_2$ have the same expected runtime-complexity.\end{enumerate}\proof{Let $\RB_P$}\end{theorem}\begin{rem}The probability distribution doesn't care about the underlying probabilityspace making it the perfect tool to compare the semantics of two different\glspl{pip}.\end{rem}We will show the partial evaluation $\Prog'$ is equivalent to the original\gls{pip} $\Prog$. In a first step we will show that for every admissible finiteprefix in $\Prog$ there exists an admissible finite prefix in $\Prog'$ with thesame probability and intermediate states and vice versa. This in turn will beexpanded for all admissible runs and finally we will show that this proves anequal probability distribution of the total runtime complexity.\subsection{Considering probability}During this section the fact that programs are probabilistic has been completelyignored. One could wonder if loops with zero probability should be removed inorder to further shrink the problem. We will assume that all transitions withzero probability have been removed before the analysis. All other transitionswith a non-zero probability are walkable in theory and will be unfolded by thepartial evaluation. Hence they must be considered for the set of abstractedlocations in order to guarantee termination of the algorithm.If a path has should result in a zero-probability, this could be filtered duringpartial evaluation.\section{Partial evaluation of \gls{scc}s}During \gls{koat}s analysis the program is split into \glspl{scc}. Bounds arefound for every \gls{scc} separately and then composed into an overall bound atthe end. Control-flow refinement is only needed for those \glspl{scc} where theanalysis fails to find tight bounds. In practice, linear bounds are consideredtight enough, whereas polynomial and exponential complexity bounds warrantrefinement in hope for tighter bounds.The previously presented control-flow refinement starts at the start locationsand evaluates the whole program. The size of the refined program is limited bythe number of locations and transitions as well as the number of properties oneach abstracted location. The smaller the initial program, the faster thecontrol-flow, the refined program and subsequent analysis. Instead of evaluatingthe program, one can adapt the partial evaluate to only evaluate a single\gls{scc} of the program. This section presents the partial evaluation of an\gls{scc}.The result is again an entire program, where only transitions the \gls{scc}, andincoming and outgoing transitions to this \gls{scc} are changed.In a first step the program is copied as a whole, lifted to a trivial evaluationgraph and the \gls{scc}, and all incoming and outgoing transitions are removedfrom the copy.Then the evaluation algorithm is executed from every incoming transition, butwith the additional condition to backtrack after unfolding an exitingtransition. The new transitions are added into the copied evaluation graph andtherefore progress made by earlier evaluations of entry transitions can bereused, shortening the evaluation of succeeding incoming transitions.Finally, the evaluation graph is converted back to a \gls{pip} and returned asresult of the partial evaluation of the \gls{scc}.\begin{figure}[h]\centering\begin{subcaptionblock}[t]{0.45\textwidth}\centering\input{figures/ch4_scc_prog}\caption{The original program}\end{subcaptionblock}\begin{subcaptionblock}[t]{0.45\textwidth}\centering\input{figures/ch4_scc_lift}\caption{Copy and lift}\end{subcaptionblock}\begin{subcaptionblock}[t]{0.45\textwidth}\centering\input{figures/ch4_scc_remove}\caption{Remove SCC}\end{subcaptionblock}\begin{subcaptionblock}[t]{0.45\textwidth}\centering\input{figures/ch4_scc_evaluate}\caption{Evaluate entry transitions}\end{subcaptionblock}\caption{Visualization of partial evaluation for a singleSCC\label{fig:evaluate_scc}}\end{figure}\begin{algorithm}\caption{$evaluate_{SCC}(P, A)$}\KwData{ A PIP $P$, a \gls{scc} $A\subset\T$ }\KwResult{A Graph $G$}$\Tin \leftarrow \Tin(A)$\;$\Tout \leftarrow \Tout(A)$\;\SetKwFunction{FEvaluate}{evaluate}\SetKwProg{Fn}{Function}{:}{end}\Fn{\FEvaluate{$G$, $v$, $t$}}{$N = \unfold^*(v)$\;\For{$t = (v, \tau, \eta, p, v') \in N $}{\uIf {$(l_v, \tau, \eta,p, l_{v'}) \in \Tout$}{$G \leftarrow G + (v, \tau, \eta, p, \langle v_{l'},\textit{true}\rangle)$\;}\uElseIf {$v' \not\in G$}{$G \leftarrow G + t$\;$G \leftarrow evaluate(P, G, v')$\;}\Else {$G \leftarrow G + t$\;}}\Return {$G$}}$G \leftarrow \text{lift}(P - A - \Tin - \Tout)$\;\For{$t_i = (l_\text{i}, \_, \_ ,\_, l_\text{i}') \in \Tin $}{$ G \leftarrow \text{evaluate}(P, G, \langle l_i, \textit{true}\rangle,t_i)$\;}\Return {$G$}\;\end{algorithm}\begin{comment}- probability is not important for finding loops.- all transitions that exist have a non-zero probability and hence arewalkable in theory\end{comment}\begin{comment}termination:- assume the abstraction method cuts every cycle- only finitely many abstractions exist-> the partial evaluation graph is finite-> the algorithm terminatescorrectness:- the abstraction is entailed by the state- every walkable path in the program/scc exists in the partial evaluationgraph\end{comment}\section{Refining invariants}\begin{comment}because we use polyhedra instead of boxes, our invariants are probablytighter than the original ones. When this is the case replace them in theprogram.\end{comment}
\todo{continue here}
% Let $\mathfrak{I} = ()$ be the% structure of% \begin{definition}[Satisfiability\cite{barwise1977}]\label{def:satisfiability}% For a domain $A$ and a structure $\mathfrak{A} = (A, +, \cdot, 0, 1, \leq,% <, =)$ we say a variable assignment $\beta: \V \rightarrow A$% \textit{satisfies} the formula $\phi \in \Phi$ when $\mathfrak{A}, \beta% \models \psi$.% $\phi $ is \textit{satisfiable} in when an assignment $\beta: \V \rightarrow% A$ exists such that $\mathfrak{A},\beta\models \phi$. When $\phi$ is% satisfiable for any assignments, one writes $\mathfrak{A} \models \phi$.% \end{definition}
\gls{fo} logic.\cite{barwise1977}In this thesis we will only consider integer arithmetic. Let $\Zs := (\Z, +,\cdot, 0, 1, \leq, <, =)$ be the structure standard integer arithmetic. For anassignment $\beta : \V \rightarrow \Z$ and a formula $\varphi \in \Phi$ we write$\beta \models \varphi$ instead of $\Zs, \beta \models \varphi$.
\gls{fo} logic.\cite{barwise1977} In this thesis we will only consider integerarithmetic. Let $\Zs := (\Z, +, \cdot, 0, 1, \leq, <, =)$ be the structure ofstandard integer arithmetic. For an assignment $\beta : \V \rightarrow \Z$ and aformula $\varphi \in \Phi$ we write $\beta \models \varphi$ instead of $\Zs,\beta \models \varphi$. Finally, we define two trivial formulas $\texttt{true}= (0 = 0)$ and $\texttt{false} = (1 < 0)$
valid variable assignments for a given formula is part of \gls{smt} and is wellresearched. The theories relevant for this thesis are the quantifier-freelinear integer arithmetic (usually called \texttt{QF\_LIA}) and thequantifier-free non-linear integer arithmetic (\texttt{QF\_NIA}).
valid variable assignments for a given formula is part of \gls{smt} and is awell studied field in computer science.\cite{walsh2009smt,abraham17smt} Thetheories relevant for this thesis are the quantifier-free linear integerarithmetic (usually called \texttt{QF\_LIA}) and the quantifier-free non-linearinteger arithmetic (\texttt{QF\_NIA}).
We use a graphical representation of programs, as shown in\fref{fig:ex_pip_nd}. Updates are assign a value to every program variable,however for better readability, the updates that preserve the value of a programvariable are omitted.
We use a graphical representation of programs, as shown in \fref{fig:ex_pip_nd}.Updates assign a value to every program variable, however for betterreadability, the updates that preserve the value of a program variable areomitted.
transition $t_2$ contains an update with a temporary $t$ variable.Furthermore, the temporary variable is constraint to $1 \leq t \leq 3$.Since the temporary variable $t$ is not bound in the state $s: \PV\rightarrow \Z$, the variable can take any arbitrary value.Whenever a temporary variable is sampled to a value not satisfying theguard of $t_2$ and x is larger than zero, the program would terminate.
transition $t_2$ contains an update with a temporary $u$ variable.Furthermore, the temporary variable is constraint to $1 \leq u \leq 3$. Thevalue of temporary variable $u$ in the state $s: \PV \rightarrow \Z$ isignored. It can have an arbitrary value for the purpose of fulfilling theguard and when being used in the update. Whenever a temporary variable issampled to a value not satisfying the guard of $t_2$ and x is larger thanzero, the program would terminate.
For a \gls{pip} one can define a probability space $(\runs_\Prog, \F, \PrSs)$, wherethe outcomes of the probabilistic experiment are the runs on the program. Everydistinct run is measurable hence the $\sigma$-algebra $\F$ is defined to containevery set $\{r\} \in \F$ for $r \in \runs_\Prog$. The probability measure $\PrSs$describes the probability of the given run for a scheduler $\scheduler$ and theinput given to the program as the initial state $s_0$. Formally the probabilityspace is defined by a cylindrical construction expanding the length of finiteprefixes up to infinity. For the detailed construction we'll refer to\citeauthor{meyer20arxiv}\cite{meyer20arxiv}.
For a \gls{pip} one can define a probability space $(\runs_\Prog, \F, \PrSs)$,where the outcomes of the probabilistic experiment are the runs on the program.Every distinct run is measurable hence the $\sigma$-algebra $\F$ is defined tocontain every set $\{r\} \in \F$ for $r \in \runs_\Prog$. The probabilitymeasure $\PrSs$ describes the probability of the given run for a scheduler$\scheduler$ and the input given to the program as the initial state $s_0$.Formally the probability space is defined by a cylindrical constructionexpanding the length of finite prefixes up to infinity. For the detailedconstruction we'll refer to \citeauthor{meyer2021tacas}\cite{meyer2021tacas}.
variable $x\in\PV$ is updated to the value $s'(x)$ by the probabilistic update -recall definition \ref{def:prob_update}, third that the temporary variables aresampled by the scheduler equal to the target state. The result is the
variable $x\in\PV$ is updated to the value $s'(x)$ by the probabilistic update-- recall Definition \ref{def:prob_update}, third that the temporary variablesare sampled by the scheduler equal to the target state. The result is the
% \begin{definition}[Reachability]\def{def:reachability}% Let $l, l' \in \Loc_\Prog$, $t,t' \in \T_\Prog$, and $s,s' \in \Sigma$. A% configuration $c' = (l', t', s')$ is \textit{reachable} from $c = (l, t, s)$% if and only if there exists a scheduler $\scheduler \in \MDS$ and starting% state $s_0 \in \Sigma$ for which $\prSs(c \rightarrow c') > 0$.% \end{definition}% Let $l, l' \in \Loc_\Prog$, $t,t' \in \T_\Prog$, $s,s' \in \Sigma$, and $c'% = (l', t', s')$ as well as $c = (l, t, s)$. Let $t = (\_, \_, \tau, \eta,% \_)$. When $c'$ is reachable from $c$ this implies:% \begin{enumerate}% \item $s \models \tau$% \end{enumerate}% \begin{proof}% \end{proof}% \begin{lemma}% If a% \end{lemma}
The primary goal of \gls{koat}'s analysis is to find bounds for theruntime complexity bounds. A bound is an expression that describes the upperbound of the costs or runtime complexity the program.
The primary goal of \gls{koat}'s analysis is to find bounds for the runtimecomplexity.
$\Prog$ with locations $V(\S)= \set{\ell, \ell'}{(\ell, \_, \_, \_,\ell') \in\S}$. A component $\S \subseteq \T_\Prog$ is \emph{strongly connected} whenfor every two locations $\ell, \ell' \in V(\S)$ there exists a path betweenthe two using only the transitions from $\S$.
$\Prog$ with locations $V(\S)= \set{\ell, \ell'}{(\ell, \_, \_, \_,\ell')\in \S}$. A component $\S \subseteq \T_\Prog$ is \emph{strongly connected}when for every two locations $\ell, \ell' \in V(\S)$ there exists a pathbetween the two using only the transitions from $\S$.
strongly related to the search for upper and lower runtime complexity bounds, isequally undecidable. Nevertheless, the answers to those questions are veryimportant in practice. For example a compiler might want to warn the programmerabout faulty code sections that were not marked explicitly to run indefinitely,or the developer could be forced to prove the efficiency of his program incritical scenarios. In an ever more complex world, with ever more complexalgorithms, the need for automatic tools arised. Even though the question forruntime-complexity can not be answered in a general case, many tools weredeveloped to automatically analyze the complexity of various programmingparadigms as tight and fast as possible.\cite{giesl2017aprove, montoya2014aplas,alarcon2017muterm,irankfinder2018wst}
strongly related to the search for upper and lower runtime complexity bounds,which is equally undecidable. Nevertheless, the answers to those questions arevery important in practice. For example a compiler might want to warn theprogrammer about faulty code sections that were not marked explicitly to runindefinitely, or the developer could be forced to prove the efficiency of hisprogram in critical scenarios. In an ever more complex world, with ever morecomplex algorithms, the need for automatic tools arised. Even though thequestion for runtime-complexity can not be answered in a general case, manytools were developed to automatically analyze the complexity of variousprogramming paradigms as tight and fast as possible. \cite{giesl2017aprove,montoya2014aplas,alarcon2017muterm,irankfinder2018wst}