6PTHERE26NXSC3ZXSI5UYKWTMPYZD4WMYFBJKZE3MRRTKC7MIU6QC RVA3DOCNBWFBWA2MLC2TVQOEY25AUCYGEJSWMD72QRQ26AFLVZUAC 567NUQ633R5ESTE2LWXVUGM7XEZGJ65IRKPOCDZJS3JE5RAMR5KAC B5T3HFKSQF3T4T6Z2KZOA5ZDMI6TVON7BS7KKB3EAGGYAS7RJPBAC X7SPUV4CXRX7D5KKL6UVGHC7JWRWCA4QAJVVQISE64UEU7GXJNRQC JCK77YLTPL6VV5ZOTMGUJSFZ5DNMS3MZGP5UQLXYW6WSMEK6GAQAC DDNESFMD4LGNTDGTCW27FQPCOR5Y5MJWN2TWCSIRNBK6QGBITBRQC QCVDPRXH57UFLU4YV3P2I5RQZGCBUKROYUZHDWOCHL43KDYN7MPAC KUEZAMEH2M3K6EDHWMYXQMHZ2QTTH4KBM5CO7A7EB64ABEWRE6WQC UVN5HOW3H2OVESJ62UFNQNS6VXHI5TRV22HQ44EOYU42453SDD2QC EIPHTX56WTRFBMBNA4VYMKK4L7GWHA4Q2AQ3BCG2ZGZB3DBRENBQC MQJMCLGFSBSFJ3JPCIHQSBZS4XZPGJI7NGGRASX5UVGJO6HES25AC @InCollection{karp2010reducibility,author = {Richard M. Karp},booktitle = {50 Years of Integer Programming 1958-2008 - From the Early Years to the State-of-the-Art},publisher = {Springer},title = {Reducibility Among Combinatorial Problems},year = {2010},editor = {Michael J{\"{u}}nger and Thomas M. Liebling and Denis Naddef and George L. Nemhauser and William R. Pulleyblank and Gerhard Reinelt and Giovanni Rinaldi and Laurence A. Wolsey},pages = {219--241},bibsource = {dblp computer science bibliography, https://dblp.org},biburl = {https://dblp.org/rec/books/daglib/p/Karp10.bib},doi = {10.1007/978-3-540-68279-0_8},file = {Old Book:/home/paradyx/MA/literatur/karp1972reducibility.pdf:PDF},timestamp = {Wed, 14 Nov 2018 10:12:21 +0100},url = {https://doi.org/10.1007/978-3-540-68279-0\_8},}
% TeX root = ../main.tex\todo{unfinished}\begin{tikzpicture}[program]\node[state,initial] (l0) {$l_0$};\node[state] (l1) [below=of l0] {$l_1$};\node[state] (l2) [right=of l1] {$l_2$};\node[state] (l3) [right=of l2] {$l_3$};\node[state] (l4) [below=of l1] {$l_4$};\path[->] (l0) edge node[swap] {$\textit{true}$} (l1)(l1) edge node[swap] {$x > 0$} (l2)edge node[swap] {$x \leq 0$} (l4)(l2) edge node[swap] {$y > 0$} (l3)edge node[swap] { \begin{aligned}y &\leq 0 \\x &:= x-1 \\z &:= z+2 \\\end{aligned}} (l1)(l3) edge node[swap] { \begin{aligned}%x &:= x+y \\y &:= y-1\end{aligned}} (l3);\end{tikzpicture}
% TeX root = ../main.tex\begin{tikzpicture}[program]\node[state,initial] (l0) {$l_0$};\node[state] (l1) [below=of l0] {$l_1$};\node[state] (l2) [right=of l1] {$l_2$};\node[state] (l3) [right=of l2] {$l_3$};\node[state] (l4) [below=of l1] {$l_4$};\path[->] (l0) edge node[swap] {$\textit{true}$} (l1)(l1) edge node[swap] {$x > 0$} (l2)edge node[swap] {$x \leq 0$} (l4)(l2) edge node[swap] {$y > 0$} (l3)edge node[swap] { \begin{aligned}y &\leq 0 \\x &:= x-1 \\z &:= z+2 \\\end{aligned}} (l1)(l3) edge node[swap] { \begin{aligned}%x &:= x+y \\y &:= y-1\end{aligned}} (l3);\end{tikzpicture}
Selecting the locations to abstract is crucial for good implementation. From atheoretical point of view the selection of locations to abstract guaranteestermination of the control-flow refinement. Yet, with every abstraction therefined program looses precision compared to the evaluated control-flow. In thefollowing some different possibilities to select the locations marked forabstractions are discussed.
\subsection{Abstract Loop heads}\begin{comment}technique:- offline- daikstra- FVSpro:- probably close to optimal- heuristic from domenech19- unclear how to propagate for overlapping loops.contra:- fvs is NP-complete [citation needed]- possibly cheaper for scc instead of whole program- unclear what a loop-head is.\end{comment}
The naive solution would be to just abstract at every location, as it also statedin \cite{Domenech19}. The advantage is that the implementation is easy, and theresulting program and analysis thereof quick.The disadvantage is that a lot of evaluated control-flow is lost to theabstraction. Every abstraction requires $\Theta(n)$ calls to the SMT-solverwhich is computationally expensive.In addition control-flow refinement was introduced to unroll loops in such a waythat hidden branching is resolved. The heuristics presented by \cite{Domenech19}are focused on loops and the properties are selected for the loop heads.Constructing a heuristic, that is well suited for every location in the programis not obvious is left open for future work.\subsection{Abstract Loop heads}\label{ssec:loopheads}The second method proposed by \cite{Domenech19} is to abstract only on loopheads. Termination is guaranteed since every infinite path in the evaluationtree of a finite program must visit some loop (and in turn the head) an infinitenumber of times. With only finitely many abstraction candidates (versions), itis guaranteed that at some point some version repeats and the evaluationbacktracks.The exact definition of a loop head is unclear. For sake of termination it issufficient that every loop contains at least one (but possibly arbitrary)location marked for abstraction. The intuitive candidate is the head of the loopsince that's the location where most branching will probably occure in practice.For this implementation a loop head is defined as follows.\begin{definition}A \textit{loop head} are the entrance locations to a loop encountered duringa depth-first search starting at the start location $l_0$.\end{definition}A loop can have multiple loop heads, when it can be entered from differentlocations, or when overlapping with other loops (see examples \todo{} \todo{}).Detecting loops and loop-heads is further discussed in \Sref{sec:findingloops}.\todo{prove upper bound on program size}\subsection{Feedback Vertex Set}As stated in the previous section \Sref{ssec:loopheads}, it is sufficient tofind only a set of locations that cover every loop in the program. The problemis one of Karp's 21 NP-complete problems andcommonly called \textit{Feedback Vertex Set} or \textit{Feedback Node Set}\cite{karp2010reducibility}.\begin{definition}[Feedback Vertex Set Problem]Given a directed graph $G = (V, E)$. Find the minimal subset $R \subseteqV$ such that every (directed) cycle of $G$ contains a node in $R$.\end{definition}Detecting all loops(cycles) in a graph is easily done by \gls{dfs} on the graph.And the problem can instead be formulated as a hitting set problem.\begin{definition}[Hitting Set Problem]Given a finite collection of finite sets $S = \{S_1, \dots, S_n\}$, whereeach $S_i \subseteq U$ for some set $U$, find a minimal set $R$ such that$R\intersect S_i \neq \emptyset$ for all $i = 1,\dots,n$.\end{definition}A hitting set problem can be formulated as a \gls{lip} and solved by readilyavailable \gls{smt} solvers. Recall definition \ref{def:loop}. In the followingthe \gls{lip} used for finding the feedback vertex set of an integer programwith locations $L$ and a total of $N$ loops $a_1, \dots, a_N$ is constructed. Itis assumed that all loops have been found by \gls{dfs} as described in\Sref{findingloops}.\begin{mini!}{}{\sum_{i=1}^{n} x_i\label{eq:lip1-goal}}{\label{eq:lip-vfs}}{}\addConstraint{\sum_{i : l_i \in \text{Loc}(a_j)} x_i}{\geq0,\protect\quad\label{eq:lip1-hit}}{j = 1,\dots,N}\addConstraint{ x_i }{\leq 1,\protect\quad\label{eq:lip1-leq1}}{i = 1,\dots,n}\addConstraint{ x_i }{\geq 0,\protect\quad\label{eq:lip1-geq0}}{i = 1,\dots,n}\end{mini!}Every variable $x_i$ of the \gls{lip} \ref{eq:lip-vfs} indicates if a locationis part of the hitting set in which case the variable is assigned to 1, or notin which case the variable is assigned the value 0. The constraints\ref{eq:lip1-hit} guarantees that every loop is \enquote{hit}, while\ref{eq:lip1-leq1} and \ref{eq:lip1-geq0} enforce that every location is onlyselected at most once. The optimization goal \ref{eq:lip-vfs} being to selectthe fewest locations possible.Computing the feedback vertex set of a program is independent of the partialevaluation and can be done a-priori. It is thus considered to be an offlinemethod. It shall be noted that the locations inside the feedback vertex set donot need to be the closed location to the start location of a loop. Fordetermination of properties it might be necessary to rotate the loop so that themarked locations are the first location in the loop.\todo{prove upper bound on program size}
We have seen two offline strategies to select locations for abstraction. In thefollowing we will introduce a technique that decides on abstraction duringpartial evaluation using the path evaluated. The technique counts the number ofoccurences on the path for every location in the program. Instead of abstractingon first re-occurrence, one can decide on a parameter $k \in \N$ resulting inthe first $k$ re-occurrences to be ignored before abstracting.The idea is to unroll loops up to a constant number of times so that loops withconstant bounds $b \leq k$ are perfectly unrolled without any abstraction.\begin{example}\todo{example with constantly bounded loop}\end{example}\todo{formalize, prove upper bound on program size}
During this section the fact that programs are probabilistic has been completelyignored. One could wonder if loops with zero probability should be removed inorder to further shrink the problem. We will assume that all transitions withzero probability have been removed before the analysis. All other transitionswith a non-zero probability are walkable in theory and will be unfolded by thepartial evaluation. Hence they must be considered for the set of abstractedlocations in order to guarantee termination of the algorithm.If a path has should result in a zero-probability, this could be filtered duringpartial evaluation.
Let $\Zs := (\Z, +, \cdot, 0, 1, <)$ be the structure of ordered standardinteger arithmetic. Let $\V = \{v_1, \dots v_n\}$ be a finite set of variables.\begin{definition}[Constraint]For a given finite set of variables $\V$, the set of \textit{linear}constraints $\L$ over the variables $\V$ is defined as the set $\set{a_0 +\sum_{i=1}^{|\V|} a_i v_i < 0}{a_i \in \Z}$. The set of\textit{polynomial} constraints $\C$ is defined as the set $\set{e < 0}{e\in \Z[\V]}$.
\begin{definition}[Constraints]For a given finite set of variables $\V$, the set of \textit{linearnon-strict inequality constraints} is defined as the set $\L^{\leq}$:\begin{align}\L^{\leq}=\set{a_0 + \sum_{i=1}^{n} a_i v_i \leq 0}{a_i \in \R}\end{align}Analogously, the set of \textit{linear strict inequality} constraints isdefined as the set $\L^{<}$:\begin{align}\L^{<}:=\set{a_0 + \sum_{i=1}^{n} a_i v_i < 0}{a_i \in \R}\end{align}
\begin{definition}[Satisfiability]A constraint $c \in \C$ is \textit{satisfiable} when the formula $c$ issatisfiable in $\Zs$ or $\Zs \models c$. We say a variable assignment$\beta: \V \mapsto \Z$ \textit{satisfies} the constraint $c$ when $(\Zs,\beta) \models c$.
Constraints are a subset of FO-formulas and thus their semantics shall bedefined accordingly.
Analogously, satisfiability of a set of constraints is defined by thesatisfiability of its conjunction. A set of constraints $C \subseteq \C$ is\textit{satisfiable} when $\Zs \models \LAnd_{c_i \in C} c_i$. Moreover, anassignment $\beta$ \textit{satisfies} the set of constraints $C$ when $(\Zs,\beta) \models \LAnd_{c_i \in C} c_i$.
\begin{definition}[Satisfiability]For a domain $D$ and a structure $\mathfrak{I} = (D, +, \cdot, 0, 1, \leq,<, =)$, a constraint $\psi \in \L$ is \textit{satisfiable} in when the formula$\psi$ is $\mathfrak{I} \models \psi$. We say a variable assignment $\beta: \V\rightarrow D$ \textit{satisfies} the constraint $\psi$ when $(\mathfrak{I},\beta) \models \psi$.
We also write $\Zs \models C$ and $(\Zs, \beta)\models C$ instead of $\Zs\models \LAnd_{c_i \in C} c_i$ and $(\Zs, \beta)\models \LAnd_{c_i \in C} c_i$,respectively.
A set of constraints $C \subseteq \L$ is satisfiable, when their conjunction issatisfiable and we write $\mathfrak{I} \models C$ instead of $\mathfrak{I}\models \LAnd_{\psi_i \in C} \psi_i$. In addition whenever the structure is evidentfrom context, we write $\models \psi$ and $\beta \models \psi$ instead of$\mathfrak{I} \models \psi$ and $(\mathfrak{I}, \beta) \models \psi$ respectively.
\llbracket c_1 \Land c_2 \rrbracket = \llbracket c_1 \rrbracket \intersect\llbracket c_2 \rrbracket \\\llbracket C \rrbracket = \llbracket \LAnd_{c_i \in C} c_i \rrbracket= \Intersect_{c_i \in C} \llbracket c_i \rrbracket
\llbracket \psi_1 \Land \psi_2 \rrbracket = \llbracket \psi_1 \rrbracket \intersect\llbracket \psi_2 \rrbracket \\\llbracket C \rrbracket = \llbracket \LAnd_{\psi_i \in C} \psi_i \rrbracket= \Intersect_{\psi_i \in C} \llbracket \psi_i \rrbracket
Two formulas, constraints, or set of constraints are considered equivalent,when there solutions are equal. We write $\varphi_1 \equiv \varphi_2$.
Two formulas, constraints, or set of constraints are considered\textit{equivalent}, denoted as $\varphi_1 \equiv \varphi_2$ when theresolutions are equal.
Note that by definition constraints are restricted to the form $e < 0$, because\gls{koat} uses this representation internally and it makes proofs more concise.However, all constraint $e_1 \diamond e_2$ with the operators $\diamond \in >,<, \leq, \geq$ can be transformed into equivalent constraints $e' < 0 \in \C$:
In practice constraints are often provided in a more general form. Restrictingthe constraints to a specific form does not limit it's practicality, since theycan be easily converted into one of those forms. Constraints of the form $e_1\diamond e_2$ with the operators $\diamond \in \{=, >, <, \leq, \geq\}$ cantransformed as follows:
Equality can be simulated by two constraints.
For real arithmetic it is important to distinguish between strict and non-strictconstraints, however for integer arithmetic it is enough to consider only strictinequality as one can be transformed into the other:\begin{align*}e_1 \geq e_2 &\equiv e_2 - e_1 - 1 < 0 \\e_1 \leq e_2 &\equiv e_1 - e_2 - 1< 0\end{align*}When talking about integer arithmetic we will thus only refer to linearconstraints.
% \subsection{Convex Polyhedron}\label{sec:polyhedron}
Among others the following operations are important to our use-case.1. Emptiness2. Intersection3. Inclusion test3. Orthogonal projection4. Upper and lower bounds\subsection{Not necessarily closed convex polyhedra}\label{ssec:nncpolyhedra}Taking the definitions from \cite{bagnara2005convexpolyhedron} a set of linearconstraints exactly defines a polyhedron. Every linear non-strict inequalityconstraint $\psi \in \L^{\leq}$ describes an closed affine half-space and everylinear strict inequality constraint $\psi \in \L^{\leq}$ describes an openaffine half-space in the euclidean space $\R^n$.\begin{definition}[Not necessarily closed Polyhedra]$\mathcal{P} \subseteq \R^n$ is a closed (convex) polyhedron if and only ifit can be expressed by an intersection of a finite number of not necessarilyclosed affine half-spaces.\end{definition}\subsection{Octagons}\label{ssec:octagons}The octagon domain was constructed to bridge the performance gap betweenintervals (see \Sref{ssec:intervals below} and polyhedra.
An octagon can be described by a set of constraints of the form $\pm v_i \pm v_j\leq b$\cite{mine2001wcre}. Obviously, such constraints are linear constraintsand hence octagons are a specialization of polyhedra.Octagons are used in \gls{koat} during the computation of invariants. They arefaster to work with than polyhedra but less precise. \todo{cite runtime forvarious operations} As shown in \Sref{sec:invariant}, too wide constraints arenot a problem for invariants.\subsection{Intervals}\label{ssec:intervals}For completes sake, let's introduce a third abstract domain: the intervaldomain. An closed interval $[a,b]$ with $a, b \in \R$ for a variable $v \in \V$can be expressed as a pair of linear constraints of the form $a \leq v$ and $v\leq b$.\begin{definition}[Box]$mathcal{B} \subseteq \R^n$ is a box if it can be expressed by a set ofintervals.\end{definition}By construction every Box is an octagon and in turn a polyhedron. For everypolyhedron $P$ one can find a box $B$ that includes the polyhedron. Wheneverit's acceptable to arbitrarily widen the polyhedron, one can work with boxesinstead. Boxes are computationally very efficient, although rather imprecise,for most applications.\todo{beautify}\todo{example picture}\subsection{Operations}
\newacronym{dfs}{DFS}{depth-first search}\newacronym{bfs}{BFS}{breadth-first search}\newacronym{lip}{LIP}{linear integer program}\newacronym{lp}{LP}{linear program}