6PTHERE26NXSC3ZXSI5UYKWTMPYZD4WMYFBJKZE3MRRTKC7MIU6QC
RVA3DOCNBWFBWA2MLC2TVQOEY25AUCYGEJSWMD72QRQ26AFLVZUAC
567NUQ633R5ESTE2LWXVUGM7XEZGJ65IRKPOCDZJS3JE5RAMR5KAC
B5T3HFKSQF3T4T6Z2KZOA5ZDMI6TVON7BS7KKB3EAGGYAS7RJPBAC
X7SPUV4CXRX7D5KKL6UVGHC7JWRWCA4QAJVVQISE64UEU7GXJNRQC
JCK77YLTPL6VV5ZOTMGUJSFZ5DNMS3MZGP5UQLXYW6WSMEK6GAQAC
DDNESFMD4LGNTDGTCW27FQPCOR5Y5MJWN2TWCSIRNBK6QGBITBRQC
QCVDPRXH57UFLU4YV3P2I5RQZGCBUKROYUZHDWOCHL43KDYN7MPAC
KUEZAMEH2M3K6EDHWMYXQMHZ2QTTH4KBM5CO7A7EB64ABEWRE6WQC
UVN5HOW3H2OVESJ62UFNQNS6VXHI5TRV22HQ44EOYU42453SDD2QC
EIPHTX56WTRFBMBNA4VYMKK4L7GWHA4Q2AQ3BCG2ZGZB3DBRENBQC
MQJMCLGFSBSFJ3JPCIHQSBZS4XZPGJI7NGGRASX5UVGJO6HES25AC
@InCollection{karp2010reducibility,
author = {Richard M. Karp},
booktitle = {50 Years of Integer Programming 1958-2008 - From the Early Years to the State-of-the-Art},
publisher = {Springer},
title = {Reducibility Among Combinatorial Problems},
year = {2010},
editor = {Michael J{\"{u}}nger and Thomas M. Liebling and Denis Naddef and George L. Nemhauser and William R. Pulleyblank and Gerhard Reinelt and Giovanni Rinaldi and Laurence A. Wolsey},
pages = {219--241},
bibsource = {dblp computer science bibliography, https://dblp.org},
biburl = {https://dblp.org/rec/books/daglib/p/Karp10.bib},
doi = {10.1007/978-3-540-68279-0_8},
file = {Old Book:/home/paradyx/MA/literatur/karp1972reducibility.pdf:PDF},
timestamp = {Wed, 14 Nov 2018 10:12:21 +0100},
url = {https://doi.org/10.1007/978-3-540-68279-0\_8},
}
% TeX root = ../main.tex
\todo{unfinished}
\begin{tikzpicture}[program]
\node[state,initial] (l0) {$l_0$};
\node[state] (l1) [below=of l0] {$l_1$};
\node[state] (l2) [right=of l1] {$l_2$};
\node[state] (l3) [right=of l2] {$l_3$};
\node[state] (l4) [below=of l1] {$l_4$};
\path[->] (l0) edge node[swap] {$\textit{true}$} (l1)
(l1) edge node[swap] {$x > 0$} (l2)
edge node[swap] {$x \leq 0$} (l4)
(l2) edge node[swap] {$y > 0$} (l3)
edge node[swap] { \begin{aligned}
y &\leq 0 \\
x &:= x-1 \\
z &:= z+2 \\
\end{aligned}} (l1)
(l3) edge node[swap] { \begin{aligned}%
x &:= x+y \\
y &:= y-1
\end{aligned}} (l3);
\end{tikzpicture}
% TeX root = ../main.tex
\begin{tikzpicture}[program]
\node[state,initial] (l0) {$l_0$};
\node[state] (l1) [below=of l0] {$l_1$};
\node[state] (l2) [right=of l1] {$l_2$};
\node[state] (l3) [right=of l2] {$l_3$};
\node[state] (l4) [below=of l1] {$l_4$};
\path[->] (l0) edge node[swap] {$\textit{true}$} (l1)
(l1) edge node[swap] {$x > 0$} (l2)
edge node[swap] {$x \leq 0$} (l4)
(l2) edge node[swap] {$y > 0$} (l3)
edge node[swap] { \begin{aligned}
y &\leq 0 \\
x &:= x-1 \\
z &:= z+2 \\
\end{aligned}} (l1)
(l3) edge node[swap] { \begin{aligned}%
x &:= x+y \\
y &:= y-1
\end{aligned}} (l3);
\end{tikzpicture}
Selecting the locations to abstract is crucial for good implementation. From a
theoretical point of view the selection of locations to abstract guarantees
termination of the control-flow refinement. Yet, with every abstraction the
refined program looses precision compared to the evaluated control-flow. In the
following some different possibilities to select the locations marked for
abstractions are discussed.
\subsection{Abstract Loop heads}
\begin{comment}
technique:
- offline
- daikstra
- FVS
pro:
- probably close to optimal
- heuristic from domenech19
- unclear how to propagate for overlapping loops.
contra:
- fvs is NP-complete [citation needed]
- possibly cheaper for scc instead of whole program
- unclear what a loop-head is.
\end{comment}
The naive solution would be to just abstract at every location, as it also stated
in \cite{Domenech19}. The advantage is that the implementation is easy, and the
resulting program and analysis thereof quick.
The disadvantage is that a lot of evaluated control-flow is lost to the
abstraction. Every abstraction requires $\Theta(n)$ calls to the SMT-solver
which is computationally expensive.
In addition control-flow refinement was introduced to unroll loops in such a way
that hidden branching is resolved. The heuristics presented by \cite{Domenech19}
are focused on loops and the properties are selected for the loop heads.
Constructing a heuristic, that is well suited for every location in the program
is not obvious is left open for future work.
\subsection{Abstract Loop heads}\label{ssec:loopheads}
The second method proposed by \cite{Domenech19} is to abstract only on loop
heads. Termination is guaranteed since every infinite path in the evaluation
tree of a finite program must visit some loop (and in turn the head) an infinite
number of times. With only finitely many abstraction candidates (versions), it
is guaranteed that at some point some version repeats and the evaluation
backtracks.
The exact definition of a loop head is unclear. For sake of termination it is
sufficient that every loop contains at least one (but possibly arbitrary)
location marked for abstraction. The intuitive candidate is the head of the loop
since that's the location where most branching will probably occure in practice.
For this implementation a loop head is defined as follows.
\begin{definition}
A \textit{loop head} are the entrance locations to a loop encountered during
a depth-first search starting at the start location $l_0$.
\end{definition}
A loop can have multiple loop heads, when it can be entered from different
locations, or when overlapping with other loops (see examples \todo{} \todo{}).
Detecting loops and loop-heads is further discussed in \Sref{sec:findingloops}.
\todo{prove upper bound on program size}
\subsection{Feedback Vertex Set}
As stated in the previous section \Sref{ssec:loopheads}, it is sufficient to
find only a set of locations that cover every loop in the program. The problem
is one of Karp's 21 NP-complete problems and
commonly called \textit{Feedback Vertex Set} or \textit{Feedback Node Set}\cite{karp2010reducibility}.
\begin{definition}[Feedback Vertex Set Problem]
Given a directed graph $G = (V, E)$. Find the minimal subset $R \subseteq
V$ such that every (directed) cycle of $G$ contains a node in $R$.
\end{definition}
Detecting all loops(cycles) in a graph is easily done by \gls{dfs} on the graph.
And the problem can instead be formulated as a hitting set problem.
\begin{definition}[Hitting Set Problem]
Given a finite collection of finite sets $S = \{S_1, \dots, S_n\}$, where
each $S_i \subseteq U$ for some set $U$, find a minimal set $R$ such that
$R\intersect S_i \neq \emptyset$ for all $i = 1,\dots,n$.
\end{definition}
A hitting set problem can be formulated as a \gls{lip} and solved by readily
available \gls{smt} solvers. Recall definition \ref{def:loop}. In the following
the \gls{lip} used for finding the feedback vertex set of an integer program
with locations $L$ and a total of $N$ loops $a_1, \dots, a_N$ is constructed. It
is assumed that all loops have been found by \gls{dfs} as described in
\Sref{findingloops}.
\begin{mini!}
{}{\sum_{i=1}^{n} x_i\label{eq:lip1-goal}}{\label{eq:lip-vfs}}{}
\addConstraint{\sum_{i : l_i \in \text{Loc}(a_j)} x_i}{\geq
0,\protect\quad\label{eq:lip1-hit}}{j = 1,\dots,N}
\addConstraint{ x_i }{\leq 1,\protect\quad\label{eq:lip1-leq1}}{i = 1,\dots,n}
\addConstraint{ x_i }{\geq 0,\protect\quad\label{eq:lip1-geq0}}{i = 1,\dots,n}
\end{mini!}
Every variable $x_i$ of the \gls{lip} \ref{eq:lip-vfs} indicates if a location
is part of the hitting set in which case the variable is assigned to 1, or not
in which case the variable is assigned the value 0. The constraints
\ref{eq:lip1-hit} guarantees that every loop is \enquote{hit}, while
\ref{eq:lip1-leq1} and \ref{eq:lip1-geq0} enforce that every location is only
selected at most once. The optimization goal \ref{eq:lip-vfs} being to select
the fewest locations possible.
Computing the feedback vertex set of a program is independent of the partial
evaluation and can be done a-priori. It is thus considered to be an offline
method. It shall be noted that the locations inside the feedback vertex set do
not need to be the closed location to the start location of a loop. For
determination of properties it might be necessary to rotate the loop so that the
marked locations are the first location in the loop.
\todo{prove upper bound on program size}
We have seen two offline strategies to select locations for abstraction. In the
following we will introduce a technique that decides on abstraction during
partial evaluation using the path evaluated. The technique counts the number of
occurences on the path for every location in the program. Instead of abstracting
on first re-occurrence, one can decide on a parameter $k \in \N$ resulting in
the first $k$ re-occurrences to be ignored before abstracting.
The idea is to unroll loops up to a constant number of times so that loops with
constant bounds $b \leq k$ are perfectly unrolled without any abstraction.
\begin{example}
\todo{example with constantly bounded loop}
\end{example}
\todo{formalize, prove upper bound on program size}
During this section the fact that programs are probabilistic has been completely
ignored. One could wonder if loops with zero probability should be removed in
order to further shrink the problem. We will assume that all transitions with
zero probability have been removed before the analysis. All other transitions
with a non-zero probability are walkable in theory and will be unfolded by the
partial evaluation. Hence they must be considered for the set of abstracted
locations in order to guarantee termination of the algorithm.
If a path has should result in a zero-probability, this could be filtered during
partial evaluation.
Let $\Zs := (\Z, +, \cdot, 0, 1, <)$ be the structure of ordered standard
integer arithmetic. Let $\V = \{v_1, \dots v_n\}$ be a finite set of variables.
\begin{definition}[Constraint]
For a given finite set of variables $\V$, the set of \textit{linear}
constraints $\L$ over the variables $\V$ is defined as the set $\set{a_0 +
\sum_{i=1}^{|\V|} a_i v_i < 0}{a_i \in \Z}$. The set of
\textit{polynomial} constraints $\C$ is defined as the set $\set{e < 0}{e
\in \Z[\V]}$.
\begin{definition}[Constraints]
For a given finite set of variables $\V$, the set of \textit{linear
non-strict inequality constraints} is defined as the set $\L^{\leq}$:
\begin{align}
\L^{\leq}=\set{a_0 + \sum_{i=1}^{n} a_i v_i \leq 0}{a_i \in \R}
\end{align}
Analogously, the set of \textit{linear strict inequality} constraints is
defined as the set $\L^{<}$:
\begin{align}
\L^{<}:=\set{a_0 + \sum_{i=1}^{n} a_i v_i < 0}{a_i \in \R}
\end{align}
\begin{definition}[Satisfiability]
A constraint $c \in \C$ is \textit{satisfiable} when the formula $c$ is
satisfiable in $\Zs$ or $\Zs \models c$. We say a variable assignment
$\beta: \V \mapsto \Z$ \textit{satisfies} the constraint $c$ when $(\Zs,
\beta) \models c$.
Constraints are a subset of FO-formulas and thus their semantics shall be
defined accordingly.
Analogously, satisfiability of a set of constraints is defined by the
satisfiability of its conjunction. A set of constraints $C \subseteq \C$ is
\textit{satisfiable} when $\Zs \models \LAnd_{c_i \in C} c_i$. Moreover, an
assignment $\beta$ \textit{satisfies} the set of constraints $C$ when $(\Zs,
\beta) \models \LAnd_{c_i \in C} c_i$.
\begin{definition}[Satisfiability]
For a domain $D$ and a structure $\mathfrak{I} = (D, +, \cdot, 0, 1, \leq,
<, =)$, a constraint $\psi \in \L$ is \textit{satisfiable} in when the formula
$\psi$ is $\mathfrak{I} \models \psi$. We say a variable assignment $\beta: \V
\rightarrow D$ \textit{satisfies} the constraint $\psi$ when $(\mathfrak{I},
\beta) \models \psi$.
We also write $\Zs \models C$ and $(\Zs, \beta)\models C$ instead of $\Zs
\models \LAnd_{c_i \in C} c_i$ and $(\Zs, \beta)\models \LAnd_{c_i \in C} c_i$,
respectively.
A set of constraints $C \subseteq \L$ is satisfiable, when their conjunction is
satisfiable and we write $\mathfrak{I} \models C$ instead of $\mathfrak{I}
\models \LAnd_{\psi_i \in C} \psi_i$. In addition whenever the structure is evident
from context, we write $\models \psi$ and $\beta \models \psi$ instead of
$\mathfrak{I} \models \psi$ and $(\mathfrak{I}, \beta) \models \psi$ respectively.
\llbracket c_1 \Land c_2 \rrbracket = \llbracket c_1 \rrbracket \intersect
\llbracket c_2 \rrbracket \\
\llbracket C \rrbracket = \llbracket \LAnd_{c_i \in C} c_i \rrbracket
= \Intersect_{c_i \in C} \llbracket c_i \rrbracket
\llbracket \psi_1 \Land \psi_2 \rrbracket = \llbracket \psi_1 \rrbracket \intersect
\llbracket \psi_2 \rrbracket \\
\llbracket C \rrbracket = \llbracket \LAnd_{\psi_i \in C} \psi_i \rrbracket
= \Intersect_{\psi_i \in C} \llbracket \psi_i \rrbracket
Two formulas, constraints, or set of constraints are considered equivalent,
when there solutions are equal. We write $\varphi_1 \equiv \varphi_2$.
Two formulas, constraints, or set of constraints are considered
\textit{equivalent}, denoted as $\varphi_1 \equiv \varphi_2$ when there
solutions are equal.
Note that by definition constraints are restricted to the form $e < 0$, because
\gls{koat} uses this representation internally and it makes proofs more concise.
However, all constraint $e_1 \diamond e_2$ with the operators $\diamond \in >,
<, \leq, \geq$ can be transformed into equivalent constraints $e' < 0 \in \C$:
In practice constraints are often provided in a more general form. Restricting
the constraints to a specific form does not limit it's practicality, since they
can be easily converted into one of those forms. Constraints of the form $e_1
\diamond e_2$ with the operators $\diamond \in \{=, >, <, \leq, \geq\}$ can
transformed as follows:
Equality can be simulated by two constraints.
For real arithmetic it is important to distinguish between strict and non-strict
constraints, however for integer arithmetic it is enough to consider only strict
inequality as one can be transformed into the other:
\begin{align*}
e_1 \geq e_2 &\equiv e_2 - e_1 - 1 < 0 \\
e_1 \leq e_2 &\equiv e_1 - e_2 - 1< 0
\end{align*}
When talking about integer arithmetic we will thus only refer to linear
constraints.
% \subsection{Convex Polyhedron}\label{sec:polyhedron}
Among others the following operations are important to our use-case.
1. Emptiness
2. Intersection
3. Inclusion test
3. Orthogonal projection
4. Upper and lower bounds
\subsection{Not necessarily closed convex polyhedra}\label{ssec:nncpolyhedra}
Taking the definitions from \cite{bagnara2005convexpolyhedron} a set of linear
constraints exactly defines a polyhedron. Every linear non-strict inequality
constraint $\psi \in \L^{\leq}$ describes an closed affine half-space and every
linear strict inequality constraint $\psi \in \L^{\leq}$ describes an open
affine half-space in the euclidean space $\R^n$.
\begin{definition}[Not necessarily closed Polyhedra]
$\mathcal{P} \subseteq \R^n$ is a closed (convex) polyhedron if and only if
it can be expressed by an intersection of a finite number of not necessarily
closed affine half-spaces.
\end{definition}
\subsection{Octagons}\label{ssec:octagons}
The octagon domain was constructed to bridge the performance gap between
intervals (see \Sref{ssec:intervals below} and polyhedra.
An octagon can be described by a set of constraints of the form $\pm v_i \pm v_j
\leq b$\cite{mine2001wcre}. Obviously, such constraints are linear constraints
and hence octagons are a specialization of polyhedra.
Octagons are used in \gls{koat} during the computation of invariants. They are
faster to work with than polyhedra but less precise. \todo{cite runtime for
various operations} As shown in \Sref{sec:invariant}, too wide constraints are
not a problem for invariants.
\subsection{Intervals}\label{ssec:intervals}
For completes sake, let's introduce a third abstract domain: the interval
domain. An closed interval $[a,b]$ with $a, b \in \R$ for a variable $v \in \V$
can be expressed as a pair of linear constraints of the form $a \leq v$ and $v
\leq b$.
\begin{definition}[Box]
$mathcal{B} \subseteq \R^n$ is a box if it can be expressed by a set of
intervals.
\end{definition}
By construction every Box is an octagon and in turn a polyhedron. For every
polyhedron $P$ one can find a box $B$ that includes the polyhedron. Whenever
it's acceptable to arbitrarily widen the polyhedron, one can work with boxes
instead. Boxes are computationally very efficient, although rather imprecise,
for most applications.\todo{beautify}
\todo{example picture}
\subsection{Operations}
\newacronym{dfs}{DFS}{depth-first search}
\newacronym{bfs}{BFS}{breadth-first search}
\newacronym{lip}{LIP}{linear integer program}
\newacronym{lp}{LP}{linear program}