KVSF5AFJEIMIYAC264OUAA56SGP5MFAUG3AECKOTUPIXPGULVFQQC
QIVGMR6INKBVT5IFCO2UJJQEO4PVW3ZM7IDTCTCOKOX7VKG7UPFAC
6PTHERE26NXSC3ZXSI5UYKWTMPYZD4WMYFBJKZE3MRRTKC7MIU6QC
X7SPUV4CXRX7D5KKL6UVGHC7JWRWCA4QAJVVQISE64UEU7GXJNRQC
567NUQ633R5ESTE2LWXVUGM7XEZGJ65IRKPOCDZJS3JE5RAMR5KAC
DDNESFMD4LGNTDGTCW27FQPCOR5Y5MJWN2TWCSIRNBK6QGBITBRQC
QCVDPRXH57UFLU4YV3P2I5RQZGCBUKROYUZHDWOCHL43KDYN7MPAC
BGMT6NIKIDEGQ74DC4LPVBR7JCUYHHQBOZXHYXMO5F2UVEQ3N47QC
B5T3HFKSQF3T4T6Z2KZOA5ZDMI6TVON7BS7KKB3EAGGYAS7RJPBAC
UVN5HOW3H2OVESJ62UFNQNS6VXHI5TRV22HQ44EOYU42453SDD2QC
KUEZAMEH2M3K6EDHWMYXQMHZ2QTTH4KBM5CO7A7EB64ABEWRE6WQC
EIPHTX56WTRFBMBNA4VYMKK4L7GWHA4Q2AQ3BCG2ZGZB3DBRENBQC
JCK77YLTPL6VV5ZOTMGUJSFZ5DNMS3MZGP5UQLXYW6WSMEK6GAQAC
MQJMCLGFSBSFJ3JPCIHQSBZS4XZPGJI7NGGRASX5UVGJO6HES25AC
\gls{koat}s analysis goal is to find tight upper bounds for \gls{pip} and
possibly non-probabilistic and non-deterministic integer programs that are a
specialization of the first. The analysis techniques used such as
\gls{mprf},\todo{more} have their limits and cannot always find tight bounds for
every program, which motivates control-flow refinement in order to find an
equivalent program to the original one, that opens the possibility for the
analysis to to find tighter bounds.
Recall \ref{def:overallcomplexity}; the newly constructed refined program shall
have they have the same overall runtime complexity as the original program,
which implies that complexity bounds found for the refined program are also
complexity bounds for the original one.
Naturally the control-flow refinement will add and replace transitions and
locations, hence the runtime and size complexity per each transition won't be
preserved. However we will show, that each transition in the refined program is
a copy of a transition in the original program, and that complexity of an
original transition is equal to the sum of its copies.
The control-flow refinement presented in this thesis adapts the partial
evaluation presented by \citeauthor{domenech2019arxiv}\cite{domenech2019arxiv}
for \acrlong{pip}. The partial evaluation will be constructed in a similar
way but explicitly taking probability and general transitions into account.
The construction is done in three steps. First the single evaluation step
(unfolding) of transitions in a probabilistic program is presented. Second, a
possibly infinite evaluation tree is constructed that will finally be collapsed
into a finite evaluation graph by an abstraction layer.
\section{Program equivalence}
\begin{definition}[Program equivalence]
Given two PIP $P = (\V, L, \GT, l_0)$ and $P' = (\V, L', \GT', l_0)$
are equivalent if and only if the following properties hold:
\begin{enumerate}
\item \todo{continue}
\end{enumerate}
\end{definition}
Afterwards we will show that the partial evaluation is sound for probabilistic
integer programs and prove that the overall expected runtime complexity
(\cref{def:past}) is equally preserved as the overall (worst-case) runtime
complexity (\cref{def:runtimebounds}).
described by a set of constraints (polyhedron).
This approach is very similar to the computation of invariants, where the set of
possible assignment each location in a given program is iteratively updated by
propagating the transition constraints throught the program until a fixpoint is
found. \todo{citation} However, during evaluation one starts at a specific
configuration and \textit{unfolds} transitions recursively, without influency
neighboring paths. The result is a (possibly still infinite) tree, representing
all possible runs.
described by a set of constraints. The resulting graph is called a
\textit{partial evaluation graph}. It was formalized for \gls{chc} by
\citeauthor{gallagher2019eptcs}. In this thesis the definitions are adapted to
better fit the algorithm and the formalisms of \Glspl{pip}.
This tree was formalized as an evaluation tree for \gls{chc} by \todo{citation}.
In this thesis the evaluation tree is defined for \Gls{pip} instead, giving the
opportunity to explicitly handle costs, and probability and staying closer to
KoAT's formalisms.
Given \gls{pip} $P$ with locations $L$, transitions $T$ and general transitions
$G$. Versions are the vertices of the partial evaluation graph. They contain the
location at each step of the evaluation and a representation of the possible
assignments at each step of the evaluation. Versions can contain just
constraints, but in order to guarantee a finite graph later on, they can also
contain an value from an abstraction space (see \Sref{sec:abstraction_layer},
not to be confused with the abstract domains).
\begin{definition}[Version]\label{def:version}
The set of \textit{constraint versions} is the set of tuples over the
locations $L$ from $P$ and constraints $\C$. The set of \textit{abstract
versions} is the set of tuples over the locations $L$ from $P$ and some
finite abstraction space $A$. Since the set of locations and the abstraction
space are both finite, so are the abstract versions. The set of versions is
the union of constraint and abstract versions.
\begin{equation}
\vers = (\Loc \times \C) \union (\Loc \times A)
\end{equation}
For clarity sake, versions will be denoted with angles. The set notation
with curly braces is used for non-abstracted constraints like so $\langle l,
\{\psi_1, \dots\}\rangle$ and the list notation with brackets is used for
abstracted values like so $\langle l, [\psi_1, \dots]\rangle$.
\end{definition}
The edges of the partial evaluation graph are transitions $\T_{PE}$ analogously
defined to Definition \ref{def:pip}, except that they connect versions and not
locations.
\begin{definition}[Evaluation Tree]
\begin{definition}[Partial Evaluation]
A graph $G = (\vers, \T_{PE})$ is a \textit{partial evaluation} of a
\gls{pip} $P$ when for every admissible path $c_0c_1\dots{}c_n \in \fpath$
of P with $c_i = (l_i, s_i, t_i)$ and $t_i = (l_{i-1}, \tau_i, \eta_i, p_i,
l_i)$ for all $i = 1, \dots, n$, there exists a path $v_0'v_1'\dots{}v_n'$
in G with the following properties:
\begin{enumerate}
\item $v'_i = (\langle l_i, \varphi_i\rangle, t_i')$ with some $\varphi_i
\in \C$ and $t_i' \in E(G)$,
\item $t'_i = (\langle l_{i-i}, \varphi_{i-1}\rangle, p_i, \tau_i, \eta_i,
\langle l_i, \varphi_i\rangle)$, and
\item $s_i \models \varphi_i$
\end{enumerate}
\end{definition}
A \Gls{pip} $P$ can be lifted to a trivial partial evaluation graph by replacing
every location $l \in L$ by the trivial version $\langle l,
\textit{true}\rangle$. Obviously the lift of program is not useful for the
analysis, as it doesn't capture any interesting properties of the program,
besides what is already known in the original program. In the following we will
present a construction for a partial evaluation graph using unfolding and
abstraction.
The partial evaluation is done step-wise. On every step the next version
$v' = \langle l', \phi'\rangle$ is computed by unfolding a transition $t \in \T$
from the original \gls{pip} that starts in the same location $l \in \L$ as the
version $v = \langle l, \phi\rangle$. The resulting constraint $\phi'$ contains
all states $s' \in \Sigma$ with $s' \models \phi'$ reachable via the transition
from a state in the source version, i.e. $\prSs((l, s, \_) \rightarrow (l',s',t))
> 0$.
The non-determinism is taken into account by the implicit existential
quantification of temporary variables in constraints. In contrast, the
probabilistic sampling is over-approximated with an non-probabilistic update
$\tilde{\eta}: \PV \rightarrow Z[\V]$ and a constraint denoted as
$\lfloor\eta\rceil$. The probabilistic distributions in the probabilistic update
$\eta$ are replaced by fresh temporary variables that are then constraint by
$\lfloor\eta\rceil$.
\begin{definition}[Non-probabilistic overapproximation\label{def:nonprobapprox}]
Let $\eta: \PV \rightarrow \Z[\V\union D]$ be a probabilistic update
function, with random distributions $D_1,\dots,D_n$ used within. A
\textit{non-probabilistic over-approximation} is a pair $(\lfloor\eta\rceil,
\tilde{\eta}) \in \C \times (\PV \rightarrow \Z[\V])$ where
\begin{equation}
\lfloor\eta\rceil = \LAnd_{x\in\PV} (x' = \eta(x)[D_i\backslash{}d_i])
\land \LAnd_{i=1,\dots,n} \lfloor D_i \rceil \\
\end{equation}
\begin{equation}
\tilde{\eta}(x) = x' \hspace{1cm} \text{for all } x\in\PV
\end{equation}
for some fresh temporary variables $x', d_i\in \TV$.
\begin{lemma}
For any assignment $s\in\Sigma$, non-probabilistic update $\eta: \PV
\rightarrow \Z[\V \union D]$, and values $k_i \in
\text{supp}([\eta(x_i)](s))$, the assignment $s'(x_i)=k_i$ satisfies the
non-probabilistic over-approximation \todo{todo}
\end{lemma}
\begin{definition}[Unfolding constraints]
Let $\varphi \in \C$ be a constraint and $t = (\_, \tau, \eta, \_, \_) \in
\T$. An unfolding of $\varphi$ via $t$ is defined as follows.
\begin{align}
\unfold(\varphi, t) = \begin{cases}
\eta(\varphi \Land \tau \Land \lfloor\eta\rceil)|_{\PV} & \text{if }
\tilde{\eta}(\varphi \Land \tau \Land \lfloor\eta\rceil) \text{ is
SAT}\\
\textit{false} & \text{otherwise}
\end{cases}
\end{align}
\end{definition}
The exactness is relaxed in order to work with abstract domains. Instead, the
result of unfolding an abstract value equivalent to $c$ shall contain all values
of the constraint unfolding but possibly more.
Given an unfolding one can derive an unfolding operation $\unfold^*: \C \times
\T \rightarrow 2^{\T_{PE}}$ that constructs a new transition between the
versions copying the guard $\tau$, update $\eta$, and probability $p$ from the
original transition. Since unsatisfiable versions cannot be reached they are
ignored and no transition is created.
\begin{align}
\unfold^*(\varphi, (l, \tau, \eta, p, l')) = \begin{cases}
\{(\langle l, \varphi\rangle, \tau, \eta, p, \langle l',
\unfold(\varphi, t)\rangle \} & \unfold(\varphi, t) \text{ if is SAT} \\
\{\} & \text{ otherwise}
\end{cases}
\end{align}
The operation is overloaded with a set function $\unfold_P^*: \C \times \GT
\rightarrow 2^{\T_{PE}}$ unfolding all transitions in a general transition.
\begin{align}
\unfold^*(\varphi, g) = \Union_{t \in g}\unfold^*(\varphi, t)
\end{align}
Again the operation is overloaded with a function $\unfold^* : \vers \rightarrow
2^{\T_{PE}}$ unfolding all outgoing general transitions in the \gls{pip} $P$.
\begin{align}
\unfold^*(\langle l, \varphi\rangle) = \Union_{g \in \GT_{out}(l)} \unfold^*(\varphi, t)
\end{align}
\begin{example}
\end{example}
\begin{algorithm}
\caption{$evaluate(P, G, v)$}
\KwData{
A PIP $P$, Graph $G$, and source version $\langle l, \varphi\rangle$
}
\KwResult{A Graph $G$}
$N = \unfold^*(v)$\;
\For{$t = (v, \tau, \eta, p, v') \in N $}{
\If {$v' \not\in G$}{
$G := G \union \{t\}$\;
$G := evaluate(P, G, v')$\;
}
\Else {
$G = G \union \{t\}$\;
}
}
\Return {G}
\end{algorithm}
\section{Partial evaluation of \gls{scc}s}
During \gls{koat}s analysis the program is split into \glspl{scc}. Bounds are
found for every \gls{scc} separately and then composed into an overall bound at
the end. Control-flow refinement is only needed for those \glspl{scc} where the
analysis fails to find tight bounds. In practice, linear bounds are considered
tight enough, whereas polynomial and exponential complexity bounds warrant
refinement in hope for tighter bounds.
The previously presented control-flow refinement starts at the start locations
and evaluates the whole program. The size of the refined program is limited by
the number of locations and transitions as well as the number of properties on
each abstracted location. The smaller the initial program, the faster the
control-flow, the refined program and subsequent analysis. Instead of evaluating
the program, one can adapt the partial evaluate to only evaluate a single
\gls{scc} of the program. This section presents the partial evaluation of an
\gls{scc}.
The result is again an entire program, where only transitions the \gls{scc}, and
incoming and outgoing transitions to this \gls{scc} are changed.
In a first step the program is copied as a whole, lifted to a trivial evaluation
graph and the \gls{scc}, and all incoming and outgoing transitions are removed
from the copy.
Then the evaluation algorithm is executed from every incoming transition, but
with the additional condition to backtrack after unfolding an exiting
transition. The new transitions are added into the copied evaluation graph and
therefore progress made by earlier evaluations of entry transitions can be
reused, shortening the evaluation of succeeding incoming transitions.
Finally, the evaluation graph is converted back to a \gls{pip} and returned as
result of the partial evaluation of the \gls{scc}.
\newcommand{\RB}[0]{\mathcal{RB}} % Runtime bound
\newcommand{\SB}[0]{\mathcal{SB}} % Size bound
\newcommand{\B}[0]{\mathcal{B}} %Bounds
\newcommand{\Rt}[0]{\mathcal{R}} %Runtime
\newcommand{\E}[0]{\mathbb{E}} % Expected
\newcommand{\scheduler}[0]{\mathfrak{S}}
\newcommand{\ESs}[0]{\mathbb{E}_{\scheduler,s_0}} % Expected
\newcommand{\Prog}[0]{\mathcal{P}} % A PIP
\newcommand{\GRV}[0]{\mathcal{GRV}} % General result variables
\renewcommand{\S}[0]{\mathcal{S}} % grv variable
\newcommand{\RBe}[0]{\mathcal{RB}_\E} % Expected Runtime bound
\newcommand{\SBe}[0]{\mathcal{SB}_\E} % Expected Size bound
\newcommand{\unfold}[0]{\text{unfold}}
\newcommand{\vers}[0]{\textbf{\textsf{Version}}}
\newcommand{\pr}[0]{\textit{pr}}
\newcommand{\prSs}[0]{\textit{pr}_{\scheduler, s_0}}
\newcommand{\PrSs}[0]{\mathbb{P}_{\scheduler, s_0}}
\newcommand{\G}[0]{\mathcal{G}}
\newcommand{\Tin}[0]{\T_\text{in}}
\newcommand{\Tout}[0]{\T_\text{out}}
\newcommand{\GTin}[0]{{\mathcal{G}\mathcal{T}_\text{in}}}
\newcommand{\GTout}[0]{{\mathcal{G}\mathcal{T}_\text{out}}}
constraint $\psi \in \L^{\leq}$ describes an closed affine half-space and every
linear strict inequality constraint $\psi \in \L^{\leq}$ describes an open
constraint $\psi \in \Lin^{\leq}$ describes an closed affine half-space and every
linear strict inequality constraint $\psi \in \Lin^{\leq}$ describes an open
\todo{define support, and constraints}
$\Z[\V\union D]$, is the set of integer polynomials containing variables $\V$ or
distributions. They behave exactly like normal polynomials except that the
values of the distributions are sampled probabilistically instead of being given
as argument. As such we can define a function $[\cdot]: \Z[\V\union D]
\rightarrow \Sigma \rightarrow \Z \rightarrow [0,1]$ that returns the
probability that the polynomial $f \in \Z[\V \union D]$ with a state $s \in
\Sigma$ evaluates to a value $k\in\Z$ with a probability $[f](s)(k)$.
The values of the variables are fixed by a syntactical substitution, where every
variable $x_i\in\V$ is replaced by the value fixed in the assignment $s(x_i)$,
we write $[x_i\backslash s(x_i)]$.
The resulting polynomial $f[x_i\backslash s(x_i)]$ describes a random variable
$[f](s)$ that takes a value at random depending only on the distributions $D_1,
\dots, D_m$ within. The random variable has the following probability
distribution:
\begin{align}
[f](s)(k) &= \sum_{\substack{d_i \in \Z \\ i = 0,\dots,m}} D_1(d_1)
\cdot D_2(d_2) \cdots D_m(d_m) \cdot \begin{cases}
1, & \text{if } f[x_i \backslash s(x_i)][D_i\backslash d_i] = k \\
0, & \text{otherwise}
\end{cases}\\
& = P([f](s) = k)
\end{align}
where $f[D_i\backslash d_i]$ is a syntactical substitution of every distribution
$D_i$ by an integer value $d_i \in \Z$ for all $i = 1,\dots,m$.
Under misuse of notation we write $f(s)$ to denote the unique value $v$ with
$f(s)(v) = 1$, when it exists. \todo{check if actually used}
The support of polynomial $f$ with an assignment $s$ is the set of values that
can be achieved with the polynomial by sampling the random distributions.
\begin{example}
Consider the polynomial $f = x^2 + \textsc{Unif}(3,5)$ and the assignment
$s(x) = 2$. $[f](s)$ is the random variable described by the polynomial $f[x/2]
=2^2 + \textsc{Unif}(3,5)$.
It can take three different values with the following probability
\begin{equation}
[f](s)(k) = \begin{cases}
\frac{1}{3}, & \text{if } k \in \{7,8,9\}\\
0, & \text{otherwise}
\end{cases}
\end{equation}
\end{example}
\begin{definition}[Probabilistic Integer Program]\label{def:pip}
$(\PV, L, \GT, l_0)$ is a probabilistic integer transition system where
\begin{definition}[Probabilistic Integer Program\cite{meyer2021tacas}]\label{def:pip}
$(\PV, \Loc, \GT, l_0)$ is a probabilistic integer transition system where
\end{enumerate}
\end{definition}
\subsection{Probability space of a program\label{ssec:probabilityspace}}
Before defining the semantics of a \gls{pip} a new special location $l_\bot$ is
introduced that will be used to indicate the termination of a program as well as
a virtual transition $t_\bot$ and general transition $g_\bot = \{t_\bot\}$ with
the same purpose. Instead of just vaguely \enquote{terminating} a program takes
the transition $g_\bot$ to the location $l_\bot$ if and only if no other
transition is walkable. Since no normal transition leaves the location $l_\bot$
the run loops indefinitely on the location $l_\bot$ after termination. In
addition a new transition $t_\text{in}$ will be used for the start of the
program.
A state $s$ is a variable assignment for the variables of a program $s: \V
\rightarrow \Z$. \Sigma is the set of all states. The set of configuration is
defined as $\confs = L \union \{l_\bot\} \times \T \union \{t_{\text{in}},
t_\bot\} \times \Sigma$.
Note that additionally to the location and state, the last transition taken to
the current location is captured as well. This won't be relevant for the partial
evaluation but is important to define the probabilistic properties of a
\gls{pip}.
A run of the program is an \textit{infinite} sequence of configurations, and the
set of all runs is defined as $\runs = \confs^\omega$. In contrast a finite
prefix is a finite sequence of configurations. The set of all finite prefixes is
defined as $\fpath = \confs^*$.
\begin{rem}
The set of configurations does not exclude invalid combinations of
locations and transitions, that is a configuration with a location that is
not the target of the incoming transition, or a state that is impossible to
reach via the incoming transition. Conversely, the set of runs and finite
prefixes does not exclude invalid paths, e.g. two consecutive configurations
that cannot be reached via the given transition, etc. The validity of a path
will be described by a probability measure where invalid paths have a zero
probability.
\end{rem}
For a \gls{pip} one can define a probability space $(\runs, \F, \PrSs)$, where the
outcomes of the probabilistic event are the runs on the program. Every distinct
run is measurable hence the $\sigma$-algebra $\F$ is defined to contain every
set ${r} \in \F$ for $r \in \runs$. The probability measure $\prSs$ describes
the probability of the given run for a scheduler $\scheduler$ and the input
given to the program as the initial state $s_0$. Formally the probability space
is defined by a cylindrical construction expanding the length of finite prefixes
up to infinity. For the detailed construction we'll refer to \cite{meyer20arxiv}.
On any given configuration a scheduler resolves the non-determinism first. The
probability measure will use the result of the scheduler instead of the previous
configuration. The scheduler keeps the values of all program variables from the
previous configuration. It selects a general transition $g$ and a variable
assignment $\tilde{s}$ satisfying the guard of the general transition. If no
such transition is found of the program is already in the terminating state
$l_\bot$ it takes the special transition $g_\bot$ indicating termination.
\begin{definition}[Scheduler]\label{def:scheduler} \cite{meyer20arxiv}
A function $\scheduler : \confs \rightarrow (\GT \uplus \{g_\bot\})$ is a
scheduler if for every configuration $c = (l,t,s) \in \confs$,
$\scheduler(c) = (g, s')$ implies:
\begin{enumerate}
\item \label{itm:scheduler1} $s'(x) = s(x)$ for all $x \in \PV$
\item $l$ is the start location $l_g$ of the general transition $g$.
\item \label{itm:scheduler3} $s' \models \tau_g$ for the guard $\tau_g$ of the general
transition $g$.
\item $g = g_\bot$ and $s' = s$, if $l=l_bot$, $t=t_\bot$, or no $g' \in
\GT$ and $s'\in \Sigma$ satisfy \cref{itm:scheduler1} to
\cref{itm:scheduler3}
Let $\scheduler$ be a scheduler and $s_0$ be an initial state. First, the
probability for a program to start at a configuration $c$ is 1 only for the
configuration that has the initial state $s_0$ at the start location $l_0$
coming from the initial transition $t_\text{in}$. All other configuration are
invalid starting configurations and get assigned a zero probability by the
probability measure $\prSs : \confs \rightarrow [0,1]$
\begin{equation}
\prSs(c) = \begin{cases}
1 & \text{ if } c = (l_0, s_0, t_\text{in}) \\
0 & \text{ otherwise}
\end{cases}
\end{equation}
Lets say the program ran up to a configuration $c=(l,t,s)$. The scheduler
resolves the non-deterministic sampling and branching to $\scheduler(c) = (g,
\tilde{s})$. The probability for the event $c \rightarrow c'$ that $c'$ follows
the configuration $c$ with $c' = (l', t', s')$ and $t' = (l, p, \tau, \eta, l')
\in g$ depends on three things: First the transition $t \in g$ is taken with the
probability $p$, second the probability that each program variables $x\in\PV$ is
updated to the value $s'(x)$ by the probabilistic update - recall definition
\ref{def:polyindeterminates}, third that the temporary variables are sampled by
the scheduler equal to the target state. The result is the probability measure
$\prSs: \confs^2 \rightarrow [0,1]$.
\begin{align}
\prSs(c \rightarrow c') = p \cdot \prod_{x\in\PV} [\eta(x)](\tilde{s})(s'(x))
\cdot \prod_{u\in\V\backslash\PV} \delta_{\tilde{s}(u),s(u)},
\end{align}
with any $z_1, z_2 \in \Z$:
\begin{equation}
\delta_{z_1,z_2} = \begin{cases}
1, & \text{if } z_1 = z_2 \\
0, & \text{if } z_1 \neq z_2.
\end{cases}
\end{equation}
Naturally for all other configurations $c'$, such as configuration that are not
reachable by the given transition, or transitions that were not selected by the
scheduler we define $\prSs(c \rightarrow c') = 0$.
\begin{align}
\prSs(f) = \prSs(c_0) \cdot \prod_{i=1}^n \prSs(c_{i-1} \rightarrow c_i)
\end{align}
By a cylindrical construction one gets the final probability measure for the
infinite runs $\PrSs : \runs \rightarrow [0,1]$ and the probability space
$(\runs, \F, \PrSs)$.
\begin{definition}[Admissible Run]\label{def:admissible}
An finite prefix $f \in \fpath$ is \textit{admissible} if and only if
$\prSs(f) > 0$. A run $c_0c_1\dots \in \runs$ is \textit{admissible} if and
only if every finite prefix $c_0\dots c_n \in\fpath$ is admissible.
\end{definition}
\begin{rem}
An admissible run can very well have have a zero probability if the
probability of its prefixes approach zero for $n \rightarrow \inf$.
\end{rem}
\begin{definition}[Terminating Run]\label{def:termination}
A run is is terminating if it is admissible and it has a finite prefix
ending in the location $l_\bot$.
\end{definition}
Because by definition only the terminating transition $t_\bot$ is admissible
when the program is in the terminal location $l_\bot$ and the following
configuration must again be in the terminal location $l_\bot$. A terminating run
must end with infinitely many repetitions of a terminating configuration
$c_\bot=(l_\bot,t_\bot,s)$ for some assignment $s\in\Sigma$.
\subsection{Bounds}
The whole goal of \gls{koat}'s analysis is to find bounds. A bound is an
expression that describes the upper limit of a property of the program.
\begin{definition}[Bounds\cite{meyer2021tacas}]\label{def:bounds}
The set of bounds $\B$ is the smallest set with $\PV \union \R_{\geq 0}
\subseteq \B$, and where $b_1,b_2 \in \B$ and $v\in \R_{\geq 1}$ imply $b_1 +
b_2, b_1 \cdot b_2 \in \B$ and $v^{b_1} \in \B$.
\end{definition}
During analysis of classical integer programs, one is especially interested in
runtime and size bounds. A runtime bound is an upper bound on the number of
visits to a transition in the program over all admissible runs, whereas a size
bound describes the highest value of a program variable observed at a
transition over all admissible runs. Because the values observed in a run are
highly dependent on the decisions made by the scheduler $\scheduler$, the bounds
we take the supremum over all $\scheduler$. One can also picture the scheduler
as an adversary that always takes the worst decision possible for runtime and
size-bounds.
\begin{definition}[Runtime and Size Bounds\cite{meyer2021tacas}\label{def:runtimebounds}]
$\RB: \T \rightarrow \B$ is a runtime bound and $\SB: \T \times \V
\rightarrow \B$ is a size bound if for all transitions $t\in\T$, all
variables $x\in\V$, all schedulers $\scheduler$, and all states $s_0 \in
\Sigma$, we have
\begin{align*}
&|s_0|(\RB(t)) \geq \sup \set{ |\set{i}{t_i =
t}|}{f=(\_,t_0,\_)\dots(\_,t_n,\_), \prSs(f) > 0} \\
&|s_0|(\SB(t, x)) \geq \sup \set{ |s(x)| }{f=\dots(\_,t,s), \prSs(f) >
0}.
\end{align*}
\end{definition}
In probabilistic programs not only the worst case is of interest, but also the
probable case. For example a program might run indefinitely in the worst case
but the worst-case has a probability approaching zero. The expected bounds are
defined using the expected value of random variables in the probability space
$(\runs, \F, \PrSs)$ defined in \Sref{ssec:probabilityspace}. Moreover, recall
definition \ref{def:expected_value}.
\begin{definition}[Expected Runtime Complexity, \acrshort{past}
\cite{meyer2021tacas}\label{def:past}]
For $g \in \GT$ its runtime is the random variable $\Rt(g)$ where $R: \GT
\rightarrow \runs \rightarrow \overline{\N}$ with \begin{equation}
\Rt(g)((\_,t_0,\_)(\_,t_i,\_)\dots) = |\set{i}{t_i \in g}|. \end{equation}
For a scheduler $\scheduler$ and starting state $s_0\in\Sigma$, the expected
runtime complexity of $g\in\GT$ is $\E_{\scheduler,s_0}(\Rt(g))$ and the
expected runtime complexity of $\Prog$ is $\sum_{g\in\GT}\E_{\scheduler,
s_0}(\Rt(g)))$.
If $\Prog$'s expected runtime complexity is finite for every scheduler
$\scheduler$ and every initial state $s_0$, then $\Prog$ is called
\acrfull{past}
\end{definition}
The expected size bound is defined similarly
\begin{definition}[Expected Size
Complexity\cite{meyer2021tacas}\label{def:expectedcomplexity}]
The set of general result variables is $\GRV = \set{(g,l,x)}{g\in\GT,
x\in\PV, (\_,\_,\_,\_,l) \in g}$. The size of $\alpha = (g,l,x) \in \GRV$ is
the random variable $\S(\alpha)$ where $\S : \GRV \rightarrow (\runs
\rightarrow \overline{N})$ with
\begin{equation}
\S(g,l,x)((l_0,t_0,s_0)(l_1,t_1,s_1)\dots) = \sup \set{|s_i(x)|}{l_i =
l, t_i \in g}.
\end{equation}
For a scheduler $\scheduler$ and starting state $s_0$, the expected size
complexity of $\alpha \in \GRV$ is $\E_{\scheduler,s_0}(\S(\alpha))$.
\end{definition}
\begin{definition}[Expected Runtime and Size
Bounds\cite{meyer2021tacas}\label{def:expectedbounds}]\phantom{ }
\begin{itemize}
\item $\RBe : \GT \rightarrow \B$ is an expected runtime bound if for
all $g\in\GT$, all schedulers $\scheduler$, and all $s_0 \Sigma$, we
have $|s_0|(\RBe(g)) \geq \ESs(\Rt(g))$.
\item $\SBe : \GRV \rightarrow \B$ is an expected size bound if for
all $\alpha\in\GT$, all schedulers $\scheduler$, and all $s_0 \Sigma$, we
have $|s_0|(\SBe(g)) \geq \ESs(\S(g))$.
\end{itemize}
\end{definition}
\begin{example}
\end{example}
\subsection{Costs}
It is useful to add the notion of costs to a transition to reflect variable
complexity of transitions. The accumulative costs of a run are the sum of costs
for each transition taken. Accumulative costs can simulated by an additional
program variable $\mathbf{c} \in \PV$ that is updated only by polynomials of the
form $\eta(\mathbf{c}) = \mathbf{c} + c_t$, where $c_t \in
\Z$ is the cost associated with the transition $t\in\T$.
For a given size bound $\SB$ for the \gls{pip} $\Prog$ and a starting assignment
$s_0$, the cost bound is defined as $|s_0|(\SB(t_\bot,\mathbf{c}))$, and given
an expected size bound $\SBe$, the expected cost bound is defined as
$|s_0|\SBe(t_\bot, mathbf{c})$.
\subsection{Graph operations}
A \gls{pip} contains a directed graph with vertices $\L$ and edges $\T$. We will
define some graph operations that will become useful during the thesis. The set
of directed graphs over locations $\L$ and transitions $\T$ is denoted by $\G$.
% The graph $G(P)$ denotes the graph contained within a \gls{pip} $P$ and is
% defined as $G(P) = (L, \T)$. The following graph operations will be defined on
% graphs only, but work equally well for \glspl{pip} by using the contained graph
% operation first. However, the result might not be a valid integer program, per-se.
A set of transitions $T \subseteq \T$ induces a sub-graph $G(T)$ in a program $P$,
with $\G: 2^\T \rightarrow \G$. Similarly a set of locations $L \subseteq \Loc$
induces a sub-graph $G(L)$ in $P$, with $G: 2^\Loc \rightarrow \G$.
\begin{rem}
The resulting graphs cannot allways be lifted to a valid \gls{pip}, since
cumulative probability of general transitions are not necessarily preserved.
\end{rem}
The operations $\Tin : \Loc \rightarrow 2^\T$, $\Tout : \Loc \rightarrow 2^\T$,
$\GTin : \Loc \rightarrow 2^\GT$, and $\GTout: \Loc \rightarrow 2^\GT$ return the
set of (general) transitions starting or ending in the given location.
Adding a transition and possibly new locations therein to graph is denoted by
the plus operator $+: \G, \T \rightarrow \G$
\subsection{Semantics}
\todo{open question regarding temporary variables}
% the original definition alled states defining variables, but this let's
% temporary variables carry-over to the next transition. Is this what we
% want?
Before defining the semantics of a \gls{pip} a new special location $l_\bot$ is
introduced that will be used to indicate the termination of a program as well as
a virtual transition $t_\bot$ and general transition $gt_\bot = \{t_\bot\}$ with
the same purpose. Instead of just vaguely \enquote{terminating} a program takes
the transition $gt_\bot$ to the location $l_\bot$ if and only if no other
transition is walkable. Since no normal transition leaves the location $l_\bot$
the run loops indefinitely on the location $l_\bot$ after termination. In
addition a new transition $t_\text{in}$ will be used for the start of the
program.
A state $s$ is a variable assignment for the variables of a program $s: \V
\rightarrow \Z$. \Sigma is the set of all states. The set of configuration is
defined as $\confs = L \union \{l_\bot\} \times \T \union \{t_{\text{in}},
t_\bot\} \times \Sigma$.
Note that additionally to the location and state, the last transition taken to
the current location is captured as well. This won't be relevant for the partial
evaluation but is important to define the probabilistic properties of a
\gls{pip}.
A run of the program is an \textit{infinite} sequence of configurations, and the
set of all runs is defined as $\runs = \confs^\omega$. In contrast a finite
prefix is a finite sequence of configurations. The set of all finite prefixes is
defined as $\fpath = \confs^*$.
\begin{rem}
The set of configurations does not exclude invalid combinations of
locations and transitions, that is a configuration with a location that is
not the target of the incoming transition, or a state that is impossible to
reach via the incoming transition. Conversely, the set of runs and finite
prefixes does not exclude invalid paths, e.g. two consecutive configurations
that cannot be reached via the given transition, etc. The validity of a path
will be described by a probability measure where invalid paths have a zero
probability.
\end{rem}
For a \gls{pip} one can define a $(\runs, \F, \textit{pr})$. The outcomes of
the probabilistic event in this space are the runs on the program. Every
distinct run is measurable hence the $\sigma$-algebra $\F$ is defined to contain
every set ${r} \in \F$ for $r \in \runs$. The probability measure $pr$ describes
the probability of the given runs. Formally the probability space is defined by
a cylindrical construction \todo{elaborate}. Since the construction is rather
involved we will only sum up and the properties relevant to this thesis.
First the probability of a run is separate from the non-determinism. So for all
the considerations the non determinism is assumed to be decided by a scheduler
and the probability measure depends on the scheduler.
The probability of specific run in the probability space is equal to the
probability that every transition in the run is taken. Naturally, invalid
transitions (either because they do not exist in the program, or because the
guard is not satisfied) have a zero probability.
The probability space will be used to define the semantics of a \gls{pip}
\subsection{Notes about the probability measure}
Probability that a variable $x$ takes the value $s(x)$ when taking the
transition $t$ from an (non-deterministically resolved) assignment $\tilde{s}$:
\todo{redefine with polynomials over indeterminates}
\begin{align}
q_{\tilde{s},t,s(x)}(x)
\end{align}
(Trivial) probability that the temporary variables got assigned there expected
values, given a (non-deterministically resolved) assignment $\tilde{s}$ and the
resulting assignment $s$
\begin{align}
\delta_{\tilde{s}(u),s(u)})
\end{align}
Probability of a transition after a finite prefix: For $F = \dots (l', t', s') \in \fpath$, $(l, t,
x) \in \confs$, $t = (l', p, \tau, \eta, l)$.
\begin{align}
pr_{\mathfrak{S},s_0}((l', t',s') \rightarrow (l, t, s)) :=
pr_{\mathfrak{S},s_0}(f \rightarrow (l,t,s))= p \cdot \prod_{x\in\PV}
q_{\tilde{s},t,s(x)}(x) \cdot \prod_{u \in \TV} \delta_{\tilde{s}(u),s(u)})
\end{align}
\begin{rem}
The probability depends on the scheduler $\mathfrak{S}$ and the starting
state $s_0$.
\end{rem}
Probability of a finite prefix. For $f = c_0, \dots, c_n \in \fpath$:
\begin{align}
pr_{\mathfrak{S},s_0}(f) = pr_{\mathfrak{S},s_0}(c_0) \cdot \prod_{i=1}^n pr_{\mathfrak{A},s_0}(c_{i-1} \rightarrow c_i)
\end{align}
Cylindrical construction yields the unique probability measure
$\mathbf{P}_{\mathfrak{S},s_0} : \F \rightarrow [0,1]$ for infinite runs
\subsection{Runtime}
\begin{comment}
for a transition:
- constant cost
- most simple case cost = 1 for every transition
- combining transitions and adding costs
- costs depending on variables
for a prefix:
- finite sequence of configurations
- runtime = sum of all costs over all steps in the prefix
- probability = product over all transition probabilities in the prefix
- legal prefix = non-zero probability
for a path:
- infinite sequence of configurations
- runtime = runtime of the largest non-zero probability prefix
- pro
legal paths = paths with a non-zero
for a program:
- supremum over runtime of all paths
\end{comment}
\begin{definition}[Runtime of a run]
\end{definition}
\begin{definition}[Worst-case runtime of a Program]
\end{definition}
\begin{definition}[Expected runtime of a Program]
\end{definition}
\newacronym{past}{PAST}{probably almost surely terminating}