paradyx/ma-tex: ch1_introduction.tex

% TeX root = main.tex
\section{Background}\label{sec:background}
The question of whether a program terminates is a famous problem at the heart of
computer science and it is known to be undecidable. The so-called Halting
Problem is strongly related to the search for upper and lower runtime complexity
bounds, and it is equally undecidable. Nevertheless, the answers to these
questions are very important in practice. For example, a compiler might want to
warn the programmer about faulty code sections that were not marked explicitly
to run indefinitely, or the developer could be forced to prove the efficiency of
their program in critical scenarios. In an ever more complex world, with ever more
complex algorithms, the need for automatic tools arise. Even though the
question of runtime complexity cannot be answered in a general case, many
tools have been developed to automatically analyse the complexity of various
programming paradigms as tightly and quickly as possible \cite{giesl2017aprove,
montoya2014aplas,alarcon2017muterm,irankfinder2018wst}.

Those tools are being constantly improved on and they are regularly compared to each
other in the \gls{termcomp} \cite{termcomp2015,termcomp2019}, the goal being to
find as many, and as tight bounds as possible for a set of test programs. 

\section{Motivation}

Consider the program $\Prog_1$ displayed in \fref{fig:classic_c}. It contains
two program variables $x$ and $y$, and one temporary variable $u$, which is used
to non-deterministically sample $x$ on the first line of the program. The
program clearly terminates. The first loop increments $x$ up to 3 if $x$ is
positive to begin with. The second loop decrements $y$ while $y$ is positive.
However, classical analysis using only \gls{mprf}\cite{benamram2017} runtime
complexity bound for this program \cite{giesl2022lncs}. With the help of
\gls{cfr} on sub-\gls{scc} level, the program can be transformed into an
equivalent program (see \fref{fig:classic_pe_c}) where the analysis with
\gls{mprf} succeeds at finding a finite runtime complexity bound, because now
the variable $x$ is constrained when entering the loop.

\begin{figure}
    \centering
    \begin{subcaptionblock}{0.46\textwidth}
        \centering
        \raisebox{1cm}{%
        \input{figures/ch1_classic_c}
        }
        \caption{Classic Integer Program $\Prog_1$\label{fig:classic_c}}
    \end{subcaptionblock}
    \begin{subcaptionblock}{0.46\textwidth}
        \centering
        \input{figures/ch1_classic_pe_c}
        \caption{Partial evaluation of $\Prog_1$\label{fig:classic_pe_c}}
    \end{subcaptionblock}
    \caption{Partial evaluation of a classic integer program with hard to find
    runtime complexity bounds using only \gls{mprf}.\cite{giesl2022lncs}}
\end{figure}

This technique used for the transformation was first introduced by
\citeauthor{Domenech19} \cite{Domenech19} and it is called \gls{cfr} via partial
evaluation. It was implemented in their analysis tool iRankfinder
\cite{irankfinder2018wst}. It achieves very similar results to \gls{mprf} when
used on entire programs or whole \glspl{scc}. Its real strength comes from
applying it to a sub-\gls{scc} level and refining specifically the loops where
\glspl{mprf} fail to find bounds. This technique was presented by
\citeauthor{giesl2022lncs} \cite{giesl2022lncs} and it is implemented in
\gls{koat} \cite{giesl2014ijcar} the complexity analysis tool developed by
\citeauthor{giesl2014ijcar} at RWTH University. 

Underneath, \gls{koat} uses iRankFinder for the \gls{cfr}. Recently, \gls{koat}
was extended to probabilistic integer programs \cite{meyer2021tacas}. For
probabilistic programs, \gls{koat} uses probabilistic ranking functions, that
work similarly to linear ranking functions, but take probability into account.
They fail for the same reasons as \gls{mprf} for certain kinds of loops.
Unfortunately, by using iRankFinder which is limited to non-probabilistic
integer programs, the technique of \gls{cfr} via partial evaluation remained out
of reach for probabilistic programs.

\begin{figure}
    \centering
    \begin{subcaptionblock}{0.46\textwidth}
        \centering
        \raisebox{2cm}{%
            \input{figures/ch1_prob_c}
        }
        \caption{Probabilistic Integer Program $\Prog_2$\label{fig:prob_c}}
    \end{subcaptionblock}
    \begin{subcaptionblock}{0.46\textwidth}
        \centering
        \input{figures/ch1_prob_pe_c}
        \caption{Partial evaluation of $\Prog_2$\label{fig:prob_pe_c}}
    \end{subcaptionblock}
    \caption{Partial evaluation of a Probabilistic Integer Program with hard to
    find runtime complexity bounds using only probabilistic ranking functions.}
\end{figure}

Consider the very similar probabilistic integer program $\Prog_2$ which is
displayed in \fref{fig:prob_c}. The variables $x$ and $y$ are now only
in/-decremented with 50\% probability in every iteration. The program will
probably always terminate, since a longer runtime becomes increasingly unlikely.
The current version of \gls{koat} fails to find a finite runtime and expected
complexity bound for this program. It would be useful to transform $\Prog_2$
into an equivalent program like the one shown in \fref{fig:prob_pe_c}, similar
to the \gls{cfr} technique presented by \citeauthor{giesl2022lncs}
\cite{giesl2022lncs} and \citeauthor{Domenech19} \cite{Domenech19}.

\section{Related Work}
\subsection{\acrshort{aprove}, \acrshort{koat}, and \acrshort{loat}} 
The tools \gls{loat} \cite{frohn2022ijcar} and \gls{koat}
\cite{brockschmidt2016acm} were developed at RWTH University and they aim to
find lower and upper (respectively) runtime complexity bounds on transition
systems. At its core, \gls{koat} computes bounds for subprograms using
\gls{mprf}\cite{benamram2017} or a \gls{twn}\cite{lommen2022twn}, and then puts the subprograms
together in order to generate upper runtime complexity and size bounds for the
whole program. Optionally to improve the analysis, \gls{koat} can resort to
partial evaluation as a control-flow refinement technique on a sub-\gls{scc}
level \cite{giesl2022lncs}. \gls{aprove} is the overall tool, and it has taken
part in competitions. It works on various input formats and transforms them
accordingly so that they can be used with specialised tools such as \gls{koat}
and \gls{loat}.

\subsection{iRankFinder}
The tool iRankFinder \cite{irankfinder2018wst} is another tool taking part in
\gls{termcomp}. It uses linear and lexicographic ranking functions for the
analysis and the program is internally represented as \gls{chc} program, which
is very similar to integer transition systems in the regard that integer
transition systems can be transformed into equivalent \gls{chc} systems.
Control-flow refinement is a technique where a program is transformed into a
semantically equivalent program, which is easier to analyse with the existing
analysis techniques (e.g. linear, lexicographic, and multiphase ranking
functions). The technique was put forward by \citeauthor{gulwani2009}
\cite{gulwani2009}. iRankFinder implemented the control-flow refinement
technique by partial evaluation proposed by \citeauthor{gallagher2019eptcs}
\cite{gallagher2019eptcs}.

\section{Goal}\label{sec:goal}
\gls{koat} borrowed the implementation of iRankFinder as a black box, which came
with a number of limitations \cite{giesl2022lncs}. Besides code-quality and
performance issues, the external library would not be able to profit from future
improvements to the algorithms. This thesis improves on the latest version of
\gls{koat} with regard to non-prob\-a\-bilis\-tic integer programs by reimplementing
the control-flow refinement technique of iRankFinder \cite{irankfinder2018wst}
natively in KoAT instead of calling the external library. By adapting the method
of partial evaluation to the paradigm of probabilistic programs and proving its
correctness, partial evaluation is made available for probabilistic programs.

The well-known basics required to understand this thesis are recapitulated in
\Cref{ch:preliminaries}, focusing especially on constraints, probability theory,
and the formalism behind (probabilistic) integer programs. 
In \Cref{ch:theory} the \gls{cfr} techniques from \citeauthor{Domenech19} are
adapted to \Glspl{pip}. We prove the theoretical correctness of the
construction. Furthermore, the technique is extended to refine only a part of
the original program.
\Cref{ch:implementation} focuses on the implementation and the integration into
the existing analysis. The performance of the new implementation is analysed.
Finally in \Cref{ch:conclusion}, we recapitulate the results of this thesis and
give an outlook on future research.