In his seminal work of 1971, Stephen Cook defined the complexity classes P - problems that are solvable by an algorithm with a polynomial running time - and NP - problems that can be verified by an algorithm running in polynomial time. The class P is often taken to be a good abstraction for the problems that are **efficiently solvable**. Cook also established that a specific problem, known as **boolean satisfiability**, also denoted simply by **SAT**, is a *hardest* problem in NP in the sense that any other problem in NP can be solved by reducing it to SAT.

SAT is the problem of deciding, given a so-called *propositional formula* $\varphi$ of the form

\[ \left((p\lor q)\land ((p\Rightarrow r\lor s)\land \neg s)\right)\lor \dots\]

whose variables $p,q,r,s,\dots$ can take yes/no as values, whether *there exists *a function mapping the variables $\{p,q,r,s,\dots\}$ to true/false in such a way that the formula becomes true. In this case, we say that the formula is

\[ (p\lor q\lor s)\land (p \lor \neg q)\land (p\lor \neg s) \land \dots\]

is as hard as SAT itself. Each *conjunct* $(p\lor q\lor s), (p \lor \neg q), (p\lor \neg s)$ is called a *clause *of the formula.

Cook's result marks the beginning of the field of **complexity theory**, in which one describes various abstract models of computation and the relationship between them.

Cook's problem has the following formulation: given a set of **variables** with true/false values, and a set of arbitrary **constraints **on these variables (the clauses of the formula), does there exist an assignment satisfying **all** the constraints?

Suppose that the constraints appearing in the input are not arbitrary, but satisfy some additional assumptions. For example, imagine that every individual constraint in the input has the property that they are satisfied by the assignment given value **false** to all the variables. Then clearly, on such inputs, one can solve the problem efficiently: every instance becomes true under the assignment taking all the variables to **false**!

To formalize this notion, let $\Gamma$ be a set of constraints. For example, $\Gamma$ might consist of all the clauses $(p_1\lor p_2\lor\dots)$ where no variable appears negated (such clauses are called *true-valid*, since setting all variables to true results in a valid solution), or of all the clauses containing at most 2 variables. Let SAT$(\Gamma)$ be the restriction of the problem SAT where the instances can only use constraints from the set $\Gamma$.

Schaefer realised that there are exactly **six** types of restrictions that can be imposed on the constraints in a way that the problem becomes solvable in polynomial time, and that any class of inputs that does not satisfy any of the six restrictions is able to encode Cook's original SAT problem.

**Theorem (Schaefer, 1978).** If any of the following conditions hold, then the problem SAT($\Gamma$) can be solved in polynomial time.

- $\Gamma$ is true-valid,
- $\Gamma$ is false-valid,
- $\Gamma$ consists of Horn relations,
- $\Gamma$ consists of dual-Horn relations,
- $\Gamma$ is definable by $2$-CNF,
- every constraint in $\Gamma$ is definable by linear equations modulo $2$.

Otherwise, SAT($\Gamma$) is NP-complete.

Schaefer's theorem is a **dichotomy** theorem: it separates the class of boolean satisfiability problems into those that are efficiently solvable and those that are as hard as SAT.

In 1990, Hell and Nešetril discovered a similar dichotomy, related to the complexity of **coloring graphs**. An $\mathbb H$-coloring of a graph $\mathbb G$ is a map $f$ from the vertices of $\mathbb G$ to the vertices of $\mathbb H$ such that if $\{u,v\}$ is an edge in $\mathbb G$, then $\{f(u),f(v)\}$ is an edge in $\mathbb H$. The **$\mathbb H$-coloring problem **takes as input a graph $\mathbb G$ and asks whether there exists an $\mathbb H$-coloring of $\mathbb G$.

For example, if $\mathbb H$ consists of a complete graph on $n$ vertices, then the $\mathbb H$-coloring problem is the common $n$-colorability problem, where one is asked to color the vertices of a graph in a way that two neighbor vertices receive different colors. The problem of $n$-colorability has received attention in computer science, where it can be used to model a variety of problems about networks, and in discrete mathematics.

Hell and Nešetril proved that the following **dichotomy** holds: either $\mathbb H$ contains a loop or is bipartite, i.e., its vertices can be partitioned into two parts $H_1$ and $H_2$, each containing no edge, in which case the $\mathbb H$-coloring problem is solvable in polynomial time, or the $\mathbb H$-coloring problem is as hard as SAT.

Feder and Vardi noticed that the $\mathbb H$-coloring problem, as well as the SAT($\Gamma$) problem, are particular instances of problems called *constraint satisfaction problems* (CSPs). A CSP has the following parameters: a fixed set $A$, and a list of relations $R_1,\dots,R_r$ on $A$. The *template* of the CSP is then $\mathbb A=(A;R_1,\dots,R_r)$, and CSP($\mathbb A$) is defined as the problem of recognizing those formulas of the form

\[ R_1(x,x,y) \land R_2(y,z) \land R_2(x,z)\land \dots \]

that admit a valid solution $s\colon\{x,y,z,\dots\}\to A$, similarly as in the SAT($\Gamma$) case.

They conjectured that this class of problems for finite sets $D$ obey a complexity dichotomy as above: every problem in this class is solvable in polynomial time, or is as hard as SAT. Feder and Vardi's seminal paper was the beginning of a massive project, that of classifying the complexity of all constraint satisfaction problems with a finite template. This project eventually succeeded in 2017, when Bulatov and Zhuk independently gave proofs of the Feder-Vardi dichotomy conjecture.

The proofs of the Feder-Vardi conjecture did not signal an end of the field of constraint satisfaction, on the contrary. Nowadays, the area of constraint satisfaction has been growing at a rapid pace and has branched out into several subareas, that cover a greater variety of problems than the CSPs over finite domains.

Particularly active are the fields of promise constraint satisfaction, related to the complexity of approximation problems, and the field of infinite-domain constraint satisfaction. A dedicated article about these areas will appear on our website in the future.

The field of constraint satisfaction has a lot to offer to students that are interested in algorithms or in the application of mathematics to theoretical computer science. The range of techniques that are used is large, going from algebra, combinatorics, logic, topology, or category theory. If you are interested in writing a thesis on this topic, simply write us!