Les cookies nous permettent de vous proposer nos services plus facilement. En utilisant nos services, vous nous donnez expressément votre accord pour exploiter ces cookies.En savoir plus OK
A computer-checked proof of the Four Colour Theorem Georges Gonthier Microsoft Research Cambridge Revenir à l'accueil
Au format "texte" :
1 A computer-checked proof of the Four Colour Theorem Georges Gonthier Microsoft Research Cambridge This report gives an account of a successful formalization of the proof of the Four Colour Theorem, which was fully checked by the Coq v7.3.1 proof assistant [13]. This proof is largely based on the mixed mathematics/computer proof [26] of Robertson et al, but contains original contributions as well. This document is organized as follows: section 1 gives a historical introduction to the problem and positions our work in this setting; section 2 defines more precisely what was proved; section 3 explains the broad outline of the proof; section 4 explains how we exploited the features of the Coq assistant to conduct the proof, and gives a brief description of the tactic shell that we used to write our proof scripts; section 5 is a detailed account of the formal proof (for even more details the actual scripts can be consulted); section 6 is a chronological account of how the formal proof was developed; finally, we draw some general conclusions in section 7. 1 The story The Four Colour Theorem is famous for being the first long-standing mathematical problem to be resolved using a computer program. The theorem was first conjectured in 1852 by Francis Guthrie, and after over a century of work by many famous mathematicians [36,28] (including De Morgan, Peirce, Hamilton, Cayley, Birkhoff, and Lebesgue), and many incorrect “proofs”, it was finally proved by Appel and Haken in 1976 [3]. This proof broke new ground because it involved using computer programs to carry out a gigantic case analysis, which could not, in any real sense, be performed by hand: it covered, quite literally, a billion cases. The Appel and Haken proof attracted a fair amount of criticism. Part of it concerned the proof style: the statement of the Four Colour Theorem is simple and elegant so many mathematicians expected a simple and elegant proof that would explain, at least informally, why the theorem was true — not opaque IBM 370 assembly language programs [5]. Another part, however, was more rational skepticism: computer programming is known to be error-prone, and difficult to relate precisely to the formal statement of a mathematical theorem. The fact that the proof also involved an initial manual case analysis [4] that was large (10,000 cases), difficult to verify, and in which several small errors were detected, also contributed to the uncertainty. In 1995, Robertson, Sanders, Seymour, and Thomas published a more streamlined proof of the theorem [26], using an argument quite similar to the one used by Appel and Haken, but employing C programs to check both the giant and the large case analyses. This second proof, along with the edition of a definitive monograph for the original proof [6], cleared up most doubts concerning the truth of the theorem. Futhermore, Robertson et al. used the larger computational resources available to them to search for and find a significantly (four times) smaller set of cases for the second analysis, by systematically exploring variants in the initial case analysis. 2 Our work can be seen as an ultimate step in this clarification effort, completely removing the two weakest links of the proof: the manual verification of combinatorial arguments, and the manual verification that custom computer programs correctly fill in parts of those arguments. To achieve this, we have written a formal proof script that covers both the mathematical and computational parts of the proof. We have run this script through the Coq proof checking system [13,9], which mechanically verified its correctness in all respects. Hence, even though the correctness of our proof still depends on the correct operation of several computer hardware and software components (the processor, its operating system, the Coq proof checker, and the Ocaml compiler that compiled it), none of these components are specific to the proof of the Four Colour Theorem. All of them come off-the-shelf, fulfill a more general purpose, and can be (and are) tested extensively on numerous other jobs, probably much more than the mind of an individual mathematician reviewing a proof manuscript could ever be. In addition, the most specific component we use — the Coq system, to which the script is tuned — can output a proof witness, i.e., a longhand detailed description of the chain of formal logical steps that was used in the proof. This witness can, in principle, be checked independently (technically, it is a term in a higher-order lambda calculus). Because this witness records only logical steps, and not computation steps, its size remains reasonable, despite the large amount of computation needed for actually checking the proof. Although this work is purportedly about using computer programming to help doing mathematics, we expect that most of its fallout will be in the reverse direction — using mathematics to help programming computers. The approach that proved successful for this proof was to turn almost every mathematical concept into a data structure or a program, thereby converting the entire enterprise into one of program verification. Most of our work consisted in experimenting with and developing proof techniques for dealing with such “higher-level” programs, leading to a proof style that departs significantly from common practice in machine-assisted theorem proving. For instance, we have almost no use for decision procedures or complex proof tactics, and insert relatively few explicit intermediate statements. In fact, many of our proofs look more like debugger or testing scripts than mathematical arguments. We believe that the fact that the “highbrow” mathematics involved in the Four Colour Theorem can be dealt with effectively in this fashion indicates that there is considerable scope for using more effectively mathematical abstraction in software engineering. 2 The statement of the theorem Part of the attractiveness of the Four Colour Theorem is its nutshell-sized statement: Four Colour Theorem: Any planar map can be coloured with only four colours. Of course, this is somewhat of a cheat, and we should be more explicit about what we mean by “coloured map”: Four Colour Theorem: The regions of any simple planar map can be coloured with only four colours, in such a way that any two adjacent regions have different colours. 3 This still leaves the terms simple planar map, region, and adjacent open; in formal mathematics, even such intuitive graphical notions must be precisely defined. It is posible to give a quite general definition, using some basic notions in abstract topology: Definition: A planar map is a set of pairwise disjoint subsets of the plane, called regions. A simple map is one whose regions are connected open sets. Definition: Two regions of a map are adjacent if their respective closures have a common point that is not a corner of the map. Definition: A point is a corner of a map iff it belongs to the closures of at least three regions. Without the restriction to simple maps in the first definition we would have the Empire Colouring Problem, where regions can consist of two or more separate pieces; such disjointed maps can require more than four colours (up to 6r+1, to be exact, when regions are allowed to have up to r disjoint open parts each). The second definition is just a formal way of saying that regions are adjacent when they “touch” at points other than corners. For example, consider a map consisting of four regions a, b, c, d that are the interior of the four squares below Then the closure of a is the union of a with its framing square, similarly for b, c, and d. The closures of a and b have a common point P that is not in the closure of c or d, so a and b are adjacent. On the other hand, a and d are not adjacent, since the only common point of their respective closures is Q, and Q also belongs to the closures of b and d. Note again that without this restriction more than four colours could be needed, e.g., if a, b, c, and d were all mutually adjacent, the the map obtained by adding the outside of the large square would require five colours. Note also that “corners” are not necessarily isolated points: replace the horizontal line segments in the map above with two arcs of the curve y = sin 1 /x, and there’s an entire line segment [Q1Q2] of “corners” all touching each of a, b, c, and d. The other notions we used in the above – open sets, connected sets, closures – are all standard basic undergraduate material. Fully formalizing these notions along with the definitions above takes up about 30 lines of our formal proof, in the file realmap.v. But, precisely because this is a formal proof, these few lines, along with the formal definition of the real numbers in file real.v (about 200 lines in total) are all that one needs to read and understand in order to ascertain that the statement below in file fourcolor.v is indeed a proof of the Four Colour Theorem: d Q1 c Q2 a b a b c d Q P 4 Variable R : real_model. Theorem four_color : (m : (map R)) (simple_map m) -> (map_colorable (4) m). Proof. Exact (compactness_extension four_color_finite). Qed. The other 60,000 or so lines of the proof can be read for insight or even entertainment, but need not be reviewed for correctness. That is the job of the the Coq proof assistant, a job for computers. The definition we use for the real numbers is both axiomatic and constructive, following the approach used in the C-Corn library for intuitionistic real numbers [17]. We define a real_model as a type of objects with operations that satisfy a fairly standard set of axioms1 for the real numbers and prove the Four Colour Theorem for an abitrary real_model. However, we also provide an explicit construction of a model (file dedekind.v) and a proof that this model is unique up to isomorphism (file realcategorical.v). Thus, it is not necessary to read through the somewhat involved Dedekind cut construction of the real model to ascertain that the R in the statement of four_color indeed stands for the usual real numbers. Nevertheless, the existence of the construction ensures that the real axioms we use are consistent with the Coq type theory2 [13,2,35,8,24]. Although the statement of the Four Colour Theorem uses notions from Analysis, the Four Colour Theorem is essentially a result in Combinatorics: the essence of the theorem is a statement of properties of certain finite arrangements of finite objects. Indeed, most mathematical papers on the subject pay only lip service to the continuous statement and quickly (and informally) rephrase the problem in graph theory: colouring a map is trivially equivalent to colouring the graph obtained by taking the regions of the map as nodes, and linking every pair of adjacent regions. In principle, it is even possible to completely forget that regions are sets of points in the graph problem, but this is never done in practice, because the purely graph theoretic characterizations of graphs generated by planar maps, such as the Kuratowski minor exclusion [23] or the Hopcroft-Tarjan planarity test [20], are not intuitively compelling and are difficult to use in proofs. Instead, the planarity of a graph is characterized by means of a planar embedding that maps nodes to points and edges to pairwise disjoint continuous arcs (the justification for this mapping is often informal). It is then possible to apply the Jordan Curve Theorem: The complement of any Jordan (i.e., closed continuous) curve is the disjoint union of two components, each of whose frontier is the whole curve. 1 There are a few twists due to the fact that Coq is based on intuitionistic type theory rather than classical set theory. The main consequence is that we have to state all the arithmetic axioms, such as the commutativity of +, using the extensional equality of the structure (defined by x=y ≤y x ≥ y), rather than Coq’s primitive intensional equality; we would need additional extensionality axioms to construct the quotient type in Coq. Also, we had to pick a form of the supremum axioms that implies the excluded middle ¬P 2 The axiomatization of the reals in the Coq standard library includes a ceiling function that maps the reals to the inductive integers, and whose consistency with the impredicative type theory of Coq v7 is questionable. Our construction relies only on an excluded middle assumption, whose consistency has been well studied 5 to make use of the planarity asssumption, as every cycle in the graph is mapped to a Jordan curve by the embedding. However, this approach results in proofs that, while convincing for humans, are difficult to formalize because they contain an informal mix of combinatorics and topology. We therefore decided to use a different kind of combinatorial structure, known as hypermaps[15,31,32,16], to carry out the combinatorial core of the proof (hypermaps will be described in detail in section 5.1). This design decision had several advantages: • We could formulate a self-contained combinatorial theorem, and prove it under weaker logical assumptions (using only intuitionistic logic). • The use of Analysis was circumscribed to the discretization of the problem, and even this could be carried out directly by elementary methods, without appealing to the Jordan Curve Theorem (in fact, the discretization does not follow from that Theorem, so it is somewhat off topic). The second point is important because the Jordan Curve Theorem is a rather difficult result3 , whose known proofs rely heavily on either complex analysis and/or homology. The approach we have outlined above allows us to avoid the Jordan Curve Theorem, although we do use a combinatorial analog (see Section 5.1). 3 A historical overview of the proof Aside from the hypermap formulation and the preliminary discretization lemma, our proof follows very closely the 1994 proof by Robertson, Sanders, Seymour and Thomas [26,27]; in turn, that proof follows the same outline as the 1976 proof by Appel and Haken [4,5], whose general principles go back to the original (incorrect) 1879 proof by Kempe[22], with important contributions by Birkhoff[10], Heesch[18], and many others (see [36] for a full history). In presenting the general ides of the proof, we can only pay tribute to the ingenuity of our predecessors; our actual work was to find a systematic way of getting the details right – all of them. Our (combinatorial) hypermap colouring theorem can be interpreted as a special case of the general Four Colour Theorem map colouring theorem, so we shall continue to use the map colouring formulation here, even though most of the works we reference use the graph formulation introduced in Section 2. We shall make the combinatorial interpretation evident by focusing on a particular class of maps: Definition: A polygonal outline is the pairwise disjoint union of a finite number of open line segments, called edges, with a set of nodes that contains the endpoints of the edges. The regions of a finite polygonal planar map, called faces, are the connected components of the complement of a polygonal outline. 3 The machine formalisation of the Jordan Curve Theorem was the subject of several projects; T. Hales has just announced that he had completed the proof using HOL Light. There seemed to be little point in trying to replicate these works. 6 Finite polygonal maps correspond quite closely to hypermaps, and indeed the construction use to prove our discretization lemma can also be used to show that the restriction to finite polygonal maps entails no loss of generality. On the contrary, the definition of polygonal maps is slightly too broad. It allows isolated nodes, disconnected outlines, and extraneous edges that are bordered by the same face on either side, called bridges. The discretization construction avoids such degeneracies; it generates polygonal maps that are bridgeless and connected. These properties play an important rôle in the combinatorial proof; indeed, the exact statement of our combinatorial theorem is Theorem four_color_hypermap : (g : hypermap) (planar_bridgeless g) -> (four_colorable g). The faces of a finite bridgeless connected polygonal map are the interior of simple polygons (and are thus bounded), except one, which is the exterior of a simple polygon (and is thus unbounded). The edges and nodes of such a map are the sides and vertices of these polygons, respectively. Thus the map is the planar projection of a polyhedron, hence the following Definition: A polyhedral map is a finite bridgeless connected polygonal map. The number N of nodes, F of faces, and E of edges of a polyhedral map are related by the well-known Euler formula N + F – E = 2 This formula characterizes planarity. We use it to define planar hypermaps; in the discretization construction, it is established by counting points, lines and squares on a rectangular grid. In the rest of this section we give an informal outline of the combinatorial proof, transposed to polyhedral map. To avoid spurrious computational geometry considerations, we will allow edges to be broken (as opposed to straight) line segments; this is only apparent in the “digon” case in c) below. The basic proof outline of the Four Colour Theorem, oddly, has not changed since the initial, incorrect proof attempt[22] by Kempe in 1879. Kempe’s proof went as follows: a) It is enough to consider only cubic maps, where exactly three edges meet at each node: just cover each node with a new polygonal face; if the resulting cubic map can be coloured, then simply deleting the faces that were added yields a colouring of the original map. b) In a cubic map, we have 3N = 2E edge endpoints, so the Euler formula can be rewritten 2E = 6F – 12. Since 2E is also the total number of sides of all faces, this states that every face has on average almost six sides, but that globally bridge bridge isolated node 7 there are 12 missing sides (this is why it takes 12 pentagons, along with the hexagons, to stitch a football together). c) Consider a minimal cubic counter example to the Four Colour Theorem: assuming there are polyhedral maps that require at least five colours, pick a cubic one with the smallest number of faces. d) By b) this minimal counter example map must have a face with five sides or less, so the neighborhood around that face must look like one of these If we erase one or two well chosen sides of the central face, we get a smaller cubic polyhedral map, which must therefore be four colourable. (For the square and pentagon, we erase two sides x and y, chosen so that the faces on the other side of x and y, respectively, are neither equal nor adjacent.) e) If the central face is not a pentagon, we can immediately find a colour for it (recalling that for a square, we erase two opposite sides). f) If the central face is a pentagon, we may need to modify the map colouring so that a colour is free for the pentagon. Roughly,4 this is done by locally interchanging two colours in any group of faces that is surrounded by the other two colours. Such two-toned groups (called “Kempe chains”) cannot cross each other, since the map is planar. Kempe enumerated all the possible ways in which chains could connect the faces in the ring surrounding the central pentagon to show that recolouring was always possible. The error that took ten years to spot and almost a century to fix occurred in the last step — some cases were missing in the enumeration (Kempe failed to note that interchanging colours in one chain may scramble other chains). The correct proof follows the same principle, but considers larger map fragments, called configurations. A configuration consists of a connected group of (whole) faces, called its kernel, surrounded by a ring of partial faces. The first major steps towards a correct proof were made by Birkhoff in 1913[10]. He showed that g) For some reducible configurations, such as the one below whose kernel is a group of four pentagons (known as the Birkhoff diamond), the Kempe argument f) that failed for single pentagons could be carried out soundly. 4 See section 5.2 for a precise description of the version of this operation that is used in the formal proof. 8 h) The Kempe argument is sound for any configuration surrounded by a ring of at most five faces—except the one containing a single pentagon. Let us say that a ring is trivial when its interior is either empty or consists of a single pentagon; then it follows from the above that a minimal counter example can contain no non-trivial rings of length 5 or less: it must be internally 5- connected. We refer to this result as the “Birkhoff lemma” in the sequel. i) As a consequence of h), in a minimal counter example, every face is surrounded by two concentric simple rings, i.e., its second neighborhood must have a shape similar to This excludes, for instance, maps where two non-consecutive neighbors of the central face are adjacent, as in the figure below, because such maps contain a configuration with a three-face ring (indicated by the heavy mixed dash). We refer to this result as the “Birkhoff theorem” in the sequel. These results suggest the following strategy for proving the Four Colour Theorem: Find an explicit list R of configurations such that one can establish I. reducibility: every configuration in R is reducible, using some variant of g). II. unavoidability: every minimal counter example must contain at least one of the configurations in R, using b), e), h), and i). The Four Colour Theorem then follows by contradiction, since no reducible configuration should occur in a minimal counter example. 9 Although much of the work that followed focused on finding a suitable R, for us it was a given: we just used the list of 633 configurations devised by Robertson et al[26]. The methods used in g), with some generalizations and minor improvements, are sufficient for part I, reducibility. It was only in 1969, however, that a systematic method for part II, unavoidability, was published by Heesch5 [18]. The method, called discharging, can be outlined as follows 1) Find a formula for computing the “average” arity (number of sides) of the polygons in the second neighborhood of a face, such that the sum of these averages over all faces is equal to 2E, the total number of sides in the graph. 2) Show that in a minimal counter example, every second neighborhood whose average arity, computed using the formula of 1), is strictly less than 6, contains one of the reducible configurations in R. Unavoidability follows from 2), since by b) and 1) any cubic map must contain some second neighborhood whose average arity is less than 6. By i) step 2) can be carried out by enumerating the possible arities of the faces in a second neighborhood. The average arity constraint usually provides an upper bound for this enumeration, and this bound is quite tight, since by e) all arities are at least 5. Thus, the total size of the enumeration is relatively small (in the 10,000 range); furthermore, the neighborhoods produced by the enumeration will have a high density of pentagons and hexagons, and thus stand a good chance of being reducible. Heesch also gave a method for constructing suitable averaging formulas for step 1). To insure that the sum-of-averages constraint in 1) is always met, the averaging process is described in terms explicit transfers of (fractions of) arity between neighboring faces: the average arity δ(a) of the neighborhood a face a is computed as δ(a) = δ(a) + ∑b Tba – ∑b Tab where δ(a) is the arity of a, and Tba is the fraction of arity transferred to a from one of its neighbors b. Heesch defined the value of Tba with a set of explicit patterns, called discharge rules. Each rule specifies that a fraction of arity should be tranferred from b to a if their common neighborhoood fits its pattern, and Tba is the sum of all these fractions. For instance (following Robertson et al.) we use the following rule This transfers to a pentagon one fifth of a side from any adjacent face that has more than five sides. Therefore, the neighborhood of any pentagon that is not adjacent to another pentagon will have an average arity of exactly 6. Robertson et al. showed [26] that with a handful of similar rules, this property can be extended to all pentagons and hexagons: unless they are part of a reducible clump of pentagons and hexagons, their average arity is exactly 6. In other terms, the “missing side” of each pentagon is exactly borrowed from the heptagons, octagons, and polygons of higher arity in its second neighborhood. Since each rule transfers only a fraction of a side, the second neighborhood of, say, an octagon with an average arity of less than 6 must contain 5 Heesch had been developing and using the methods since the 1940s, however. 1 a /5 b 10 many pentagons, which increases the likelyhood that it will contain a reducible configuration. Unlike a weighted arithmetic average, the average arity obtained by discharging might not depend on the arities of all the faces in the neighborhood, so it may not be possible to enumerate the arities of all the faces in a neighborhood with an average arity less than 6. This is not a problem in practice, since the enumeration can be stopped as soon as a reducible configuration occurs, and the set of discharge rules can be adjusted so that this is always the case. Discharging also gives bounds on the size of neighborhoods that need to be enumerated. The rules proposed by Robertson et al. never transfer more than half a side between two faces, so there is no need to enumerate the neighborhoods of dodecagons and larger polygons, since their average arity is never less than 6. Although Heesch had correctly devised the plan of the proof of the Four Colour Theorem, he was unable to actually carry it out because he missed a crucial element: computing power. The discharge rules he tried gave him a set R containing configurations with a ring of size 18, for which checking reducibility was beyond the reach of computers at the time. However, there was hope, since both the set of discharge rules and the set R could be adjusted arbitrarily in order to make every step succeed. Appel and Haken cracked the problem in 1976 by focusing their efforts on adjusting the discharge rules rather than extending R, using a heuristic due to Heesch for predicting whether a configuration would be reducible (with 90% accuracy), without performing a full check. By trial and error they arrived at a set R, containing only configurations of ring size at most 14, for which they barely had enough computing resources to do the reducibility computation. Indeed the discharging formula had to be complicated in many ways in order to work around computational limits. In particular the formula had to transfer arity between non-adjacent faces, and to accommodate this extension unavoidability had to be checked manually. It was only with the 1994 proof by Robertson et al. that the simple discharging formula that Heesch had sought was found. We took the work of Robertson et al. as our starting point, reusing their optimized catalog of 633 reducible configurations, their cleverly crafted set of 32 discharge rules, and their branch-and-bound enumeration of second neighborhoods[27]. However, we had to devise new algorithms and data structures for performing reducibility checks on configurations and for matching them in second neighborhoods, as the C integer programming coding tricks they used could not be efficiently replicated in the context of a theorem prover, which only allows pure, side effect free data structures (e.g., no arrays). And performance was an issue: the version of the Coq system we used needed three days to check our proof, whereas Robertson et al. only needed a three hours… ten years ago! (Future releases of Coq should cut our time back to a few hours, however.) We compensated in part this performance lag by using more sophisticated algorithms, using multiway decision diagrams (MDDs) [1,12] for the reducibility computation, concrete construction programs for representing configurations, and tree walks over a circular zipper[21] to do configuration matching. This sophistication was in part 11 possible because it was backed by formal verification; we didn’t have to “dumb down” computations or recheck their outcome to facilitate reviewing the algorithms, as Robertson et al. did for their C programs [27]. Even with the added sophistication, the program verification part was the easiest, most straightforward part of this project. It turned out to be much more difficult to find effective ways of stating and proving “obvious” geometrical properties of planar maps. The approach that succeeded was to turn as many mathematical problems as possible into program verification problems. 4 Prover technology The scripts of this proof were developed and checked using the Coq v7.3.1 system [13], under the Proof General front end [7]. This combination of tools provided a very flexible basis for trying out different approaches to the problem. In particular, as the proof advanced, we were able to extend the Coq scripting language to directly support the more effective approach that emerged. These extensions enabled progress and ultimately completion of the project, but unfortunately tie down the scripts to the 7.3.1 version of Coq; the user-level syntax extension facilities of the more recent releases of the system, although easier to use, do not support our extensions (and finishing the proof was more of a priority than porting it). However, the most important feature of the Coq system for this proof was its logic, not its syntax. Specifically, the entire enterprise hinged on the fact that the Coq logical language includes a small but practical programming language, and that the computation of programs embedded in statements is an integral part of the Coq logic. In the Coq logic, the two statements P(2 + 2) and P(4) are not just equivalent, they are identical, meaning that the two statements can be used interchangeably at any point in a formal demonstation. As a consequence, one can use any proof of P(4) as a proof of P(2+2) — e.g., if P(x) is defined as 4 = x, the instance of the reflexivity of = that proves 4 = 4 also proves directly 4 = 2 + 2. This feature is called computational reflection. From the trivial 2+2 example above one may get the impression that computational reflection is neither sound nor useful: A. The correctness of demonstration using computational reflection, depends on the irks and quirks of an arcane programming language. B. If all the feature can do is prove 2 + 2 = 4, there’s not much point to it. Both impressions are utterly wrong. As regards impression A, correctness, the embedded programming language has been carefully trimmed of all the traps and trappings commonly found in general purpose programming languages, to ensure that every expression and statement has a definite, context-independent meaning. The list of features excluded by this principle is long: there are no side effects, since they would make the the outcome of a computation depend on its timing; there are no floating point or primitive integers, since their rounding/overflow behaviour is machine dependent; consequently there are no constant-time access arrays; there are no exceptions, virtual methods, unbounded loops, … In fact the most complex to implement feature of the language is the ability to call functions with arguments. Since definitions with parameters are the main building block for mathematical theories, and thus any pratical formal proof system must include support for replacing 12 parameters with values, admitting computational reflection requires no great leap of faith. On the contrary, the theory of pure functions with arguments, called the λcalculus, has been studied for over 75 years. The logical metatheory of Coq, which was established almost 20 years ago, is in fact directly based on the λ-calculus: in Coq, the consistency of the logic follows from the consistency of computations[13,2,8]. Of course, the fact that the embedded language is so spartan rather strengthens impression B: it seems difficult to write useful code in such a restricted language, despite the well-know fact that in theory, any computation can be described in the pure λ-calculus. We believe our work provides ample evidence that this is definitely not the case, and that on the contrary computational reflection is a very effective proof technique, both in the large and in the small: • In the large, we use computational reflection to implement major parts of the Four Colour Theorem proof, with programs of an unprecedented size and complexity. Most of the previous applications of computational reflection used only relatively modest implementations of formal rewrite systems [11]. • In the small, we use computational reflection to robustly automate many “small steps” of our proof, so we have reasonably short proofs without having to resort to the fickle domain-specific “decision procedures” traditionally used to handle these simple steps. The flagship application of computational reflection “in the large” occurs in the reducibility part of the proof of the Four Colour Theorem. We define a common prologue, in file cfreducible.v, that includes the following: Variable cf : config. Definition check_reducible : bool := … Definition cfreducible : Prop := … Lemma check_reducible_valid : check_reducible -> cfreducible. The variable cf is a parameter of the definitions and lemmas that follow. It stands for the explicit data representation of a configuration map, basically a string of letters and numbers. We define elsewhere a standard interpretation of that string as an explicit program for constructing a mathematical hypermap object (see section 5.3 for an explanation of that construction). The two definitions that follow are of a very different nature; the first is a program, while the second is a logical predicate: • check_reducible is a boolean expression that performs a complex combinatorial reducibility check, using a nonstandard interpretation of the map construction program cf, and the MDD computations mentioned in Section 3. • cfreducible is a mathematical proposition that asserts that the hypermap constructed by the standard interpretation of cf is indeed reducible. Using these definitions, we then prove the check_reducible_valid lemma, which asserts the partial correctness of the check_reducible program with respect to the mathematical specification cfreducible: if the check_reducible returns true, then cfreducible holds. Since cf is a parameter of this proof, the result applies for any value of cf. Using this, we can prove for example Lemma cfred232 : (cfreducible (Config 11 33 37 H 2 H 13 Y 5 H 10 H 1 H 1 Y 3 H 11 Y 4 H 9 H 1 Y 3 H 9 Y 6 Y 1 Y 1 Y 3 Y 1 Y Y 1 Y)). in just two logical steps, by applying check_reducible_is_valid to the concrete configuration above and the trivial proof of true = true, even though the configuration map represented by (Config 11 33 …) has a ring size of 14 and a longhand demonstration would need to go over 20 million cases. Of course the 13 complexity does not disappear altogether — Coq 7.3.1 needs an hour to check the validity of this “trivial” proof. The fact that evaluation is transparent in Coq’s logic has another important consequence for us: it allows us to use freely multiple levels of intermediate definitions to derive specific properties from general ones in a modular way. Because such definitions are automatically expanded and specialized by evaluation, we can directly use the lemmas associated with the generic properties on the derived properties, without having to manually unfold the property definitions or provide new specialized lemmas. For example, when in the scripts above we use the boolean value check_reducible as a logical statement, Coq silently6 interprets this as (is_true check_reducible) where (is_true b) is defined as b=true. — this is why we could directly use reflexivity in the proof of cfred232. We use this “trick” extensively in our proofs, because it allows us to handle a wide range of concepts using a relatively modest set of basic lemmas — we never needed to resort to the lemma search command of Coq. For example, we model paths with a boolean function recursively defined over sequences Variable e : (rel d). Fixpoint path [x : d; p : (seq d)] : bool := if p is (Adds y p') then (andb (e x y) (path y p')) else true. This allows us to directly use the standard lemmas for the boolean “and” function andb to reason about (path x (Adds y p)), (path x (Adds y (Adds z q))), etc. Using boolean functions in lieu of logical predicates has other benefits: we can use rewriting of boolean expressions to apply an equivalence deep inside a complex statement, e.g., to “push” the path predicate over a sequence concatenation, using Lemma path_cat : (x : d; p1, p2 : (seq d)) (path x (cat p1 p2)) = (andb (path x p1) (path (last x p1) p2)). Booleans also make it easy to reason by contradiction even in Coq’s intuitionistic logic. In particular, if a boolean assumption computes or rewrites to false, then the standard tactic Discriminate can be used to conclude the proof. This turned out to be so common that most of our proof tactics attempt to use Discriminate automatically, and this is so effective that almost all our case splits leave at most two nontrivial subgoals. Thus we can make the script layout reflect the structure of the proof just by indenting the script for the first subgoal. Because we employed mostly computable definitions we were able to rely mainly on a “generate-and-test” style of proof. Most of our proofs follow this routine: • Do a case split on relevant variables or expressions, based on their type • Compute out the proof goal, possibly along with some assumptions Quite often this was enough to complete the (sub)proof; most of remaining cases could be resolved just by nudging the computation with selective rewriting (e.g., using the path_cat lemma above). Explicit logical steps came only a distant third (7000 tactic calls, compared to 21,000 stepping/case splitting/rewriting tactic calls); about two thirds of them were backward inference (explicitly forcing a proof schema, spinning off new proof obligations), a third were forward inference (steps that explicitly add new facts). Although there are only 1500 of them, declarative forward 6 is_true is declared as a coercion from bool to sorts. 14 steps explicitly inserting a subgoal were very important because they helped structure the longer proof scripts. We found this approach to machine-assisted theorem proving quite effective, enough so that we made very little use of the (admittedly modest) proof automation facilities of the Coq system. This approach provides a very natural way of harnessing the computational resources of the proof assistant: we write and have Coq execute a set of simple, deterministic programs that compute with concrete (but partial) data. This compares rather favorably with trying to control the assistant with tactics, which are metaprograms manipulating terms, or with a set of nondeterministic rewrite rules. Of course this conclusion is relative to the problem at hand—there is not much symbolic algebra in the proof of the Four Colour Theorem, and even so there were many subproofs that would have benefited from better rewriting support. Despite these caveats, it remains that our simple-minded approach succeeded on a challenging problem. We should point out that the liberal use of intermediate definitions (over 1000) also greatly helped in keeping the development tractable: there are only 2500 lemmas in the entire proof, more than half of which are trivial rewrite equations for integers and list operations. Unfortunately combining definitions also makes much of the Coq automation ineffective: unless definitions are unfolded just at the right level, the primitive Apply, Auto, AutoRewrite and most of the more elaborate tactics will fail to match the goal; user guidance is almost always needed. Moreover it is unclear that more aggressive unfolding in automated tactics would help, since the size of the goal increases geometrically when several layers of definitions are unfolded. The size of the individual lemma proofs varies widely: over half of them are oneliners, 75% are under 5 lines, 90% under 10 lines, but at the other end 40 are longer than a screen page, up to almost 700 lines for the proof of correctness of the compilation of configuration reducibility (lemma cfctr_correct). We have such large proofs because we deliberately avoid the clutter of ad hoc lemmas that often plague formal developments; if an obscure result is only needed as an intermediate step of a higher-level lemma, we use forward reasoning to introduce it and prove at the point at which it is relevant. This often saves a significant amount of bookkeeping, since we do not need to explicitly establish and introduce the context in which the result holds. There is a tradeoff, however: having longer, more significant lemmas greatly helps bookkeeping at the global proof level, but can create bookkeeping problems for individual proofs. Indeed, the proof context (the set of variables, definitions, and assumptions that can be referenced) can easily become uncomfortably large even in a 10-line script if it is not managed. We found it difficult to do systematic context management, using only the standard set of Coq tactics. For example, it takes two separate commands to discharge an assumption H (moving H from the context to the goal G, so that following case, rewrite or reverse chaining commands will operate on H→G). Although this may appear to be a minor annoyance, this makes it very difficult to maintain consistency in a script that explicitly moves assumptions to and from the context—but that is largely what context management consists of. It also didn’t help that several useful variants of this operation required rather different, sometimes arcane sequences of commands. 15 This prompted us to develop our own tactic “shell”, so that we could make better progress with the project. Fortunately, the Coq v6-v7 systems included a rather powerful, if somewhat unwieldly, user syntax mechanism that was sufficient for our purposes (unfortunately, it has been replaced by an easier to use but less expressive notation facility in recent releases of the Coq system). Although we only put in features when they occurred repeatedly in our scripts, we ended up with a complete and rich language. We should point out that such extensions have no impact on the trustworthiness of the proof that was carried out, for two reasons: • There is a small, well-defined kernel of the system that checks the logical validity of proofs carried out with the Coq assistant. The bulk of the Coq code base, and in particular the entire tactic and input syntax subsystems can only prepare candidate proofs and submit them to the kernel for validation. For additional safety, Coq actually stores the proofs that have been approved, so that they can be independently rechecked. • Our extensions only involved modifying the tables of the Coq extensible parser. We did not add any ML code to the system, so our extensions cannot possibly interfere with the kernel code through side effects, languagedependent behaviour, or unsafe code. Our design follows the philosophy of a command shell: we offer a small number of commands, but each of them can perform a variety of related functions, according to its arguments; all commands have identical or at least very similar argument conventions. Nearly 85% of the commands in our scripts are one of the following five7 : • Move: t1 t2 t3 … => p1 p2 p3 …. This is the basic bookkeeping command, for moving assumptions and values from the context to the goal and conversely. It combines the functions of the primitive Coq commands Generalize and Intros, mixed with Clear and Pattern. The arguments t1 t2 t3 are moved to the goal; if ti is a value rather than a logical assumption, “moving ti to the goal” implies first generalizing ti, that is, replacing ti with a new variable xi. After the =>, p1 p2 p3 provide names for the assumptions or values that are moved from the goal to context; p1 p2 p3 can actually be Coq “intro patterns”, allowing tuples, lists conjunctions to be decomposed on the fly. The Move command has several other options for deleting additional assumptions, selecting the exact occurrences to generalize, and performing on-the-fly simplification. • Case: t1 t2 t3 … => [p11 | p12 | …] p2 p3 …. The Case command is a version of Move specialized for the analysis of data structures: the first pattern to the right of => must be a destructuring “intro pattern” [13]. The Case command provides additional facilities for dealing with Coq’s dependent datatypes, but is otherwise interchangeable with the Move command. • Apply: P… => [p11 | p12 | …] …. 7 We reuse the keywords of some of the primitive Coq commands, but ensure backward compatibility by always including a symbol (one of : / =>) in the list of arguments. 16 This is the basic reverse chaining command; it is similar to Coq’s primitive Apply, with two important differences: it is insensitive to intermediate definitions, and the proof method P can contain holes “?” that will be filled by matching the current proof goal. The other command arguments are similar to those of the Case command, except that the patterns p11 , p12 are used on the first, second, etc. subgoal, respectively. • Step p: t By … This is the forward chaining command, similar to Coq’s primitive Assert, except that it can be used to introduce several values (p can be a pattern), and the By introduces a tactic, not a term. The By can be omitted, in which case the command generates a subgoal for the justification of t (only 25% of the Step commands have a By in the four colour proof script). • Rewrite: t1 t2 t3... This command is the main workhorse in our scripts; it is called nearly 10000 times, almost as many times as Move and Case combined. It merges the functionality of most of Coq’s rewriting/computing commands (Rewrite, Unfold, Fold, and Simpl) with a coherent, unified notation, also adding important functionality such as selection of the rewrite occurrences. Most importantly, however, it applies a sequence of rewrite rules rather than a single rule; this feature alone about halves the size of our scripts! These commands and a handful of related ones (Elim, Clear, Injection, Congr, and Def) are defined and extensively documented in the prelude file tacticext.v that is loaded by every file in the development. The Move, Case and Apply commands also have a “view” feature that provides a tie-in for the reflect inductive predicate defined in the boolean prelude file boolprop.v. Briefly, (reflect P b) states that P and b are equivalent (P↔b=true). We prove such properties for all boolean predicates that have a useful logical counterpart; for a predicate foob, the corresponding lemma is fooP. For example in boolprop.v we prove8 Lemma andP : (reflect b1 /\ b2 (andb b1 b2)). Then we can write Move/andP=> [Hb1 Hb2] to decompose an assumption of the form (andb b1 b2) as if it were b1 /\ b2, that is, introducing assumptions Hb1: b1 and Hb2 : b2. Conversely, we can use the sequence Apply/andP; Split to split a proof of (andb b1 b2) in two subproofs of b1 and b2, respectively. Since we adopted a “command shell” design, our scripts do not read as mathematical text. However, given our proof style based on computation and expansion, we don’t think they possibly could, as we tend to get many large, unwieldly subgoals. The important thing for us was that this language allowed us to tackle effectively complex proofs, even when we didn’t have a detailed text proof to guide us. We can present this piece of anecdotic evidence as to the efficiency of our command language: it turned out we needed a variant of Streicher’s axiom K [19], which states that there is only one proof of x = x, for our dataSet structures. Axiom K is derivable in this case, because equality is decidable by definition on dataSet; in fact this derivation is in the Coq standard library (in Logic/Eqdep_dec.v). We tried rolling out our own proof, and came up with a four-line, 50-word proof script; the standard library proof has 8 The actual lemma is called andPx; andP is the corresponding syntactic form with holes “?” for b1 and b2. 17 over 100 non-comment source lines, and 500 words, so we can claim an order-ofmagnitude productivity improvement on this particular example. 5 The computer-checked proof The exercise of formalizing a famous theorem can easily degenerate in the following (somewhat depressing) routine: translate a nice, crisp, mathematical text into computer lingo, at the rate of xx days per page, balk at the long list of trivial mathematical prerequisites that ought to be known to the proof system (but aren’t), write a large ad hoc collection of lemmas to fill the deficiencies. However, our experience with the Four Colour Theorem wasn’t like this. Because we were faced with rather different implementation and performance constraints, we had to come up with our own programs for checking reducibility and unavoidability (although we did reuse as much of the Robertson et al. proof as we could). Because most graph theory proofs rely on the visual analysis faculties of the reader, and a proof assistant like Coq is fully devoid of such faculties, we mostly had to come up with our own proofs (sometimes using visual intuition as guidance, but not always). In addition, as we noted in section 2, we were working with a different, more combinatorial notion of map; while this notion was generally more cumbersome than the more familiar topological one, on several occasions its additional precision provided a clean argument where the “intuitive” situation seemed muddled at best. In particular we found we could do without the notion of “ring completion” and the “omitted straightforward but lengthy proof” of the “folklore theorem (3.3)” in the Robertson et al. paper[26]. Indeed, the biggest payoff was that we could weaken the definition of “appears”, which involves graph isomorphism, into one that involved only morphism, thereby simplifying considerably the configuration occurrence check and its correctness proof. The remainder of this section gives an account of the mathematics and the algorithms involved in the actual computer proof: the theory of planar hypermaps, the reducibility computation, the construction programs for configuration maps and their interpretations, the Birkhoff lemma and the embedding of configurations in a minimal counter example, the enumeration of second neighborhoods, and the discretization of the continuous Four Colour problem to the hypermap one. 5.1 Hypermaps Having ruled out topological definitions of planar maps, we needed a combinatorial replacement for them in order to state (and hopefully prove) the Four Colour Theorem. That structure would have to a) explicitly represent all local geometrical connections b) support a clearly recognizable definition of “planar” c) be easy to manipulate in Coq This combination of requirements ruled out all the structures that had been used in the previous formal developments of graph theory, e.g., simplicial representations, which have separate sorts for edges, faces, and nodes are too complex to meet c), and graphs 18 verifying the Kuratowski minor exclusion [23] fail both a) and c). Rather than combing the literature in search of an appropriate definition, we tried to roll out our own from first principles, and ended up rediscovering hypermaps, which are wellknown combinatorial structures [15,31,32,33,29,16]. We reasoned as follows: 1. Since an essential part of our approach was to define properties by computable programs, we wanted a representation on which graph traversal would be easy to program. A natural starting point was therefore the basic data structure for directed graphs, an array G of lists of integers, where G[i] lists the immediate neighbors of node i (nodes are represented by integer indices). 2. Each cell in these lists corresponds to an edge in the directed graph. Additional edge attributes, such as labels, are often added to the cells; in particular, for symmetric graphs, it is common to add reciprocal pointers between the two cells that correspond to the same undirected edge in the graph. More precisely, when adding an edge e = {i,j} to a symmetric graph, that is, adding j to G[i] and i to G[j], means inserting a cell cj in G[i] and a cell ci in G[j], and putting extra pointers cj → ci and ci → cj in ci and cj, respectively. In this way, the edge e is represented by the linked pair ci ↔ cj: each of the cells ci, cj represents a “halfedge”, called a dart, in the hypermap terminology (some authors use the term flag). 3. The structure can encode geometrical information at no extra cost, simply by taking the convention that G[i] list the darts of node i in geometrical order (say, counterclockwise). Since this is a cyclic order, it is then natural to make the cell lists circular, by having the last cell point to the first. This simplifies considerably geometrical traversals: to go from any dart to the next dart on the same face, clockwise, simply follow, successively, the “edge reciprocal” and “node list” pointers. Moreover, each non-isolated node of the graph is now represented by its circular list, just as the edges are represented by the reciprocal dart pairs. This allows us to further simplify the structure by removing the G array and the integer indices altogether, as the “edge” pointer of a cell cj always points into the same list as G[j]. We assume that we also have some way of iterating over all cells. 4. The resulting structure is easy to represent in Coq: there is only one type of objects, d (for darts), whose only properties is that it is has an equality test and is finite. Pointer attributes like the “edge” and “node” pointers above are just 0 1 3 2 3 3 1 0 0 0 0 1 3 2 3 3 1 0 0 0 1 2 3 19 functions e, n : d → d, and in general any additional attribute (e.g., “colour”) can be represented by a function on d. Here n and e must be, respectively, a permutation and an involution on d. Weakening the requirement for e to just being a permutation, that is, allowing “edges” to have more than two sides, yields the traditional definition of hypermaps: a pair of permutations of a finite set. Hypermaps have been extensively used in the study of graph enumeration [16]. 5. However, even this simplified definition turned out to be impractical in Coq, because “being a permutation” property is moderately difficult to prove and use: both the “there exists an inverse” and the “injective” characterizations require deductive steps to use; in our approach, this is five to ten times more expensive than an equational characterization, because a single Rewrite: command can perform five to ten rewrite steps. For this reason, we extended the structure with a third dart attribute, a pointer to the next counterclockwise dart on the same face, represented by a function f : d → d. The navigation trick pointed out in 3. above turns into an equation e ◦ n ◦ f = Id Because d is finite, this identity characterizes hypermaps; it implies that e, n, and f are all permutations, and that we also have n ◦ f ◦ e = Id and f ◦ e ◦ n = Id. As Tutte pointed out [31], this representation is circularly symmetrical with respect to edges, nodes, and faces. 6. Although their name suggests drawing “darts” as arrows on the original map, this leads to horrendously confused figures when one adds arrows for the e, n, and f functions. In the figures below, we therefore always depict darts as points (we use small hexagonal bullets). When we draw an ordinary map together with the corresponding hypermap, we arrange the darts in circles around the corresponding nodes, halfway between adjacent edges. This way, each node appears at the center of the corresponding n cycle, each f cycle is inset in the corresponding face, and each e cycle is the diagonal of an n-f quadrilateral centered on the corresponding edge, as in the figure below. However, we rarely find it necessary to draw such complex figures; most of the time, we only need to consider the diagram around a single dart, which has the following fixed, regular shape e n f dart node edge map 20 The hypermaps that correspont to plain maps have the property that all e cycles have length 2; this implies the additional identities e = e -1 = n ◦ f. We call such hypermaps plain hypermaps. Although the central part of the proof of the Four Colour Theorem uses only plain hypermaps, most of the basic results on hypermaps are easier to establish for general hypermaps: many of the constructions described in this section create general hypermaps. Although we primarily selected the “one domain, three functions, one equation” definition of hypermaps for its simplicity, its threefold symmetry allows significant reuse of definitions and lemmas. For example, the Euler formula takes a completely symmetrical form for this presentation of hypermaps #e + #n + #f = #d + 2#(e∪n∪f) where #e, #n and #f denote the number of distinct cycles of e, n, and f, respectively, #d the total number of darts, and #(e∪n∪f) denotes the number of connected components of the relation obtained by taking the union of the graphs of all three functions. Because of this symmetry, the graph/map duality is completely trivial in our setting, so there is no point for us in switching from map colouring to graph colouring, as is traditionally done; all the lemmas in our development are therefore phrased in terms of map colouring. We actually chose to define planar hypermaps as those that satisfy this generalized Euler formula, since this property is readily computable. Much of the Four Colour Theorem proof can be carried out using only this formula. In particular Benjamin Werner, who worked with us on the early part of the proof, found out that the proof of correctness of the reducibility part (step I) was most naturally carried out using the Euler formula only. As the other unavoidability part of the proof (steps II) is explicitly based on the Euler formula, one could be misled into thinking that the whole theorem is a direct consequence of the Euler formula. This is not the case, however, because unavoidability also depends on the Birkhoff theorem giving the shape of second neighborhoods. Part of the proof of the latter requires cutting out the submap inside an arbitrary simple ring of 2 to 5 faces in the (hypothesized) counterexample map. Identifying the inside of a ring is exactly what the Jordan Curve Theorem is about, so this calls for a combinatorial analog of that famous theorem. As a side benefit, we will get additional assurance that our definition of planarity is correct by proving that the hypermap Jordan property is actually equivalent to the hypermap Euler formula (about half of that reciprocal proof is a lemma required for the correctness of the reducibility check). Unfortunately, the naïve transposition of the Jordan Curve Theorem from the continuous plane to discrete maps fails. Simply removing a ring from a hypermap, even a connected one, can leave behind any number of components: both the “inside” and the “outside” may turn out to be empty or disconnected. A possible solution, proposed by Stahl [29], is to consider paths (called chords below) that go from one face of the ring to another (loops are allowed). The Jordan Curve Theorem then tells e n f 21 us that such paths cannot start from the “inner half” of a ring face, and end at the “outer half” of a ring face,9 i.e., there are no “Moebius rings” such as the one below. Given careful definitions of the “inner” and “outer” halves of ring faces, this can be made to work (indeed, we have done this in file rjordan.v, as an addendum to the proof), but the resulting definitions are too unwieldly and complex to be practical for machine-checked inductive proofs. Fortunately, a much simpler and slightly stronger version of the Jordan property can be defined directly on the hypermap structure, by considering contours of the hypermap three-function graph, rather than rings in an underlying map. The simplification comes from the fixed local structure of hypermaps, which allows “inner” and “outer” to be defined locally, provided we restrict ourselves to a certain pattern of traversal. Specifically, we exclude one of the three functions (the e permutation), and pick fixed, opposite directions of travel on the other two: from a dart x we can go to either n -1(x) or f(x). We define contour paths as dart paths that only take steps in this n -1 ∪f relation. A contour cycle follows the inside border of a face ring, clockwise, listing explicitly all the darts in this border. Note that n or n -1 steps from a contour cycle always go inside the contour, while f or f -1 steps always go outside. Therefore the Jordan property for hypermap contours is: “no contour path that starts and ends on a duplicate-free contour cycle, without otherwise intersecting it, can start with an n -1 step and end with an f step”, or, more symmetrically, “any such path must start and end with the same type of step”. For example, the figure below is forbidden. This definition is much simpler than the ring one, not only because it avoids the delicate definition of the inner and outer “halves” of a face cycle, but also because 9 More precisely, this follows from the Jordan separation theorem; the nonseparation part of the theorem, which implies that each edge borders at most two regions, is implicitly used to turn the problem of colouring the continuous plane into one of colouring a discrete data structure. contour path 22 “nonintersecting” (resp., “duplicate-free”) have their literal meanings here: not containing the same dart (resp., twice the same dart), whereas for rings we need to also exclude darts in the same f cycle. This also implies that the contour-based property is slightly stronger than the ring property, since it applies to contour cycles that are not derived from simple rings (different darts from the same f cycle can occur in two different sections of a contour cycle). Such contours actually occur in the Four Colour Theorem proof, but only around explicitly matched configuration maps, for which we do not need the Jordan property. Although it is much simpler than the ring property, we found that the definition above was still too complex to carry out inductive proofs in Coq; it still involves three different values (the path, its starting point, and the cycle), along with five to six separate properties. However, we observed that by splicing together the cycle and the path, we could form an equivalent statement that only involves a single value and three properties. We therefore used the following: Theorem (the Jordan Curve Theorem for hypermaps): A hypermap is planar iff it has no duplicate-free “Moebius contours” of the form The x ≠ y condition rules out contour cycles; note however that we do allow y = n(x). As far as we know this is a new combinatorial definition of planarity. Perhaps it has escaped attention because a crucial detail, reversing one of the permutations, is obscured for plain maps (where e -1 = e), or when considering only the cycles of that the permutation. Since, as we show in the Coq development, this Jordan property is equivalent to the Euler identity, it is symmetrical with respect to the choice of the two permutations that define “contours”, despite appearances (we use this in the reducibility proof). Oddly enough, we know no simple direct proof of this fact (the best we can do is to derive it from the equivalence between the “ring” and “contour” versions of the Jordan property, and which is hardly “simple”). We show that our Jordan property is equivalent to the Euler identity by induction on the number of darts. At each induction step we remove some dart z from the hypermap structure. In doing so, we must take care to redefine the permutations so that they avoid z. For two of them we can do this in the obvious way by suppressing z from the cycle in which it occurs: for example, we can define permutations n’ and f’ on the smaller hypermap by n’(n -1(z)) = n(z), f’(f -1(z)) = f(z), and n’(x) = n(x) and f’(x)=f(x) otherwise. For the third permutation, however, the triangular identity of hypermaps leaves us no choice, and we have to either merge two cycles, or split a cycle of that permutation. For example, if that third permutation is e, we do the following transformation on the portion of hypermap surrounding z: e n f z Walkupe x y nx ny n (n-1 ∪ f)* x ≠ y nx ≠ ny 23 Following Stahl [29], we call this operation the Walkup transformation [34]. More precisely, the figure above illustrates the Walkupe transformation; by symmetry, we also have Walkupn and Walkupf transformations. In general, the three transformations yield different hypermaps, and all three prove to be useful. However, in the degenerate case where z is a fixpoint of any one of the three permutations, then all three transformations give the same result (z has at most 3 neighbors to connect), e.g., if e(z) = z, then either Of course we only need to define and study one of the Walkup transformations, as we then get the others (and their properties) for free by symmetry. We chose the Walkupe transformation. Apart from for the removal of z, the Walkupe transformation leaves the cycles of n and f unchanged; however, except in the degenerate cases above, it has a nontrivial effect on the cycles of e: if z and n(z) are on different e cycles, the Walkupe transformation merges the two cycles; if z and n(z) are on the same e cycle, then the Walkupe transformation splits this cycle. The degenerate and merge forms of the Walkup transformation clearly leave the validity of the hypermap Euler equation #e + #n + #f = #d + 2#(e∪n∪f) z Walkupe (split) nz ez nz ez z Walkupe (merge) nz ez nz ez z z Walkup or Walkup nothing Walkupn z Walkupf 24 unchanged (both sides decrease by 1 or 3). The split form only preserves the Euler equation if it disconnects the hypermap; otherwise, it increases the difference between the left and right hand sides by 2. Since repeatedly applying the transformation eventually yields the empty hypermap, for which the Euler equation is trivially valid, we immediately see that all hypermaps satisfy the inequality #e + #n + #f ≥ #d + 2#(e∪n∪f) Planar hypermaps are thus those that minimize the left hand side, so applying any of the Walkup transformations to a planar map always yields a planar map; in the split case, the map is always disconnected (as is obvious from the figure above). Thus, to prove the implication Euler→Jordan we are free to apply any sequence of Walkup transformations to reduce a planar hypermap containing a Moebius contour to the map below, for which the Euler equality obviously fails. We use Walkupe to eliminate darts outside the contour, as this leaves the n -1 and f steps of the contour unchanged, and Walkupf and Walkupn to contract n -1 and f steps on the contour, respectively. In the Jordan → Euler direction, we use only Walkupe transformations, as they leave the contour steps mostly unchanged. We carefully select the removed dart to make sure that we are not in the split case: we use the Jordan property to show that any e cycle C that is closed under n (n(z)∈C for all z∈C) contains a fixpoint of either n or f. We named the latter statement the Euler tree lemma, because it is the hypermap analog of the following: if a planar connected graph has only one face, then it is a tree; this property is used implicitly in the standard “flooding” proof of the Euler formula. We also use all three transformations in the main body of the Four Colour Theorem proof. Since at this point we are restricting ourselves to plain maps (all e cycles have length 2), we always perform two Walkup transformations in succession; the first one always has the merge form, the second one is always degenerate, and always yields a plain map. Each variant of this double Walkup transformation has a different geometric interpretation, and is used in a different part of the proof: • The double Walkupf transformation erases an edge in the map, merging the two adjoining faces. It is used in the main induction of the Four Colour Theorem proof, to replace an identified reducible configuration with a smaller submap. • The double Walkupe transformation concatenates two successive edges in the map; we only apply it at nodes that have only two incident edges, to remove edge subdivisions left over after erasing edges (the node degree condition ensures that the resulting map is plain). • The double Walkupn transformation contracts an edge in the map, merging its two endpoints together. It is used to prove the correctness of the reducibility check, by induction on the size of the remainder of the graph. We use the upper cited Euler tree lemma of the Jordan → Euler proof to find an appropriate dart at which to apply the transformation, so as to avoid the split case of the Walkup transformation. 25 Contours play a central part in our formal development of the Four Colour Theorem proof, because they are the basis for a precise definition of the patch operation, which pastes two maps along a border ring to generate a larger map. This operation is the cornerstone of the proof, since it defines the three-way relation between a configuration submap, whether explicitly identified or computed using the Jordan property, the map in which it occurs, and the remainder of that map. Mathematical articles and books rarely define this operation precisely, probably on the account that it is “intuitively obvious”. We have found, however, that getting the details of this operation exactly right was critical to the successful proof of several key lemmas. Ultimately, this attention to detail paid off handsomely, as it allowed us to remove injectivity checks in the matching configuration, and to omit the analysis of the structure of the second neighborhood altogether. A key observation is that, despite appearances, the patch operation is not symmetrical in its arguments. It does paste two hypermaps along a ring, but in a slightly asymmetrical manner, and requires different assumptions the two maps: • For one of the submaps, which we shall call the disk map, the ring is an e cycle, that is, a hyperedge. This cycle must be simple, i.e., no two darts on it can belong to the same face. • The other submap, the remainder map, the ring is an arbitrary n cycle. The gluing operation itself consists in merging pairwise the darts on the e cycle of the disk map with those on the n cycle of the remainder map, reversed (the two cycles must have the same length). On the merged cycle, the e function is defined as in the remainder map and the n function is defined as in the disk map. The definition of the f function is then adjusted as is to satisfy the triangular identity, as illustrated below. Let us point out that although the darts on the border rings were linked by the e and n permutations in the disk and remainder map, respectively, they are not directly connected in the full map. Indeed, they need not even form a simple ring in the full map. However, because the e cycle is simple in the disk map, it is a subcycle of a contour that delineates the entire disk map (indicated by the mixed dash in the figure above). This contour is preserved by the construction, which, because of this, is reversible: the disk map can be extracted, using the Jordan property, from this contour. This allows us to divide the proof load by two, using equational reasoning to show that geometrical properties in the full map are equivalent to the conjunction of similar properties in the disk and remainder maps: • The full map is planar (resp., connected) iff both submaps are. patch disk remainder contour cycle full map 26 • The full map is plain iff the remainder map is plain, and the disk map is plain except for the border ring. • The full map is cubic (all n cycles of length 3) iff the disk map is cubic, and the remainder map is cubic except for the border ring. • If the full map is bridgeless both submaps are; the converse only holds if the disk map ring is chordless (the only e links between the f cycles of the darts in the e cycle ring are those of the cycle itself). • The full map is four colourable iff both submaps are four colourable with colourings that match on the rings (with a reversal). All the theory exposed so far applies to general hypermaps. However, for the rest of the proof of the Four Colour Theorem we need to specialize to plain (all e cycles of length 2) and cubic (all n cycles of length 3) maps: these properties are required both for the enumeration strategy for the unavoidability part, as well as for the computation of reducibility checks. To do this specialization we show that the task of colouring an arbitrary hypermap can be solved by colouring the hypermap obtained by covering every node and hyperedge with a new face. As depicted in the figure below, the construction of this new map is completely regular and straightforward; the new map has six times as many darts as the original one. 5.2 Reducibility The bulk of the reducibility computation for a given configuration consists in iterating a formalized version of the Kempe chain argument, in order to compute a lower bound for the set of colourings that can be “fitted” to match a colouring of the configuration border, using colour swaps. The actual check consists in verifying that this lower bound contains all the colourings of the map obtained by erasing 1 to 4 specific edges in the complete configuration map; this smaller map is called the e f n f e n original hypermap expansion of central dart expansion of other darts 27 contract map of the configuration. This final check does not require a significant amount of computation (in many cases, the lower bound yields the set of all possible colourings, whence the check is trivial). Although the Kempe chain argument is usually exposed for the dual graph colouring problem, we have found that the form that is actually used for the proof of the Four Colour Theorem is easier to justify for the map colouring problem. The argument uses an alternative formulation of the colouring problem, proposed by Tait[30] in 1880 (one year after Kempe’s original proof, along with a variant of Kempe’s proof that turned out to be equally wrong). Tait suggested replacing the face colouring problem with an edge colouring problem. Suppose we use the integers 0, 1, 2, and 3 to “colour” the map faces; given such a colouring k, we can compute another colouring, this time on the edges of the map: we colour an edge x that separates two faces a and b with the bitwise sum (aka. exclusive or) k(a) ⊕ k(b) of the colours of a and b. This edge colouring has two obvious properties: 1) Since a and b are adjacent we have k(a) ≠ k(b), so k(x) takes only the values 1, 2, or 3 2) The bitwise sum of k(y) for all edges y that are incident to any given node p is 0: expanding the definition of k(y), we see that the colour of every face incident to p occurs twice in the summation. Conversely, any edge colouring that satisfies 1) and 2) can be turned into a face colouring, by choosing an arbitrary colour for one face a0 in each connected component of the map, for each edge x bounding a0, colouring the face across x with k(a0) ⊕ k(x), and so on (the formal proof of correctness involves the Jordan property). Tait further observed that for cubic maps, 2) is equivalent to the much simpler 3) For any node p the colours of the three edges incident to p are a permutation of {1,2,3}. He concluded that four colouring the faces of a