Even More Category Theory: The Elementary Topos

In More Category Theory: The Grothendieck Topos, we defined the Grothendieck topos as something like a generalization of the concept of sheaves on a topological space. In this post we generalize it even further into a concept so far-reaching it can even be used as a foundation for mathematics.

I. Definition of the Elementary Topos

We start by discussing the generalization of the universal constructions we defined in More Category Theory: The Grothendieck Topos, called limits and colimits.

Given categories \mathbf{J} and \mathbf{C}, we refer to a functor F: \mathbf{J}\rightarrow \mathbf{C} as a diagram in \mathbf{C} of type \mathbf{J}, and we refer to \mathbf{J} as an indexing category. We write the functor category of all diagrams in \mathbf{C} of type \mathbf{J} as \mathbf{C^{J}}.

Given a diagram F: \mathbf{J}\rightarrow \mathbf{C}, a cone to F is an object N of \mathbf{C} together with morphisms \psi_{X}: N\rightarrow F(X) indexed by the objects X of \mathbf{J} such that for every morphism f: X\rightarrow Y in  \mathbf{J}, we have F(f)\circ \psi_{X}=\psi_{Y}.

A limit of a diagram F: \mathbf{J}\rightarrow \mathbf{C} is a cone (L, \varphi) to F such that for any other cone (N, \psi)  to F there exists a unique morphism u: N\rightarrow L such that \varphi_{X}\circ \psi_{X} for all X in J.

For example, when \mathbf{J} is a category with only two objects A and B and whose only morphisms are the identity morphisms on each of these objects, the limit of the diagram F: \mathbf{J}\rightarrow \mathbf{C} is just the product. Similarly, the pullback is the limit of the diagram F: \mathbf{J}\rightarrow \mathbf{C} when \mathbf{J} is the category with three objects A, B, and C, and the only morphisms aside from the identity morphisms are one morphism A\xrightarrow{f}C and another morphism B\xrightarrow{g}C. The terminal object is the limit of the diagram F: \mathbf{J}\rightarrow \mathbf{C} when \mathbf{J} is the empty category, and the equalizer is the limit of the diagram F: \mathbf{J}\rightarrow \mathbf{C} when \mathbf{J} is the category with two objects A and B and whose only morphisms aside from the identity morphisms are two morphisms A\xrightarrow{f}B and A\xrightarrow{g}B.

A colimit is the dual concept to a limit, obtained by reversing the directions of all the morphisms in the definition. In the same way that the limit generalizes the concepts of product, pullback, terminal object, and equalizer, the colimit generalizes the concepts of coproduct, pushout, initial object, and coequalizer.

Next we discuss the concept of adjoint functors. Consider two categories \mathbf{C} and \mathbf{D}, and two functors F: \mathbf{C}\rightarrow \mathbf{D} and G: \mathbf{D}\rightarrow \mathbf{C}. We say that F is right adjoint to G, and that G is left adjoint to F, if for all objects C in \mathbf{C} and D in \mathbf{D} there exist bijections

\theta: \text{Hom}_{\mathbf{C}}(C, G(D))\xrightarrow{\sim}\text{Hom}_{\mathbf{D}}(F(C), D)

which are natural in the sense that given morphisms \alpha: C\rightarrow C' in \mathbf{C} and \xi: D'\rightarrow D in \mathbf{D}, we have

\theta(G(\alpha)\circ f\circ \xi)=\alpha\circ \theta(f)\circ F(\xi).

Suppose that products exist in \mathbf{C}. For a fixed object A of \mathbf{C}, consider the functor

A\times - : \mathbf{C}\rightarrow \mathbf{C}

which sends an object C of \mathbf{C} to the product A\times C in \mathbf{C}. If this functor has a right adjoint, we denote it by

(-)^{A}: \mathbf{C}\rightarrow \mathbf{C}.

We refer to the object A as an exponentiable object. We refer to the object B^{A} for some B in \mathbf{C} as an exponential object in \mathbf{C}. A category is called Cartesian closed if it has a terminal object and binary products, and if every object is an exponentiable object.

In the category \mathbf{Sets}, the exponential object B^{A} corresponds to the set of all functions from A to B. This also explains our notation for functor categories such as \mathbf{Sets^{C^{op}}} and \mathbf{C^{J}}.

Finally, we discuss the concept of subobject classifiers. We start by defining two important kinds of morphisms, monomorphisms and epimorphisms. A monomorphism (also called a mono, or monic) is a morphism f: X\rightarrow Y such that for all morphisms g_{1}: Y\rightarrow Z and g_{2}: Y\rightarrow Z, whenever the compositions f\circ g_{1} and f\circ g_{2} are equal, then it is guaranteed that g_{1} and g_{2} are also equal. An epimorphism (also called an epi, or epic)  is the dual of this concept, obtained by reversing the directions of all the morphisms in the definition of a monomorphism.

Two monomorphisms f: A\rightarrow D and g: B\rightarrow D are called equivalent if there is an isomorphism h: A\rightarrow B such that g\circ h=f. A subobject of D is then defined as an equivalence class of monomorphisms with domain D.

A subobject classifier is an object \Omega and a monomorphism \text{true}: 1\rightarrow \Omega such that to every monic j: U\rightarrow X there is a unique arrow \chi_{j}: X\rightarrow \Omega such that if u: U\rightarrow 1 is the unique morphism from U to the terminal object 1, then we have

\chi_{j}\circ j=\text{true}\circ u.

The significance of the subject classifier can perhaps best be understood by considering the category \mathbf{Sets}. The characteristic function \chi_{j} of the subset U of X is defined as the function on X that gives the value 1 if x\in U and gives the value 0 if x\notin U. Then we can set the terminal object 1 to be the set \{0\} and the object \Omega as the set \{0,1\}. The morphism \text{true} then sends 0\in \{0\} to 0\in \{0,1\}. The idea is that subobjects, i.e. subsets of sets in \mathbf{Sets}, can be obtained as pullbacks of \text{true} along the characteristic function \chi_{j}.

For the category \text{Sh }(X) of sheaves on a topological space X, the subobject classifier is the sheaf on X where for each open subset U of X the set \mathcal{F} (U) is given by the set of open subsets of U. The morphism \text{true} then “selects” the “maximal” open subset U of U.

Now we define our generalization of the Grothendieck topos. An elementary topos is a category \mathcal{E} satisfying the following conditions.

(i) \mathcal{E} has all finite limits and colimits.

(ii) \mathcal{E} is Cartesian closed.

(iii) \mathcal{E} has a subobject classifier.

A Grothendieck topos satisfies all these conditions and is an example of an elementary topos. However, the elementary topos is a much more general idea, and whereas the Grothendieck topos can be considered as a “generalized space”, the elementary topos can be considered as a “generalized universe of sets”. The term “universe”, as used in mathematics, refers to the entirety of where our discourse takes place, such that any concept or construction that we will ever need to consider or discuss can be found in this universe.

Perhaps the most basic example of an elementary topos is the category \mathbf{Sets}. It is actually also a Grothendieck topos, with its underlying category the category with one object and one morphism, which is the identity morphism on its one object. An example of an elementary topos that is not a Grothendieck topos is the category \mathbf{FinSets} of finite sets. It is worth noting, however, that despite the elementary topos being more general, the Grothendieck topos still continues to occupy somewhat of a special place in topos theory, including its applications to logic and other branches of mathematics beyond its origins in algebraic geometry.

II. Logic and the Elementary Topos

Mathematics is formalized, as a language, using what is known as first-order logic (also known as predicate logic or predicate calculus). This involves constants and variables of different “sorts” or “types”, such as x or y, strung together by relations, usually written Q(x, y), expressing a statement such as x=y. We also have functions, usually written g(x, y) expressing something such as x+y. The variables and functions are terms, and when these terms and strung together by relations, they form formulas. These formulas in turn are strung together by binary connectives such as “and”, “or”, “not”, “implies” and quantifiers such as “for all” and “there exists” to form more complicated formulas.

We can associate with an elementary topos a “language”. The “types” of this language are given by the objects of the topos. “Functions” are given by morphisms of objects. “Relations” are given by the subobjects of the object. In addition to these we need a notion of quantifiers, “for all” (written \forall) and “there exists” (written \exists). These quantifiers are given, for the functors \text{Sub }(Y)\rightarrow \text{Sub }(X), by left and right adjoints \exists_{f}, \forall_{f}: \text{Sub }(X)\rightarrow \text{Sub }(Y). For the binary connectives such as “and”, or”, “not”, and “implies”, we rely on the Heyting algebra structure on the subobjects of an elementary topos.

The existence of a Heyting algebra structure means that there exist operations, called join (written \vee) and meet (written \wedge), generalizing unions and intersections of sets, supremum and infimum of elements, or binary connectives “and” and “or”, a least element (written 0), a greatest element (written 1), and an implication operation such that

z\leq(x\Rightarrow y) if and only if z\wedge x\leq y.

We also have the negation of an element x

\neg x=(x\Rightarrow 0).

This Heyting algebra structure for subobjects \text{Sub }(A) of an object A of an elementary topos is provided by taking pullbacks (for the meet) and coproducts (for the join), with 0\rightarrow A as the least element, A\rightarrow A as the greatest element, and the implication given by the exponential.

We have shown one way in which topos theory is related to logic. Now we show how topos theory is related to the most commonly accepted foundations of mathematics, set theory. More technically, these foundations come from a handful of axioms called the ZFC axioms. The letters Z and F come from the names of the mathematicians who developed it, Ernst Zermelo and Abraham Fraenkel, while the letter C comes from another axiom called the axiom of choice.

The elementary topos, with some additional conditions, can be used to construct a version of the ZFC axioms. The first condition is that whenever there are two morphisms f: A\rightarrow B and g: A\rightarrow B, and a morphism x: 1\rightarrow X from the terminal object 1 to A, we only have f\circ x=g\circ x if f=g. In this case we say that the topos is well-pointed. The second condition is that we have a natural numbers object, which is an object \mathbf{N} and morphisms 0:1\rightarrow \mathbf{N} ands:\mathbf{N}\rightarrow \mathbf{N}, such that for any other object X and morphisms x:1\rightarrow X and f:X\rightarrow X, we have a unique morphism h: \mathbf{N}\rightarrow X such that h\circ 0=x and h\circ s=f . The third condition is the axiom of choice; this is equivalent to the statement that for every epimorphism p:X\rightarrow I there exists s:I\rightarrow X such that s\circ p=1.

One of the issues that hounded set theory in the early days after the ZFC axioms were formulated where whether the axiom of choice could be derived from the other axioms (these axioms were simply called the ZF axioms) or whether it needed to be put in separately. Another issue concerned what was known as the continuum hypothesis, a statement concerning the cardinality of the natural numbers and the real numbers, and whether this statement could be proved or disproved from the ZFC axioms alone. The mathematician Paul Cohen showed that both the axiom of choice and the continuum hypothesis are independent of ZF and ZFC respectively. A topos-theoretic version of Cohen’s proof of the independence of the continuum hypothesis was then later developed by the mathematicians William Lawvere and Myles Tierney (both of whom also developed much of the original theory of elementary toposes).

We now discuss certain aspects of topos theory related to Cohen’s proof. First we introduce a construction in an elementary topos that generalizes the Grothendieck topology discussed in More Category Theory: The Grothendieck Topos. A Lawvere-Tierney topology on \mathcal{E} is a map: j: \Omega\rightarrow \Omega such that

(a) j\circ \text{true}=\text{true}

(b) j\circ j=j

(c) j\circ \wedge=\wedge \circ (j\times j)

The Lawvere-Tierney topology allows us to construct sheaves on the topos, and together with the Heyting algebra structure on the subobject classifier \Omega, allows us to construct double negation sheaves, which themselves form toposes that have the special property that they are Boolean, i.e. the Heyting algebra structure of its subobject classifier satisfies the additional property \neg \neg x=x. This is important because a well-pointed topos, which is necessary to formulate a topos-theoretic version of ZFC, is necessarily Boolean. Another condition for the topos to be well-pointed is for it to be two-valued, which means that there are only two morphisms from the terminal object 1 to \Omega. We can obtain such a two-valued topos from any other topos using the concept of a filter, which essentially allows us to take “quotients” of the Heyting algebra structure on \Omega.

There is yet another condition for an elementary topos to be well-pointed, namely that its “supports split” in the topos. This condition is automatically satisfied whenever the topos satisfies the axiom of choice.

It turns out that the topos of double negation sheaves over a partially ordered set is Boolean (as discussed earlier) and satisfies the axiom of choice. For proving the independence of the continuum hypothesis, a partially ordered set was constructed by Cohen, representing  “finite states of knowledge”, and we can use this to form a topos of double negation sheaves known as the Cohen topos. Using the concept of a filter we then obtain a two-valued topos and therefore satisfy all the requirements for a topos-theoretic version of ZFC. However, the continuum hypothesis does not hold in the Cohen topos, thus proving its independence of ZFC.

A similar strategy involving double negation sheaves was used by the mathematician Peter Freyd to develop a topos-theoretic version of Cohen’s proof of the independence of the axiom of choice from the other axioms ZF, using a different underlying category (since a partially ordered set would automatically satisfy the axiom of choice). In both cases the theory of elementary toposes would provide a more “natural” language for Cohen’s original proofs.

III. Geometric Morphisms

We now discuss morphisms between toposes. The elementary topos was inspired by the Grothendieck topos, which was in turn inspired by sheaves on a topological space, so we turn to the classical theory once more and look at morphisms between sheaves. Given a continuous function f: X\rightarrow Y, and a sheaf \mathcal{F} on X, we can define a sheaf, called the direct image sheaf, f_{*}\mathcal{F} on Y by setting f_{*}\mathcal{F}(V)=\mathcal{F}(f^{-1}(V)) for every open subset V\subseteq Y. Similarly, given a sheaf \mathcal{G} on Ywe also have the inverse image sheaf, however it cannot similarly be defined as f^{*}\mathcal{G}(U)=\mathcal{G}(f(U)) for an open subset U\subseteq X, since the image of U in Y may not be an open subset of Y.

This can be remedied by the process of “sheafification”; we think instead in terms of the “stalks” of the sheaf \mathcal{G}, i.e. sets that are in some way “parametrized” by the points y of Y. Then we can obtain sets “parametrized” by the points f(x); these sets then form the inverse image sheaf f^{*}\mathcal{G} on X. The points of a space are of course not open sets in the usual topologies that we use, so the definition of a stalk involves the “direct limit” of open sets containing the point. It is worth noting that the inverse image “preserves” finite limits.

The process of taking the direct image sheaf can be expressed as a functor between the category \text{Sh }(X) of sheaves on X to the category \text{Sh }(Y) of sheaves on Y. The inverse image sheaf is then the right adjoint to the direct image functor, and it has the property that it preserves finite limits.

A geometric morphism is a pair of adjoint functors between toposes such that the left adjoint preserves finite limits. This allows us to form the category \mathfrak{Top} whose objects are elementary toposes and whose morphisms are geometric morphisms. The natural transformations between geometric morphisms, called geometric transformations, give the category \mathfrak{Top} the extra structure of a 2-category. There are also logical morphisms between toposes, which preserve all structure, and with them and their natural transformations we can form the 2-category \mathfrak{Log}.

We can also define the topos \mathfrak{Top}/\mathcal{S} as the category whose objects are geometric morphisms p: \mathcal{E}\rightarrow \mathcal{S} and whose morphisms (p: \mathcal{F}\rightarrow \mathcal{S})\rightarrow (q: \mathcal{E}\rightarrow \mathcal{S}) are pairs (f, \alpha) where f: \mathcal{F}\rightarrow \mathcal{E} is a geometric morphism and \alpha: q\cong p\circ f is a geometric transformation. Together with “2-cells” (f, \alpha)\rightarrow (g, \beta) given by geometric transformations f\rightarrow g that are “compatible” in some sense with \alpha and \beta\mathfrak{Top}/\mathcal{S} also forms a 2-category.

Geometric morphisms can now be used to define the points of a topos. In the category of sets, we can use the morphisms of the set consisting of only one element to all the other sets to indicate the elements of these other sets. The same goes for topological spaces and their points. We have mentioned earlier the category \mathbf{Sets} as the topos of sheaves on a point. Therefore, we define the points of a topos \mathcal{E} as the geometric morphisms from \mathbf{Sets} to \mathcal{E}.

There exist, however, toposes (including Grothendieck toposes) without points. Sheaves, however, are defined only using open sets, therefore to deal with toposes satisfactorily we can make use of the concept of locales, which abstract the properties of open sets and the study of topological spaces, while “forgetting” the underlying sets of points. A topos which is equivalent to the category of sheaves on some locale is called a localic topos.

An important result in the theory of localic toposes is Barr’s theorem, which states that for every Grothendieck topos \mathcal{E} there exists a sheaf \text{Sh }(\mathbf{B}) on a locale \mathbf{B} with a “complete” Boolean algebra structure and an epimorphism \text{Sh }(\mathbf{B})\rightarrow \mathcal{E}. Another important results is Deligne’s theorem, which states that a coherent topos, i.e. a topos \mathcal{E} for which there is a  site (\mathbf{C}, J) where \mathbf{C} has finite limits and the Grothendieck topology has a “basis” which consists of finite covering families, has “enough points“, i.e. for any two arrows \alpha: E\rightarrow D and \alpha: E\rightarrow D in \mathcal{E} there exists a point p: \mathbf{Sets}\rightarrow \mathcal{E} such that the stalk p^{*}(\alpha) is not equal to the stalk p^{*}(\beta) .

We can also use geometric morphisms to define the idea of a classifying topos. A classifying topos is an elementary topos such that objects in any other topos can be “classified” by the geometric morphisms of the topos to the classifying topos. For example, ring objects in any topos \mathcal{E} are classified by the topos given by the opposite category of the category of “finitely presented” rings \mathbf{fp}\textbf{-}\mathbf{rings^{op}}. The object in \mathbf{fp}\textbf{-}\mathbf{rings^{op}} given by the polynomial ring \mathbf{Z}[X] is then a universal object, such that any ring object in \mathcal{E} can be obtained by constructing the pullback of \mathbf{Z}[X]\rightarrow \mathbf{fp}\textbf{-}\mathbf{rings^{op}} along \mathcal{E}\rightarrow \mathbf{fp}\textbf{-}\mathbf{rings^{op}}.

We now combine the idea of classifying toposes (which was inspired by the idea of classifying spaces in algebraic topology) with the applications of topos theory to first-order logic discussed earlier. A theory \mathbb{T} is a set of formulas, called the axioms of the theory, and a model of \mathbb{T} in a topos \mathcal{E} is an interpretation, i.e. an assignment of an object of \mathcal{E} to every type of the first-order language, a subobject of \mathcal{E} to every relation, and a morphism of \mathcal{E} to every function, with quantifiers and binary connectives provided by the corresponding adjoint functors and Heyting algebra structures respectively.

A theory is called a coherent theory if it is of the form \forall x (\phi(x)\Rightarrow \psi(x)), where \phi(x) and \psi(x) are coherent formulas, i.e. formulas which are built up using only the operations of finitary conjunction \wedge, finitary disjunction \vee, and existential quantification \exists. If we also allow as well the operation of infinitary disjunction \bigvee, then we will obtain a geometric formula, and a theory of the form \forall x (\phi(x)\Rightarrow \psi(x)), where \phi(x) and \psi(x) are geometric formulas is called a geometric theory.

Most theories in mathematics are coherent theories. For those which are not, however, there is a certain process called Morleyization which associates to those theories a coherent theory.

For any model of a coherent theory \mathbb{T} in an elementary topos \mathcal{E}, there exists a classifying topos \mathcal{E}_\mathbb{T} and a universal object (in this context also called a universal model) such that said model can be obtained as a pullback of U\rightarrow \mathcal{E}_\mathbb{T} along the geometric morphism \mathcal{E}\rightarrow \mathcal{E}_\mathbb{T}.

We mention yet another aspect of topos theory where logic and geometry combine. We have earlier mentioned the theorems of Deligne and Barr in the context of studying toposes as sheaves on locales. Combined with the logical aspects of the toposes, and the theory of classifying toposes, Deligne’s theorem implies that a statement of the form \forall x (\phi(x)\Rightarrow \psi(x)) where \phi(x) and \psi(x) are coherent formulas holds in all models of the coherent theory \mathbb{T} in any topos if and only if it holds in all models of \mathbb{T} in \mathbf{Sets}.

Meanwhile, Barr’s theorem implies that a statement of the form \forall x (\phi(x)\Rightarrow \psi(x)) where \phi(x) and \psi(x) are geometric formulas holds in all models of the geometric theory \mathbb{T} in any topos if  and only if it holds in all models of \mathbb{T} in Boolean toposes.

In this context, Deligne’s theorem and Barr’s theorem respectively correspond to finitary and infinitary versions of a famous theorem in classical logic called Godel’s completeness theorem.

References:

Topos on Wikipedia

Topos on the nLab

What is … a Topos? by Zhen Lin Low

An Informal Introduction to Topos Theory by Tom Leinster

Topos Theory by Peter T. Johnstone

Sketches of an Elephant: A Compendium of Topos Theory by Peter T. Johnstone

Handbook of Categorical Algebra 3: Categories of Sheaves by Francis Borceux

Sheaves in Geometry and Logic: A First Introduction to Topos Theory by Saunders Mac Lane and Ieke Moerdijk