Rotating and Reflecting Vectors Using Matrices

In Vector Spaces, Modules, and Linear Algebra we learned about vectors, and defined them as elements of a set that is closed under addition and scalar multiplication. This is a pretty abstract concept, and in that post we used an example of “apples and oranges” to express it. However we also mentioned that many other things are vectors; for instance, states in quantum mechanics, and quantities with a magnitude and direction, such as forces. It is these quantities with a magnitude and direction that we will focus on in this post.

We will use the language that we developed in Matrices in order to make things more concrete. We will focus on two dimensions only in this post, in order to simplify things, although it will not be difficult to generalize to higher dimensions. We develop first a convention. The vector

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)

represents a quantity with magnitude “1” (meter, or meter per second, or Newton, etc.) going to the right (or east). Similarly, the vector

\displaystyle \left(\begin{array}{c}-1\\0\end{array}\right)

represents a quantity with magnitude 1 going to the left (or west). Meanwhile, the vector

\displaystyle \left(\begin{array}{c}0\\1\end{array}\right)

represents a quantity with magnitude 1 going upward (or to the north). Finally, the vector

\displaystyle \left(\begin{array}{c}0\\-1\end{array}\right)

represents a quantity with magnitude 1 going downward (or to the south). These vectors we have enumerated all have magnitude 1, therefore they are also called unit vectors. Since they are vectors, we can “scale” them or add or subtract them from each other to form new vectors. For example, we can “double” the upward-pointing unit vector,

\displaystyle 2\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}0\\2\end{array}\right)

to obtain a vector again pointing upward but with a magnitude of 2. We can also “add” the right-pointing unit vector to the upward-pointing unit vector, as follows:

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)+\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}1\\1\end{array}\right)

We can easily infer that this vector will point “diagonally” upward and to the right (or to the northwest). But what will be its magnitude? For this we introduce the concept of the transpose. The transpose of a matrix is just another matrix but with its rows and columns interchanged. For a column matrix, we have only one column, so its transpose is a matrix with only one row, as follows:

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)^{T}=\left(\begin{array}{cc}a&b\end{array}\right)

Now, to take the magnitude of a vector, we take the square root of the product of the transpose of a vector and the vector itself. Note that the multiplication of matrices is not commutative, so it is important that the row matrix be on the left and the column matrix (the vector) be on the right. It is the only way we will obtain an ordinary number from the matrices.

Applying the rules of matrix multiplication, we see that for a vector

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

the magnitude will be given by the square root of

\displaystyle \left(\begin{array}{cc}a&b\end{array}\right) \left(\begin{array}{c}a\\b\end{array}\right)=a^{2}+b^{2}

This should be reminiscent of the Pythagorean theorem. As we have already seen in From Pythagoras to Einstein, this ancient theorem always shows up in many aspects of modern mathematics and physics. Going back to our example of the vector

\displaystyle \left(\begin{array}{c}1\\1\end{array}\right)

we can now compute for its magnitude. Multiplying the transpose of this vector and the vector itself, in the proper order, we obtain

\displaystyle \left(\begin{array}{cc}1&1\end{array}\right) \left(\begin{array}{c}1\\1\end{array}\right)=1^{2}+1^{2}=2

and taking the square root of this number, we see that the magnitude of our vector is equal to \sqrt{2}.

In Matrices we mentioned that a square matrix may be used to describe linear transformations between vectors. Now that we have used the language of vectors to describe quantities with magnitude and direction, we also show a very special kind of linear transformation – one that sends a vector to another vector with the same value of the magnitude, but “rotated” or “reflected”, i.e. with a different direction. We may say that this linear transformation describes the “operation” of rotation or reflection. This analogy is the reason why linear transformations from a vector space to itself are also often referred to as linear operators, especially in quantum mechanics.

We make this idea clearer with an explicit example. Consider the matrix

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)

We look at its effect on some vectors:

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}0\\1\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}-1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}-1\\0\end{array}\right)=\left(\begin{array}{c}0\\-1\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}0\\-1\end{array}\right)=\left(\begin{array}{c}1\\0\end{array}\right)

From these basic examples one may infer that our matrix represents a counterclockwise “rotation” of ninety degrees. The reader is encouraged to visualize (or better yet draw) how this is so. In fact, we can express a counterclockwise rotation of any angle \theta using the matrix

\displaystyle \left(\begin{array}{cc}\text{cos }\theta&-\text{sin }\theta\\ \text{sin }\theta&\text{cos }\theta\end{array}\right)

We consider next another matrix, given by

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)

We likewise look at its effect on some vectors:

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}0\\-1\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}-1\\0\end{array}\right)=\left(\begin{array}{c}-1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}0\\-1\end{array}\right)=\left(\begin{array}{c}0\\1\end{array}\right)

What we see now is that this matrix represents a “reflection” along the horizontal axis. Any reflection along a line specified by an angle of \frac{\theta}{2} is represented by the matrix

\displaystyle \left(\begin{array}{cc}\text{cos }\theta&\text{sin }\theta\\ \text{sin }\theta&-\text{cos }\theta\end{array}\right)

The matrices representing rotations and reflections form a group (see Groups) called the orthogonal group. Since we are only looking at rotations in the plane, i.e. in two dimensions, it is also more properly referred to as the orthogonal group in dimension 2, written \text{O}(2). The matrices representing rotations form a subgroup (a subset of a group that is itself also a group) of the orthogonal group in dimension 2, called the special orthogonal group in dimension 2 and written \text{SO}(2).

The reader is encouraged to review the concept of a group as discussed in Groups, but intuitively what this means is that by multiplying two matrices, for instance, representing counterclockwise rotations of angles \alpha and \beta, then we will get a matrix which represents a counterclockwise rotation of angle \alpha+\beta. In other words, we can “compose” rotations; and the composition is associative, possesses an “identity” (a rotation of zero degrees) and for every counterclockwise rotation of angle \theta there is an “inverse” (a clockwise rotation of angle \theta, which is also represented as a counterclockwise rotation of angle -\theta).


\displaystyle \left(\begin{array}{cc}\text{cos }\alpha&-\text{sin }\alpha\\ \text{sin }\alpha&\text{cos }\alpha\end{array}\right)\left(\begin{array}{cc}\text{cos }\beta&-\text{sin }\beta\\ \text{sin }\beta&\text{cos }\beta\end{array}\right)=\left(\begin{array}{cc}\text{cos}(\alpha+\beta)&-\text{sin}(\alpha+\beta)\\ \text{sin}(\alpha+\beta)&\text{cos}(\alpha+\beta)\end{array}\right)

It can be a fun exercise to derive this equation using the laws of matrix multiplication and the addition formulas for the sine and cosine functions from basic trigonometry.

This is what it means for \text{SO}(2), the matrices representing rotations, to form a group. Reflections can also be considered in addition to rotations, and reflections and rotations can be composed with each other. This is what it means for \text{O}(2), the matrices representing rotations and reflections, to form a group. The matrices representing reflections alone do not form a group however, since the composition of two reflections is not a reflection, but a rotation.

Technically, the distinction between the matrices representing rotations and the matrices representing reflections can be seen by examining the determinant, which is a concept we will leave to the references for now.

It is worth repeating how we defined the orthogonal group \text{O}(2) technically – it is the group of matrices that preserve the magnitudes of vectors. This gives us some intuition as to why they are so special. There are other equivalent definitions of \text{O}(2). For example, they can also be defined as the matrices A which satisfy the equation

\displaystyle AA^{T}=A^{T}A=I

where the matrix A^{T} is the transpose of  the matrix A, which is given by interchanging the rows and the columns of A, as discussed earlier, and

\displaystyle I=\left(\begin{array}{cc}1&0\\ 0&1\end{array}\right)

is the “identity” matrix, which multiplied to any other matrix A (on either side) just gives back A. This may also be expressed by saying that the group \text{O}(2) is made up of the matrices whose transpose is also its inverse (and vice versa).

In summary, we have shown in this post one specific aspect of vector spaces and linear transformations between vector spaces, and “fleshed out” the rather skeletal framework of sets that are closed under addition and scalar multiplication, and functions that respect this structure. It is important to note of course, that the applications of vector spaces and linear transformations are by no means limited to describing quantities with magnitude and direction.

Another concept that we have “fleshed out” in this post is the concept of groups, which we have only treated rather abstractly in Groups. We have also been using the concept of groups in algebraic topology, in particular homotopy groups in Homotopy Theory and homology groups and cohomology groups in Homology and Cohomology, but it is perhaps the example of the orthogonal group, or even better the special orthogonal group, where we have intuitive and concrete examples of the concept. Rotations can be composed, the composition is associative, there exists an “identity”, and there exists an “inverse” for every element. The same holds for rotations and reflections together.

These two subjects that we have discussed in this post, namely linear algebra and group theory, are in fact closely related. The subject that studies these two subjects in relation to one another is called representation theory, and it is a very important part of modern mathematics.


Orthogonal Matrix on Wikipedia

Orthogonal Group on Wikipedia

Algebra by Michael Artin



We discussed linear algebra in Vector Spaces, Modules, and Linear Algebra, and there we focused on “finite-dimensional” vector spaces (the concept of dimension for vector spaces was discussed in More on Vector Spaces and Modules), writing vectors in the form

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

Vectors need not be written in this way, since the definition of the concept of vector space only required that it be a set closed under addition and scalar multiplication. For example, we could have just denoted vectors by v, or, in quantum mechanics, we use what we call “Dirac notation”, writing vectors as |\psi\rangle.

However, the notation that we used in Vector Spaces, Modules, and Linear Algebra is quite convenient; it allowed us to display explicitly the “components”; if we declare that our scalars, for example, be the set of real numbers \mathbb{R}, and that our vector space is the set of all vectors of the form

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

where a,b\in \mathbb{R}, then we already know that we can use the following vectors for our basis:

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)


\displaystyle \left(\begin{array}{c}0\\1\end{array}\right)

since any vector can be expressed uniquely as a linear combination

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)=a\left(\begin{array}{c}1\\0\end{array}\right)+b\left(\begin{array}{c}0\\1\end{array}\right)

It is also quite easy to see that our vector space here has dimension 2. What we have done is express our vector as a matrix, more specifically a column matrix. A matrix is a rectangular array of numbers (which we refer to as its “entries”), with some specific properties as we will discuss later. If a matrix has m rows and n columns, we refer to it as an m\times n matrix. A matrix that has only one row is often referred to as a row matrix, and a matrix with only one column, as we have been using to express our vectors up to now, is referred to as a column matrix. A matrix with the same number of columns and rows is referred to as a square matrix.

Here are some examples of matrices (with real number entries):

\displaystyle \left(\begin{array}{cc}1&-0.25\\ 100&0\\2&-5\end{array}\right)        (3\times 2 matrix)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&\frac{3}{2}\end{array}\right)        (2\times 2 square matrix)

\displaystyle \left(\begin{array}{cccc}1&27&-\frac{4}{5}&10\\ \end{array}\right)       (1\times 4 row matrix)

We will often adopt the notation that the entry in the first row and first column of a matrix A will be labeled by A_{1,1}, the entry in the second row and first column of the same matrix will be labeled A_{2,1}, and so on. Since we often denote vectors by v, we will denote its first component (the entry in the first row) by v_{1}, the second component by v_{2}, and so on.

We can perform operations on matrices. The set of m\times n matrices, for fixed m and n form a vector space, which means we can “scale” them or multiply them by a “scalar”, and we can also add or subtract them from each other. This is done so “componentwise”, i.e.

\displaystyle c\left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)=\left(\begin{array}{cc}cA_{1,1}&cA_{1,2}\\ cA_{2,1}&cA_{2,2}\end{array}\right)

\displaystyle \left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)+\left(\begin{array}{cc}B_{1,1}&B_{1,2}\\ B_{2,1}&B_{2,2}\end{array}\right)=\left(\begin{array}{cc}A_{1,1}+B_{1,1}&A_{1,2}+B_{1,2}\\ A_{2,1}+B_{2,1}&A_{2,2}+B_{2,2}\end{array}\right)

\displaystyle \left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)-\left(\begin{array}{cc}B_{1,1}&B_{1,2}\\ B_{2,1}&B_{2,2}\end{array}\right)=\left(\begin{array}{cc}A_{1,1}-B_{1,1}&A_{1,2}-B_{1,2}\\ A_{2,1}-B_{2,1}&A_{2,2}-B_{2,2}\end{array}\right)

Multiplication of matrices is more complicated. A j\times k matrix can be multiplied by a k\times l matrix to form a j\times l matrix. Note that the number of columns of the first matrix must be equal to the number of rows of the second matrix. Note also that multiplication of matrices is not commutative; a product AB of two matrices A and B may not be equal to the product BA of the same matrices, contrary to what we find in the multiplication of ordinary numbers.

The procedure to obtaining the entries of this product matrix is as follows: Let’s denote the product of the j\times k matrix A and the k\times l matrix B by AB (this is a j\times l matrix, as we have mentioned above) and let AB_{m,n} be its entry in the m-th row and n-th column. Then

\displaystyle AB_{m,n}=\sum_{i=1}^{k}A_{m,i}B_{i,n}

For example, we may have

\displaystyle \left(\begin{array}{cc}1&-3\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}5&-2&0&1\\ 0&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}(1)(5)+(-3)(0)&(1)(-2)+(-3)(1)&(1)(0)+(-3)(-1)&(1)(1)+(-3)(4)\\ (2)(5)+(0)(0)&(2)(-2)+(0)(1)&(2)(0)+(0)(-1)&(2)(1)+(0)(4)\\(-2)(5)+(6)(0)&(-2)(-2)+(6)(1)&(-2)(0)+(6)(-1)&(-2)(1)+(6)(4)\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&-3\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}5&-2&0&1\\ 0&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}5&-5&3&-11\\ 10&-4&0&2\\-10&10&-6&22\end{array}\right)

We highlight the following step to obtain the entry in the first row and first column:

\displaystyle \left(\begin{array}{cc}\mathbf{1}&\mathbf{-3}\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}\mathbf{5}&-2&0&1\\ \mathbf{0}&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}\mathbf{(1)(5)+(-3)(0)}&(1)(-2)+(-3)(1)&(1)(0)+(-3)(-1)&(1)(1)+(-3)(4)\\ (2)(5)+(0)(0)&(2)(-2)+(0)(1)&(2)(0)+(0)(-1)&(2)(1)+(0)(4)\\(-2)(5)+(6)(0)&(-2)(-2)+(6)(1)&(-2)(0)+(6)(-1)&(-2)(1)+(6)(4)\end{array}\right)

Now that we know how to multiply matrices, we now go back to vectors, which can always be written as column matrices. For the sake of simplicity we continue to restrict ourselves to finite-dimensional vector spaces. We have seen that writing vectors as column matrices provides us with several conveniences. Other kinds of matrices are also useful in studying vector spaces.

For instance, we noted in Vector Spaces, Modules, and Linear Algebra that an important kind of function between vector spaces (of the same dimension) are the linear transformations, which are functions f(v) such that f(av)=af(v) and f(u+v)=f(u)+f(v). We note that if A is an n\times n square matrix, and v is an n\times 1 column matrix, then the product Av is another n\times 1 column matrix. It is a theorem that all linear transformations between n-dimensional vector spaces can be written as an n\times n square matrix.

 We also have functions from a vector space to the set of its scalars, which are sometimes referred to as functionals. The set of linear functionals, i.e. the set of functionals f(v) such that f(av)=af(v) and f(u+v)=f(u)+f(v), are represented by multiplying any column matrix by a row matrix (the number of their entries must be the same, as per the rules of matrix multiplication). For instance, we may have

\displaystyle \left(\begin{array}{cccc}u_{1}&u_{2}&u_{3}&u_{4} \end{array}\right)\left(\begin{array}{c}v_{1}\\v_{2}\\v_{3}\\v_{4}\end{array}\right) = u_{1}v_{1}+u_{2}v_{2}+u_{3}v_{3}+u_{4}v_{4}

Note that the right side is just a real number (or complex number, or perhaps most generally, an element of the field of scalars of our vector space).

Matrices are rather ubiquitous in mathematics (and also in physics). In fact, some might teach the subject of linear algebra with the focus on matrices first. Here, however, we have taken the view of introducing first the abstract idea of vector spaces, and with matrices being viewed as a method of making these abstract ideas of vectors, linear transformations, and linear functionals more “concrete”. At the very heart of linear algebra still remains the idea of a set whose elements can be added and scaled, and functions between these sets that “respect” the addition and scaling. But when we want to actually compute things, matrices will often come in handy.


 Matrix on Wikipedia

Linear Algebra Done Right by Sheldon Axler

Algebra by Michael Artin

Abstract Algebra by David S. Dummit and Richard M. Foote

More on Chain Complexes

In Homology and Cohomology we used the concept of chain complexes to investigate topological spaces. In Exact Sequences we saw examples of chain complexes generalized to abelian groups other than that made out of topological spaces. In this post we study chain complexes in the context of linear algebra (see Vector Spaces, Modules, and Linear Algebra).

We start with some definitions regarding modules. In More on Vector Spaces and Modules we gave the definition of a basis of a vector space. It is known that any vector space can always have a basis. However, the same is not true for modules. It is only a certain special kind of module called a free module which has the property that one can always find a basis for it.

Alternatively, a free module over a ring R may be thought of as being a module that is isomorphic to a direct sum of several copies of the ring R.

An example of a module that is not free is the module \mathbb{Z}/2\mathbb{Z} over the ring \mathbb{Z}. It is a module over \mathbb{Z} since it is closed under addition and under multiplication by any element of \mathbb{Z}, however a basis that will allow it to be written as a unique linear combination of elements of the basis cannot be found, nor is it a direct sum of copies of \mathbb{Z}.

Although not all modules are free, it is actually a theorem that any module is a quotient of a free module. Let A be a module over a ring R. The theorem says that this module is the quotient of some free module, which we denote by F_{0}, by some other module which we denote by K_{1}. In other words,


We can write this as the following chain complex, which also happens to be an exact sequence (see Exact Sequences):

0\rightarrow K_{1}\xrightarrow{i_{1}} F_{0}\xrightarrow{\epsilon} A\rightarrow 0

We know that the module F is free. However, we do not know if the same holds true for K_{1}. Regardless, the theorem says that any module is a quotient of a free module. Therefore we can write

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{\epsilon_{1}} K_{1}\rightarrow 0

We can therefore put these chain complexes together to get

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{\epsilon_{1}} K_{1}\xrightarrow{i_{1}} F_{0}\xrightarrow{\epsilon} A\rightarrow 0

However, this sequence of modules and morphisms is not a chain complex since the image of \epsilon_{1} is not contained in the kernel of i_{1}. But if we compose these two maps together, we obtain

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{d_{1} }F_{0}\xrightarrow{\epsilon} A\rightarrow 0

where d_{1}=i_{1}\circ \epsilon_{1}. This is a chain complex as one may check. We can keep repeating the process indefinitely to obtain

...\xrightarrow{d_{3}} F_{2}\xrightarrow{d_{2} } F_{1}\xrightarrow{d_{1} } F_{0}\xrightarrow{\epsilon} A\rightarrow 0

This chain complex is called a free resolution of A. A free resolution is another example of an exact sequence.

We now introduce two more special kinds of modules.

A projective module is a module P such for any surjective morphism p: A\rightarrow A'' between two modules A and A'' and morphism h: P\rightarrow A'', there exists a morphism g: P\rightarrow A such that p\circ g=h.

It is a theorem that a module is projective if and only if it is a direct summand of a free module. This also means that a free module is automatically also projective.

An injective module is a module E such for any injective morphism i: A\rightarrow B between two modules A and B and morphism f: A\rightarrow E, there exists a morphism g: B\rightarrow E such that g\circ i=f.

Similar to our discussion regarding free resolutions earlier, we can also have projective resolutions and injective resolutions. A projective resolution is a chain complex

...\xrightarrow{d_{3}} P_{2}\xrightarrow{d_{2} } P_{1}\xrightarrow{d_{1} } P_{0}\xrightarrow{\epsilon} A\rightarrow 0

such that the P_{n} are projective modules.

Meanwhile, an injective resolution is a chain complex

...0\rightarrow A\xrightarrow{\eta} E^{0}\xrightarrow{d^{0} } E^{1}\xrightarrow{d^{1}} E^{2}\xrightarrow{d^{2}} ...

such that the E^{n} are injective modules.

Since projective and injective resolutions are chain complexes, we can use the methods of homology and cohomology to study them (Homology and Cohomology) even though they may not be made up of topological spaces. However, the usual procedure is to consider these chain complexes as forming an “abelian category” and then applying certain functors (see Category Theory) such as what are called the “Tensor” and “Hom” functors before applying the methods of homology and cohomology, resulting in what are known as “derived functors“. This is all part of the subject known as homological algebra.


Free Module on Wikipedia

Projective Module on Wikipedia

Injective Module on Wikipedia

Resolution on Wikipedia

An Introduction to Homological Algebra by Joseph J. Rotman

Abstract Algebra by David S. Dummit and Richard M. Foote

Exact Sequences

In Homology and Cohomology we introduced the idea of chain complexes to help us obtain information about topological spaces. We recall that a chain complex is made up of abelian groups of spaces C_{n} and boundary homomorphisms \partial_{n}: C_{n}\rightarrow C_{n-1} such that for all n the composition of successive boundary homomorphisms \partial_{n-1}\circ \partial_{n}: C_{n}\rightarrow C_{n-2} sends every element in C_{n} to the zero element in C_{n-2}.

Chain complexes can be expressed using the following diagram:


We now abstract this idea, generalizing it so that the groups C_{n} do not necessarily have to be topological spaces, and show an example of a chain complex that is ubiquitous in mathematics.

First we recall some ideas from Homology and Cohomology. Our “important principle” was summarized in the following statement:

All boundaries are cycles.

Boundaries in C_{n} are elements of the image of the boundary homomorphism \partial_{n+1}. Cycles in C_{n} are elements of the kernel of the boundary homomorphism \partial_{n}. Therefore, we can also state our “important principle” as follows:

\text{Im }\partial_{n+1}\subseteq \text{Ker }\partial_{n} for all n

This is of course just another restatement of the defining property of all chain complexes that two successive boundary functions when composed send every element of its domain to the zero element of its range.

There is an important kind of chain complex with the following property:

\text{Im }\partial_{n+1}=\text{Ker }\partial_{n} for all n

Such a chain complex is called an exact sequence. Sometimes we just say that the chain complex is exact. We will show some simple examples of exact sequences, but for these examples we will drop the notation of the boundary homomorphism \partial_{n} to show that many properties of ordinary functions can be expressed in terms of exact sequences.

Consider, for example, abelian groups A, B, and C. The identity elements of A, B, and C will be denoted by 0, writing 0\in A, 0\in B, and 0\in C if necessary. We will also write 0 to denote the trivial abelian group consisting only of the single element 0. Let us now look at the exact sequence

0\rightarrow A\xrightarrow{f} B

where 0\rightarrow A is the inclusion function sending 0\in 0 to 0\in A. The image of this inclusion function is therefore 0\in A. By the defining property of exact sequences, this is also the kernel of the function f:A\rightarrow B. In other words, f sends 0\in A to 0\in B. It is a property of group homomorphisms that whenever the kernel consists of only one element, the homomorphism is an injective, or one-to-one, function. This means that no more than one element of the domain gets sent to the same element in the range. Since this function is also a homomorphism, it is also called a monomorphism.

Meanwhile, let us also consider the exact sequence

B\xrightarrow{g} C\rightarrow 0

where C\rightarrow 0 is the “constant” function that sends any element in C to 0. The kernel of this constant function is therefore the entirety of C. By the defining property of exact sequences, this is also the image of the function B\rightarrow C. In other words, the image of the function g is the entirety of C, or we can also say that every element of C is assigned by g to some element of B. Such a function is called surjective, or onto. Since this function is also a homomorphism, it is also called an epimorphism.

The exact sequence

0\rightarrow A\xrightarrow{f} B\xrightarrow{g} C\rightarrow 0

is important in many branches of mathematics, and is called a short exact sequence. This means that f is a monomorphism, g is an epimorphism, and that \text{im }f=\text{ker g} in B. As an example of a short exact sequence of abelian groups, we have

0\rightarrow 2\mathbb{Z}\xrightarrow{f} \mathbb{Z}\xrightarrow{g} \mathbb{Z}/2\mathbb{Z}\rightarrow 0

(see also Modular Arithmetic and Quotient Sets). The monomorphism f takes the abelian group of even integers 2\mathbb{Z} and “embeds” them into the abelian group of the integers \mathbb{Z}. The epimorphism g then sends the integers in \mathbb{Z} to the element 0 in \mathbb{Z}/2\mathbb{Z} if they are even, and to the element 1 in  \mathbb{Z}/2\mathbb{Z} if they are odd. We see that every element in \mathbb{Z} that comes from 2\mathbb{Z}, i.e. the even integers, gets sent to the identity element or zero element 0 of the abelian group \mathbb{Z}/2\mathbb{Z}.

In the exact sequence

0\rightarrow A\xrightarrow{f} B\xrightarrow{g} C\rightarrow 0

The abelian group B is sometimes referred to as the extension of the abelian group C by the abelian group A.

We recall the definition of the homology groups H_{n}:

H_{n}=\text{Ker }\partial_{n}/\text{Im }\partial_{n+1}.

We can see from this definition that a chain complex is an exact sequence (we can also say that the chain complex is acyclic) if all of its homology groups are zero. So in a way, the homology groups “measure” how much a chain complex “deviates” from being an exact sequence.

We also have the idea of a long exact complex, which usually comes from the homology groups of chain complexes which themselves form a short exact sequence. In order to discuss this we first need the notion of a chain map between chain complexes. If we have a chain complex

...\xrightarrow{\partial_{A, n+3}}A_{n+2}\xrightarrow{\partial_{A, n+2}}A_{n+1}\xrightarrow{\partial_{A, n+1}}A_{n}\xrightarrow{\partial_{A, n}}A_{n-1}\xrightarrow{\partial_{A, n-1}}A_{n-2}\xrightarrow{\partial_{A, n-2}}...

and another chain complex

...\xrightarrow{\partial_{B, n+3}}B_{n+2}\xrightarrow{\partial_{B, n+2}}B_{n+1}\xrightarrow{\partial_{B, n+1}}B_{n}\xrightarrow{\partial_{B, n}}B_{n-1}\xrightarrow{\partial_{B, n-1}}B_{n-2}\xrightarrow{\partial_{B, n-2}}...

a chain map is given by homomorphisms

f_{n}: A_{n}\rightarrow B_{n} for all n

such that the homomorphisms f_{n} commute with the boundary homomorphisms \partial_{A, n} and \partial_{B, n}, i.e.

\partial_{B, n}\circ f_{n}=f_{n-1}\circ \partial_{A, n} for all n.

A short exact sequence of chain complexes is then a short exact sequence

0\rightarrow A_{n}\xrightarrow{f_{n}} B_{n}\xrightarrow{g_{n}} C_{n}\rightarrow 0 for all n

where the homomorphisms f_{n} and g_{n} satisfy the conditions for them to form a chain map, i.e. they commute with the boundary homomorphisms in the sense shown above.

In the case that we have a short exact sequence of chain complexes, their homology groups will then form a long exact sequence:

...\rightarrow H_{n}(A)\xrightarrow{f_{*}}H_{n}(B)\xrightarrow{g_{*}}H_{n}(C)\xrightarrow{\partial}H_{n-1}(A)\xrightarrow{f_{*}}...

Long exact sequences are often used for calculating the homology groups of complicated topological spaces related in some way to simpler topological spaces whose homology groups are already known.


Chain Complex on Wikipedia

Exact Sequence on Wikipedia

Algebraic Topology by Allen Hatcher

A Concise Course in Algebraic Topology by J. P. May

Abstract Algebra by David S. Dummit and Richard M. Foote

Eilenberg-MacLane Spaces, Spectra, and Generalized Cohomology Theories

In Homotopy Theory we explained the concept of homotopy and homotopy groups to study topological spaces and continuous functions from one topological space to another. Meanwhile, in Homology and Cohomology, we introduced the idea of homology and cohomology for the same purpose. In this post, we discuss one certain way in which homotopy and cohomology may be related to each other. Since homology and cohomology are related, being “duals” of each other in some sense, by extension this also relates homotopy and homology.

We recall first how we defined homology: Given a space X, we construct a chain complex out of its subspaces. For subspaces of the same dimension, we must have the structure of an abelian group, so that these subspaces form chains on X. The abelian group of n-dimensional chains then have boundary functions or boundary maps (or boundary morphisms, since these chains form abelian groups) to the abelian group of n-1-dimensional chains.

The homology groups H_{n}(X) are then given as the quotient of the group of cycles (kernels of the n-th boundary morphisms) by the group of boundaries (images of the n+1-th boundary morphisms).

We also recall how we defined cohomology in terms of the homology. We define cochains on X as the abelian group of the functions from chains on X to some other abelian group G. Together with the appropriate coboundary morphisms, they form what is called a cochain complex.

The cohomology groups H^{n}(X,G) are defined, dually to the homology groups, as the quotient of the the group of cocycles (kernels of the n-th coboundary morphisms) by the group of coboundaries (images of the n-1-th coboundary morphisms).

We now review some important concepts in homotopy theory. The n-th homotopy group \pi_{n}(X) of a space X is defined as the group [S^{n},X] of homotopy classes of functions from the n-dimensional sphere (also simply called the n-sphere) to X. equipped with appropriate basepoints. The homotopy group \pi_{1}(X) is called the fundamental group of X.

An Eilenberg-MacLane space, written K(G,n), for a certain group G and a certain natural number n, is a topological space characterized by the property that its n-th homotopy group is G and all its other homotopy groups are trivial. In Category Theory we mentioned that the fundamental group of a circle is the group of integers \mathbb{Z}, corresponding to the number of times a loop winds around the circle before returning to its basepoint, with the direction taken into account. All the other homotopy groups of the circle are actually trivial, which makes it an Eilenberg-Maclane space K(\mathbb{Z},1).

For certain kinds of topological spaces called CW-complexes (spaces which can be built out of “cells” which are topological spaces homeomorphic to n-dimensional open or closed balls) we have the following interesting relationship between homotopy and cohomology:

\displaystyle \tilde{H}^{n}(X, G)\cong [X, K(G,n)]

where the symbol “\cong” denotes an isomorphism of groups and the \tilde{H}^{n}(X,G) are the reduced cohomology groups, obtained by “dualizing” the augmented chain complex of reduced homology, which has

\displaystyle ...\xrightarrow{\partial_{2}} C_{1}\xrightarrow{\partial_{1}} C_{0}\xrightarrow{\epsilon} \mathbb{Z}\rightarrow 0

instead of the usual

\displaystyle ...\xrightarrow{\partial_{2}} C_{1}\xrightarrow{\partial_{1}} C_{0}\xrightarrow{\partial_{0}} 0

for the purpose of making the homology groups of a topological space * consisting of a single point trivial. We have the following relation between the homology groups H_{n}(X) and the reduced homology groups \tilde{H}_{n}(X):

\displaystyle H_{n}(X)=\tilde{H}_{n}(X)\oplus H_{n}(*)

Let X_{+} be the topological space obtained by adjoining a disjoint basepoint to X. Then

\displaystyle H_{n}(X)=\tilde{H}_{n}(X_{+})

Similarly, for cohomology, we have the following relations:

\displaystyle H^{n}(X)=\tilde{H}^{n}(X)\oplus H^{n}(*)

\displaystyle H^{n}(X)=\tilde{H}^{n}(X_{+})

The idea of using the homotopy classes of functions from one space to another to define some kind of “generalized cohomology theory” leads to the theory of “spectra” in algebraic topology (it should be noted that the word “spectra” or “spectrum” has many different meanings in mathematics, and in this post we are referring specifically to the concept in algebraic topology).

We first introduce some definitions. The wedge sum of two topological spaces X and Y with respective basepoints x_{0} and y_{0}, is the topological space, written X\wedge Y, given by the quotient space (X\coprod Y)/\sim under the identification x_{0}\sim y_{0}. One can think of the space X\wedge Y as being obtained from X and Y by gluing their basepoints together. For example, the wedge sum S^{1}\wedge S^{1} of two circles can be thought of as the “figure eight”.

The smash product of two topological spaces X and Y, again with respective basepoints x_{0} and y_{0}, is the topological space, written X\vee Y, given by the quotient space (X\times Y)/(X\wedge Y). What this means is that we take the cartesian product of X and Y, and then we collapse a copy of the wedge sum of X and Y containing the basepoint into the basepoint.

The suspension of a topological space X is the topological space, written SX, given by the quotient space (X\times I)/\sim under the identifications (x_{1},0)\sim (x_{2},0) and (x_{1},1)\sim (x_{2},1) for any x_{1}, x_{2}\in X, where I is the unit interval [0,1]. When X is the circle S^{1}, then X\times I is the cylinder, and SX is the cylinder with both ends collapsed into points. The space SX also looks like two cones with their bases glued together.

The reduced suspension of a topological space X with basepoint x_{0} is the topological space, written \Sigma X, given by the quotient space (X\times I)/(X\times \{0\}\cup X\times \{1\}\cup\{x_{0}\}\times I). This can be thought of as taking the suspension SX of X and then collapsing a copy of the unit interval containing the basepoint into the basepoint.

We note a couple of results regarding smash products and reduced suspensions. First, the smash product S^{1}\vee X of the circle S^{1} and a topological space X is homeomorphic to the reduced suspension \Sigma X of X. Second, the smash product S^{m}\vee S^{n} of an m-dimensional sphere S^{m} and an n-dimensional sphere S^{n} is homeomorphic to an m+n-dimensional sphere S^{m+n}.

The loop space \Omega X of a space X with a chosen basepoint x_{0} is the topological space whose underlying set is the set of loops beginning and ending at  x_{0} and equipped with an appropriate topology called the compact-open topology.

We have the following relation between the concepts of reduced suspension and loop space:

\displaystyle [\Sigma X, Y]\cong [X, \Omega Y]

By the definition of homotopy groups and of Eilenberg-MacLane spaces, together with the properties of the reduced suspension and of the smash product, this implies that

\Omega K(G,n)\cong K(G, n-1)

where the symbol \cong in this context denotes homeomorphism of topological spaces. We can repeat the process of taking the loop space to obtain

\Omega^{2} K(G,n)=\Omega (\Omega K(G,n))\cong K(G, n-2)

More generally, we will have

\Omega^{m} K(G,n)=\Omega K(G,n)\cong K(G, n-m) for m<n

We now define the concepts of prespectra and spectra. A prespectrum T is a sequence of spaces T_{n} with basepoints and basepoint-preserving maps \sigma: \Sigma T_{n}\rightarrow T_{n+1}. A spectrum E is a prespectrum with adjoints \tilde{\sigma}: E_{n}\rightarrow \Omega E_{n+1} which are homeomorphisms.

The Eilenberg-MacLane spaces form a spectrum, called the Eilenberg-MacLane spectrum. There are other spectra that will result in other generalized cohomology theories, which are defined to be functors (see Category Theory) from the category of pairs from pairs of topological spaces to the category of abelian groups, together with a natural transformation corresponding to a generalization of the boundary map, required to satisfy a set of conditions called the Eilenberg-Steenrod axioms. It follows from a certain theorem, called the Brown representability theorem, that every generalized cohomology theory comes from some spectrum, similar to how ordinary cohomology comes from the Eilenberg-MacLane spectrum.


Eilenberg-MacLane Space on Wikipedia

Spectrum on Wikipedia

Eilenberg-Steenrod Axioms on Wikipedia

Algebraic Topology by Allen Hatcher

A Concise Course in Algebraic Topology by J. P. May

My Favorite Equation in Physics

My favorite equation in physics is none other than Newton’s second law of motion, often written as

\displaystyle F=ma.

I like to call it the “Nokia 3310 of Physics” – the Nokia 3310 was a popular cellular phone model, back in the older days before smartphones became the norm, and which to this day is still well-known for its reliability and its durability. In the same way, Newton’s second law of motion, although superseded in modern physics by relativity and quantum mechanics, is still quite reliable for its purposes, was historical and groundbreaking for its time, and remains the “gold standard” of physical theories for its simplicity and elegance.

In fact, much of modern physics might be said to be just one long quest to “replace” Newton’s second law of motion when it was found out that it didn’t always hold, for example when things were extremely small, extremely fast, or extremely massive, or any combination of the above. Therefore quantum mechanics was developed to describe the physics of the extremely small, special relativity was developed to describe the physics of the extremely fast, and general relativity was developed to describe the physics of the extremely massive. However, a physical theory that could deal with all of the above – a so-called “theory of everything” – has not been developed yet, although a great deal of research is dedicated to this goal.

This so-called “theory of everything” is of course not literally a theory of “everything”. One can think of it instead as just a really high-powered, upgraded version of Newton’s second law of motion that holds even when things were extremely small, extremely fast, and extremely massive.

(Side note: There’s usually other things we might ask for in a “theory of everything” too. For instance, we usually want the theory to “unify” the four fundamental forces of electromagnetism, the weak nuclear force, the strong nuclear force, and gravity. As far as we currently understand, all the ordinary forces we encounter in everyday life, in fact all the forces we know of in the universe, are just manifestations of these four fundamental forces. It’s a pretty elegant scientific fact, and we want our theory to be even more elegant by unifying all these forces under one concept.)

All this being said, we look at a few aspects of Newton’s second law of motion. Even those who are more interested in the more modern theories of physics, or who want to pursue the quest for the “theory of everything”, might be expected to have a reasonably solid understanding, and more importantly an appreciation, for Newton’s second law.

The meaning of the equation is familiar from high school physics: The acceleration (the change in the velocity with respect to time) of an object is directly proportional to the force applied, in the same direction, and is inversely proportional to the mass of the object. Let’s simplify things for a moment and focus only on one dimension of motion, so we don’t have to worry too much about the direction (except forward/backward or upward/downward, and so on). We also assume that the mass of the object is constant.

First of all, we note that, given the definition of acceleration, Newton’s second law of motion is really a differential equation, expressible in the following form:

\displaystyle F=m\frac{dv}{dt}

or, since the velocity v is the derivative \frac{dx}{dt} of the position x with respect to the time t, we can also express it as

\displaystyle F=m\frac{d(\frac{dx}{dt})}{dt}

or, in more compact notation,

\displaystyle F=m\frac{d^{2}x}{dt^2}.

We first discuss this form, which is a differential equation for the position. We will go back to the first form later.

The force F itself may have different forms. One particularly simple form is for the force of gravity exerted by the Earth on objects near its surface. In this case we can use the approximate form F=-mg, where g is a constant with a value of around 9.81 \text{m}/\text{s}^{2}. The minus sign is there for purposes of convention, since this force is always in a downward direction, and we take “up” to be the positive direction. We can then take x to be the height of the object above the ground.

We have in this specific case (called “free fall”) the following expression of Newton’s second law:

\displaystyle -mg=m\frac{d^{2}x}{dt^2}

\displaystyle -g=\frac{d^{2}x}{dt^2}

We can then apply our knowledge of calculus so that we can obtain an expression telling us how the height object changes over time. We skip the steps and just give the answer here:


where x_{0} and v_{0} are constants, respectively called the initial position and the initial velocity, which need to be specified before we can give the height of the object above the ground at any time t.

We go back to the first form we wrote down above to express Newton’s second law of motion as the following differential equation for the velocity:

\displaystyle F=m\frac{dv}{dt}

In the case of free fall, this is

\displaystyle -mg=m\frac{dv}{dt}

\displaystyle -g=\frac{dv}{dt}

This can be solved to obtain the following expression for the velocity at any time t:


We collect our results here, and summarize. By solving Newton’s second law of motion for this particular system, using the methods of calculus, we have the two equations for the position and velocity at any time t.



But to obtain the position and velocity at any time t, we also need two constants x_{0} and v_{0}, which we respectively call the initial position and the initial velocity.

In other words, when we know the “specifications” of a system, such as the law of physics that governs it, and the form of the force in our case, and we know the initial position and the initial velocity, then it is possible to know the position and the velocity at any other point in time.

This is a special case of the following question that always appears in physics, whether it is classical mechanics, quantum mechanics, or most other branches of modern physics:

“Given the state of a system at a particular time, in what state will it be at some other time?”

In classical mechanics, the “state” of a system is given by its position and velocity. Equivalently, it may also be given by its position and momentum. In quantum mechanics it is a little different, and there is this concept often referred to as the “wavefunction” which gives the “state” of a system. In elementary contexts the wavefunction may be thought of as giving the probability of a particle to be found at a specific position (more precisely, this probability is the “amplitude squared” of the wavefunction). Since quantum mechanics involves probabilities, a variant of this question is the following:

“Given that a system is in a particular state at some particular time, what is the probability that it will be in some other specified state at some other specified time?”

We now go back to classical mechanics and Newton’s second law, and focus on some historical developments. It is perhaps worth mentioning that before Isaac Newton, Galileo Galilei already had ideas on force and acceleration, evident in his book Two New Sciences. Anyway, Newton’s magnum opus Mathematical Principles of Natural Philosophy was where it was first stated in its most familiar form, and where it was used as one of the ingredients needed to put together Newton’s theory of universal gravitation. It was around this time that the study of mechanics became popular among many thinkers of the era.

Meanwhile, also around this time, another branch of physics was gaining ground in popularity. This was the field of optics, which studied the motion of light just as mechanics studied the motion of more ordinary material objects. Just as Newton’s second law of motion, along with the first and third laws, made up the basics of mechanics, the basic law in optics was given by Fermat’s principle, which is given by the following statement:

Light always moves in such a way that it minimizes its time of travel.

It was the goal of the scientists and mathematicians of that time to somehow “unify” these two seemingly separate branches of physics; this was especially inviting since Fermat’s principle seemed even more elegant than Newton’s second law of motion.

While the physical relationship between light and matter would only be revealed with the advent of quantum mechanics, the scientists and mathematicians of the time were at least able to come up with a language for mechanics analogous to the very elegant statement of Fermat’s principle. This was developed over a long period of time by many historical figures such as Pierre de Maupertuis, Leonhard Euler, and Joseph Louis Lagrange. This was fully accomplished in the 19th century by the mathematician William Rowan Hamilton.

This quest gave us many alternative formulations of Newton’s second law; what it says in terms of physics is exactly the same, but it is written in a more elegant language. Although during the time people had no idea about quantum mechanics or relativity, these formulations would become very useful for expressing these newly discovered laws of physics later on. The first of these is called the Lagrangian formulation, and its statement is the following:

An object always moves in such a way that it minimizes its action.

This “action” is a quantity defined as the integral over time of another quantity called the “Lagrangian” which is usually defined as the difference of the expressions for the kinetic energy and the potential energy, both concepts which are related to the more familiar formulation of Newton (although developed by his often rival Gottfried Wilhelm Liebniz).

Another formulation is called the Hamiltonian formulation, and what it does is give us a way to imagine the time evolution of the “state” of the object (given by its position and momentum) in a “space of states” called the “phase space“. This time evolution is given by a quantity called the Hamiltonian, which is usually defined in terms of the Lagrangian.

The Lagrangian and Hamiltonian formulations of classical mechanics, as we stated earlier, contain no new physics. It is still Newton’s second law of motion. It is still F=ma. It is just stated in a new language. However, relativity and quantum mechanics, which do contain new physics, can also be stated in this language. In quantum mechanics, for example, the state at a later time is given by applying a “time evolution operator” defined using the Hamiltonian to a “state vector” representing the current state. Meanwhile, the probability that a certain state will be found in some other specified state can be found using the path integral, which is defined using the action, or in other words, the Lagrangian.

We have thus reviewed Newton’s second law of motion, one of the oldest laws of physics humankind has discovered since the classical age, and looked at it in the light of newer theories. There will always be new theories, as such is the nature of physics and science as a whole, to evolve, to improve. But there are some ideas in our history that have stood the test of time, and in the cases where they had to be replaced, they have paved the way for their own successors. Such ideas, in my opinion, will always be worth studying no matter how old they become.


Classical Mechanics on Wikipedia

Lagrangian Mechanics on Wikipedia

Hamiltonian Mechanics on Wikipedia

Mechanics by Lev Landau and Evgeny Lifshitz

Classical Mechanics by Herbert Goldstein

An Intuitive Introduction to Calculus

In this post, we dial back a bit on the mathematical sophistication and take on a subject that should be familiar to most people in STEM (Science, Technology, Engineering and Mathematics) fields but largely possessing a scary reputation to those with less mathematical training – calculus. For that matter, there are even those who can wield calculus as a tool for producing numbers, but with little appreciation for the ideas at work behind it.

This post will be dependent largely on intuition as opposed to rigor; references will be provided at the end of the post for those who want a look at the inner workings behind calculus. However, we will not shy away from notation either, in the hopes that people who find the symbols intimidating will instead appreciate them as helpful tools expressing elegant ideas.

I. The Derivative

We will begin with a somewhat whimsical example. Consider a house, and suppose in this house the living room is twice as large (in area for example) as the dining room. Now suppose we hit the entire house with a “shrinking ray”, shrinking it down so that it can no longer be seen with the naked eye nor measured with our usual measuring instruments. So if someone asks for the size of the house now, we might as well say that it’s negligible. What about the size of the living room? Negligible as well. The size of the dining room? Also negligible.

However, as long as this “shrinking ray” works the way it does in the movies, we can ask how big the living room is compared to the dining room, and the answer is the same as it was before. It is twice as big, even though both are now of essentially negligible size.

It is this idea that lies at the heart of calculus; an expression built out of negligible quantities may turn out to not be negligible at all. In our example, the particular expression in question is a ratio of two negligible quantities. This will lead us to the notion of a derivative.

But first, we will consider another example, which at first glance will seem unrelated to our idea of negligible quantities. Suppose that we want to find out about the speed of a certain car as it travels a distance of 40 kilometers from City A to City B. Suppose the car has no speedometer, or that the speedometer is busted.  The only way we can figure out the speed is from its definition as the ratio of the distance traveled to the time of travel. Suppose the journey from City A to City B takes place in an hour. This would mean that the car is travelling at a rather leisurely (or slow) speed of 40 kilometers per hour.

But suppose we provide additional information about the trip. Let’s say that the first half of the distance between City A and City B took up 45 minutes, due to heavy traffic. Therefore the second half of the distance was traversed in only fifteen minutes. Then we can say that the car was actually travelling pretty slowly at around 27 kilometers per hour for the first half of the distance. For the second half of the distance, the car was actually travelling pretty fast at 80 kilometers per hour.

Let’s provide even more information about the trip. Let’s say that the last quarter of the distance took up ten minutes of the trip. So the car was traveling slower again, at 60 kilometers per hour. This means the third quarter of the distance, a distance of 10 kilometers, was traversed in only 5 minutes. In other words, the car was travelling quite fast at 120 kilometers per hour.

Although the car is travelling at a rather slow speed on the average, there is a part of the trip where the car travels pretty fast. We only learn about this if we look at parts of the trip instead of just averaging on the whole. And we learn more if we look at smaller and smaller parts – perhaps we can learn most if we look at parts so small, that they might as well be negligible. And so we make contact, once again, with the core idea that we are trying to discuss.

As may be familiar from high school physics, the average speed (we symbolize it here by v) is often written as the ratio between differences in the distances (symbolized here by x) and the times (symbolized here by t). In mathematics, a quantity which is a difference of two other quantities is often written using the symbol \Delta. Therefore, the formula for the velocity may be written as follows:

\displaystyle v=\frac{\Delta x}{\Delta t}

However, as we have demonstrated in our example above, we learn more by dividing the quantities into smaller and smaller parts.

When our quantities, in this particular example the differences of distances and times, are so small that they might as well be negligible, instead of using the symbols \Delta x and \Delta t we instead write dx and dt. Therefore we write

\displaystyle v=\frac{dx}{dt}

We review some of the concepts we have discussed. What is the value of dx? It’s essentially negligible; it’s not quite zero, but very close to it, that we can’t really say anything about it anymore. What about dt? Again, essentially negligible. But what about v? Despite it being a ratio of two “essentially negligible” quantities, v itself is not negligible!

As demonstrated earlier in our example, v will be different depending on which part of the journey we are considering. To make this more explicit, we can write

\displaystyle v=\frac{dx}{dt}|_{t=t_{0}}


\displaystyle v=\frac{dx}{dt} (t_{0})

to mean that we mean v at the specific instant of time t_{0}. Because we are looking at specific instants of time, we refer to this speed as the instantaneous speed. If we had a speedometer, we could simply take a reading from it at the specific instant of time t_{0}, and the reading would be the instantaneous speed. However, we assumed that we could not do this, therefore to figure out the instantaneous speed we need to take extremely small measurements of distance and extremely small measurements of time, somewhere around the  specific instant of time t_{0}, and take their ratio.

The problem, of course, is that the quantities are so small that they are essentially negligible. We may not be able to measure them, since they may be so much smaller than the limits of accuracy of our measuring instruments. So how could we get such a ratio, which is not negligible, from two essentially negligible quantites that we cannot even measure? Luckily, if we can express one quantity as a function of the other, we can have a way of calculating this ratio. We discuss this method next.

In mathematics, we often use the concept of functions to express how one quantity depends on another. In this case, given a particular form of a function f(x), we can obtain \frac{df}{dx} as another function of x. We need our function f(x) to be “continuous” so that we always know that df is extremely small whenever dx is extremely small.

Consider the quantity

\displaystyle \frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)}

This is a ratio of differences; the numerator is a difference, and so is the denominator; therefore we can also write this as

\displaystyle \frac{\Delta f}{\Delta x}

Now suppose the quantity \epsilon is extremely small. In this case, the denominator may be rewritten; instead of writing \Delta x, we write dx, since the difference between x+\epsilon and x is extremely small (in fact it is just \epsilon). As for the numerator, if the function is continuous as described above, then we know automatically that it is also extremely small, and we may therefore also write df instead of \Delta f. Therefore, we have

\displaystyle \frac{df}{dx}=\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)} when \epsilon is extremely small (essentially negligible)

Let’s see an explicit example of this in action. Suppose we have f(x)=x^{2}. Then we have

\displaystyle \frac{\Delta f}{\Delta x}=\frac{(x+\epsilon)^{2}-x^{2}}{(x+\epsilon)-(x)}

Using some high school algebra, we can expand and simplify the right hand side:

\displaystyle \frac{\Delta f}{\Delta x}=\frac{x^{2}+2x\epsilon+\epsilon^{2}-x^{2}}{x+\epsilon-x}

\displaystyle \frac{\Delta f}{\Delta x}=\frac{2x\epsilon+\epsilon^{2}}{\epsilon}

\displaystyle \frac{\Delta f}{\Delta x}=\frac{\epsilon(2x+\epsilon)}{\epsilon}

\displaystyle \frac{\Delta f}{\Delta x}=\frac{(2x+\epsilon)}{1}

\displaystyle \frac{\Delta f}{\Delta x}=2x+\epsilon

Now to obtain \frac{df}{dx}, we just need \epsilon to be extremely small, essentially negligible; in any case, it should be much smaller than any other term in the expression that we might as well chalk it up to measurement error, like the difference between the weight of a sack of flour and the weight of a sack of flour with one extra grain of flour. In other words, 2x+\epsilon and 2x are essentially the same; and \epsilon is so small that we might as well set it to zero. Therefore we have

\displaystyle \frac{df}{dx}=2x

Note that df by itself is essentially negligible; in this case it is equal to 2x\epsilon+\epsilon^{2}, and since \epsilon is extremely small, the entire expression itself is also extremely small and essentially negligible. Of course, dx is just \epsilon, and is also extremely small and essentially negligible. But in accordance with our “important idea” stated earlier, the ratio \frac{df}{dx}, called the derivative of f with respect to x, is not neglible. The derivative of  f with respect to x is also often written f'(x).

So going back to our example of the car going from City A to City B, if we could have an expression that gives us the distance traveled by the car in terms of its time of travel, we could calculate the instantaneous speed at any time by taking the derivative of that expression.

If a certain quantity depends on several other quantities, for example, if it is a function f(x,y,z) of several independent variables x, y, and z, the derivative of f with respect to any one of the independent variables, suppose x, is called the partial derivative of f with respect to x, and written \frac{\partial f}{\partial x}, or sometimes \partial_{x} f.

II. The Integral

We next discuss another expression, aside from a ratio, that is made out of essentially negligible quantities but is not itself negligible. Consider the weight of a grain of flour; like stated in an earlier example, we often think of it as essentially negligible. The weight of a sack of flour, on the other hand, is certainly not often thought of as negligible. But the sack of flour itself is made up of grains of flour; therefore these “essentially” negligible things, when there are many enough of them, may combine into something that is not itself negligible.

We consider a summation of a certain number of terms, and we also consider an interval of real numbers from the real number a to the real number b. If we will sum two terms then we will divide this interval into two, if we will sum three terms we will divide this interval into three, and so on. Consider now the summation of five terms


where f(x) is a function of real numbers, defined on the interval from a to b and x_{1}, x_{2}, x_{3}, andx_{4} are quantities between a and b dividing the interval from a to b into five equal parts. If we have, for example a=0 and b=100, then x_{1}=20x_{2}=40x_{3}=60, and x_{4}=80.

For further simplification we can also write x_{5}=b and x_{0}=a. We can then write the same sum as


This can be expressed in more compact notation as

\displaystyle \sum_{i=1}^{5} f(x_{i})(x_{i}-x_{i-1})

or, to keep it general, we divide the interval between a and b into n subdivisions where n is any positive integer instead of just five, so that we have

\displaystyle \sum_{i=1}^{n} f(x_{i})(x_{i}-x_{i-1})

Note that as the number n of subdivisions of the interval between a and b increases, the quantity (x_{i}-x_{i-1}) decreases. We consider another sum, namely

\displaystyle \sum_{i=1}^{n} f(x_{i-1})(x_{i}-x_{i-1})

This is different from the other sum above. However, we note that as we increase the number of subdivisions, the quantity (x_{i}-x_{i-1}) decreases, and the quantities x_{i} and  x_{i-1} become essentially equal to each other. If our function f(x) is “continuous”, then f(x_{i}) and f(x_{i-1}) become essentially equal to each other too. When we reach so many subdivisions that the quantity (x_{i}-x_{i-1}) becomes extremely small or essentially negligible, and the quantities f(x_{i}) and f(x_{i-1}) become essentially equal to each other, we write

\displaystyle \int_{a}^{b}f(x)dx=\sum_{i=1}^{n} f(x_{i})(x_{i}-x_{i-1})=\sum_{i=1}^{n} f(x_{i-1})(x_{i}-x_{i-1})

The summation \int_{a}^{b}f(x)dx is called the integral of f(x) from a to b.

The integral is a sum of terms of the form f(x)dx, which is the product of f(x) multiplied by dx. In fact, the integral symbol itself \int, is simply an old version of the letter “s”, to show that it stands for sum, in the same way that the letter “d” in dx represents a very small difference.

If we think of dx as a very small “width” and f(x) as some sort of “height”, then f(x)dx is some kind of very small “area” of a very “thin” rectangle, and the integral \int_{a}^{b}f(x)dx gives the total area under the curve f(x) from a to b, taken by dividing this area into very “thin” rectangles and adding up the areas of each of these rectangles.

But there are also other quantities of the form f(x)dx. For example, if we think of f(x)dx as the probability that a certain quantity, say, the height of a random person one may meet on the street, has a value very close to x, then the integral \int_{a}^{b}f(x)dx gives us the total probability that this quantity has a value between a and b.

III. The Fundamental Theorem of Calculus

Unlike the derivative, the integral is actually rather difficult to compute. It is a sum of very many terms, each of which is a product of one quantity with an essentially negligible quantity. However, there is a shortcut to computing the integral, which involves the derivative, and the discovery of this “shortcut” is one of the greatest achievements in the history of mathematics.

This “shortcut” which relates the integral and the derivative is so important and so monumental that it is called the fundamental theorem of calculus, and its statement is as follows:

\displaystyle \int_{a}^{b}\frac{df}{dx}dx=f(b)-f(a)

The notation itself is already very suggestive as to the intuition between this theorem. It is also perhaps worth noting that the integral is a sum of products, while the derivative is a ratio of differences. The function f(x) is defined in the interval from a to b, so if we sum all the tiny differences df from f(a) to f(b) we would get f(b)-f(a). Of course the rigorous proof of this theorem is much more involved than this “plausibility argument” that we have presented here.

In practice what this means if we want to solve for the integral of a function we only need to “reverse” what we did to solve for the derivative. This is why the integral (more precisely the “indefinite” integral) is also sometimes called the antiderivative. Earlier we solved for the derivative of the function f(x)=x^{2} with respect to x and obtained \frac{df}{dx}=2x. If we now ask what is the integral of 2x from a to b, we will find, using the fundamental theorem of calculus, that \int_{a}^{b}2xdx=b^{2}-a^{2}.

We now summarize, in an effort to show that there’s nothing to be scared of when it comes to derivatives and integrals:

\displaystyle \frac{df}{dx} is a ratio of extremely small differences \displaystyle df and \displaystyle dx

\displaystyle \int_{a}^{b}f(x)dx is a sum of the products of \displaystyle f(x) and the extremely small differences \displaystyle dx

That being said, despite the rather simple intuition we have laid out here, making the language of calculus “precise”, or rather “rigorous”, is a much bigger task that has taken centuries of development, and even after mathematicians have agreed on how to develop calculus it has been “resurrected” time and time again to come up with even more powerful versions of calculus. For example, there is a version of the integral called the Lebesgue integral, in which instead of dx we use a quantity called the “measure” which may not necessarily be extremely small. Still, this integral is still a sum of products, and the concept of extremely small and essentially negligible quantities still shows up, since there will be parts where the measure will become extremely small and essentially negligible.

As for dealing with the extremely small and essentially negligible, this is made more precise using the language of limits. In older times, another concept was used, called infinitesimals, however there were so many questions from philosophers that the concept was basically abandoned in favor of the language of limits. The concept of infinitesimals itself is the subject of much research in modern times, since it was found that along with modern developments in mathematics they can still become useful and now made more precise. The modern study of calculus and its related subjects goes by the much simpler-sounding name of analysis.

Finally, even though we attempted an answer to the question “What is calculus?” we did not really explain how to do calculus or how to apply it, although there are tons and tons of applications of calculus in the modern world.For all these we will direct the reader to the references at the end of the post, which will help those who want to learn more about the subject.

In my opinion however, the best way to learn calculus is to master first the language of limits, or more generally learn how to deal with extremely small and essentially negligible quantities, without necessarily going to derivatives and integrals, since derivatives and integrals themselves are merely specific applications of the philosophy, once more recalled here, that an expression built out of negligible quantities may turn out to not be negligible at all. And it is best to learn how to deal with this in the most general case. In some schools this makes up the subject of precalculus. I would recommend very much the classic book of one of the greatest mathematicians of all time, Leonhard Euler, entitled Introduction to Analysis of the Infinite, of which translations abound on the internet.


Calculus on Wikipedia

Introduction to Analysis of the Infinite by Leonhard Euler (translated by Ian Bruce)

Calculus by James Stewart

Principles of Mathematical Analysis by Walter Rudin

Projective Geometry

Infinity is quite the tricky concept in mathematics. While there’s a basic concept of what “infinity” means in the foundations of mathematics, even non-mathematicians will probably recognize that it’s a very challenging concept to grapple with. Still, some notion related to infinity might be convenient to have in many branches of mathematics, so these different branches will usually come up with some “variant” of the concept of infinity that will be easier to deal with yet still provide the convenience we desire from such a concept.

In this post we discuss one way in which we incorporate the concept of infinity in algebraic geometry. We often hear, for example, of how parallel lines “meet at infinity”; it is the goal of this post to somehow give the idea of what that statement means. In addition, this concept can be convenient for many purposes, for instance in the study of elliptic curves.

Consider lines in the xy plane, with the additional condition that they must pass through the origin. The equations of these lines are always of the form


since these lines must have zero x-intercept or y-intercept in order for them to pass through the origin. A horizontal line corresponds to the equation


i.e. m=0, while a diagonal line has


i.e. m=1. Meanwhile, a vertical line corresponds to the equation


which one may express, alternatively, as “m=\infty“.

Let us now consider another line, a vertical line at x=1, and look at where our lines which pass through the origin meet this vertical line. The horizontal line y=0 meets the vertical line x=1 at the point (1,0). The diagonal line y=x meets the vertical line x=1 at the point (1,1). The vertical line 0=x does not meet the vertical line x=1 at all. More generally, every line that passes through the origin, except the vertical line 0=x, meets the vertical line x=1 at a single point. Conversely, at every point on the vertical line x=1 there is one line that passes through the origin and also passes through that point on the vertical line x=1.

So the set of lines on the plane that pass through the origin and the set of points on the line are almost in one-to-one correspondence with one another, except that there is one line, the vertical line 0=x, which does not correspond to any point on the line.

This gives us a method of obtaining a version of the (real) line with a “point at infinity”. We declare the “points” of this “line” to be the lines in the plane that pass through the origin, and the vertical line that passes through the origin is declared to be the “point at infinity”. This “line” is called the (real) projective line. Note that it is not really a line with a set of points, but a set of lines in the plane that pass through the origin.

The procedure we performed earlier, setting up a vertical line at x=1 and looking at the points where it is met by the lines passing through the origin, is analogous to looking at the one-dimensional “shadows” or “projections” of these lines, and gives us the reasoning behind the term “projective line”. The “point at infinity” is that one line that has no shadow.

We can generalize this idea to form the (real) projective plane. The “points” of this “plane” are lines in three-dimensional space that pass through the origin. The “lines” of this “plane” are planes that pass that pass through the origin. Note that two parallel lines are merely “shadows” or “projections” of two planes that pass through the origin, and their intersection is a line through the origin, i.e. a point in the projective plane, which happens to be a “point at infinity”. If we set up another plane such that the two parallel lines show up as the “shadows” of planes, then the line where the two planes intersect will have no “shadow”. This is what we mean when we say that two parallel lines intersect at infinity.

We can actually generalize this to any dimension n+1; the set of lines through the origin in this n+1-dimensional space form what we call (real) projective space, written \mathbb{RP}^{n}.

We can also generalize this construction to the complex numbers; we simply take the set of all lines w=mz where z, w, and m are all complex numbers. Since the complex numbers form a plane, adding a “point at infinity” makes the complex projective line into a sphere. One can perhaps imagine this as “rolling up” the plane into a sphere, and the “point at infinity” “closes” the sphere. A more formal method of visualizing this relation is provided by the technique called “stereographic projection“, which is used by mapmakers to relate a globe to a flat map.

The complex projective line, written \mathbb{CP}^{1} has a special name; it is called the Riemann sphere. We also have higher-dimensional generalizations, called complex projective space, and written \mathbb{CP}^{n}.

There are other ways of defining projective spaces. For instance, the real projective line \mathbb{RP}^{1} can also be defined as the set of points of a circle centered at the origin with “antipodal” points “identified” via an equivalence relation (see Modular Arithmetic and Quotient Sets). Since a line in the plane through the origin crosses a circle at two points which are “antipodal” (“opposite”, in some sense), this is equivalent to our earlier definition in terms of lines through the origin. For higher-dimensional generalizations, we simply replace the circle by higher-dimensional spheres centered at the origin.

We mention one more method of defining projective spaces which is useful in algebraic geometry (see Basics of Algebraic Geometry). We define the n-dimensional projective space as the set of equivalence classes of points specified by n+1 (real or complex) coordinates under the equivalence relation

(a_{0}, a_{1}, ..., a_{n})\sim (\lambda a_{0}, \lambda a_{1}, ..., \lambda a_{n})

where \lambda is any nonzero constant. The idea is that the points (a_{0}, a_{1}, ..., a_{n}) and (\lambda a_{0}, \lambda a_{1}, ..., \lambda a_{n}) for all nonzero constants \lambda all lie on a single line which happens to pass through the origin.

In order to do algebraic geometry on projective space, the polynomials being studied need to be homogeneous, which means that all the terms of the polynomial must have the same degree. For instance, we cannot have


in projective space; instead we can only use something like


which is homogeneous. This is necessary to protect the defining property

(a_{0}, a_{1}, ..., a_{n})\sim (\lambda a_{0}, \lambda a_{1}, ..., \lambda a_{n})

of our projective space. However, there is a procedure for obtaining the ordinary (called “affine“) space from a projective space, similar to the “projection” construction described above, and in this case the homogeneous polynomial ty=x^2 in projective space becomes y=x^2 in affine space.


Projective Line on Wikipedia

Projective Plane on Wikipedia

Projective Space on Wikipedia

Algebraic Geometry by Robin Hartshorne

Basics of Algebraic Geometry

Update: A more technical discussion of varieties and schemes can be found in Varieties and Schemes Revisited.

Consider polynomial functions (of positive degree) on the complex plane. By the fundamental theorem of algebra, any polynomial function can be factored, up to a constant, into linear factors over the complex numbers:


The a_{i} are of course referred to as the roots of the polynomial. They are also called the zeroes of the polynomial, because we always have f(a_{i})=0 for any of the a_{i}. These linear factors are in one-to-one correspondence with the points of the complex plane \mathbb{C}; the factor {z-a_{i}} of course corresponds to the point  a_{i} in \mathbb{C}.

Making an analogy with the factorization of the integers, the linear factors can be thought of as the “primes” of polynomial functions with complex coefficients (the constant factors play the role of the “units” –  see The Fundamental Theorem of Arithmetic and Unique Factorization).

Therefore, one can see some kind of relation between the “primes” of a ring (see Rings, Fields, and Ideals) of polynomial functions on a space and the “points” of this space.

A linear factor z-a_{i} “generates” an ideal in the ring of polynomial functions with coefficients in \mathbb{C}, namely the set of “multiples” of z-a_{i}, written (z-a_{i}) alternatively, one may also describe this ideal as the set of polynomial functions that “vanish” at the point a_{i} in the complex plane \mathbb{C}. This ideal (z-a_{i}) is actually a maximal ideal (see More on Ideals) in this particular ring.

By a famous theorem called Hilbert’s Nullstellensatz (it is named after the German mathematician David Hilbert, while “nullstellensatz” is German for “theorem of the zeroes”), the maximal ideals generated by the linear factors in the ring of polynomial functions with complex coefficients are in one-to-one correspondence with the points of the complex plane \mathbb{C}.

We leave the complex plane for a while and recall some concepts from high school mathematics. We learned that there are certain “shapes” that can be described by polynomial equations. For example, in the xy plane, the equation y=x describes a line, y=x^2 describes a parabola, and the equation x^2+y^2=1 describes a circle. What this means is that the points comprising these “shapes” have (real number) coordinates x and y which satisfy their respective polynomial equations. We rewrite these equations as

y-x=0 (line)

y-x^2=0 (parabola)

x^2+y^2-1=0 (circle)

We can then say that the “line”, the “parabola”, and the “circle” are the sets of points whose coordinates x and y make their respective equations




equal to zero. Hence we also refer to these sets of points as the zero sets of their respective equations.

We can also have polynomial functions “on” these “shapes”, or rather, zero sets. These are just like ordinary polynomial functions, but we recall that these functions must also obey the equations that describe these sets. For instance, we can have the polynomial function y^2-5x^3+1 on the line which is the zero set of the equation y=x, but since all points on this line must have coordinates that satisfy the equation y=x, we can also write the polynomial function as x^2-5x^3+1 or y^2-5y^3+1. Note that a polynomial function such as 5x^3-5y^3 on this line is also the same as the polynomial function 0.

Using what we know from Modular Arithmetic and Quotient Sets, we can also rewrite the set of polynomial functions (with real coefficients) on the line described by the equation y=x as the quotient set (or quotient ring)


We now go back to the complex plane. The complex plane itself can also be considered as some kind of “shape”, and one that is described by an equation, namely one that is trivial, such as 1=1 or z=z. We can describe more complicated “shapes” using polynomial equations involving more than one complex variable. On these “shapes” there are also polynomial functions (with complex coefficients), and as in the case above, we can express the set of these polynomial functions as a quotient ring. For instance, if our “shape” is described by a single polynomial f(w,z) in two complex variables w and z, the polynomial functions in two complex variables on this “shape” form the quotient ring


Even in this more general case, Hilbert’s Nullstellensatz still provides us with a one-to-one correspondence between the points of this “shape” (the elements of the zero set of f(w,z)) and the maximal ideals of the quotient ring \mathbb{C}[w,z]/(f(w,z)).

A “shape” which is the zero set of a polynomial (or polynomials, for higher dimensions) is also referred to technically as an algebraic variety, or simply a variety. We can put a topology on a variety, called the Zariski topology (see Basics of Topology and Continuous Functions and Presheaves) by declaring the zero sets of polynomial functions on the variety to be the closed sets. For example, when the variety is the complex plane, the closed sets are the finite sets of points, which are the zero sets of polynomial functions on the complex plane.

Because of the one-to-one correspondence between the maximal ideals of a ring of polynomial functions on the variety and the points of the variety, we can take the point of view that the points of the variety are given by the maximal ideals of the variety. In other words, we need not draw or visualize anything anymore; the only thing we need to do is study the ideals of the ring we are given. This also means that the only thing we need to study the variety is the ring. One can also perhaps say that we are doing geometry, i.e. studying shapes, not by looking at the shapes themselves, but by looking at the functions on the shapes.

The idea of a variety can be further generalized by thinking of a “shape” whose points are given not only by the maximal ideals of a ring, but by its prime ideals. This leads into the concept of a scheme, which has “generic points” in addition to its “ordinary points” (those which are given by the maximal ideals).

The study of varieties and schemes, as well as the polynomial functions on them, are part of the branch of mathematics called algebraic geometry. The name comes from the use of concepts from abstract algebra, such as rings, fields, and ideals, to study geometry, but it should also be reminiscent of the algebra that is more familiar from high school mathematics, for example in the motivating example of polynomials and coordinates. One may also recognize algebraic geometry as a high-powered version of Cartesian geometry, once again from high school mathematics, also known as analytic geometry; however, in higher-level mathematics the word “analytic” has another meaning (related to calculus, in particular complex calculus), therefore we are further justified in using the name “algebraic geometry”.


Algebraic Geometry on Wikipedia

Algebraic Variety on Wikipedia

Scheme on Wikipedia

Algebra by Michael Artin

Algebraic Geometry by Robin Hartshorne

Even More Category Theory: The Elementary Topos

In More Category Theory: The Grothendieck Topos, we defined the Grothendieck topos as something like a generalization of the concept of sheaves on a topological space. In this post we generalize it even further into a concept so far-reaching it can even be used as a foundation for mathematics.

I. Definition of the Elementary Topos

We start by discussing the generalization of the universal constructions we defined in More Category Theory: The Grothendieck Topos, called limits and colimits.

Given categories \mathbf{J} and \mathbf{C}, we refer to a functor F: \mathbf{J}\rightarrow \mathbf{C} as a diagram in \mathbf{C} of type \mathbf{J}, and we refer to \mathbf{J} as an indexing category. We write the functor category of all diagrams in \mathbf{C} of type \mathbf{J} as \mathbf{C^{J}}.

Given a diagram F: \mathbf{J}\rightarrow \mathbf{C}, a cone to F is an object N of \mathbf{C} together with morphisms \psi_{X}: N\rightarrow F(X) indexed by the objects X of \mathbf{J} such that for every morphism f: X\rightarrow Y in  \mathbf{J}, we have F(f)\circ \psi_{X}=\psi_{Y}.

A limit of a diagram F: \mathbf{J}\rightarrow \mathbf{C} is a cone (L, \varphi) to F such that for any other cone (N, \psi)  to F there exists a unique morphism u: N\rightarrow L such that \varphi_{X}\circ \psi_{X} for all X in J.

For example, when \mathbf{J} is a category with only two objects A and B and whose only morphisms are the identity morphisms on each of these objects, the limit of the diagram F: \mathbf{J}\rightarrow \mathbf{C} is just the product. Similarly, the pullback is the limit of the diagram F: \mathbf{J}\rightarrow \mathbf{C} when \mathbf{J} is the category with three objects A, B, and C, and the only morphisms aside from the identity morphisms are one morphism A\xrightarrow{f}C and another morphism B\xrightarrow{g}C. The terminal object is the limit of the diagram F: \mathbf{J}\rightarrow \mathbf{C} when \mathbf{J} is the empty category, and the equalizer is the limit of the diagram F: \mathbf{J}\rightarrow \mathbf{C} when \mathbf{J} is the category with two objects A and B and whose only morphisms aside from the identity morphisms are two morphisms A\xrightarrow{f}B and A\xrightarrow{g}B.

A colimit is the dual concept to a limit, obtained by reversing the directions of all the morphisms in the definition. In the same way that the limit generalizes the concepts of product, pullback, terminal object, and equalizer, the colimit generalizes the concepts of coproduct, pushout, initial object, and coequalizer.

Next we discuss the concept of adjoint functors. Consider two categories \mathbf{C} and \mathbf{D}, and two functors F: \mathbf{C}\rightarrow \mathbf{D} and G: \mathbf{D}\rightarrow \mathbf{C}. We say that F is right adjoint to G, and that G is left adjoint to F, if for all objects C in \mathbf{C} and D in \mathbf{D} there exist bijections

\theta: \text{Hom}_{\mathbf{C}}(C, G(D))\xrightarrow{\sim}\text{Hom}_{\mathbf{D}}(F(C), D)

which are natural in the sense that given morphisms \alpha: C\rightarrow C' in \mathbf{C} and \xi: D'\rightarrow D in \mathbf{D}, we have

\theta(G(\alpha)\circ f\circ \xi)=\alpha\circ \theta(f)\circ F(\xi).

Suppose that products exist in \mathbf{C}. For a fixed object A of \mathbf{C}, consider the functor

A\times - : \mathbf{C}\rightarrow \mathbf{C}

which sends an object C of \mathbf{C} to the product A\times C in \mathbf{C}. If this functor has a right adjoint, we denote it by

(-)^{A}: \mathbf{C}\rightarrow \mathbf{C}.

We refer to the object A as an exponentiable object. We refer to the object B^{A} for some B in \mathbf{C} as an exponential object in \mathbf{C}. A category is called Cartesian closed if it has a terminal object and binary products, and if every object is an exponentiable object.

In the category \mathbf{Sets}, the exponential object B^{A} corresponds to the set of all functions from A to B. This also explains our notation for functor categories such as \mathbf{Sets^{C^{op}}} and \mathbf{C^{J}}.

Finally, we discuss the concept of subobject classifiers. We start by defining two important kinds of morphisms, monomorphisms and epimorphisms. A monomorphism (also called a mono, or monic) is a morphism f: X\rightarrow Y such that for all morphisms g_{1}: Y\rightarrow Z and g_{2}: Y\rightarrow Z, whenever the compositions f\circ g_{1} and f\circ g_{2} are equal, then it is guaranteed that g_{1} and g_{2} are also equal. An epimorphism (also called an epi, or epic)  is the dual of this concept, obtained by reversing the directions of all the morphisms in the definition of a monomorphism.

Two monomorphisms f: A\rightarrow D and g: B\rightarrow D are called equivalent if there is an isomorphism h: A\rightarrow B such that g\circ h=f. A subobject of D is then defined as an equivalence class of monomorphisms with domain D.

A subobject classifier is an object \Omega and a monomorphism \text{true}: 1\rightarrow \Omega such that to every monic j: U\rightarrow X there is a unique arrow \chi_{j}: X\rightarrow \Omega such that if u: U\rightarrow 1 is the unique morphism from U to the terminal object 1, then we have

\chi_{j}\circ j=\text{true}\circ u.

The significance of the subject classifier can perhaps best be understood by considering the category \mathbf{Sets}. The characteristic function \chi_{j} of the subset U of X is defined as the function on X that gives the value 1 if x\in U and gives the value 0 if x\notin U. Then we can set the terminal object 1 to be the set \{0\} and the object \Omega as the set \{0,1\}. The morphism \text{true} then sends 0\in \{0\} to 0\in \{0,1\}. The idea is that subobjects, i.e. subsets of sets in \mathbf{Sets}, can be obtained as pullbacks of \text{true} along the characteristic function \chi_{j}.

For the category \text{Sh }(X) of sheaves on a topological space X, the subobject classifier is the sheaf on X where for each open subset U of X the set \mathcal{F} (U) is given by the set of open subsets of U. The morphism \text{true} then “selects” the “maximal” open subset U of U.

Now we define our generalization of the Grothendieck topos. An elementary topos is a category \mathcal{E} satisfying the following conditions.

(i) \mathcal{E} has all finite limits and colimits.

(ii) \mathcal{E} is Cartesian closed.

(iii) \mathcal{E} has a subobject classifier.

A Grothendieck topos satisfies all these conditions and is an example of an elementary topos. However, the elementary topos is a much more general idea, and whereas the Grothendieck topos can be considered as a “generalized space”, the elementary topos can be considered as a “generalized universe of sets”. The term “universe”, as used in mathematics, refers to the entirety of where our discourse takes place, such that any concept or construction that we will ever need to consider or discuss can be found in this universe.

Perhaps the most basic example of an elementary topos is the category \mathbf{Sets}. It is actually also a Grothendieck topos, with its underlying category the category with one object and one morphism, which is the identity morphism on its one object. An example of an elementary topos that is not a Grothendieck topos is the category \mathbf{FinSets} of finite sets. It is worth noting, however, that despite the elementary topos being more general, the Grothendieck topos still continues to occupy somewhat of a special place in topos theory, including its applications to logic and other branches of mathematics beyond its origins in algebraic geometry.

II. Logic and the Elementary Topos

Mathematics is formalized, as a language, using what is known as first-order logic (also known as predicate logic or predicate calculus). This involves constants and variables of different “sorts” or “types”, such as x or y, strung together by relations, usually written Q(x, y), expressing a statement such as x=y. We also have functions, usually written g(x, y) expressing something such as x+y. The variables and functions are terms, and when these terms and strung together by relations, they form formulas. These formulas in turn are strung together by binary connectives such as “and”, “or”, “not”, “implies” and quantifiers such as “for all” and “there exists” to form more complicated formulas.

We can associate with an elementary topos a “language”. The “types” of this language are given by the objects of the topos. “Functions” are given by morphisms of objects. “Relations” are given by the subobjects of the object. In addition to these we need a notion of quantifiers, “for all” (written \forall) and “there exists” (written \exists). These quantifiers are given, for the functors \text{Sub }(Y)\rightarrow \text{Sub }(X), by left and right adjoints \exists_{f}, \forall_{f}: \text{Sub }(X)\rightarrow \text{Sub }(Y). For the binary connectives such as “and”, or”, “not”, and “implies”, we rely on the Heyting algebra structure on the subobjects of an elementary topos.

The existence of a Heyting algebra structure means that there exist operations, called join (written \vee) and meet (written \wedge), generalizing unions and intersections of sets, supremum and infimum of elements, or binary connectives “and” and “or”, a least element (written 0), a greatest element (written 1), and an implication operation such that

z\leq(x\Rightarrow y) if and only if z\wedge x\leq y.

We also have the negation of an element x

\neg x=(x\Rightarrow 0).

This Heyting algebra structure for subobjects \text{Sub }(A) of an object A of an elementary topos is provided by taking pullbacks (for the meet) and coproducts (for the join), with 0\rightarrow A as the least element, A\rightarrow A as the greatest element, and the implication given by the exponential.

We have shown one way in which topos theory is related to logic. Now we show how topos theory is related to the most commonly accepted foundations of mathematics, set theory. More technically, these foundations come from a handful of axioms called the ZFC axioms. The letters Z and F come from the names of the mathematicians who developed it, Ernst Zermelo and Abraham Fraenkel, while the letter C comes from another axiom called the axiom of choice.

The elementary topos, with some additional conditions, can be used to construct a version of the ZFC axioms. The first condition is that whenever there are two morphisms f: A\rightarrow B and g: A\rightarrow B, and a morphism x: 1\rightarrow X from the terminal object 1 to A, we only have f\circ x=g\circ x if f=g. In this case we say that the topos is well-pointed. The second condition is that we have a natural numbers object, which is an object \mathbf{N} and morphisms 0:1\rightarrow \mathbf{N} ands:\mathbf{N}\rightarrow \mathbf{N}, such that for any other object X and morphisms x:1\rightarrow X and f:X\rightarrow X, we have a unique morphism h: \mathbf{N}\rightarrow X such that h\circ 0=x and h\circ s=f . The third condition is the axiom of choice; this is equivalent to the statement that for every epimorphism p:X\rightarrow I there exists s:I\rightarrow X such that s\circ p=1.

One of the issues that hounded set theory in the early days after the ZFC axioms were formulated where whether the axiom of choice could be derived from the other axioms (these axioms were simply called the ZF axioms) or whether it needed to be put in separately. Another issue concerned what was known as the continuum hypothesis, a statement concerning the cardinality of the natural numbers and the real numbers, and whether this statement could be proved or disproved from the ZFC axioms alone. The mathematician Paul Cohen showed that both the axiom of choice and the continuum hypothesis are independent of ZF and ZFC respectively. A topos-theoretic version of Cohen’s proof of the independence of the continuum hypothesis was then later developed by the mathematicians William Lawvere and Myles Tierney (both of whom also developed much of the original theory of elementary toposes).

We now discuss certain aspects of topos theory related to Cohen’s proof. First we introduce a construction in an elementary topos that generalizes the Grothendieck topology discussed in More Category Theory: The Grothendieck Topos. A Lawvere-Tierney topology on \mathcal{E} is a map: j: \Omega\rightarrow \Omega such that

(a) j\circ \text{true}=\text{true}

(b) j\circ j=j

(c) j\circ \wedge=\wedge \circ (j\times j)

The Lawvere-Tierney topology allows us to construct sheaves on the topos, and together with the Heyting algebra structure on the subobject classifier \Omega, allows us to construct double negation sheaves, which themselves form toposes that have the special property that they are Boolean, i.e. the Heyting algebra structure of its subobject classifier satisfies the additional property \neg \neg x=x. This is important because a well-pointed topos, which is necessary to formulate a topos-theoretic version of ZFC, is necessarily Boolean. Another condition for the topos to be well-pointed is for it to be two-valued, which means that there are only two morphisms from the terminal object 1 to \Omega. We can obtain such a two-valued topos from any other topos using the concept of a filter, which essentially allows us to take “quotients” of the Heyting algebra structure on \Omega.

There is yet another condition for an elementary topos to be well-pointed, namely that its “supports split” in the topos. This condition is automatically satisfied whenever the topos satisfies the axiom of choice.

It turns out that the topos of double negation sheaves over a partially ordered set is Boolean (as discussed earlier) and satisfies the axiom of choice. For proving the independence of the continuum hypothesis, a partially ordered set was constructed by Cohen, representing  “finite states of knowledge”, and we can use this to form a topos of double negation sheaves known as the Cohen topos. Using the concept of a filter we then obtain a two-valued topos and therefore satisfy all the requirements for a topos-theoretic version of ZFC. However, the continuum hypothesis does not hold in the Cohen topos, thus proving its independence of ZFC.

A similar strategy involving double negation sheaves was used by the mathematician Peter Freyd to develop a topos-theoretic version of Cohen’s proof of the independence of the axiom of choice from the other axioms ZF, using a different underlying category (since a partially ordered set would automatically satisfy the axiom of choice). In both cases the theory of elementary toposes would provide a more “natural” language for Cohen’s original proofs.

III. Geometric Morphisms

We now discuss morphisms between toposes. The elementary topos was inspired by the Grothendieck topos, which was in turn inspired by sheaves on a topological space, so we turn to the classical theory once more and look at morphisms between sheaves. Given a continuous function f: X\rightarrow Y, and a sheaf \mathcal{F} on X, we can define a sheaf, called the direct image sheaf, f_{*}\mathcal{F} on Y by setting f_{*}\mathcal{F}(V)=\mathcal{F}(f^{-1}(V)) for every open subset V\subseteq Y. Similarly, given a sheaf \mathcal{G} on Ywe also have the inverse image sheaf, however it cannot similarly be defined as f^{*}\mathcal{G}(U)=\mathcal{G}(f(U)) for an open subset U\subseteq X, since the image of U in Y may not be an open subset of Y.

This can be remedied by the process of “sheafification”; we think instead in terms of the “stalks” of the sheaf \mathcal{G}, i.e. sets that are in some way “parametrized” by the points y of Y. Then we can obtain sets “parametrized” by the points f(x); these sets then form the inverse image sheaf f^{*}\mathcal{G} on X. The points of a space are of course not open sets in the usual topologies that we use, so the definition of a stalk involves the “direct limit” of open sets containing the point. It is worth noting that the inverse image “preserves” finite limits.

The process of taking the direct image sheaf can be expressed as a functor between the category \text{Sh }(X) of sheaves on X to the category \text{Sh }(Y) of sheaves on Y. The inverse image sheaf is then the right adjoint to the direct image functor, and it has the property that it preserves finite limits.

A geometric morphism is a pair of adjoint functors between toposes such that the left adjoint preserves finite limits. This allows us to form the category \mathfrak{Top} whose objects are elementary toposes and whose morphisms are geometric morphisms. The natural transformations between geometric morphisms, called geometric transformations, give the category \mathfrak{Top} the extra structure of a 2-category. There are also logical morphisms between toposes, which preserve all structure, and with them and their natural transformations we can form the 2-category \mathfrak{Log}.

We can also define the topos \mathfrak{Top}/\mathcal{S} as the category whose objects are geometric morphisms p: \mathcal{E}\rightarrow \mathcal{S} and whose morphisms (p: \mathcal{F}\rightarrow \mathcal{S})\rightarrow (q: \mathcal{E}\rightarrow \mathcal{S}) are pairs (f, \alpha) where f: \mathcal{F}\rightarrow \mathcal{E} is a geometric morphism and \alpha: q\cong p\circ f is a geometric transformation. Together with “2-cells” (f, \alpha)\rightarrow (g, \beta) given by geometric transformations f\rightarrow g that are “compatible” in some sense with \alpha and \beta\mathfrak{Top}/\mathcal{S} also forms a 2-category.

Geometric morphisms can now be used to define the points of a topos. In the category of sets, we can use the morphisms of the set consisting of only one element to all the other sets to indicate the elements of these other sets. The same goes for topological spaces and their points. We have mentioned earlier the category \mathbf{Sets} as the topos of sheaves on a point. Therefore, we define the points of a topos \mathcal{E} as the geometric morphisms from \mathbf{Sets} to \mathcal{E}.

There exist, however, toposes (including Grothendieck toposes) without points. Sheaves, however, are defined only using open sets, therefore to deal with toposes satisfactorily we can make use of the concept of locales, which abstract the properties of open sets and the study of topological spaces, while “forgetting” the underlying sets of points. A topos which is equivalent to the category of sheaves on some locale is called a localic topos.

An important result in the theory of localic toposes is Barr’s theorem, which states that for every Grothendieck topos \mathcal{E} there exists a sheaf \text{Sh }(\mathbf{B}) on a locale \mathbf{B} with a “complete” Boolean algebra structure and an epimorphism \text{Sh }(\mathbf{B})\rightarrow \mathcal{E}. Another important results is Deligne’s theorem, which states that a coherent topos, i.e. a topos \mathcal{E} for which there is a  site (\mathbf{C}, J) where \mathbf{C} has finite limits and the Grothendieck topology has a “basis” which consists of finite covering families, has “enough points“, i.e. for any two arrows \alpha: E\rightarrow D and \alpha: E\rightarrow D in \mathcal{E} there exists a point p: \mathbf{Sets}\rightarrow \mathcal{E} such that the stalk p^{*}(\alpha) is not equal to the stalk p^{*}(\beta) .

We can also use geometric morphisms to define the idea of a classifying topos. A classifying topos is an elementary topos such that objects in any other topos can be “classified” by the geometric morphisms of the topos to the classifying topos. For example, ring objects in any topos \mathcal{E} are classified by the topos given by the opposite category of the category of “finitely presented” rings \mathbf{fp}\textbf{-}\mathbf{rings^{op}}. The object in \mathbf{fp}\textbf{-}\mathbf{rings^{op}} given by the polynomial ring \mathbf{Z}[X] is then a universal object, such that any ring object in \mathcal{E} can be obtained by constructing the pullback of \mathbf{Z}[X]\rightarrow \mathbf{fp}\textbf{-}\mathbf{rings^{op}} along \mathcal{E}\rightarrow \mathbf{fp}\textbf{-}\mathbf{rings^{op}}.

We now combine the idea of classifying toposes (which was inspired by the idea of classifying spaces in algebraic topology) with the applications of topos theory to first-order logic discussed earlier. A theory \mathbb{T} is a set of formulas, called the axioms of the theory, and a model of \mathbb{T} in a topos \mathcal{E} is an interpretation, i.e. an assignment of an object of \mathcal{E} to every type of the first-order language, a subobject of \mathcal{E} to every relation, and a morphism of \mathcal{E} to every function, with quantifiers and binary connectives provided by the corresponding adjoint functors and Heyting algebra structures respectively.

A theory is called a coherent theory if it is of the form \forall x (\phi(x)\Rightarrow \psi(x)), where \phi(x) and \psi(x) are coherent formulas, i.e. formulas which are built up using only the operations of finitary conjunction \wedge, finitary disjunction \vee, and existential quantification \exists. If we also allow as well the operation of infinitary disjunction \bigvee, then we will obtain a geometric formula, and a theory of the form \forall x (\phi(x)\Rightarrow \psi(x)), where \phi(x) and \psi(x) are geometric formulas is called a geometric theory.

Most theories in mathematics are coherent theories. For those which are not, however, there is a certain process called Morleyization which associates to those theories a coherent theory.

For any model of a coherent theory \mathbb{T} in an elementary topos \mathcal{E}, there exists a classifying topos \mathcal{E}_\mathbb{T} and a universal object (in this context also called a universal model) such that said model can be obtained as a pullback of U\rightarrow \mathcal{E}_\mathbb{T} along the geometric morphism \mathcal{E}\rightarrow \mathcal{E}_\mathbb{T}.

We mention yet another aspect of topos theory where logic and geometry combine. We have earlier mentioned the theorems of Deligne and Barr in the context of studying toposes as sheaves on locales. Combined with the logical aspects of the toposes, and the theory of classifying toposes, Deligne’s theorem implies that a statement of the form \forall x (\phi(x)\Rightarrow \psi(x)) where \phi(x) and \psi(x) are coherent formulas holds in all models of the coherent theory \mathbb{T} in any topos if and only if it holds in all models of \mathbb{T} in \mathbf{Sets}.

Meanwhile, Barr’s theorem implies that a statement of the form \forall x (\phi(x)\Rightarrow \psi(x)) where \phi(x) and \psi(x) are geometric formulas holds in all models of the geometric theory \mathbb{T} in any topos if  and only if it holds in all models of \mathbb{T} in Boolean toposes.

In this context, Deligne’s theorem and Barr’s theorem respectively correspond to finitary and infinitary versions of a famous theorem in classical logic called Godel’s completeness theorem.


Topos on Wikipedia

Topos on the nLab

What is … a Topos? by Zhen Lin Low

An Informal Introduction to Topos Theory by Tom Leinster

Topos Theory by Peter T. Johnstone

Sketches of an Elephant: A Compendium of Topos Theory by Peter T. Johnstone

Handbook of Categorical Algebra 3: Categories of Sheaves by Francis Borceux

Sheaves in Geometry and Logic: A First Introduction to Topos Theory by Saunders Mac Lane and Ieke Moerdijk