Rotations in Three Dimensions

In Rotating and Reflecting Vectors Using Matrices we learned how to express rotations in 2-dimensional space using certain special 2\times 2 matrices which form a group (see Groups) we call the special orthogonal group in dimension 2, or \text{SO}(2) (together with other matrices which express reflections, they form a bigger group that we call the orthogonal group in 2 dimensions, or \text{O}(2)).

In this post, we will discuss rotations in 3-dimensional space. As we will soon see, notations in 3-dimensional space have certain interesting features not present in the 2-dimensional case, and despite being seemingly simple and mundane, play very important roles in some of the deepest aspects of fundamental physics.

We will first discuss rotations in 3-dimensional space as represented by the special orthogonal group in dimension 3, written as \text{SO}(3).

We recall some relevant terminology from Rotating and Reflecting Vectors Using Matrices. A matrix is called orthogonal if it preserves the magnitude of (real) vectors. The magnitude of the vector v must be equal to the magnitude of the vector Av, for a matrix A, to be orthogonal. Alternatively, we may require, for the matrix A to be orthogonal, that it satisfy the condition

\displaystyle AA^{T}=A^{T}A=I

where A^{T} is the transpose of A and I is the identity matrix. The word “special” denotes that our matrices must have determinant equal to 1. Therefore, the group \text{SO}(3) consists of the 3\times3 orthogonal matrices whose determinant is equal to 1.

The idea of using the group \text{SO}(3) to express rotations in 3-dimensional space may be made more concrete using several different formalisms.

One popular formalism is given by the so-called Euler angles. In this formalism, we break down any arbitrary rotation in 3-dimensional space into three separate rotations. The first, which we write here by \varphi, is expressed as a counterclockwise rotation about the z-axis. The second, \theta, is a counterclockwise rotation about an x-axis that rotates along with the object. Finally, the third, \psi, is expressed as a counterclockwise rotation about a z-axis that, once again, has rotated along with the object. For readers who may be confused, animations of these steps can be found among the references listed at the end of this post.

The matrix which expresses the rotation which is the product of these three rotations can then be written as

\displaystyle g(\varphi,\theta,\psi) = \left(\begin{array}{ccc} \text{cos}(\varphi)\text{cos}(\psi)-\text{cos}(\theta)\text{sin}(\varphi)\text{sin}(\psi) & -\text{cos}(\varphi)\text{sin}(\psi)-\text{cos}(\theta)\text{sin}(\varphi)\text{cos}(\psi) & \text{sin}(\varphi)\text{sin}(\theta) \\ \text{sin}(\varphi)\text{cos}(\psi)+\text{cos}(\theta)\text{cos}(\varphi)\text{sin}(\psi) & -\text{sin}(\varphi)\text{sin}(\psi)+\text{cos}(\theta)\text{cos}(\varphi)\text{cos}(\psi) & -\text{cos}(\varphi)\text{sin}(\theta) \\ \text{sin}(\psi)\text{sin}(\theta) & \text{cos}(\psi)\text{sin}(\theta) & \text{cos}(\theta) \end{array}\right).

The reader may check that, in the case that the rotation is strictly in the xy plane, i.e. \theta and \psi are zero, we will obtain

\displaystyle g(\varphi,\theta,\psi) = \left(\begin{array}{ccc} \text{cos}(\varphi) & -\text{sin}(\varphi) & 0 \\ \text{sin}(\varphi) & \text{cos}(\varphi) & 0 \\ 0 & 0 & 1 \end{array}\right).

Note how the upper left part is an element of \text{SO}(2), expressing a counterclockwise rotation by an angle \varphi, as we might expect.

Contrary to the case of \text{SO}(2), which is an abelian group, the group \text{SO}(3) is not an abelian group. This means that for two elements a and b of \text{SO}(3), the product ab may not always be equal to the product ba. One can check this explicitly, or simply consider rotating an object along different axes; for example, rotating an object first counterclockwise by 90 degrees along the z-axis, and then counterclockwise again by 90 degrees along the x-axis, will not end with the same result as performing the same operations in the opposite order.

We now know how to express rotations in 3-dimensional space using 3\times 3 orthogonal matrices. Now we discuss another way of expressing the same concept, but using “unitary”, instead of orthogonal, matrices. However, first we must revisit rotations in 2 dimensions.

The group \text{SO}(2) is not the only way we have of expressing rotations in 2-dimensions. For example, we can also make use of the unitary (we will explain the meaning of this word shortly) group in 1-dimension, also written \text{U}(1). It is the group formed by the complex numbers with magnitude equal to 1. The elements of this group can always be written in the form e^{i\theta}, where \theta is the angle of our rotation. As we have seen in Connection and Curvature in Riemannian Geometry, this group is related to quantum electrodynamics, as it expresses the gauge symmetry of the theory.

The groups \text{SO}(2) and \text{U}(1) are actually isomorphic. There is a one-to-one correspondence between the elements of \text{SO}(2) and the elements of \text{U}(1) which respects the group operation. In other words, there is a bijective function f:\text{SO}(2)\rightarrow\text{U}(1), which satisfies ab=f(a)f(b) for a, b elements of \text{SO}(2). When two groups are isomorphic, we may consider them as being essentially the same group. For this reason, both \text{SO}(2) and U(1) are often referred to as the circle group.

We can now go back to rotations in 3 dimensions and discuss the group \text{SU}(2), the special unitary group in dimension 2. The word “unitary” is in some way analogous to “orthogonal”, but applies to vectors with complex number entries.

Consider an arbitrary vector

\displaystyle v=\left(\begin{array}{c}v_{1}\\v_{2}\\v_{3}\end{array}\right).

An orthogonal matrix, as we have discussed above, preserves the quantity (which is the square of what we have referred to earlier as the “magnitude” for vectors with real number entries)

\displaystyle v_{1}^{2}+v_{2}^{2}+v_{3}^{2}

while a unitary matrix preserves

\displaystyle v_{1}^{*}v_{1}+v_{2}^{*}v_{2}+v_{3}^{*}v_{3}

where v_{i}^{*} denotes the complex conjugate of the complex number v_{i}. This is the square of the analogous notion of “magnitude” for vectors with complex number entries.

Just as orthogonal matrices must satisfy the condition

\displaystyle AA^{T}=A^{T}A=I,

unitary matrices are required to satisfy the condition

\displaystyle AA^{\dagger}=A^{\dagger}A=I

where A^{\dagger} is the Hermitian conjugate of A, a matrix whose entries are the complex conjugates of the entries of the transpose A^{T} of A.

An element of the group \text{SU}(2) is therefore a 2\times 2 unitary matrix whose determinant is equal to 1. Like the group \text{SO}(3), the group \text{SU}(2) is also a group which is not abelian.

Unlike the analogous case in 2 dimensions, the groups \text{SO}(3) and \text{SU}(2) are not isomorphic. There is no one-to-one correspondence between them. However, there is a homomorphism from \text{SU}(2) to \text{SO}(3) that is “two-to-one”, i.e. there are always two elements of \text{SU}(2) that get mapped to the same element of \text{SO}(3) under this homomorphism. Hence, \text{SU}(2) is often referred to as a “double cover” of \text{SO}(3).

In physics, this concept underlies the weird behavior of quantum-mechanical objects called spinors (such as electrons), which require a rotation of 720, not 360, degrees to return to its original state!

The groups we have so far discussed are not “merely” groups. They also possesses another kind of mathematical structure. They describe certain shapes which happen to have no sharp corners or edges. Technically, such a shape is called a manifold, and it is the object of study of the branch of mathematics called differential geometry, which we have discussed certain basic aspects of in Geometry on Curved Spaces and Connection and Curvature in Riemannian Geometry.

For the circle group, the manifold that it describes is itself a circle. The elements of the circle group correspond to the points of the circle. The group \text{SU}(2) is the surface of the 4– dimensional sphere, or what we call a 3-sphere (for those who might be confused by the terminology, recall that we are only considering the surface of the sphere, not the entire volume, and this surface is a 3-dimensional, not a 4-dimensional, object). The group \text{SO}(3) is 3-dimensional real projective space, written \mathbb{RP}^{3}. It is a manifold which can be described using the concepts of projective geometry (see Projective Geometry).

A group that is also a manifold is called a Lie group (pronounced like “lee”) in honor of the mathematician Marius Sophus Lie who pioneered much of their study. Lie groups are very interesting objects of study in mathematics because they bring together the techniques of group theory and differential geometry, which teaches us about Lie groups on one hand, and on the other hand also teaches us more about both group theory and differential geometry themselves.


Orthogonal Group on Wikipedia

Rotation Group SO(3) on Wikipedia

Euler Angles on Wikipedia

Unitary Group on Wikipedia

Spinor on Wikipedia

Lie Group on Wikipedia

Real Projective Space on Wikipedia

Algebra by Michael Artin



In Presheaves we have compared functions on a topological space (as an example we considered the complex plane \mathbb{C} with the Zariski topology) and the functions on open subsets of this space (which in our example would be the complex plane \mathbb{C} with a finite number of points removed).

In this post we take on this topic again, with an emphasis on the functions which can be expressed in terms of polynomials; in Presheaves we saw that on the entire complex plane we could not admit \frac{1}{x} as a function (we will refer to these functions defined on a space as regular functions on the space) on the complex plane \mathbb{C} as it was undefined at the point x=0. It can, however, be admitted as a (regular) function on the open subset \mathbb{C}-\{0\}. We will restrict our topological spaces to the case of varieties (see Basics of Algebraic Geometry).

Note that if we are considering the entire complex plane, the regular functions are only those whose denominators are constants. But on the open subset \mathbb{C}-\{0\}, we may have polynomials in the denominators as long as their zeroes are not in the open subset, in this case 0, which is not in \mathbb{C}-\{0\}. If we take an other open subset, one that is itself a subset of \mathbb{C}-\{0\}, such as \mathbb{C}-\{0,1\}, we can admit even more regular functions on this open subset.

The difference between the properties of a topological space and an open subset of such a space is related to the difference between “local” properties and “global” properties. “Local” means it holds on a smaller part of the space, while “global” means it holds on the entire space. For example, “locally”, the Earth appears flat. Of course, “globally”, we know that the Earth is round. However, ideally we should be able to “patch together” local information to obtain global information. This is what the concept of sheaves (see Sheaves) are for.

We may think about what we will see if we only “look at” a single point, for example, in \mathbb{C}, we may only look at 0. We can look at the set of all ratios of polynomials that are always defined at 0, which means that the polynomial in the denominator is not allowed to have a zero at 0. However, there are many functions that we can have – for example \frac{1}{x-1}, \frac{1}{(x-1)^{2}}, \frac{1}{(x-1)(x-2)}, and so many others aside from those that are already regular on all of \mathbb{C}. The set of all these functions, which form a ring, is called the local ring at 0. The local ring at any point P of a variety X is written \mathcal{O}_{X,P}. Taking the local ring at P is an example of the process of localization.

A single point is not an open subset in our topology, so this does not fit into our definition of a sheaf or a presheaf. Instead, we say that the local ring at a point is the stalk of the sheaf of regular functions at that point. More technically, the stalk of a sheaf (or presheaf) is the set of equivalence classes (see Modular Arithmetic and Quotient Sets) of pairs (U,\varphi), under the equivalence relation (U,\varphi)\sim(U',\varphi') if there exists an open subset V in the intersection U\cap U' for which \varphi |_{V}=\varphi'  |_{V}. The elements of the stalk are called the germs of the sheaf (or presheaf).

An important property of a local ring at a point P is that it has only one maximal ideal (see More on Ideals), which is made up of the polynomial functions that vanish at P. This maximal ideal we will write as \mathfrak{m}_{X,P}. The quotient (again see Modular Arithmetic and Quotient Sets) \mathcal{O}_{X,P}/\mathfrak{m}_{X,P} is called the residue field.

We recall the Hilbert Nullensatz and the definition of varieties and schemes in Basics of Algebraic Geometry. There we established a correspondence between the points of a variety (resp. scheme) and the maximal ideals (resp. prime ideals) of its “ring of functions”. We can use the ideas discussed here concerning locality, via the concept of presheaves and sheaves, to construct more general varieties and schemes.

One of the great things about algebraic geometry is that it is kind of a “synthesis” of ideas from both abstract algebra and geometry, and ideas can be exchanged between both. For example, we have already mentioned in Basics of Algebraic Geometry that we can start with a ring R and look at the set of its maximal (resp. prime) ideals as forming a space. If we look at the set of its prime ideals (usually also referred to as its spectrum, and denoted \text{Spec } R – again we note that the word spectrum has many meanings in mathematics) then we have a scheme. This ring R may not even be a ring of polynomials – we may even consider the ring of integers \mathbb{Z}, and do algebraic “geometry” on the space \text{Spec }\mathbb{Z}!

We can also extract the idea of only looking at local information, an idea which has geometric origins, and apply it to abstract algebra. We can then define local rings completely algebraically, without reference to geometric ideas, as a ring with a unique maximal ideal.

A local ring which is also a principal ideal domain (a ring in which every ideal is a principal ideal, again see More on Ideals) and is not a field is called a discrete valuation ring. Discrete valuation rings are localizations of Dedekind domains, which are important in number theory, as we have discussed in Algebraic Numbers; for instance, in Dedekind domains, even though elements may not factor uniquely into irreducibles, ideals will always factor uniquely into prime ideals.

For the ring of integers \mathbb{Z}, an example of a local ring is given by the ring of fractions whose denominator is an integer not divisible by a certain prime number p. We denote this local ring by \mathbb{Z}_{(p)}. For p=2, \mathbb{Z}_{(2)} is composed of all fractions whose denominator is an odd number. The unique maximal ideal of this ring is given by the fractions whose numerator is an even number. Since \mathbb{Z} is a Dedekind domain, \mathbb{Z}_{(p)} is also a discrete valuation ring. We refer to the local ring \mathbb{Z}_{(p)} as the localization of \mathbb{Z} at the point (prime ideal) (p).

We started with the idea of “local” and “global” in geometry, in particular algebraic geometry, and ended up with ideas important to number theory. This is once more an example of how the exchange of ideas between different branches of mathematics leads to much fruitful development of each branch and of mathematics as a whole.


Localization on Wikipedia

Localization of a Ring on Wikipedia

Local Ring on Wikipedia

Stalk on Wikipedia

Algebraic Geometry by Andreas Gathmann

Algebraic Geometry by J.S. Milne

Algebraic Geometry by Robin Hartshorne

Algebraic Number Theory by Jurgen Neukirch

The Hom and Tensor Functors

We discussed functors in Category Theory, and in this post we discuss certain functors important to the study of rings and modules. Moreover, we look at these functors and how they affect exact sequences, whose importance was discussed in Exact Sequences. Our discussion in this post will also be related to some things that we discussed in More on Chain Complexes.

If M and N are two modules whose ring of scalars is the ring R (we refer to M and N as R-modules), then we denote by \text{Hom}_{R}(M,N) the set of linear transformations (see Vector Spaces, Modules, and Linear Algebra) from M to N. It is worth noting that this set has an abelian group structure (see Groups).

We define the functor \text{Hom}_{R}(M,-) as the functor that assigns to an R-module N the abelian group \text{Hom}_{R}(M,N) of linear transformations from M to N. Similarly, the functor \text{Hom}_{R}(-,N) assigns to the R-module M the abelian group \text{Hom}_{R}(M,N) of linear transformations from M to N.

These functors \text{Hom}_{R}(M,-) and \text{Hom}_{R}(-,N), combined with the idea of exact sequences, give us new definitions of projective and injective modules, which are equivalent to the old ones we gave in More on Chain Complexes.

We say that a functor is an exact functor if preserves exact sequences. In the case of \text{Hom}_{R}(M,-), we say that it is exact if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow \text{Hom}_{R}(M,A)\rightarrow \text{Hom}_{R}(M,B)\rightarrow \text{Hom}_{R}(M,C)\rightarrow 0

is also exact. The concept of an exact sequence of sets of linear transformations of R-modules makes sense because of the abelian group structure on these sets. In this case we also say that the R-module M is projective.

Similarly, an R-module N is injective if the functor \text{Hom}_{R}(-,N) is exact, i.e. if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow \text{Hom}_{R}(A,N)\rightarrow \text{Hom}_{R}(B,N)\rightarrow \text{Hom}_{R}(C,N)\rightarrow 0

is also exact.

We introduce another functor, which we write M\otimes_{R}-. This functor assigns to an R-module N the tensor product (see More on Vector Spaces and Modules) M\otimes_{R}N. Similarly, we also have the functor -\otimes_{R}N, which assigns to an R-module M the tensor product M\otimes_{R}N. If our ring R is commutative, then there will be no distinction between the functors M\otimes_{R}- and -\otimes_{R}M. We will continue assuming that our rings are commutative (an example of a noncommutative ring is the ring of n\times n matrices).

We say that a module N is flat if the functor -\otimes_{R}N is exact, i.e. if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow A\otimes_{R}N\rightarrow B\otimes_{R}N\rightarrow C\otimes_{R}N\rightarrow 0

is also exact.

We make a little digression to introduce the concept of an algebra. The word “algebra” has a lot of meanings in mathematics, but in our context, as a mathematical object in the subject of abstract algebra and linear algebra, it means a set with both a ring and a module structure. More technically, for a ring A, an A-algebra is a ring B and a ring homomorphism f:A\rightarrow B, which makes B into an A-module via the following definition of the scalar multiplication:

ab=f(a)b for a\in A, b\in B.

The notion of an algebra will be useful in defining the notion of a flat morphism. A ring homomorphism f: A\rightarrow B is a flat morphism if the functor -\otimes_{A}B is exact. Since B is an A-algebra, and an A-algebra is also an A-module, this means that f: A\rightarrow B is a flat morphism if B is flat as an A-module. The notion of a flat morphism is important in algebraic geometry, where the “points” of schemes are given by the prime ideals of a ring, since it corresponds to a “continuous” family of schemes parametrized by the “points” of another scheme.

Finally, the functors \text{Hom}_{R}(M,-), \text{Hom}_{R}(-,N), and -\otimes_{R}N, which we will also refer to as the “Hom” and “Tensor” functors, can be used to define the derived functors “Ext” and “Tor”, to which we have given a passing mention in More on Chain Complexes. We now elaborate on these constructions.

The Ext functor, written \text{Ext}_{R}^{n}(M,N) for a fixed R-module M, is calculated by taking an injective resolution of B,

0\rightarrow N\rightarrow E^{0}\rightarrow E^{1}\rightarrow ...

then applying the functor \text{Hom}_{R}(M,-):

0 \rightarrow \text{Hom}_{R}(M,N)\rightarrow \text{Hom}_{R}(M,E^{0})\rightarrow \text{Hom}_{R}(M,E^{1})\rightarrow ...

we “remove” \text{Hom}_{R}(M,N) to obtain the chain complex

0 \rightarrow \text{Hom}_{R}(M,E^{0})\rightarrow \text{Hom}_{R}(M,E^{1})\rightarrow ...

Then \text{Ext}_{R}^{n}(M,N) is the n-th homology group (see Homology and Cohomology) of this chain complex.

Alternatively, we can also define the Ext functor \text{Ext}_{R}^{n}(M,N) for a fixed R-module N by taking a projective resolution of M,

...\rightarrow P_{1}\rightarrow P_{0}\rightarrow M\rightarrow 0

then then applying the functor \text{Hom}_{R}(-,N), which “dualizes” the chain complex:

0 \rightarrow \text{Hom}_{R}(M,N)\rightarrow \text{Hom}_{R}(P_{0},N)\rightarrow \text{Hom}_{R}(P_{1},N)\rightarrow ...

we again “remove” \text{Hom}_{R}(M,N) to obtain the chain complex

0 \rightarrow \text{Hom}_{R}(P_{0},N)\rightarrow \text{Hom}_{R}(P_{1},N)\rightarrow ...

and \text{Ext}_{R}^{n}(M,N) is once again given by the n-th homology group of this chain complex.

The Tor functor, meanwhile, written \text{Tor}_{n}^{R}(M,N) for a fixed R-module N, is calculated by taking a projective resolution of M and applying the functor -\otimes_{R}N, followed by “removing” M\otimes_{R}N:

0\rightarrow M\otimes_{R}P_{0}\rightarrow M\otimes_{R}P_{1}\rightarrow ...

\text{Tor}_{n}^{R}(M,N) is then given by the n-th homology group of this chain complex.

The Ext and Tor functors were originally developed to study the concepts of “extension” and “torsion” of groups in abstract algebra, hence the names, but they have since then found utility in many other subjects, in particular algebraic topology, algebraic geometry, and algebraic number theory. Our exposition here has been quite abstract; to find more motivation, aside from checking out the references listed below, the reader may also compare with the ordinary homology and cohomology theories in algebraic topology. Hopefully we will be able to flesh out more aspects of what we have discussed here in future posts.


Hom Functor on Wikipedia

Tensor Product of Modules on Wikipedia

Flat Module on Wikipedia

Associative Algebra on Wikipedia

Derived Functor on Wikipedia

Ext Functor on Wikipedia

Tor Functor on Wikipedia

Abstract Algebra by David S. Dummit and Richard B. Foote

Commutative Algebra by M. F. Atiyah and I. G. MacDonald

An Introduction to Homological Algebra by Joseph J. Rotman

Galois Groups

In Algebraic Numbers we discussed algebraic number fields and a very important group associated associated to an algebraic number field called its ideal class group. In this post we define another very important group called the Galois group. They are named after the mathematician Evariste Galois, who lived in the early 19th century and developed the theory before his early death in a duel (with mysterious circumstances) at the age of 20 years old.

The problem that motivated the development of Galois groups was the solution of polynomial equations of higher degree. We know that for quadratic equations (equations of degree 2) there exists a “quadratic formula” that allows us to solve for the roots of any quadratic equation. For cubic equations (equations of degree 3) and quartic equations (equations of degree 4), there is also a similar “cubic formula” and a “quartic formula”, although they are not as well-known as the  quadratic formula.

However for quintic equations (equations of degree 5) there is no “quintic formula”. What this means is that not every quintic equation can be solved by a finite number of additions, subtractions, multiplications, divisions, and extractions of roots. Some quintic equations, of course, can be easily solved using these operations, such as x^{5}-1=0. However this does not hold true for all quintic equations. This was proven by another mathematician, Niels Henrik Abel, but it was Galois who gave the conditions needed to determine whether a quintic equation could be solved using the aforementioned operations or not.

The groundbreaking strategy that Galois employed was to study the permutations of roots of polynomial equations. These permutations are the same as the field automorphisms of the smallest field extension (see Algebraic Numbers for the definition of a field extension) that contains these roots (called the splitting field of the polynomial equation) which also fix the field of coefficients of the polynomial.

By “field automorphisms” we mean a function f from a field to itself such that the following conditions are satisfied:



By “fix” we mean that if a is an element of the field of coefficients of the polynomial equation, then we must have


We might perhaps do better by discussing an example. We do not delve straight into quintic equations, and consider first the much simpler case of a quadratic equation such as x^{2}+1=0. We consider the polynomial x^{2}+1 as having coefficients in the field \mathbb{Q} of rational numbers. The roots of this equation are i and -i, and the splitting field is the field \mathbb{Q}[i].

Since there are only two roots, we only have two permutations of these roots. One is the identity permutation, which sends i to i and -i to -i, and the other is the permutation that exchanges the two, sending i to -i and -i to i. The first one corresponds to the identity field automorphism of \mathbb{Q}[i], while the second one corresponds to the complex conjugation field automorphism of \mathbb{Q}[i]. Both these permutations preserve \mathbb{Q}.

These permutations (or field automorphisms) form a group (see Groups), which is what we refer to as the Galois group of the field extension (the splitting field, considered as a field extension of the field of coefficients of the polynomial) or the polynomial.

The idea is that the “structure” of the Galois group, as a group, is related to the “structure” of the field extension. For example, the subgroups of the Galois groups correspond to the “intermediate fields” contained in the splitting field but containing the field of coefficients of the polynomial.

Using this idea, Galois showed that whenever the Galois group of an irreducible quintic polynomial is the symmetric group S_{5} (the group of permutations of the set with 5 elements) or the alternating group A_{5} (the group of “even” permutations of the set with 5 elements), then the polynomial cannot be solved using a finite number of additions, subtractions, multiplications, division, and extractions of roots. This happens, for example, when the irreducible quintic polynomial has three real roots, as in the case of x^{5}-16x+2. More details of the proof can be found in the last chapter of the book Algebra by Michael Artin.

Although the Galois group was initially developed to deal with problems regarding the solvability of polynomial equations, they have found applications beyond this original purpose and have become a very important part of many aspects of modern mathematics, especially in (but not limited to, rather surprisingly) number theory.

For example, the study of “representations” of Galois groups in terms of linear transformations of vector spaces (see Vector Spaces, Modules, and Linear Algebra) is an important part of the proof of the very famous problem called Fermat’s Last Theorem by the mathematician Andrew Wiles in 1994. A very active field of research in the present day related to representations of Galois groups is called the Langlands program. In particular, what is usually being studied is the “absolute” Galois group – the group of field automorphisms of the set of all algebraic numbers that fix the field \mathbb{Q} of rational numbers. A book that makes these ideas accessible to a more general audience is Fearless Symmetry: Exposing the Hidden Patterns of Numbers by Avner Ash and Robert Gross.


Galois Theory on Wikipedia

Galois Group on Wikipedia

Wiles’ Proof of Fermat’s Last Theorem on Wikipedia

Langlands Program on Wikipedia

Fearless Symmetry: Exposing the Hidden Patterns of Numbers by Avner Ash and Robert Gross

Algebra by Michael Artin

Rotating and Reflecting Vectors Using Matrices

In Vector Spaces, Modules, and Linear Algebra we learned about vectors, and defined them as elements of a set that is closed under addition and scalar multiplication. This is a pretty abstract concept, and in that post we used an example of “apples and oranges” to express it. However we also mentioned that many other things are vectors; for instance, states in quantum mechanics, and quantities with a magnitude and direction, such as forces. It is these quantities with a magnitude and direction that we will focus on in this post.

We will use the language that we developed in Matrices in order to make things more concrete. We will focus on two dimensions only in this post, in order to simplify things, although it will not be difficult to generalize to higher dimensions. We develop first a convention. The vector

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)

represents a quantity with magnitude “1” (meter, or meter per second, or Newton, etc.) going to the right (or east). Similarly, the vector

\displaystyle \left(\begin{array}{c}-1\\0\end{array}\right)

represents a quantity with magnitude 1 going to the left (or west). Meanwhile, the vector

\displaystyle \left(\begin{array}{c}0\\1\end{array}\right)

represents a quantity with magnitude 1 going upward (or to the north). Finally, the vector

\displaystyle \left(\begin{array}{c}0\\-1\end{array}\right)

represents a quantity with magnitude 1 going downward (or to the south). These vectors we have enumerated all have magnitude 1, therefore they are also called unit vectors. Since they are vectors, we can “scale” them or add or subtract them from each other to form new vectors. For example, we can “double” the upward-pointing unit vector,

\displaystyle 2\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}0\\2\end{array}\right)

to obtain a vector again pointing upward but with a magnitude of 2. We can also “add” the right-pointing unit vector to the upward-pointing unit vector, as follows:

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)+\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}1\\1\end{array}\right)

We can easily infer that this vector will point “diagonally” upward and to the right (or to the northwest). But what will be its magnitude? For this we introduce the concept of the transpose. The transpose of a matrix is just another matrix but with its rows and columns interchanged. For a column matrix, we have only one column, so its transpose is a matrix with only one row, as follows:

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)^{T}=\left(\begin{array}{cc}a&b\end{array}\right)

Now, to take the magnitude of a vector, we take the square root of the product of the transpose of a vector and the vector itself. Note that the multiplication of matrices is not commutative, so it is important that the row matrix be on the left and the column matrix (the vector) be on the right. It is the only way we will obtain an ordinary number from the matrices.

Applying the rules of matrix multiplication, we see that for a vector

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

the magnitude will be given by the square root of

\displaystyle \left(\begin{array}{cc}a&b\end{array}\right) \left(\begin{array}{c}a\\b\end{array}\right)=a^{2}+b^{2}

This should be reminiscent of the Pythagorean theorem. As we have already seen in From Pythagoras to Einstein, this ancient theorem always shows up in many aspects of modern mathematics and physics. Going back to our example of the vector

\displaystyle \left(\begin{array}{c}1\\1\end{array}\right)

we can now compute for its magnitude. Multiplying the transpose of this vector and the vector itself, in the proper order, we obtain

\displaystyle \left(\begin{array}{cc}1&1\end{array}\right) \left(\begin{array}{c}1\\1\end{array}\right)=1^{2}+1^{2}=2

and taking the square root of this number, we see that the magnitude of our vector is equal to \sqrt{2}.

In Matrices we mentioned that a square matrix may be used to describe linear transformations between vectors. Now that we have used the language of vectors to describe quantities with magnitude and direction, we also show a very special kind of linear transformation – one that sends a vector to another vector with the same value of the magnitude, but “rotated” or “reflected”, i.e. with a different direction. We may say that this linear transformation describes the “operation” of rotation or reflection. This analogy is the reason why linear transformations from a vector space to itself are also often referred to as linear operators, especially in quantum mechanics.

We make this idea clearer with an explicit example. Consider the matrix

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)

We look at its effect on some vectors:

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}0\\1\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}-1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}-1\\0\end{array}\right)=\left(\begin{array}{c}0\\-1\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}0\\-1\end{array}\right)=\left(\begin{array}{c}1\\0\end{array}\right)

From these basic examples one may infer that our matrix represents a counterclockwise “rotation” of ninety degrees. The reader is encouraged to visualize (or better yet draw) how this is so. In fact, we can express a counterclockwise rotation of any angle \theta using the matrix

\displaystyle \left(\begin{array}{cc}\text{cos }\theta&-\text{sin }\theta\\ \text{sin }\theta&\text{cos }\theta\end{array}\right)

We consider next another matrix, given by

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)

We likewise look at its effect on some vectors:

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}0\\-1\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}-1\\0\end{array}\right)=\left(\begin{array}{c}-1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}0\\-1\end{array}\right)=\left(\begin{array}{c}0\\1\end{array}\right)

What we see now is that this matrix represents a “reflection” along the horizontal axis. Any reflection along a line specified by an angle of \frac{\theta}{2} is represented by the matrix

\displaystyle \left(\begin{array}{cc}\text{cos }\theta&\text{sin }\theta\\ \text{sin }\theta&-\text{cos }\theta\end{array}\right)

The matrices representing rotations and reflections form a group (see Groups) called the orthogonal group. Since we are only looking at rotations in the plane, i.e. in two dimensions, it is also more properly referred to as the orthogonal group in dimension 2, written \text{O}(2). The matrices representing rotations form a subgroup (a subset of a group that is itself also a group) of the orthogonal group in dimension 2, called the special orthogonal group in dimension 2 and written \text{SO}(2).

The reader is encouraged to review the concept of a group as discussed in Groups, but intuitively what this means is that by multiplying two matrices, for instance, representing counterclockwise rotations of angles \alpha and \beta, then we will get a matrix which represents a counterclockwise rotation of angle \alpha+\beta. In other words, we can “compose” rotations; and the composition is associative, possesses an “identity” (a rotation of zero degrees) and for every counterclockwise rotation of angle \theta there is an “inverse” (a clockwise rotation of angle \theta, which is also represented as a counterclockwise rotation of angle -\theta).


\displaystyle \left(\begin{array}{cc}\text{cos }\alpha&-\text{sin }\alpha\\ \text{sin }\alpha&\text{cos }\alpha\end{array}\right)\left(\begin{array}{cc}\text{cos }\beta&-\text{sin }\beta\\ \text{sin }\beta&\text{cos }\beta\end{array}\right)=\left(\begin{array}{cc}\text{cos}(\alpha+\beta)&-\text{sin}(\alpha+\beta)\\ \text{sin}(\alpha+\beta)&\text{cos}(\alpha+\beta)\end{array}\right)

It can be a fun exercise to derive this equation using the laws of matrix multiplication and the addition formulas for the sine and cosine functions from basic trigonometry.

This is what it means for \text{SO}(2), the matrices representing rotations, to form a group. Reflections can also be considered in addition to rotations, and reflections and rotations can be composed with each other. This is what it means for \text{O}(2), the matrices representing rotations and reflections, to form a group. The matrices representing reflections alone do not form a group however, since the composition of two reflections is not a reflection, but a rotation.

Technically, the distinction between the matrices representing rotations and the matrices representing reflections can be seen by examining the determinant, which is a concept we will leave to the references for now.

It is worth repeating how we defined the orthogonal group \text{O}(2) technically – it is the group of matrices that preserve the magnitudes of vectors. This gives us some intuition as to why they are so special. There are other equivalent definitions of \text{O}(2). For example, they can also be defined as the matrices A which satisfy the equation

\displaystyle AA^{T}=A^{T}A=I

where the matrix A^{T} is the transpose of  the matrix A, which is given by interchanging the rows and the columns of A, as discussed earlier, and

\displaystyle I=\left(\begin{array}{cc}1&0\\ 0&1\end{array}\right)

is the “identity” matrix, which multiplied to any other matrix A (on either side) just gives back A. This may also be expressed by saying that the group \text{O}(2) is made up of the matrices whose transpose is also its inverse (and vice versa).

In summary, we have shown in this post one specific aspect of vector spaces and linear transformations between vector spaces, and “fleshed out” the rather skeletal framework of sets that are closed under addition and scalar multiplication, and functions that respect this structure. It is important to note of course, that the applications of vector spaces and linear transformations are by no means limited to describing quantities with magnitude and direction.

Another concept that we have “fleshed out” in this post is the concept of groups, which we have only treated rather abstractly in Groups. We have also been using the concept of groups in algebraic topology, in particular homotopy groups in Homotopy Theory and homology groups and cohomology groups in Homology and Cohomology, but it is perhaps the example of the orthogonal group, or even better the special orthogonal group, where we have intuitive and concrete examples of the concept. Rotations can be composed, the composition is associative, there exists an “identity”, and there exists an “inverse” for every element. The same holds for rotations and reflections together.

These two subjects that we have discussed in this post, namely linear algebra and group theory, are in fact closely related. The subject that studies these two subjects in relation to one another is called representation theory, and it is a very important part of modern mathematics.


Orthogonal Matrix on Wikipedia

Orthogonal Group on Wikipedia

Algebra by Michael Artin


We discussed linear algebra in Vector Spaces, Modules, and Linear Algebra, and there we focused on “finite-dimensional” vector spaces (the concept of dimension for vector spaces was discussed in More on Vector Spaces and Modules), writing vectors in the form

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

Vectors need not be written in this way, since the definition of the concept of vector space only required that it be a set closed under addition and scalar multiplication. For example, we could have just denoted vectors by v, or, in quantum mechanics, we use what we call “Dirac notation”, writing vectors as |\psi\rangle.

However, the notation that we used in Vector Spaces, Modules, and Linear Algebra is quite convenient; it allowed us to display explicitly the “components”; if we declare that our scalars, for example, be the set of real numbers \mathbb{R}, and that our vector space is the set of all vectors of the form

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

where a,b\in \mathbb{R}, then we already know that we can use the following vectors for our basis:

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)


\displaystyle \left(\begin{array}{c}0\\1\end{array}\right)

since any vector can be expressed uniquely as a linear combination

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)=a\left(\begin{array}{c}1\\0\end{array}\right)+b\left(\begin{array}{c}0\\1\end{array}\right)

It is also quite easy to see that our vector space here has dimension 2. What we have done is express our vector as a matrix, more specifically a column matrix. A matrix is a rectangular array of numbers (which we refer to as its “entries”), with some specific properties as we will discuss later. If a matrix has m rows and n columns, we refer to it as an m\times n matrix. A matrix that has only one row is often referred to as a row matrix, and a matrix with only one column, as we have been using to express our vectors up to now, is referred to as a column matrix. A matrix with the same number of columns and rows is referred to as a square matrix.

Here are some examples of matrices (with real number entries):

\displaystyle \left(\begin{array}{cc}1&-0.25\\ 100&0\\2&-5\end{array}\right)        (3\times 2 matrix)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&\frac{3}{2}\end{array}\right)        (2\times 2 square matrix)

\displaystyle \left(\begin{array}{cccc}1&27&-\frac{4}{5}&10\\ \end{array}\right)       (1\times 4 row matrix)

We will often adopt the notation that the entry in the first row and first column of a matrix A will be labeled by A_{1,1}, the entry in the second row and first column of the same matrix will be labeled A_{2,1}, and so on. Since we often denote vectors by v, we will denote its first component (the entry in the first row) by v_{1}, the second component by v_{2}, and so on.

We can perform operations on matrices. The set of m\times n matrices, for fixed m and n form a vector space, which means we can “scale” them or multiply them by a “scalar”, and we can also add or subtract them from each other. This is done so “componentwise”, i.e.

\displaystyle c\left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)=\left(\begin{array}{cc}cA_{1,1}&cA_{1,2}\\ cA_{2,1}&cA_{2,2}\end{array}\right)

\displaystyle \left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)+\left(\begin{array}{cc}B_{1,1}&B_{1,2}\\ B_{2,1}&B_{2,2}\end{array}\right)=\left(\begin{array}{cc}A_{1,1}+B_{1,1}&A_{1,2}+B_{1,2}\\ A_{2,1}+B_{2,1}&A_{2,2}+B_{2,2}\end{array}\right)

\displaystyle \left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)-\left(\begin{array}{cc}B_{1,1}&B_{1,2}\\ B_{2,1}&B_{2,2}\end{array}\right)=\left(\begin{array}{cc}A_{1,1}-B_{1,1}&A_{1,2}-B_{1,2}\\ A_{2,1}-B_{2,1}&A_{2,2}-B_{2,2}\end{array}\right)

Multiplication of matrices is more complicated. A j\times k matrix can be multiplied by a k\times l matrix to form a j\times l matrix. Note that the number of columns of the first matrix must be equal to the number of rows of the second matrix. Note also that multiplication of matrices is not commutative; a product AB of two matrices A and B may not be equal to the product BA of the same matrices, contrary to what we find in the multiplication of ordinary numbers.

The procedure to obtaining the entries of this product matrix is as follows: Let’s denote the product of the j\times k matrix A and the k\times l matrix B by AB (this is a j\times l matrix, as we have mentioned above) and let AB_{m,n} be its entry in the m-th row and n-th column. Then

\displaystyle AB_{m,n}=\sum_{i=1}^{k}A_{m,i}B_{i,n}

For example, we may have

\displaystyle \left(\begin{array}{cc}1&-3\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}5&-2&0&1\\ 0&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}(1)(5)+(-3)(0)&(1)(-2)+(-3)(1)&(1)(0)+(-3)(-1)&(1)(1)+(-3)(4)\\ (2)(5)+(0)(0)&(2)(-2)+(0)(1)&(2)(0)+(0)(-1)&(2)(1)+(0)(4)\\(-2)(5)+(6)(0)&(-2)(-2)+(6)(1)&(-2)(0)+(6)(-1)&(-2)(1)+(6)(4)\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&-3\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}5&-2&0&1\\ 0&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}5&-5&3&-11\\ 10&-4&0&2\\-10&10&-6&22\end{array}\right)

We highlight the following step to obtain the entry in the first row and first column:

\displaystyle \left(\begin{array}{cc}\mathbf{1}&\mathbf{-3}\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}\mathbf{5}&-2&0&1\\ \mathbf{0}&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}\mathbf{(1)(5)+(-3)(0)}&(1)(-2)+(-3)(1)&(1)(0)+(-3)(-1)&(1)(1)+(-3)(4)\\ (2)(5)+(0)(0)&(2)(-2)+(0)(1)&(2)(0)+(0)(-1)&(2)(1)+(0)(4)\\(-2)(5)+(6)(0)&(-2)(-2)+(6)(1)&(-2)(0)+(6)(-1)&(-2)(1)+(6)(4)\end{array}\right)

Now that we know how to multiply matrices, we now go back to vectors, which can always be written as column matrices. For the sake of simplicity we continue to restrict ourselves to finite-dimensional vector spaces. We have seen that writing vectors as column matrices provides us with several conveniences. Other kinds of matrices are also useful in studying vector spaces.

For instance, we noted in Vector Spaces, Modules, and Linear Algebra that an important kind of function between vector spaces (of the same dimension) are the linear transformations, which are functions f(v) such that f(av)=af(v) and f(u+v)=f(u)+f(v). We note that if A is an n\times n square matrix, and v is an n\times 1 column matrix, then the product Av is another n\times 1 column matrix. It is a theorem that all linear transformations between n-dimensional vector spaces can be written as an n\times n square matrix.

 We also have functions from a vector space to the set of its scalars, which are sometimes referred to as functionals. The set of linear functionals, i.e. the set of functionals f(v) such that f(av)=af(v) and f(u+v)=f(u)+f(v), are represented by multiplying any column matrix by a row matrix (the number of their entries must be the same, as per the rules of matrix multiplication). For instance, we may have

\displaystyle \left(\begin{array}{cccc}u_{1}&u_{2}&u_{3}&u_{4} \end{array}\right)\left(\begin{array}{c}v_{1}\\v_{2}\\v_{3}\\v_{4}\end{array}\right) = u_{1}v_{1}+u_{2}v_{2}+u_{3}v_{3}+u_{4}v_{4}

Note that the right side is just a real number (or complex number, or perhaps most generally, an element of the field of scalars of our vector space).

Matrices are rather ubiquitous in mathematics (and also in physics). In fact, some might teach the subject of linear algebra with the focus on matrices first. Here, however, we have taken the view of introducing first the abstract idea of vector spaces, and with matrices being viewed as a method of making these abstract ideas of vectors, linear transformations, and linear functionals more “concrete”. At the very heart of linear algebra still remains the idea of a set whose elements can be added and scaled, and functions between these sets that “respect” the addition and scaling. But when we want to actually compute things, matrices will often come in handy.


 Matrix on Wikipedia

Linear Algebra Done Right by Sheldon Axler

Algebra by Michael Artin

Abstract Algebra by David S. Dummit and Richard M. Foote

More on Chain Complexes

In Homology and Cohomology we used the concept of chain complexes to investigate topological spaces. In Exact Sequences we saw examples of chain complexes generalized to abelian groups other than that made out of topological spaces. In this post we study chain complexes in the context of linear algebra (see Vector Spaces, Modules, and Linear Algebra).

We start with some definitions regarding modules. In More on Vector Spaces and Modules we gave the definition of a basis of a vector space. It is known that any vector space can always have a basis. However, the same is not true for modules. It is only a certain special kind of module called a free module which has the property that one can always find a basis for it.

Alternatively, a free module over a ring R may be thought of as being a module that is isomorphic to a direct sum of several copies of the ring R.

An example of a module that is not free is the module \mathbb{Z}/2\mathbb{Z} over the ring \mathbb{Z}. It is a module over \mathbb{Z} since it is closed under addition and under multiplication by any element of \mathbb{Z}, however a basis that will allow it to be written as a unique linear combination of elements of the basis cannot be found, nor is it a direct sum of copies of \mathbb{Z}.

Although not all modules are free, it is actually a theorem that any module is a quotient of a free module. Let A be a module over a ring R. The theorem says that this module is the quotient of some free module, which we denote by F_{0}, by some other module which we denote by K_{1}. In other words,


We can write this as the following chain complex, which also happens to be an exact sequence (see Exact Sequences):

0\rightarrow K_{1}\xrightarrow{i_{1}} F_{0}\xrightarrow{\epsilon} A\rightarrow 0

We know that the module F is free. However, we do not know if the same holds true for K_{1}. Regardless, the theorem says that any module is a quotient of a free module. Therefore we can write

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{\epsilon_{1}} K_{1}\rightarrow 0

We can therefore put these chain complexes together to get

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{\epsilon_{1}} K_{1}\xrightarrow{i_{1}} F_{0}\xrightarrow{\epsilon} A\rightarrow 0

However, this sequence of modules and morphisms is not a chain complex since the image of \epsilon_{1} is not contained in the kernel of i_{1}. But if we compose these two maps together, we obtain

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{d_{1} }F_{0}\xrightarrow{\epsilon} A\rightarrow 0

where d_{1}=i_{1}\circ \epsilon_{1}. This is a chain complex as one may check. We can keep repeating the process indefinitely to obtain

...\xrightarrow{d_{3}} F_{2}\xrightarrow{d_{2} } F_{1}\xrightarrow{d_{1} } F_{0}\xrightarrow{\epsilon} A\rightarrow 0

This chain complex is called a free resolution of A. A free resolution is another example of an exact sequence.

We now introduce two more special kinds of modules.

A projective module is a module P such for any surjective morphism p: A\rightarrow A'' between two modules A and A'' and morphism h: P\rightarrow A'', there exists a morphism g: P\rightarrow A such that p\circ g=h.

It is a theorem that a module is projective if and only if it is a direct summand of a free module. This also means that a free module is automatically also projective.

An injective module is a module E such for any injective morphism i: A\rightarrow B between two modules A and B and morphism f: A\rightarrow E, there exists a morphism g: B\rightarrow E such that g\circ i=f.

Similar to our discussion regarding free resolutions earlier, we can also have projective resolutions and injective resolutions. A projective resolution is a chain complex

...\xrightarrow{d_{3}} P_{2}\xrightarrow{d_{2} } P_{1}\xrightarrow{d_{1} } P_{0}\xrightarrow{\epsilon} A\rightarrow 0

such that the P_{n} are projective modules.

Meanwhile, an injective resolution is a chain complex

...0\rightarrow A\xrightarrow{\eta} E^{0}\xrightarrow{d^{0} } E^{1}\xrightarrow{d^{1}} E^{2}\xrightarrow{d^{2}} ...

such that the E^{n} are injective modules.

Since projective and injective resolutions are chain complexes, we can use the methods of homology and cohomology to study them (Homology and Cohomology) even though they may not be made up of topological spaces. However, the usual procedure is to consider these chain complexes as forming an “abelian category” and then applying certain functors (see Category Theory) such as what are called the “Tensor” and “Hom” functors before applying the methods of homology and cohomology, resulting in what are known as “derived functors“. This is all part of the subject known as homological algebra.


Free Module on Wikipedia

Projective Module on Wikipedia

Injective Module on Wikipedia

Resolution on Wikipedia

An Introduction to Homological Algebra by Joseph J. Rotman

Abstract Algebra by David S. Dummit and Richard M. Foote

Exact Sequences

In Homology and Cohomology we introduced the idea of chain complexes to help us obtain information about topological spaces. We recall that a chain complex is made up of abelian groups of spaces C_{n} and boundary homomorphisms \partial_{n}: C_{n}\rightarrow C_{n-1} such that for all n the composition of successive boundary homomorphisms \partial_{n-1}\circ \partial_{n}: C_{n}\rightarrow C_{n-2} sends every element in C_{n} to the zero element in C_{n-2}.

Chain complexes can be expressed using the following diagram:


We now abstract this idea, generalizing it so that the groups C_{n} do not necessarily have to be topological spaces, and show an example of a chain complex that is ubiquitous in mathematics.

First we recall some ideas from Homology and Cohomology. Our “important principle” was summarized in the following statement:

All boundaries are cycles.

Boundaries in C_{n} are elements of the image of the boundary homomorphism \partial_{n+1}. Cycles in C_{n} are elements of the kernel of the boundary homomorphism \partial_{n}. Therefore, we can also state our “important principle” as follows:

\text{Im }\partial_{n+1}\subseteq \text{Ker }\partial_{n} for all n

This is of course just another restatement of the defining property of all chain complexes that two successive boundary functions when composed send every element of its domain to the zero element of its range.

There is an important kind of chain complex with the following property:

\text{Im }\partial_{n+1}=\text{Ker }\partial_{n} for all n

Such a chain complex is called an exact sequence. Sometimes we just say that the chain complex is exact. We will show some simple examples of exact sequences, but for these examples we will drop the notation of the boundary homomorphism \partial_{n} to show that many properties of ordinary functions can be expressed in terms of exact sequences.

Consider, for example, abelian groups A, B, and C. The identity elements of A, B, and C will be denoted by 0, writing 0\in A, 0\in B, and 0\in C if necessary. We will also write 0 to denote the trivial abelian group consisting only of the single element 0. Let us now look at the exact sequence

0\rightarrow A\xrightarrow{f} B

where 0\rightarrow A is the inclusion function sending 0\in 0 to 0\in A. The image of this inclusion function is therefore 0\in A. By the defining property of exact sequences, this is also the kernel of the function f:A\rightarrow B. In other words, f sends 0\in A to 0\in B. It is a property of group homomorphisms that whenever the kernel consists of only one element, the homomorphism is an injective, or one-to-one, function. This means that no more than one element of the domain gets sent to the same element in the range. Since this function is also a homomorphism, it is also called a monomorphism.

Meanwhile, let us also consider the exact sequence

B\xrightarrow{g} C\rightarrow 0

where C\rightarrow 0 is the “constant” function that sends any element in C to 0. The kernel of this constant function is therefore the entirety of C. By the defining property of exact sequences, this is also the image of the function B\rightarrow C. In other words, the image of the function g is the entirety of C, or we can also say that every element of C is assigned by g to some element of B. Such a function is called surjective, or onto. Since this function is also a homomorphism, it is also called an epimorphism.

The exact sequence

0\rightarrow A\xrightarrow{f} B\xrightarrow{g} C\rightarrow 0

is important in many branches of mathematics, and is called a short exact sequence. This means that f is a monomorphism, g is an epimorphism, and that \text{im }f=\text{ker g} in B. As an example of a short exact sequence of abelian groups, we have

0\rightarrow 2\mathbb{Z}\xrightarrow{f} \mathbb{Z}\xrightarrow{g} \mathbb{Z}/2\mathbb{Z}\rightarrow 0

(see also Modular Arithmetic and Quotient Sets). The monomorphism f takes the abelian group of even integers 2\mathbb{Z} and “embeds” them into the abelian group of the integers \mathbb{Z}. The epimorphism g then sends the integers in \mathbb{Z} to the element 0 in \mathbb{Z}/2\mathbb{Z} if they are even, and to the element 1 in  \mathbb{Z}/2\mathbb{Z} if they are odd. We see that every element in \mathbb{Z} that comes from 2\mathbb{Z}, i.e. the even integers, gets sent to the identity element or zero element 0 of the abelian group \mathbb{Z}/2\mathbb{Z}.

In the exact sequence

0\rightarrow A\xrightarrow{f} B\xrightarrow{g} C\rightarrow 0

The abelian group B is sometimes referred to as the extension of the abelian group C by the abelian group A.

We recall the definition of the homology groups H_{n}:

H_{n}=\text{Ker }\partial_{n}/\text{Im }\partial_{n+1}.

We can see from this definition that a chain complex is an exact sequence (we can also say that the chain complex is acyclic) if all of its homology groups are zero. So in a way, the homology groups “measure” how much a chain complex “deviates” from being an exact sequence.

We also have the idea of a long exact complex, which usually comes from the homology groups of chain complexes which themselves form a short exact sequence. In order to discuss this we first need the notion of a chain map between chain complexes. If we have a chain complex

...\xrightarrow{\partial_{A, n+3}}A_{n+2}\xrightarrow{\partial_{A, n+2}}A_{n+1}\xrightarrow{\partial_{A, n+1}}A_{n}\xrightarrow{\partial_{A, n}}A_{n-1}\xrightarrow{\partial_{A, n-1}}A_{n-2}\xrightarrow{\partial_{A, n-2}}...

and another chain complex

...\xrightarrow{\partial_{B, n+3}}B_{n+2}\xrightarrow{\partial_{B, n+2}}B_{n+1}\xrightarrow{\partial_{B, n+1}}B_{n}\xrightarrow{\partial_{B, n}}B_{n-1}\xrightarrow{\partial_{B, n-1}}B_{n-2}\xrightarrow{\partial_{B, n-2}}...

a chain map is given by homomorphisms

f_{n}: A_{n}\rightarrow B_{n} for all n

such that the homomorphisms f_{n} commute with the boundary homomorphisms \partial_{A, n} and \partial_{B, n}, i.e.

\partial_{B, n}\circ f_{n}=f_{n-1}\circ \partial_{A, n} for all n.

A short exact sequence of chain complexes is then a short exact sequence

0\rightarrow A_{n}\xrightarrow{f_{n}} B_{n}\xrightarrow{g_{n}} C_{n}\rightarrow 0 for all n

where the homomorphisms f_{n} and g_{n} satisfy the conditions for them to form a chain map, i.e. they commute with the boundary homomorphisms in the sense shown above.

In the case that we have a short exact sequence of chain complexes, their homology groups will then form a long exact sequence:

...\rightarrow H_{n}(A)\xrightarrow{f_{*}}H_{n}(B)\xrightarrow{g_{*}}H_{n}(C)\xrightarrow{\partial}H_{n-1}(A)\xrightarrow{f_{*}}...

Long exact sequences are often used for calculating the homology groups of complicated topological spaces related in some way to simpler topological spaces whose homology groups are already known.


Chain Complex on Wikipedia

Exact Sequence on Wikipedia

Algebraic Topology by Allen Hatcher

A Concise Course in Algebraic Topology by J. P. May

Abstract Algebra by David S. Dummit and Richard M. Foote

Modular Arithmetic and Quotient Sets

There is more than one way of counting. The one we are most familiar with goes like this:

\displaystyle 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,...

and so on getting to bigger and bigger numbers. The numbers are infinite of course, so with every new count we will be naming a new different number bigger than the previous one.

Another way, also familiar to us but one we don’t often pause to think about, goes like this:

\displaystyle 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3,...

I’m talking about the hours on a clock. This way of counting repeats itself, and there is only a finite set of numbers that it goes over.

If we can do arithmetic with ordinary numbers, so can we with the numbers on a clock. What is 11+2? In ordinary arithmetic, it is 13, but on a clock, it is 1. 1 is the remainder of 11+2 when divided by 12. This kind of arithmetic is called modular arithmetic, and it is often associated with one of the greatest mathematicians of all time, Carl Friedrich Gauss.

If the hands of a clock now point to 5, after 100 hours, where will it point? We do the procedure earlier, and get the remainder when 5+100=105 is divided by 12. We will then get 9. It is strange to talk of multiplication when referring to a clock, but we can do multiplication also in the same way if we want to. As for subtraction, we can ask, what is 5 o’clock minus say, 7 hours? We don’t say “-2 o’clock”. Instead we say that it is 10 o’clock. So there is a way of keeping the numbers positive: Just keep adding 12 until we get a positive number less than 12. This is also similar to the remainder procedure above. Essentially we just add or subtract 12 until we get a positive number less than or equal to 12. Later we will change our notation and instead choose non-negative numbers less than 12.

Division is too complicated to speak about for now. Instead I’ll just try to link what I said with the more formal aspects of mathematics. This set of “numbers on a clock” we will call \mathbb{Z}/12\mathbb{Z}. 12\mathbb{Z} means to the set of integer multiples of 12 like ...,-36, -24, -12, 0, 12, 24,36,... and so on. \mathbb{Z}/12\mathbb{Z} means that if two numbers differ by any number in the set 12\mathbb{Z}, we should consider them equivalent. The rule that specifies which numbers are to be considered equivalent to each other is called an equivalence relation.

So 13 o’clock is equivalent to 1 o’clock (not using military time here by the way) since they differ by 12, while 100 is equivalent to 4 since they differ by 96 which is a multiple of 12. For the purposes of notation, we write 13\sim 1 and 100\sim 4. Our equivalence relation in this case can be expressed by writing n+12\sim n for any integer n.

All the numbers that are equivalent to each other form an equivalence class. We can think of \mathbb{Z}/12\mathbb{Z} as the set of equivalence classes under the notion of equivalence that we have defined here. We can select “representatives” for every equivalence class for ease of notation; we choose, for convenience, that \displaystyle 0,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11 represent the respective equivalence classes which they belong to. Note that we chose 0 instead of 12 to represent the equivalence class which they belong to – while we’re used to saying 12 o’clock, mathematicians will usually choose 0 to “represent” all its other buddies that are equivalent to it.

We can think of the process of going from the set of integers \mathbb{Z} to the set of equivalence classes \mathbb{Z}/12\mathbb{Z} as being mediated by a function. A function simply assigns to every element in its domain another element from its range. So here the function assigns to every integer in \mathbb{Z} an equivalence class in \mathbb{Z}/12\mathbb{Z}. The set of integers that get sent to the equivalence class of 0, i.e. the set of integer multiples of 12, is called the kernel of this function.

\mathbb{Z}/12\mathbb{Z} is an example of a so-called quotient set. The rather confusing terminology comes from the fact that we used the group operation of addition to define our equivalence relation; since group operations often use multiplicative notation, the term quotient set makes sense in that context. In this case since our set also forms a group we refer to it also as a quotient group. If we discuss it together with multiplication, i.e. in the context of its structure as a ring, we can also refer to it as a quotient ring. (See also the previous posts Groups and Rings, Fields, and Ideals).

There are many important examples of quotient sets: \mathbb{Z}/2\mathbb{Z} can be thought of as just 0 and 1, reminiscent of “bits” in computer science and engineering. Alternatively, one may think of \mathbb{Z}/2\mathbb{Z} as a set of two equivalence classes; one is made up of all even numbers and the other is made up of all odd numbers. We also have \mathbb{R}/\mathbb{Z}, where \mathbb{R} is the real line. \mathbb{R}/\mathbb{Z} can be thought of as the circle; I won’t explain now why but one can have a fairly nice mental exercise trying to figure it out (or just check it out on one of the helpful references listed below).


Equivalence Class on Wikipedia

Quotient Group on Wikipedia

Quotient Ring on Wikipedia

Algebra by Michael Artin


Groups are some of the most basic concepts in mathematics. They are even more basic than the things we discussed in Rings, Fields, and Ideals. In fact, all these things require the concept of groups before they can even be defined rigorously. But apart from being a basic stepping stone toward other concepts, groups are also extremely useful on their own. They can be used to represent the permutations of a set. They can also be used to describe the symmetries of an object. Since symmetries are so important in physics, groups also play an important part in describing physical phenomena. The standard model of particle physics, for example, which describes the fundamental building blocks of our physical world such as quarks, electrons, and photons, is expressed as a “gauge theory” with symmetry group U(1)\times SU(2)\times SU(3).

We will not discuss something of this magnitude for now, although perhaps in the future we will (at least electromagnetism, which is a gauge theory with symmetry group U(1)). Our intention in this post will be to define rigorously the abstract concept of groups, and to give a few simple examples. Whatever application we have in mind when we have the concept of groups, it will have the same rigorous definition, and perhaps express the same idea at its very core.

First we will define what a law of composition means. We have been using this concept implicitly in previous posts, in concepts such as addition, subtraction, and multiplication. The law of composition makes these concepts more formal. We quote from the book Algebra by Michael Artin:

A law of composition is a function of two variables, or a map

\displaystyle S\times S\rightarrow S

Here S\times S\rightarrow S denotes, as always, the product set, whose elements are pairs a, b of elements of S.

There are many ways to express a law of composition. The familiar ones include

\displaystyle a+b=c

\displaystyle a\circ b=c

\displaystyle a\times b=c

\displaystyle ab=c

From the same book we now quote the definition of a group:

A group is a set G together with a law of composition that has the following properties:

  • The law of composition is associative: (ab)c=a(bc) for all ab, and c.
  • G contains an identity element 1, such that 1a=a and a1=a for all a in G.
  • Every element a of G has an inverse, an element b such that ab=1 and ba=1.

Note that the definition has used one particular notation for the law of composition, but we can use different symbols for the sake of convenience or clarity. This is merely notation and the definition of a group does not change depending on the notation that we use.

All this is rather abstract. Perhaps things will be made clearer by considering a few examples. For our first example, we will consider the set of permutations of the set with three elements which we label 1, 2, and 3. The first permutation is what we shall refer to as the identity permutation. This sends the element 1 to 1, the element 2 to 2, and the element 3 to 3.

Another permutation sends the element 1 to 2, the element 2 to 1, and the element 3 to 3. In other words, it exchanges the elements 1 and 2 while keeping the element 3 fixed. There are two other permutations which are similar in a way, one which exchanges 2 and 3 while keeping 1 fixed, and another permutation which exchanges 1 and 3 while keeping 2 fixed. To more easily keep track of these three permutations, we shall refer to them as “reflections”.

We have now enumerated four permutations. There are two more. One permutation sends 1 to 22 to 3, and 3 to 1. The last permutation sends 1 to 32 to 1, and 3 to 2. Just as we have referred to the earlier three permutations as “reflections”, we shall now refer to these last two permutations as “rotations”.

We now have a total of six permutations, which agrees with the result one can find from combinatorics. Our claim is that these six permutations form a group, with the law of composition given by performing first one permutation followed by the other. Therefore the reflection that exchanges 2 and 3, followed by the reflection that exchanges 1 and 3, is the same as the rotation that sends 1 to 32 to 1, and 3 to 2, as one may check.

We can easily verify two of the properties required for a set to form a group. There exists an identity element in our set of permutations, namely the identity permutation. Permuting the three elements 1, 2, and 3 via the identity permutation (i.e. doing nothing) followed by a rotation or reflection is the same as just applying the rotation or reflection alone. Similarly, applying a rotation or reflection, and then applying the identity permutation is the same as applying just the rotation or reflection alone.

Next we show that every element has an inverse. The rotation that sends 1 to 22 to 3, and 3 to 1 followed by the rotation that sends 1 to 32 to 1, and 3 to 2 results in the identity permutation. Also the rotation that sends 1 to 32 to 1, and 3 to 2 followed by the rotation that sends 1 to 22 to 3, and 3 to 1 results in the identity permutation once again. Therefore we see that the two rotations are inverses of each other. As for the reflections, we can see that doing the same reflection twice results in the identity permutation. Every reflection has itself as its inverse, and of course the same thing holds for the identity permutation.

The associative property holds for the set of permutations of three elements, but we will not prove this statement explicitly in this post, as it is perhaps best done by figuring out the law of composition for all the permutations, i.e. by figuring out which permutations result from performing two permutations successively. This will result in something that is analogous to a “multiplication table”. With all three properties shown to hold, the set of permutations of three elements forms a group, called the symmetric group S_{3}.

Although the definition of a group requires the law of composition to be associative, it does not require it to be commutative; for our example, two successive permutations might not give the same result when performed in the reverse order. When the law of composition of a group is commutative, the group is called an abelian group.

An example of an abelian group is provided by the integers, with the law of composition given by addition. Appropriately, we use the symbol + to denote this law of composition. The identity element is provided by 0, and the inverse of an integer n is provided by the integer -n. We already know from basic arithmetic that addition is both associative and commutative, so this guarantees that under addition the integers form a group and moreover form an abelian group (sometimes called the additive group of integers).

That’s it for now, but the reader is encouraged to explore more about groups since the concept can be found essentially everywhere in mathematics. For example, the positive real numbers form a group under multiplication. The reader might want to check if they really do satisfy the three properties required for a set to form a group. Another thing to think about is the group of permutations of the set with three elements, and how they relate to the symmetries of an equilateral triangle. Once again the book of Artin provides a very reliable technical discussion of groups, but one more accessible book that stands out in its discussion of groups is Love and Math: The Heart of Hidden Reality by Edward Frenkel, which is part exposition and part autobiography. The connections between groups, symmetries, and physics are extensively explored in that book, as the author’s research explores the connection between quantum mechanics and the Langlands program, an active field of mathematical research where groups once again play a very important role. More on groups are planned for future posts on this blog.


Groups on Wikipedia

Symmetric Group on Wikipedia

Dihedral Group on Wikipedia

Abelian Group on Wikipedia

Algebra by Michael Artin

Love and Math: The Heart of Hidden Reality by Edward Frenkel