Homotopy Theory

In Basics of Topology and Continuous Functions, we discussed how to put an “arrangement” or “organization” on sets using the concept of a topology. We also mentioned the concept of a continuous function, which expresses the idea of a mapping between sets that in some way preserves the arrangement or organization.

What we think of as a “shape” – a circle, a sphere, a cube, or some other complicated shape, like the shape of a living organism – can be thought of as the set of the points that make it up. Using Cartesian coordinates on a plane, for example, a circle with radius R is the set of all points whose coordinates are pairs of numbers (x, y) that satisfy the equation x^2+y^2=R^2. Similarly, in a three-dimensional space, a sphere with radius R is the set of points whose coordinates are triples of numbers (x, y, z) that satisfy the equation x^2+y^2+z^2=R^2. The plane itself and the three-dimensional space, in a sense, are also shapes, or geometric objects.

Since shapes are sets, we can put a topology on them. To accomplish this we usually use the concept of a metric, which is a way of expressing an idea of a distance between points. The so-called metric topology then puts an arrangement on this set by grouping together points which are close together into “neighborhoods”, a concept we have also already discussed. We can then think of a continuous function between shapes as a function that sends points close together on one shape to points that are also close together on the other shape.

We also defined a homeomorphism as a continuous function with a continuous inverse. When a function has an inverse, it is also called a bijective function, or a bijection. A bijection between two sets expresses the idea that every element of one set can be paired with exactly one element from the other set, and vice-versa, with no elements of either set left unpaired. If a homeomorphism exists between two topological spaces, we say that the two topological spaces are homeomorphic.

All this means that a homeomorphism can then be thought of as a deformation between two shapes, without any gluing or tearing involved, since gluing or tearing would ruin our arrangement of neighborhoods of points on the shapes. Simply stretching or compressing the shape would change distances between points, but the points would still belong to the same neighborhood. If we had, say, a class of students, and we make them transfer seats to leave a space of one empty chair between each student, the distance between them would change, but if Bob was seated between Alice and Charlie before, he would still be seated in between them after. His closest neighbors would still remain his closest neighbors, although they are a little less closer to each other than they were before.

Related to the notion of homeomorphism are the notions of homotopy and homotopy equivalence. A homotopy can be thought of as a deformation between functions. First one needs an “interval”; formally this is chosen to be the closed interval [0,1]. This can be thought of as the “time” for the deformation to take place: We have a function which is f at the “start time” 0, g at the “end time” 1 and as the “time” runs from 0 to 1 the function is “deformed” continuously from f to g. We say that there is a homotopy from f to g, or that f and g are homotopic.

If we have a continuous function f from the topological space X to the topological space Y, and another continuous function g from Y to X, we can form the function g\circ f from X to X by composition. Similarly we can form we can form the function f\circ g from Y to Y. If the function g\circ f is homotopic to the identity function (the function that sends every element to itself) on X, and if the function f\circ g is homotopic to the identity function on Y, then we say that f and g are homotopy equivalences, and that X and Y are homotopy equivalent.

Topological spaces that are homeomorphic are automatically homotopy equivalent. However, it is possible for topological spaces to be homotopy equivalent but not homeomorphic. Examples of spaces that are homotopy equivalent but not homeomorphic can be found in the references listed at the end of this post.

The continuous functions from X to Y which are homotopic to each other can be considered equivalent (see also the discussion in Modular Arithmetic and Quotient Sets), and they can form equivalence classes. The set of all equivalence classes of continuous functions from X to Y is then denoted [X,Y].

Of particular importance is the set formed when X is the n-dimensional sphere, written S^{n}. In this notation the ordinary circle is S^{1}, and the ordinary sphere is S^{2}. We are used to thinking of the circle as being embedded in a two-dimensional space and the sphere as being embedded in a three-dimensional space, but since we are only looking at the “surface” of these shapes and not their interior, their dimensions are reduced by one from the spaces they are embedded in. There is also the zero-dimensional sphere S^{0}; this is just a set with two points. One can see this by considering the equations for the circle and sphere above; we go down one variable and set the zero-dimensional sphere of radius R to be the points on the line which satisfy the equation x^2=R^2, which gives us the two points x=R and x=-R.

After fixing “basepoints” on S^{n} and Y, the set [S^{n},Y] forms a group (see Groups). It is called the n-th homotopy group of Y, and is written \pi_{n}(Y). The first homotopy group Y\pi_{1}(Y), has a special name; it is called the fundamental group of Y. The fundamental group of a space can be thought of as the group of equivalence classes of loops on Y which begin and end at the chosen basepoint, with loops considered equivalent if they can be deformed into each other. The identity element of the fundamental group is the equivalence class of the loops on Y which can be deformed into a point. There is also the zeroth homotopy group, \pi_{0}(Y); although we do not have a special name for this homotopy group, it serves an important role, since it keeps track of how many “pieces” make up our space Y.

A space Y for which \pi_{0}(Y) is the trivial group (the group with only one element, which is the identity element) is called path connected. When a space is path connected, any two points p and q may be connected by a path, which is a continuous function f from the interval [0,1] to the space Y for which f(0)=p and f(1)=q. In a way, we may think of a path connected space as being made up of only one “piece”.

A path connected space Y for which \pi_{1}(Y) is also the trivial group is called simply connected. A simply connected space is one in which all loops can be deformed into a point. A plane is a simply connected space; however if we punch a “hole” in it it is no longer simply connected. Similarly, the surface of a sphere is also a simply connected space; the surface of a donut, called a torus, however, is not simply connected.

In a path connected space, any two points can be connected by a path. Similarly, in a simply connected space, any two paths with the same endpoints can be connected by a “path of paths”. More on this point of view of homotopy theory can be found in Vladimir Voevodsky’s lecture at the 2002 Clay Mathematics Institute annual meeting which is listed among the references below. The lecture also discusses some basic category theory and algebraic geometry, using fairly intuitive language, without sacrificing the important ideas behind both subjects.

Homotopy theory, which we have barely scratched the surface of in this post, is just one part of the subject called algebraic topology. The name of the subject comes from the use of concepts from abstract algebra, such as groups, to study topological spaces. Other parts of the subject are homology theory and cohomology theory. All three are related to each other and have found applications in other branches of mathematics aside from studying topological spaces. They have been used, for example, to study aspects of linear algebra, via the subjects of homological algebra and algebraic k-theory. In physics, topological concepts applied to the study of phase transitions in condensed matter physics earned the trio of David J. Thouless, F. Duncan M. Haldane, and J. Michael Kosterlitz the 2016 Nobel Prize in Physics. There are also many promising applications of topology to more fundamental aspects of theoretical physics.

More on homotopy theory and algebraic topology can be found in the books Algebraic Topology by Allen Hatcher and More Concise Algebraic Topology by J. P. May, both freely and legally available online and also listed among the references below.


An Intuitive Introduction to Motivic Homotopy Theory by Vladimir Voevodsky (Video)

An Intuitive Introduction to Motivic Homotopy Theory by Vladimir Voevodsky (Notes)

Topology on Wikipedia

Homotopy on Wikipedia

Homotopy Group on Wikipedia

Algebraic Topology by Allen Hatcher

A Concise Course in Algebraic Topology by J. P. May

Modular Arithmetic and Quotient Sets

There is more than one way of counting. The one we are most familiar with goes like this:

\displaystyle 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,...

and so on getting to bigger and bigger numbers. The numbers are infinite of course, so with every new count we will be naming a new different number bigger than the previous one.

Another way, also familiar to us but one we don’t often pause to think about, goes like this:

\displaystyle 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3,...

I’m talking about the hours on a clock. This way of counting repeats itself, and there is only a finite set of numbers that it goes over.

If we can do arithmetic with ordinary numbers, so can we with the numbers on a clock. What is 11+2? In ordinary arithmetic, it is 13, but on a clock, it is 1. 1 is the remainder of 11+2 when divided by 12. This kind of arithmetic is called modular arithmetic, and it is often associated with one of the greatest mathematicians of all time, Carl Friedrich Gauss.

If the hands of a clock now point to 5, after 100 hours, where will it point? We do the procedure earlier, and get the remainder when 5+100=105 is divided by 12. We will then get 9. It is strange to talk of multiplication when referring to a clock, but we can do multiplication also in the same way if we want to. As for subtraction, we can ask, what is 5 o’clock minus say, 7 hours? We don’t say “-2 o’clock”. Instead we say that it is 10 o’clock. So there is a way of keeping the numbers positive: Just keep adding 12 until we get a positive number less than 12. This is also similar to the remainder procedure above. Essentially we just add or subtract 12 until we get a positive number less than or equal to 12. Later we will change our notation and instead choose non-negative numbers less than 12.

Division is too complicated to speak about for now. Instead I’ll just try to link what I said with the more formal aspects of mathematics. This set of “numbers on a clock” we will call \mathbb{Z}/12\mathbb{Z}. 12\mathbb{Z} means to the set of integer multiples of 12 like ...,-36, -24, -12, 0, 12, 24,36,... and so on. \mathbb{Z}/12\mathbb{Z} means that if two numbers differ by any number in the set 12\mathbb{Z}, we should consider them equivalent. The rule that specifies which numbers are to be considered equivalent to each other is called an equivalence relation.

So 13 o’clock is equivalent to 1 o’clock (not using military time here by the way) since they differ by 12, while 100 is equivalent to 4 since they differ by 96 which is a multiple of 12. For the purposes of notation, we write 13\sim 1 and 100\sim 4. Our equivalence relation in this case can be expressed by writing n+12\sim n for any integer n.

All the numbers that are equivalent to each other form an equivalence class. We can think of \mathbb{Z}/12\mathbb{Z} as the set of equivalence classes under the notion of equivalence that we have defined here. We can select “representatives” for every equivalence class for ease of notation; we choose, for convenience, that \displaystyle 0,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11 represent the respective equivalence classes which they belong to. Note that we chose 0 instead of 12 to represent the equivalence class which they belong to – while we’re used to saying 12 o’clock, mathematicians will usually choose 0 to “represent” all its other buddies that are equivalent to it.

We can think of the process of going from the set of integers \mathbb{Z} to the set of equivalence classes \mathbb{Z}/12\mathbb{Z} as being mediated by a function. A function simply assigns to every element in its domain another element from its range. So here the function assigns to every integer in \mathbb{Z} an equivalence class in \mathbb{Z}/12\mathbb{Z}. The set of integers that get sent to the equivalence class of 0, i.e. the set of integer multiples of 12, is called the kernel of this function.

\mathbb{Z}/12\mathbb{Z} is an example of a so-called quotient set. The rather confusing terminology comes from the fact that we used the group operation of addition to define our equivalence relation; since group operations often use multiplicative notation, the term quotient set makes sense in that context. In this case since our set also forms a group we refer to it also as a quotient group. If we discuss it together with multiplication, i.e. in the context of its structure as a ring, we can also refer to it as a quotient ring. (See also the previous posts Groups and Rings, Fields, and Ideals).

There are many important examples of quotient sets: \mathbb{Z}/2\mathbb{Z} can be thought of as just 0 and 1, reminiscent of “bits” in computer science and engineering. Alternatively, one may think of \mathbb{Z}/2\mathbb{Z} as a set of two equivalence classes; one is made up of all even numbers and the other is made up of all odd numbers. We also have \mathbb{R}/\mathbb{Z}, where \mathbb{R} is the real line. \mathbb{R}/\mathbb{Z} can be thought of as the circle; I won’t explain now why but one can have a fairly nice mental exercise trying to figure it out (or just check it out on one of the helpful references listed below).


Equivalence Class on Wikipedia

Quotient Group on Wikipedia

Quotient Ring on Wikipedia

Algebra by Michael Artin


Groups are some of the most basic concepts in mathematics. They are even more basic than the things we discussed in Rings, Fields, and Ideals. In fact, all these things require the concept of groups before they can even be defined rigorously. But apart from being a basic stepping stone toward other concepts, groups are also extremely useful on their own. They can be used to represent the permutations of a set. They can also be used to describe the symmetries of an object. Since symmetries are so important in physics, groups also play an important part in describing physical phenomena. The standard model of particle physics, for example, which describes the fundamental building blocks of our physical world such as quarks, electrons, and photons, is expressed as a “gauge theory” with symmetry group U(1)\times SU(2)\times SU(3).

We will not discuss something of this magnitude for now, although perhaps in the future we will (at least electromagnetism, which is a gauge theory with symmetry group U(1)). Our intention in this post will be to define rigorously the abstract concept of groups, and to give a few simple examples. Whatever application we have in mind when we have the concept of groups, it will have the same rigorous definition, and perhaps express the same idea at its very core.

First we will define what a law of composition means. We have been using this concept implicitly in previous posts, in concepts such as addition, subtraction, and multiplication. The law of composition makes these concepts more formal. We quote from the book Algebra by Michael Artin:

A law of composition is a function of two variables, or a map

\displaystyle S\times S\rightarrow S

Here S\times S\rightarrow S denotes, as always, the product set, whose elements are pairs a, b of elements of S.

There are many ways to express a law of composition. The familiar ones include

\displaystyle a+b=c

\displaystyle a\circ b=c

\displaystyle a\times b=c

\displaystyle ab=c

From the same book we now quote the definition of a group:

A group is a set G together with a law of composition that has the following properties:

  • The law of composition is associative: (ab)c=a(bc) for all ab, and c.
  • G contains an identity element 1, such that 1a=a and a1=a for all a in G.
  • Every element a of G has an inverse, an element b such that ab=1 and ba=1.

Note that the definition has used one particular notation for the law of composition, but we can use different symbols for the sake of convenience or clarity. This is merely notation and the definition of a group does not change depending on the notation that we use.

All this is rather abstract. Perhaps things will be made clearer by considering a few examples. For our first example, we will consider the set of permutations of the set with three elements which we label 1, 2, and 3. The first permutation is what we shall refer to as the identity permutation. This sends the element 1 to 1, the element 2 to 2, and the element 3 to 3.

Another permutation sends the element 1 to 2, the element 2 to 1, and the element 3 to 3. In other words, it exchanges the elements 1 and 2 while keeping the element 3 fixed. There are two other permutations which are similar in a way, one which exchanges 2 and 3 while keeping 1 fixed, and another permutation which exchanges 1 and 3 while keeping 2 fixed. To more easily keep track of these three permutations, we shall refer to them as “reflections”.

We have now enumerated four permutations. There are two more. One permutation sends 1 to 22 to 3, and 3 to 1. The last permutation sends 1 to 32 to 1, and 3 to 2. Just as we have referred to the earlier three permutations as “reflections”, we shall now refer to these last two permutations as “rotations”.

We now have a total of six permutations, which agrees with the result one can find from combinatorics. Our claim is that these six permutations form a group, with the law of composition given by performing first one permutation followed by the other. Therefore the reflection that exchanges 2 and 3, followed by the reflection that exchanges 1 and 3, is the same as the rotation that sends 1 to 32 to 1, and 3 to 2, as one may check.

We can easily verify two of the properties required for a set to form a group. There exists an identity element in our set of permutations, namely the identity permutation. Permuting the three elements 1, 2, and 3 via the identity permutation (i.e. doing nothing) followed by a rotation or reflection is the same as just applying the rotation or reflection alone. Similarly, applying a rotation or reflection, and then applying the identity permutation is the same as applying just the rotation or reflection alone.

Next we show that every element has an inverse. The rotation that sends 1 to 22 to 3, and 3 to 1 followed by the rotation that sends 1 to 32 to 1, and 3 to 2 results in the identity permutation. Also the rotation that sends 1 to 32 to 1, and 3 to 2 followed by the rotation that sends 1 to 22 to 3, and 3 to 1 results in the identity permutation once again. Therefore we see that the two rotations are inverses of each other. As for the reflections, we can see that doing the same reflection twice results in the identity permutation. Every reflection has itself as its inverse, and of course the same thing holds for the identity permutation.

The associative property holds for the set of permutations of three elements, but we will not prove this statement explicitly in this post, as it is perhaps best done by figuring out the law of composition for all the permutations, i.e. by figuring out which permutations result from performing two permutations successively. This will result in something that is analogous to a “multiplication table”. With all three properties shown to hold, the set of permutations of three elements forms a group, called the symmetric group S_{3}.

Although the definition of a group requires the law of composition to be associative, it does not require it to be commutative; for our example, two successive permutations might not give the same result when performed in the reverse order. When the law of composition of a group is commutative, the group is called an abelian group.

An example of an abelian group is provided by the integers, with the law of composition given by addition. Appropriately, we use the symbol + to denote this law of composition. The identity element is provided by 0, and the inverse of an integer n is provided by the integer -n. We already know from basic arithmetic that addition is both associative and commutative, so this guarantees that under addition the integers form a group and moreover form an abelian group (sometimes called the additive group of integers).

That’s it for now, but the reader is encouraged to explore more about groups since the concept can be found essentially everywhere in mathematics. For example, the positive real numbers form a group under multiplication. The reader might want to check if they really do satisfy the three properties required for a set to form a group. Another thing to think about is the group of permutations of the set with three elements, and how they relate to the symmetries of an equilateral triangle. Once again the book of Artin provides a very reliable technical discussion of groups, but one more accessible book that stands out in its discussion of groups is Love and Math: The Heart of Hidden Reality by Edward Frenkel, which is part exposition and part autobiography. The connections between groups, symmetries, and physics are extensively explored in that book, as the author’s research explores the connection between quantum mechanics and the Langlands program, an active field of mathematical research where groups once again play a very important role. More on groups are planned for future posts on this blog.


Groups on Wikipedia

Symmetric Group on Wikipedia

Dihedral Group on Wikipedia

Abelian Group on Wikipedia

Algebra by Michael Artin

Love and Math: The Heart of Hidden Reality by Edward Frenkel

Vector Spaces, Modules, and Linear Algebra

Let’s take a little trip back in time to grade school mathematics. What is five apples plus three apples? Easy, the answer is eight apples. What about two oranges plus two oranges? The answer is four oranges. What about three apples plus two oranges? Wait, that question is literally “apples and oranges”! But we can still answer that question of course. Three apples plus two oranges is three apples and two oranges. Does that sound too easy? We ramp it up just a little bit: What is three apples and two oranges, plus one apple and five oranges? The answer is four apples and seven oranges. Even if we’re dealing with two objects we’re not supposed to mix together, we can still do mathematics with them, as long as we treat each object separately.

Such an idea can be treated with the concept of vector spaces. Another application of this concept is to quantities with magnitude and direction in physics, where the concept actually originated. Yet another application is to quantum mechanics, where things can be simultaneously on and off, or simultaneously pointing up and down, or simultaneously be in a whole bunch of different states we would never think of being capable of existing together simultaneously. But what, really, is a vector space?

We can think of vector spaces as sets of things that can be added to or subtracted from each other, or scaled up or scaled down, or combinations of all these. To make all these a little easier, we stay in the realm of what are called “finite dimensional” vector spaces, and we develop for this purpose a little notation. We go back to the example we set out at the start of this post, that of the apples and oranges. Say for example that we have three apples and two oranges. We will write this as

\displaystyle \left(\begin{array}{c}3\\2\end{array}\right)

Now, say we want to add to this quantity, one more apple and five oranges. We write

\displaystyle \left(\begin{array}{c}3\\2\end{array}\right)+\left(\begin{array}{c}1\\5\end{array}\right)

Of course this is easy to solve, and we have already done the calculation earlier. We have

\displaystyle \left(\begin{array}{c}3\\2\end{array}\right)+\left(\begin{array}{c}1\\5\end{array}\right)=\left(\begin{array}{c}4\\7\end{array}\right)

But we also said we can “scale” such a quantity. So suppose again that we have three apples and two oranges. If we were to double this quantity, what would we have? We would have six apples and four oranges. We write this operation as

\displaystyle 2\left(\begin{array}{c}3\\2\end{array}\right)=\left(\begin{array}{c}6\\4\end{array}\right)

We can also “scale down” such a quantity. Suppose we want to cut in half our amount of three apples and two oranges. We would have one and a half apples (or three halves of an apple) and one orange:

\displaystyle \frac{1}{2}\left(\begin{array}{c}3\\2\end{array}\right)=\left(\begin{array}{c}\frac{3}{2}\\1\end{array}\right)

We can also apply what we know of negative numbers – we can for example think of a negative amount of something as being like a “debt”. With this we can now add subtraction to the operations that we can do to vector spaces. For example, let us subtract from our quantity of three apples and two oranges the quantity of one apple and five oranges. We will be left with two apples and a “debt” of three oranges. We write

\displaystyle \left(\begin{array}{c}3\\2\end{array}\right)-\left(\begin{array}{c}1\\5\end{array}\right)=\left(\begin{array}{c}2\\-3\end{array}\right)

Finally, we can combine all these operations:

\displaystyle 2\left(\left(\begin{array}{c}3\\2\end{array}\right)+\left(\begin{array}{c}1\\5\end{array}\right)\right)=2\left(\begin{array}{c}4\\7\end{array}\right)=\left(\begin{array}{c}8\\14\end{array}\right)

For vector spaces, the “scaling” operation possesses a property analogous to the distributive property of multiplication over addition. So if we wanted to, we could also have performed the previous operation in another way, which gives the same answer:

\displaystyle 2\left(\left(\begin{array}{c}3\\2\end{array}\right)+\left(\begin{array}{c}1\\5\end{array}\right)\right)=2\left(\begin{array}{c}3\\2\end{array}\right)+2\left(\begin{array}{c}1\\5\end{array}\right)=\left(\begin{array}{c}6\\4\end{array}\right)+\left(\begin{array}{c}2\\10\end{array}\right)=\left(\begin{array}{c}8\\14\end{array}\right)

We can also apply this notation to problems in physics. Suppose a rigid object acted on by a force of one Newton to the north and another force of one Newton to the east. Then adopting a convention of Cartesian coordinates with the positive x-axis oriented towards the east, we can calculate the resultant force acting on the object as follows

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)+\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}1\\1\end{array}\right)

This is actually a force with a magnitude of around 1.414 Newtons, with a direction pointing towards the northeast, but a discussion of such calculations will perhaps be best left for future posts. For now, we want to focus on the two important properties of vector spaces, its being closed under the operations of addition and multiplication by a scaling factor, or “scalar”.

In Rings, Fields, and Ideals, we discussed what it means for a set to be closed under certain operations. A vector space is therefore a set that is closed under addition among its own elements and under multiplication by a “scalar”, which is an element of a field, a concept we discussed in the same post linked to above. A set that is closed under addition among its own elements and multiplication by a scalar which is a ring instead of a field is called a module. Another concept we discussed in Rings, Fields, and Ideals and also in More on Ideals is the concept of an ideal. An ideal is a module which is also a subset of its ring of scalars.

Whenever we talk about sets, it is always important to also talk about the functions between such sets. A vector space (or a module) is just a set with special properties, namely closure under addition and scalar multiplication, therefore we want to talk about functions that are related to these properties. A linear transformation is a function between two vector spaces or modules that “respect” addition and scalar multiplication. Let u and v be any two elements of a vector space or a module, and let a be any element of their field or ring of scalars. By the properties defining vector spaces and modules, u+v and av are also elements of the same vector space or module. A function f between two vector spaces or modules is called a linear transformation if

\displaystyle f(u+v)=f(u)+f(v)

\displaystyle f(av)=af(v)

Linear transformations are related to the equation of a line in Cartesian geometry, and they give the study of vector spaces and modules its name, linear algebra. For certain types of vector spaces or modules, linear transformations can be represented by nifty little gadgets called matrices, which are rectangular arrays of elements of the field or ring of scalars. The vectors (elements of vector spaces) which we have featured in this post can be thought of as matrices with only a single column, or sometimes called column matrices. We will not discuss matrices in this post, although perhaps in the future we will; they can be found, along with many other deeper aspects of linear algebra, in most textbooks on linear algebra or abstract algebra such as Linear Algebra Done Right by Sheldon Axler or Algebra by Michael Artin.


Vector Space in Wikipedia

Module on Wikipedia

Linear Algebra Done Right by Sheldon Axler

Algebra by Michael Artin

Basics of Topology and Continuous Functions

Informally, a topology is a kind of “arrangement” or “organization” that we put on a set. One can think of an analogy with an army, which is made up of soldiers organized into squads, which are in turn organized into platoons, and so forth. Topology accomplishes this by organizing subsets of a set into “open sets” and “closed sets”. We quote here the rigorous definition of a topology following the book Topology by James R. Munkres:

A topology on a set X is a collection \mathcal{T} of subsets of X having the following properties:

(1)  \varnothing and X are in \mathcal{T}.

(2) The union of the elements of any subcollection of \mathcal{T} is in \mathcal{T}.

(3) The intersection of the elements of any finite subcollection of \mathcal{T} is in \mathcal{T}.

A set X for which a topology \mathcal{T} has been specified is called a topological space.

From the same book, we have the following definition of an open set:

If X is a topological space with a topology \mathcal{T}, we say that a subset U of X is an open set of X if U belongs to the collection \mathcal{T}.

We also have the following definition of a closed set:

A subset A of a topological space X is said to be closed if the set  X-A is open.

We note that the notation X-A here refers to the complement of A in X, i.e., the set X-A is the set of all elements of the set X that are not also elements of the set A.

Our definition of open sets and closed sets has some results that seem rather weird at first glance. By definition, both the entire set X and the empty set \varnothing are open. But X-\varnothing is just X, which is open; therefore, by our definition of closed sets, \varnothing is closed. Similarly, since X-X is the empty set \varnothing, which is again open, we find that the entire set X is also closed. Therefore, the sets X and \varnothing are both open and closed!

This only seems paradoxical because we are used to thinking of the words open and closed as being opposites; such may be the case in real life, but for our purposes these words are merely terminology that we use in order to organize our set; therefore, it should not be troubling for us to find a set both closed and open (some refer to such a set as a “clopen” set). There also exist examples of sets in some topologies being neither closed nor open.

Now we show one example of putting a topology on a set. We consider the set with two elements, which we shall refer to as 0 and 1. This set has the following subsets:

\displaystyle \varnothing

\displaystyle \{0\}

\displaystyle \{1\}

\displaystyle \{0,1\}

We shall now put a topology on this set. By the definition of a topology, the subsets \{0,1\}, which is the entire set, along with the empty set \varnothing have to be open. By the result we discovered earlier, they must also both be closed. We now have a choice of what to do with the remaining subsets, \{0\} and  \{1\}. If we declare them to both be open, then all the subsets are open sets. We call a topology where all subsets are open the discrete topology. It so happens that if we do this, both \{0\} and  \{1\} will also be closed by the definition of topology. So putting the discrete topology on this set with two elements makes all subsets both open and closed.

We can also not declare anything on the two sets \{0\} and  \{1\}; this will make them neither open nor closed, and only the entire set and the empty set are declared open (they also happen to be closed). Such a topology where only the entire set and empty set are declared to be open (which they are forced to be, by definition) is called the trivial topology.

Finally, we can declare just one of \{0\} and  \{1\} to be open. There are two different ways of doing this; we can declare either \{0\} to be open, in which case  \{1\} will be closed, or we can declare  \{1\} to be open, in which case \{0\} will be closed. A two element set where one of the one-element subset is declared to be open, rendering the other one-element subset closed, is called a Sierpinski space.

We tackle one more example. Consider the set of all real numbers \mathbb{R}, also called the real line. We will put a topology on the real line, but first there is one more concept that we need to define. Let a and b be two real numbers, where b is greater than a (also written, of course, as a<b). The set of all real numbers which are greater than a but less than b is denoted by (a, b). Note that a and b themselves are not included in the set (a, b). We call sets such as these open intervals. If instead we consider the set of real numbers greater than or equal to a but less than or equal to b, then we write [a, b], and both a and b are now included in [a, b]. Such sets are called closed intervals.

We now go back to putting a topology on the real line. As may be suggested by the naming, we now declare that all sets that are unions of open intervals, including of course the open intervals themselves, to be the open sets of our topology. This will make the closed intervals, including the sets consisting of only one real number, into closed sets. We will not explain why in this post, but it always goes back to the definitions of topology, open set, and closed set. This topology that we have defined on the real line is called its standard topology.

Now that we know the concept of an open interval, there is another related concept that we will introduce in this post. We stay in the context of real numbers and the real line. Let x be a real number, and let \epsilon be a positive real number. The open interval (x-\epsilon, x+\epsilon) is an example of what we call a neighborhood of x. It consists of all real numbers whose difference from x is less than \epsilon, or we may think of it them as being less than a distance \epsilon away from x. This motivates the terminology of “neighborhood”, even though \epsilon can be as small or as big as we want.

The concept of a neighborhood plays a big role in a very common kind of topology called a metric topology. It plays a big role in the modern foundations of calculus and geometry.

Recall that a function is a mapping between sets, in the sense that it assigns to every element in a set called its domain an element of another set called its range. As per our definitions above, a topological space is just a set  for which a topology is specified, so we can talk about functions between topological spaces. The topologies on the sets involved will allow us to define an important kind of function between topological spaces, called a continuous function. Once more we refer to the book of Munkres:

Let X and Y be topological spaces. A function f: X \rightarrow Y is said to be continuous if for each open subset V of Y, the set f^{-1}(V) is an open subset of X

Recall that f^{-1}(V) is the set of all points x of X for which f(x)\in V; it is empty if V does not intersect the image set f(X) of f.

Continuity of a function depends not only upon the function f itself, but also on the topologies specified for its domain and range. If we wish to emphasize this fact, we can say that f is continuous relative to specific topologies on X and Y.

A continuous function with a continuous inverse is called a homeomorphism.

This is the most basic definition of continuity of a function. However, depending on the topologies on the domain and range, there may be several equivalent definitions, all deriving from this one most basic definition, that will shed light on certain concepts of importance for the topological spaces that we are studying. We state here one important equivalent definition for the case of functions from real numbers to real numbers, with the set of real numbers equipped with the standard topology discussed earlier.

A function f: \mathbb{R} \rightarrow \mathbb{R} is said to be continuous if for any two real numbers x and y and a positive real number \epsilon, there exists another positive real number \delta such that whenever |x-y| is less than \delta, then it is guaranteed that |f(x)-f(y)| is less than \epsilon. The notation |x-y| stands for the absolute value of x-y; if x is greater than y then it is simply equal to x-y, but if y is greater than x then it is instead equal to y-x. The same applies to |f(x)-f(y)|. If f(x) is greater than f(y) then it is equal to f(x)-f(y) but if f(y) is greater than f(x) then it is equal to f(y)-f(x).

The idea that this definition of continuity is supposed to communicate, is that we can always produce as small a change as we want in the “output” of the function as long as we make a change in the “input” that is sufficiently small enough. We can think of functions that are not continuous as having abrupt “jumps” such that even if we make the smallest of changes in the input we still cannot make the output change slowly enough with respect to this change in the input.

It is important to remind ourselves, once again, that this latter definition of continuity follows from the one most basic definition of continuity we have defined earlier, we have simply specialized it to the case where the domain and range is the set of real numbers, and we have equipped this set with its standard topology. We have not discussed explicitly how exactly to relate the two definitions here, but the inquisitive reader can find it and much more in the book of Munkres.


General Topology on Wikipedia

Sierpinski Space on Wikipedia

Continuous Function on Wikipedia

Topology by James R. Munkres

From Pythagoras to Einstein

The Pythagorean theorem is one of the most famous theorems in all of mathematics. Even people who are not very familiar with much of mathematics are at least familiar with the Pythagorean theorem, especially since its grade school level stuff. It’s also one of the most ancient theorems, obviously known to the ancient Greeks, although it may not have been invented by Pythagoras himself – it may go back to an even earlier time in history.

We recall the statement of the Pythagorean theorem. Suppose we have a right triangle, a triangle where one of the three angles is a right angle. The side opposite the right angle is called the “hypotenuse”, and we will use the symbol c to signify its length. It is always the longest among the three sides. The other two sides are called the altitude, whose length we symbolize by a, and the base, whose length we symbolize by b. The Pythagorean theorem relates the length of these three sides, so that given the lengths of two sides we can calculate the length of the remaining side.

\displaystyle a^2+b^2=c^2

Later on, when we learn about Cartesian coordinates, the Pythagorean theorem is used to derive the so-called “distance formula”. Let’s say we have a point A with coordinates \displaystyle (x_{1}, y_{1}), and another point B with coordinates \displaystyle (x_{2}, y_{2}). The distance between point A and point B is given by

\displaystyle \text{distance}=\sqrt{(x_{2}-x_{1})^2+(y_{2}-y_{1})^2}.

There is a very important aspect of the distance formula that will play an important role in the rest of the things that we will discuss in this post. Namely, we can change the coordinate system, so that the points A and B have different coordinates, by translating the origin, or by rotating the coordinate axes. However, even though the coordinates of the two points will be different, the distance given by the distance formula will be the same.

This might be a little technical, so let’s have a more down-to-earth example. I live in Southeast Asia, so I will say, for example, that the Great Pyramid of Giza, in Egypt, is very far away. But someone who lives in Egypt will perhaps say that no, the Great Pyramid is just nearby. In that case, because we live in different places, we will disagree on how far away the Great Pyramid is. But if, instead, I ask for the distance between the Sphinx and the Great Pyramid, then that is something we can agree on even if we live in different places (Google tells me they are about a kilometer apart).

We disagree on how close or far away something is, because the answer to that question depends on where we are. I measure distance from myself based on my own location, and the same is true for the other person, and that is why we disagree. But the distance between the two objects in my example, the Sphinx and the Great Pyramid, is an invariant quantity. It does not depend on where we are. This invariance makes it a very important quantity.

We will rewrite the distance formula using the following symbols to simplify the notation:

\displaystyle \Delta x=x_{2}-x_{1}

\displaystyle \Delta y=y_{2}-y_{1}

Furthermore we will use the symbol \Delta s to represent the distance. The distance formula is now written

\displaystyle \Delta s=\sqrt{(\Delta x)^2+(\Delta y)^2}

That giant square root over the right hand side does not look very nice, so we square both sides of the distance formula, giving us

\displaystyle (\Delta s)^2=(\Delta x)^2+(\Delta y)^2.

Finally, we switch the left hand side and the right hand side so that the analogy with the Pythagorean theorem becomes more visible. So our distance formula now becomes

\displaystyle (\Delta x)^2+(\Delta y)^2=(\Delta s)^2.

Again we did all of this so that we have the same form for the distance formula and the Pythagorean theorem. Here they are again, for comparison:

Pythagorean Theorem: \displaystyle a^2+b^2=c^2

Distance Formula: \displaystyle (\Delta x)^2+(\Delta y)^2=(\Delta s)^2.

Of course, real life does not exist on a plane, and if we wanted a distance formula with three dimensions, as may be quite useful for applications in engineering where we work with three-dimensional objects, we have the three-dimensional distance formula:

\displaystyle (\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2

Following now the pattern, if we wanted to extend this to some sort of a space with four dimensions, for whatever reason, we just need to add another variable w, as follows:

\displaystyle (\Delta w)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2

As far as we know, in real life space only has three dimensions. However, we do live in something four-dimensional; not a four-dimensional space, but a four-dimensional “spacetime”. The originator of the idea of spacetime was a certain Hermann Minkowski,  a mathematician who specialized in number theory but made this gigantic contribution to physics before he tragically died of appendicitis at the age of 44 years old. But Minkowski’s legacy was passed on to his good friend, a rising physicist who was working on a theory unifying two apparently conflicting areas of physics at the time, classical mechanics and electromagnetism. This young physicist’s name was Albert Einstein, and the theory he was working on was called “relativity”. And Minkowski’s idea of spacetime would play a central role in it.

People have been putting space and time together since ancient times. When we set an event, for example, like a meeting or a party, we need to specify a place and a time. But Minkowski’s idea was far more radical. He wanted to think of space and time as parts of a single entity called spacetime, in the same way that the x-axis and the y-axis are parts of a single entity called the x-y plane. If this was true, then just as there was an invariant “distance” between two points in the x-y plane, then there should be an invariant “interval” between two events in spacetime. This would simplify and explain many phenomena already suggested by the work of Einstein and his predecessors such as the measurement of lengths and the passing of time being different for different observers.

However, the formula for this interval was different from the distance in that there was a minus sign for the one coordinate which was different from the rest, time. This was needed for the theory to agree with the electromagnetic phenomena that we observe in everyday life.

\displaystyle -(\Delta t)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2

There’s still a little problem with this formula, however. We measure time and distance using different units. For time we usually use seconds, minutes, hours, days, and so on. For distance we use centimeters, meters, kilometers, and so on. When we add or subtract quantities they need to have the same units. But we already have experience in adding or subtracting quantities with different units – for example, let’s stick with distance and imagine that we need to add two different lengths; however, one is measured in meters while another is measured in centimeters. All we need to do is to “convert” one of them so that they have the same units; a hundred centimeters make one meter, so we can use this fact to convert to either centimeters or meters, and then we can add the two lengths. More technically, we say that we need a conversion factor, a constant quantity of 100 centimeters per meter.

The same goes for our problem in calculating the spacetime interval. We need to convert some of the quantities so that they will have the same units. What we need is a conversion factor, a constant quantity measured in units that involve a ratio of units of time and distance, or vice-versa. Such a quantity was found suitable, and it is the speed of light in a vacuum c, which has a value of around 300,000,000 meters per second. This allows us to write

\displaystyle -(c\Delta t)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2

This formula is at the heart of the theory of relativity. For those who have seen the 2014 movie “Interstellar”, one may recall (spoiler alert) how the main character aged so much more slowly than his daughter, because of the effects of the geometry of spacetime he experienced during his mission, and when he met up with her again she was already so much older than him. All of this can really be traced back to the idea of a single unified spacetime with an invariant interval as shown above. If space and time were two separate entities instead of being parts of a single spacetime, there would be no such effects. But if they form a single spacetime, then neither time nor distance are invariant; the invariant quantity is the spacetime interval. Time and distance are relative. Hence, “relativity”. Hence, contraction of length and dilation of time. Such effects in real life are already being observed in the GPS satellites that orbit our planet.

The theory of relativity is by no means a “complete” theory, because there are still so many questions, involving black holes for example. Like most of science, there’s always room for improvement. But what we currently have is a very beautiful, very elegant theory that explains many phenomena we would otherwise be unable to explain, and all of it comes back to some very ancient mathematics we are all familiar with from grade school.


Theory of Relativity on Wikipedia

Spacetime on Wikipedia

Spacetime Physics by Edwin F. Taylor and John Archibald Wheeler

Spacetime and Geometry by Sean Carroll

More on Ideals

In Rings, Fields, and Ideals, we talked about the concept of an ideal, which we described as a subset of a ring which is closed under addition among its own elements and under multiplication by the elements of the ring. Here is a more rigorous definition, lifted verbatim from the book Algebra by Michael Artin:

An ideal Iof a ring R is a nonempty subset of R with these properties:

(i) I is closed under addition.

(ii) If s is in I and r is in R, then rs is in I.

Now we show two important examples of ideals that all rings have:

The first one we will present here is called the zero ideal, which we also write as (0). It is the ideal consisting only of one element, which is the zero element, which we will write as 0 regardless of whether or not we are working in the ring of integers \mathbb{Z}. It is good to mention at this point two important properties of 0. First, it is the identity element under addition and subtraction; in other words, 0 added to or subtracted from any other element gives back that element. Second, any number multiplied to or by 0 gives back 0. A familiar and intuitive example of these concepts is given in the arithmetic of \mathbb{Z}.

Now to show that the set consisting of the zero element alone is an ideal, we check if the properties that define the concept of an ideal hold:

For property (i) above, we see from

\displaystyle 0+0=0


\displaystyle 0-0=0

that whenever the left hand side contains only elements from (0), namely only 0, then the right hand also only contains elements from (0), which is, again, 0.

As for property (ii), this is satisfied since any element of R multiplied to or by 0 is once again 0, which is the one and only element of the set (0).

Therefore (0) is indeed an ideal.

The second example of an ideal is called the unit ideal, which we write down using the symbol (1). It is none other than the entire ring R itself. Obviously, since the definition of a ring requires that it be closed under addition, subtraction and multiplication, the entire ring R is an ideal of itself, but in its context as an ideal we refer to it as the unit ideal.

Notice that (0) can be considered to be the subset of R consisting of all multiples of 0 and that (1) can be considered to be the subset of R consisting of all multiples of 1. These are special cases of a more general kind of ideal, called a principal ideal. Once again we take the rigorous definition almost verbatim from Artin’s Algebra:

In any R, the multiples of a particular element a form an ideal called the principal ideal generated by a. An element b of R is in this ideal if and only if b is a multiple of a, which is to say, if and only if a divides b in R.

There are several notations for this principal ideal:

(a) = aR = Ra = \{ra|r\in R\}.

Principal ideals are the most basic kind of ideals, but they are not the only kind. As with most of mathematics, most of the interesting stuff happens when we go beyond the basics. Examples of ideals that are not principal will be tackled in future posts. For now, we will mention and take note of the fact that in the ring \mathbb{Z}, every ideal is principal. We will discuss in this post two more special cases of ideals, maximal ideals and prime ideals, and we show concrete examples in the ring \mathbb{Z}.

Ideals are sets, and one of the most basic concepts involving sets is when one set is a subset of another. A set B is said to be a subset of another set A when every element of B is also an element of A. In symbols we write B \subseteq A or A \supseteq B. For example, we can say that the set of all reptiles are a subset of the set of all animals, since every reptile is also an animal. We have been using this concept implicitly (hoping that intuition will carry the day) in the definition of ideals of a ring.

When all elements of A are also elements of B, and at the same time all elements of B are also elements of A, then we may say that A and B have the same elements, and that A=B. In more symbolic notation, we say that whenever both B \subseteq A and A \subseteq B, then A=B. Whenever we can write B \subseteq A but not A = B, then we can write B \subset A and say that B is a proper subset of A. Intuitively we can think of A as containing B and being bigger than B.

We now show examples of ideals containing other ideals. Let us stay in the ring of integers \mathbb{Z}. Consider the ideal made up of integer multiples of 10, i.e. (10). Its elements include

\displaystyle ..., -40, -30, -20, -10, 0, 10 , 20, 30, 40, ...

Consider also the ideal (5). Its elements include

\displaystyle ...,-20, -15, -10, -5, 0, 5 , 10, 15, 20, ...

Not all elements are displayed (since there are an infinite number of them) but one can see that every element of (10) is also an element of (5). We can therefore say that (10) \subseteq (5). But since not all elements of (5) are elements of (10), we cannot also write (10)=(5). Instead we say that (10) \subset (5) . In other words, (10)  is a proper subset of (5).

A maximal ideal is an ideal that is a proper subset of the unit ideal (1) (the entire ring itself) but is not a proper subset of any other ideal. So the ideal (10) is not maximal, because it is a proper subset of the ideal (5)(5), however, is maximal. And so are all the principal ideals of \mathbb{Z} of the form (p) where p is a prime number. Any other ideal that is not of this form, for example (10), (24), or (25) is a proper subset of an ideal of this form. One may now see some sort of an analogy with prime and composite numbers, since any natural number greater than 1 is either a prime or a product of primes.

Speaking of primes, a prime ideal is an ideal, once again not the unit ideal (1) (therefore also a proper subset of the unit ideal), which we will now define. Let a and b be elements of the ring R. Let their product, the element ab, be one of the elements of the ideal I. If this fact ensures that either a or b is also an element of the ideal the ideal I, then we say that I is a prime ideal.

Still staying in the ring \mathbb{Z}, the ideal (10) is not a prime ideal since 10 is the product of 2 and 5 and yet neither 2 nor 5 is an element of (10). Once again, principal ideals of the form (p) where p is a prime number, are prime ideals, because the only factors of p are 1 and itself, and while 1 cannot be in (p) (or else it would be the unit ideal), certainly p is in (p). Therefore all maximal ideals in \mathbb{Z} are also prime ideals.

There is, however, one prime ideal in \mathbb{Z} which is not a maximal ideal. It is the zero ideal (0). It is a prime ideal since the only factor of 0 is itself, and it is certainly an element (in fact the only element) of (0). It is not, however, a maximal ideal since its only element, the integer 0, is also an element of every other ideal in \mathbb{Z}; therefore, the ideal (0) is a proper subset of every other ideal and is not maximal. The prime ideals in \mathbb{Z} are those of the form (p) where p is a prime number, i.e. the maximal ideals, as well as the zero ideal (0).


Ideal in Wikipedia

Maximal Ideal in Wikipedia

Prime Ideal in Wikipedia

Algebra by Michael Artin

Rings, Fields, and Ideals

Rings are mathematical objects which can be thought of as sets closed under the familiar operations of addition, subtraction, and multiplication. The technical definition involves the concept of a group, which I’ll take on maybe in a future post, so for now instead of defining rings rigorously I’ll instead rely on intuition and examples (also, in this post all rings considered will be commutative – note that this is not the most general case). The first, and perhaps most intuitive example of a ring are the ordinary integers (symbolized by \mathbb{Z}). When we add, subtract, or multiply two integers, we get another integer. Here are a few examples:





In each of these examples, whenever the two operands on the left hand side are integers, then the right hand side is also an integer.

But we also know of a fourth arithmetic operation, the operation of division, which we did not require rings to have. Surely, in a ring, some elements may divide others, for example, once again in the ring of integers:


But it doesn’t hold for all elements (obviously it will not hold for any number being divided by zero, so when we speak of division, we mean only division by nonzero elements). For example,


The right hand side, 0.5, is not an integer, despite 10 and 20 being both integers. This is what we mean when we say that a ring (here the ring of integers) need not be closed under the operation of division. When a ring is closed under the operation of division, we call it a field. The rational numbers \mathbb{Q} form a field. The sum, difference, product, and quotient of two rational numbers is always another rational number. The same also applies to the real numbers \mathbb{R} and the complex numbers \mathbb{C}, so they also form fields.

Another very important example of a ring is the ring of polynomials (of positive degree) in one variable x, familiar from high school mathematics:




For rings of polynomials in one variable x, we shall use the symbols \mathbb{Z}[x], \mathbb{Q}[x], \mathbb{R}[x] and \mathbb{C}[x] for polynomials with integer, rational, real, and complex coefficients respectively. All of these form rings – they are closed under addition, subtraction, and multiplication.

We now mention an important concept related to rings, called ideals of a ring. Ideals are subsets of a ring which are closed under addition and subtraction among its own elements, and multiplication by elements of the ring to which it belongs. We shall demonstrate with an example. Consider the set of integer multiples of 5, in the ring of integers \mathbb{Z}. We will symbolize this set by (5). Some of its elements include the following:

...-20,-15,-10, -5, 0, 5, 10, 15,20...

Now we show a few examples of what it means for (5) to be closed under addition and subtraction among its own elements and multiplication by elements of the ring \mathbb{Z} to which it belongs. First, we take two elements of (5), i.e. two multiples of 5, and add or subtract them together:




We can see that whenever two multiples of 5 are added or subtracted from each other, the result is always another multiple of 5. Now we show examples of how (5) is closed under multiplication by any integer, i.e. any element of the ring \mathbb{Z}:




Note that the integer (the first factor on the left hand side) which multiplies the multiple of 5 (the second factor on the left hand side) does not need to be a multiple of 5 itself; however their product (on the right hand side) is always a multiple of 5. This is reminiscent of the concept of vector spaces which are very useful not only in mathematics but also in physics and engineering – vector spaces are sets closed under addition among themselves and under multiplication by a “scalar”. In fact, both vector spaces and ideals of a ring are both special cases of more general mathematical objects called modules.

And for now that’s it. I would like to note that the way this post was written is not the way we do mathematics at higher levels. Mathematics is really built more on logic, rigor, and definitions, not just intuition and examples like what was done here (although intuition and examples can be useful). But I guess I wanted this post to be more accessible to people who don’t have formal mathematical training; I even used the division symbol “\div” which is almost never used in higher level mathematics. Ultimately it’s part of what I wanted to do in this blog – to write in a more accessible language and hopefully show more of what’s in higher level mathematics (and also physics) to the layperson. Since the other (and primary) purpose of this blog is to help me learn, many future posts might not be as accessible as this one, and hopefully feature more rigor; nevertheless I still want to continue writing expository posts such as these from time to time.

Also, I want to say that I’m still very much in the process of learning, and there might be mistakes in my expositions (aside from the confusing language). That’s why I always include references, which are written by people with more expertise (and more time to proofread). A reader of this blog may take the content of the posts as merely an invitation to read the references. In any case, reading the references I usually list at the end of my posts is always very much encouraged.


Ring on Wikipedia

Field on Wikipedia

Ideal on Wikipedia

Module on Wikipedia

Algebra by Michael Artin

The Fundamental Theorem of Arithmetic and Unique Factorization

We know, from basic arithmetic, that any natural number greater than 1 is either a prime number or can be factored uniquely into a product of prime numbers. For example, consider the number 150:

\displaystyle 150=2\cdot3\cdot5\cdot5

This is only unique, of course, up to the order in which multiplication is performed (since multiplication in the natural numbers is commutative), and up to a unit, i.e.

\displaystyle 150=2\cdot3\cdot5\cdot5=1\cdot2\cdot3\cdot5\cdot5=1\cdot1\cdot2\cdot3\cdot5\cdot5=...

so we just consider all these different factorizations to be the same. This fact is so important that it is referred to as the fundamental theorem of arithmetic.

Now let us consider other kinds of numbers. We start with the integers, which are almost like the natural numbers except that there are negative numbers. The most important thing we should take note of here is that aside from the unit 1, we also have another unit, namely -1.

Let’s consider the number -28:

\displaystyle -28=(-1)\cdot2\cdot2\cdot7

This is also

\displaystyle -28=(-1)\cdot2\cdot2\cdot7=(-1)\cdot(-1)\cdot(-1)\cdot2\cdot2\cdot7=...

Once again, we consider these factorizations to be the same. Aside from these differences involving units, and also the order of multiplication, the factorization is unique.

Now we move on to less familiar numbers. Consider the Gaussian integers, complex numbers whose real and imaginary parts are both integers. In more technical terms, we write

\displaystyle \mathbb{Z}[i]=\{a+bi|a,b\in\mathbb{Z}\}.

The units here are, in addition to 1 and -1, i and -i.

For Gaussian integers, the number 5 is not a prime number anymore, but instead factorizes as follows:


Note that this is also the same as


Since these two factorizations differ only by units:




Meanwhile, the number 3 is still a prime number.

It is a theorem, which I shall not prove in this post, that the “ordinary” prime numbers that leave a remainder of 3 when divided by 4 “remain prime” for Gaussian integers. Those that leave a remainder of  1 when divided by  4 are no longer prime for Gaussian integers. We say that they “split”. The number 2 is a special case, we say that it “ramifies”; this means it factorizes, again, up to units, as a power:


But through all this, even though factorization in the Gaussian integers is very different from factorization in the natural numbers or the integers, the factorization in the Gaussian integers remains unique, of course up to the order of the factors and up to units.

We now show an example of factorization no longer being unique. Consider the (complex) numbers

\displaystyle \mathbb{Z}[\sqrt{-5}]=\{a+b\sqrt{-5}|a,b\in\mathbb{Z}\}.

We now have, for the number 6, the following two factorizations which are truly different, not just in order or up to units:



Unique factorization has failed. And in numbers like these, the failure of unique factorization can be measured by something called the class number. When unique factorization holds, the class number is 1. The class number is one of the most important quantities in all of algebraic number theory and will be tackled in later posts.


Fundamental Theorem of Arithmetic on Wikipedia

Algebra by Michael Artin

Zeta Functions and L-Functions

This is my first post on this blog with actual content, and it’s an attempt to strike a balance between the two kinds of posts I want to make on this blog: expository articles, and personal notes on concepts I’m still trying to understand. It’s also going to feature a bit more of history as compared to actual technical stuff. As this blog continues to grow, I hope to write more of the latter. As of the moment, however, I hope understanding will be given for the lack of rigor on this blog. I will try to post lots of references to make up for it.

The Riemann zeta function is probably known to many people. After all, there is a million dollar prize attached to a problem involving it, known as the Riemann hypothesis, which I will not discuss yet in this post. Instead, first I will note that despite the name, it was not Bernhard Riemann who first discovered the Riemann zeta function. It was already known to Leonhard Euler, who used it to study the prime numbers. Bernhard Riemann’s later work with the Riemann zeta function was for a similar purpose to Euler’s – Riemann used it to give a formula for the number of primes less than a given number. Now that we have given a bit of history of the Riemann zeta function and what it is useful for, we now present its most basic form:

\displaystyle \zeta(s)=\sum_{n=0}^{\infty} \frac{1}{n^{s}}=\frac{1}{1^{s}}+\frac{1}{2^{s}}+\frac{1}{3^{s}}+...

The connection with prime numbers comes from the fact that aside from being expressed as an infinite sum, it can also be expressed as an infinite product, called the Euler product:

\displaystyle \zeta(s)=\prod_{p} \frac{1}{1-p^{-s}}=\bigg(\frac{1}{1-2^{-s}}\bigg)\bigg(\frac{1}{1-3^{-s}})\bigg(\frac{1}{1-5^{-s}}\bigg)...

where p runs over all prime numbers.

For s=2 for example, we have

\displaystyle \zeta(2)=\sum_{n=0}^{\infty} \frac{1}{n^{2}}=\frac{1}{1^{2}}+\frac{1}{2^{2}}+\frac{1}{3^{2}}+...=1+\frac{1}{4}+\frac{1}{9}+...

This series converges to the value

\displaystyle \zeta(2)=\frac{\pi^{2}}{6}.

One can see that for some values of s, the infinite series does not converge. However, from complex analysis we have the concept of analytic continuation, which gives a “substitute” for the Riemann zeta function which is equal to the “original” Riemann zeta function wherever it was defined, and yet “continues” it to where it was not defined before. When we have performed this analytic continuation, the “new” Riemann zeta function (which, by the way, is usually written as a complicated expression involving integrals and gamma functions) will then be defined for all values of s except for s=1.

So while the “original” (infinite series) Riemann zeta function diverges for s=0 and s=-1, with the “new” (integrals and gamma functions) Riemann zeta function obtained from analytic continuation we have

\displaystyle \zeta(0)=-\frac{1}{2}

\displaystyle \zeta(-1)=-\frac{1}{12}

If we insist on writing the Riemann zeta function in the “original” infinite series form, this leads to the following very weird results:

\displaystyle \zeta(0)=\frac{1}{1^{0}}+\frac{1}{2^{0}}+\frac{1}{3^{0}}+...=1+1+1+...=-\frac{1}{2}

\displaystyle \zeta(-1)=\frac{1}{1^{-1}}+\frac{1}{2^{-1}}+\frac{1}{3^{-1}}+...=1+2+3+...=-\frac{1}{12}

Perhaps in some future post I will expound more on these weird results and the method of analytic continuation from which they originate.

While Riemann was doing his work, another great mathematician, Peter Gustav Lejeune Dirichlet, was generalizing the Riemann zeta function to solve another problem involving primes. Namely, what Dirichlet proved was that in an arithmetic progression of positive integers  where the first term and the increment are mutually prime, an infinite number of the terms of the progression are prime numbers. For example, consider the infinite arithmetic progression

\displaystyle 5, 8, 11, 14, 17, 20, 23, 26, 29, 32,...

The prime numbers here are

\displaystyle 5, 11, 17, 23, 29,....

What Dirichlet proved is that there are an infinite number of these prime numbers in the infinite arithmetic progression, even though I wrote down only five. Besides, it is quite difficult to identify whether a number is prime or composite when the number is very large. But Dirichlet was able to prove, for sure, that there is an infinite number of them in the progression. And in order to perform this feat Dirichlet had to “generalize” the Riemann zeta function.

But what do we mean by Dirichlet “generalizing” the Riemann zeta function? First we make an analogy between geometric series and power series.

An (infinite) geometric series is a series of the form

\displaystyle \sum_{n=0}^{\infty}x^{n}

while a power series is a series of the form

\displaystyle \sum_{n=0}^{\infty}a_{n}x^{n}

where the coefficients a_{n} are to be determined by what kind of power series we have. For a Taylor series of a function f(x) at x=0, for example,

\displaystyle a_{n}=\frac{f^{(n)}(0)}{n!}

where f^{(n)}(0) is the n-th derivative of f(x) at x=0.

Now we can see that a geometric series is just a power series where a_{n}=1 for all n.

Now we go back to Dirichlet’s generalization of the Riemann zeta function, which we now refer to as “L-functions”. An L-function starts out as a Dirichlet series, an infinite series of the form

\displaystyle \sum_{n=0}^{\infty} \frac{a_{n}}{n^{s}}.

So the Riemann zeta function is just a Dirichlet series where a_{n}=1 for all n.

Just like in the case of power series, the coefficients a_{n} are determined by special methods. For the problem Dirichlet was considering, what he needed was what we now refer to as a Dirichlet character, which we write using the symbol \chi(n). The Dirichlet character has some special properties – for example, the Dirichlet character is periodic, i.e. \chi(n+k)=\chi(n) for some number k which we call the period and \chi(n)=0 whenever n and the k are not mutually prime, i.e. they have a common factor aside from 1. Also, the Dirichlet character is multiplicative, which means that \chi(mn)=\chi(m)\chi(n).

So now we express the above series, which we now refer to as an “L-series”, as

\displaystyle L(s,\chi)=\sum_{n=0}^{\infty} \frac{\chi(n)}{n^{s}}.

L-series, like the Riemann zeta function, can also be expressed as an Euler product. The analytic continuation of the L-series is then what we call an L-function.

I guess that’s it for now, even though we hardly accomplished anything except revisit a little history and get a little sample of the properties of the Riemann zeta function and the Dirichlet L-functions. In the future I guess I’ll make more posts regarding zeta functions and L-functions (I’ll mention at this point that there is another important generalization of the Riemann zeta function, the Dedekind zeta function) including perhaps how they were used to perform the accomplishments that brought fame to the mathematicians these functions are now named after.

Finally, some references:

Riemann Zeta Function on Wikipedia

Dirichlet L-Function on Wikipedia

Riemann’s Zeta Function by Harold M. Edwards

A Classical Introduction to Modern Number Theory by Kenneth Ireland and Michael Rosen