Some Useful Links: Knots in Physics and Number Theory

In modern times, “knots” have been important objects of study in mathematics. These “knots” are akin to the ones we encounter in ordinary life, except that they don’t have loose ends. For a better idea of what I mean, consider the following picture of what is known as a “trefoil knot“:


More technically, a knot is defined as the embedding of a circle in 3-dimensional space. For more details on the theory of knots, the reader is referred to the following Wikipedia pages:

Knot on Wikipedia

Knot Theory on Wikipedia

One of the reasons why knots have become such a major part of modern mathematical research is because of the work of mathematical physicists such as Edward Witten, who has related them to the Feynman path integral in quantum mechanics (see Lagrangians and Hamiltonians).

Witten, who is very famous for his work on string theory (see An Intuitive Introduction to String Theory and (Homological) Mirror Symmetry) and for being the first, and so far only, physicist to win the prestigious Fields medal, himself explains the relationship between knot theory and quantum mechanics in the following article:

Knots and Quantum Theory by Edward Witten

But knots have also appeared in other branches of mathematics. For example, in number theory, the result in etale cohomology known as Artin-Verdier duality states that the integers are similar to a 3-dimensional object in some sense. In particular, because it has a trivial etale fundamental group (which is kind of an algebraic analogue of the fundamental group discussed in Homotopy Theory and Covering Spaces), it is similar to a 3-sphere (recall the common but somewhat confusing notation that the ordinary sphere we encounter in everyday life is called the 2-sphere, while a circle is also called the 1-sphere).

Note: The fact that a closed 3-dimensional space with a trivial fundamental group is a 3-sphere is the content of a very famous conjecture known as the Poincare conjecture, proved by Grigori Perelman in the early 2000’s.  Perelman refused the million-dollar prize that was supposed to be his reward, as well as the Fields medal.

The prime numbers, because their associated finite fields have one cover for every integer, are like circles, and recalling the definition of knots mentioned above, are therefore like knots on this 3-sphere. This analogy, originally developed by David Mumford and Barry Mazur, is better explained in the following post by Lieven le Bruyn on his blog neverendingbooks:

What is the Knot Associated to a Prime on neverendingbooks

Finally, given what we have discussed, could it be that knot theory can “tie together” (pun intended) physics and number theory? This is the motivation behind the new subject called “arithmetic Chern-Simons theory” which is introduced in the following paper by Minhyong Kim:

Arithmetic Chern-Simons Theory I by Minhyong Kim

Of course, it must also be clarified that this is not the only way by which physics and number theory are related. It is merely another way, a new and not yet thoroughly explored one, by which the unity of mathematics manifests itself via its many different branches helping one another.


Some Useful Links: Quantum Gravity Seminar by John Baez

I have not been able to make posts tackling physics in a while, since I have lately been focusing my efforts on some purely mathematical stuff which I’m trying very hard to understand. Hence my last few posts have been quite focused mostly on algebraic geometry and category theory. Such might perhaps be the trend in the coming days, although of course I still want to make more posts on physics at some point.

Of course, the “purely mathematical” stuff I’ve been posting about is still very much related to physics. For instance, in this post I’m going to link to a webpage collecting notes from seminars by mathematical physicist John Baez on the subject of quantum gravity – and much of it involves concepts from subjects like category theory and algebraic topology (for more on the basics of these subjects from this blog, see Category TheoryHomotopy Theory, and Homology and Cohomology).

Here’s the link:

Seminar by John Baez

As Baez himself says on the page, however, quantum gravity is not the only subject tackled on his seminars. Other subjects include topological quantum field theory, quantization, and gauge theory, among many others.

John Baez also has lots of other useful stuff on his website. One of the earliest mathematics and mathematical physics blogs on the internet is This Week’s Finds in Mathematical Physics, which apparently goes back all the way to 1995, and is one of the inspirations for this blog:

This Week’s Finds in Mathematical Physics by John Baez

Many of the posts on This Week’s Finds in Mathematical Physics show the countless fruitful, productive, and beautiful interactions between mathematics and physics. This is also one of the main goals of this blog – reflected even by the posts which have been focused on mostly “purely mathematical” stuff.

Metric, Norm, and Inner Product

In Vector Spaces, Modules, and Linear Algebra, we defined vector spaces as sets closed under addition and scalar multiplication (in this case the scalars are the elements of a field; if they are elements of a ring which is not a field, we have not a vector space but a module). We have seen since then that the study of vector spaces, linear algebra, is very useful, interesting, and ubiquitous in mathematics.

In this post we discuss vector spaces with some more additional structure – which will give them a topology (Basics of Topology and Continuous Functions), giving rise to topological vector spaces. This also leads to the branch of mathematics called functional analysis, which has applications to subjects such as quantum mechanics, aside from being an interesting subject in itself. Two of the important objects of study in functional analysis that we will introduce by the end of this post are Banach spaces and Hilbert spaces.

I. Metric

We start with the concept of a metric. We have to get two things out of the way. First, this is not the same as the metric tensor in differential geometry, although it also gives us a notion of a “distance”. Second, the concept of metric is not limited to vector spaces only, unlike the other two concepts we will discuss in this post. It is actually something that we can put on a set to define a topology, called the metric topology.

As we discussed in Basics of Topology and Continuous Functions, we may think of a topology as an “arrangement”. The notion of “distance” provided by the metric gives us an intuitive such arrangement. We will make this concrete shortly, but first we give the technical definition of the metric. We quote from the book Topology by James R. Munkres:

A metric on a set X is a function

\displaystyle d: X\times X\rightarrow \mathbb{R}

having the following properties:

1) d(x, y)>0 for all x,y \in X; equality holds if and only if x=y.

2) d(x,y)=d(y,x) for all x,y \in X.

3) (Triangle inequality) d(x,y)+d(y,z)>d(x,z), for all x,y,z \in X.

We quote from the same book another important definition:

Given a metric d on X, the number d(x, y) is often called the distance between x and y in the metric d. Given \epsilon >0, consider the set

\displaystyle B_{d}(x,\epsilon)=\{y|d(x,y)<\epsilon\}

of all points у whose distance from x is less than \epsilon. It is called the \epsilon-ball centered at x. Sometimes we omit the metric d from the notation and write this ball simply as B(x,\epsilon) when no confusion will arise.

Finally, once more from the same book, we have the definition of the metric topology:

If d is a metric on the set X, then the collection of all \epsilon-balls B_{d}(x,\epsilon), for x\in X and \epsilon>0, is a basis for a topology on X, called the metric topology induced by d.

We recall that the basis of a topology is a collection of open sets such that every other open set can be described as a union of the elements of this collection. A set with a specific metric that makes it into a topological space with the metric topology is called a metric space.

An example of a metric on the set \mathbb{R}^{n} is given by the ordinary “distance formula”:

\displaystyle d(x,y)=\sqrt{\sum_{i=1}^{n}(x_{i}-y_{i})^{2}}

Note: We have followed the notation of the book of Munkres, which may be different from the usual notation. Here x and y are two different points on \mathbb{R}^{n}, and x_{i} and y_{i} are their respective coordinates.

The above metric is not the only one possible however. There are many others. For instance, we may simply put

\displaystyle d(x,y)=0 if \displaystyle x=y

\displaystyle d(x,y)=1 if \displaystyle x\neq y.

This is called the discrete metric, and one may check that it satisfies the definition of a metric. One may think of it as something that simply specifies the distance from a point to itself as “near”, and the distance to any other point that is not itself as “far”. There is also the taxicab metric, given by the following formula:

\displaystyle d(x,y)=\sum_{i=1}^{n}|x_{i}-y_{i}|

One way to think of the taxicab metric, which reflects the origins of the name, is that it is the “distance” important to taxi drivers (needed to calculate the fare) in a certain city with perpendicular roads. The ordinary distance formula is not very helpful since one needs to stay on the roads – therefore, for example, if one needs to go from point x to point y which are on opposite corners of a square, the distance traversed is not equal to the length of the diagonal, but is instead equal to the length of two sides. Again, one may check that the taxicab metric satisfies the definition of a metric.

II. Norm

Now we move on to vector spaces (we will consider in this post only vector spaces over the real or complex numbers), and some mathematical concepts that we can associate with them, as suggested in the beginning of this post. Being a set closed under addition and scalar multiplication is already a useful concept, as we have seen, but we can still add on some ideas that would make them even more interesting. The notion of metric that we have discussed earlier will show up repeatedly over this discussion.

We first discuss the notion of a norm, which gives us a notion of a “magnitude” of a vector. We quote from the book Introductory Functional Analysis with Applications by Erwin Kreyszig for the definition:

A norm on a (real or complex) vector space X is a real valued function on X whose value at an x\in X is denoted by

\displaystyle \|x\|    (read “norm of x“)

and which has the properties

(N1) \|x\|\geq 0

(N2) \|x\|=0\iff x=0

(N3) \|\alpha x\|=|\alpha|\|x\|

(N4) \|x+y\|\leq\|x\|+\|y\|    (triangle inequality)

here x and y are arbitrary vectors in X and \alpha is any scalar.

A vector space with a specified norm is called a normed space.

A norm automatically provides a vector space with a metric; in other words, a normed space is always a metric space. The metric is given in terms of the norm by the following equation:

\displaystyle d(x,y)=\|x-y\|

However, not all metrics come from a norm. An example is the discrete metric, which satisfies the properties of the metric but not the norm.

III. Inner Product

Next we discuss the inner product. The inner product gives us a notion of “orthogonality”, a concept which we already saw in action in Some Basics of Fourier Analysis. Intuitively, when two vectors are “orthogonal”, they are “perpendicular” in some sense. However, our geometric intuition may not be as useful when we are discussing, say, the infinite-dimensional vector space whose elements are functions. For this we need a more abstract notion of orthogonality, which is embodied by the inner product. Again, for the technical definition we quote from the book of Kreyszig:

With every pair of vectors x and y there is associated a scalar which is written

\displaystyle \langle x,y\rangle

and is called the inner product of x and y, such that for all vectors x, y, z and scalars \alpha we have

(IPl) \langle x+y,z\rangle=\langle x,z\rangle+\langle y,z\rangle

(IP2) \langle \alpha x,y\rangle=\alpha\langle x,y\rangle

(IP3) \langle x,y\rangle=\overline{\langle y,x\rangle}

(IP4) \langle x,x\rangle\geq 0,    \langle x,x\rangle=0 \iff x=0

A vector space with a specified inner product is called an inner product space.

One of the most basic examples, in the case of a finite-dimensional vector space, is given by the following procedure. Let x and y be elements (vectors) of some n-dimensional real vector space X, with respective components x_{1}, x_{2},...,x_{n} and y_{1},y_{2},...,y_{n} in some basis. Then we can set

\displaystyle \langle x,y\rangle=x_{1}y_{1}+x_{2}y_{2}+...+x_{n}y_{n}

This is the familiar “dot product” taught in introductory university-level mathematics courses.

Let us now see how the inner product gives us a notion of “orthogonality”. To make things even easier to visualize, let us set n=2, so that we are dealing with vectors (which we can now think of as quantities with magnitude and direction) in the plane. A unit vector x pointing “east” has components x_{1}=1, x_{2}=0, while a unit vector y pointing “north” has components y_{1}=0, y_{2}=1. These two vectors are perpendicular, or orthogonal. Computing the inner product we discussed earlier, we have

\displaystyle \langle x,y\rangle=(1)(0)+(0)(1)=0.

We say, therefore, that two vectors are orthogonal when their inner product is zero. As we have mentioned earlier, we can extend this to cases where our geometric intuition may no longer be as useful to us. For example, consider the infinite dimensional vector space of (real-valued) functions which are “square integrable” over some interval (if we square them and integrate over this interval, we have a finite answer), say [0,1]. We set our inner product to be

\displaystyle \int_{0}^{1}f(x)g(x)dx.

As an example, let f(x)=\text{cos}(2\pi x) and g(x)=\text{sin}(2\pi x). We say that these functions are “orthogonal”, but it is hard to imagine in what way. But if we take the inner product, we will see that

\displaystyle \int_{0}^{1}\text{cos}(2\pi x)\text{sin}(2\pi x)dx=0.

Hence we see that \text{cos}(2\pi x) and \text{sin}(2\pi x) are orthogonal. Similarly, we have

\displaystyle \int_{0}^{1}\text{cos}(2\pi x)\text{cos}(4\pi x)dx=0

and \text{cos}(2\pi x) and \text{cos}(4\pi x) are also orthogonal. We have discussed this in more detail in Some Basics of Fourier Analysis. We have also seen in that post that orthogonality plays a big role in the subject of Fourier analysis.

Just as a norm always induces a metric, an inner product also induces a norm, and by extension also a metric. In other words, an inner product space is also a normed space, and also a metric space. The norm is given in terms of the inner product by the following expression:

\displaystyle \|x\|=\sqrt{\langle x,x\rangle}

Just as with the norm and the metric, although an inner product always induces a norm, not every norm is induced by an inner product.

IV. Banach Spaces and Hilbert Spaces

There is one more concept I want to discuss in this post. In Valuations and Completions, we discussed Cauchy sequences and completions. Those concepts still carry on here, because they are actually part of the study of metric spaces (in fact, the valuations discussed in that post actually serve as a metric on the fields that were discussed, showing how in number theory the concept of metric and metric spaces still make an appearance). If every Cauchy sequence in a metric space X converges to an element in X, then we say that X is a complete metric space.

Since normed spaces and inner product spaces are also metric spaces, the notion of a complete metric space still makes sense, and we have special names for them. A normed space which is also a complete metric space is called a Banach space, while an inner product space which is also a complete metric space is called a Hilbert space. Finite-dimensional vector spaces (over the real or complex numbers) are always complete, and therefore we only really need the distinction when we are dealing with infinite dimensional vector spaces.

Banach spaces and Hilbert spaces are important in quantum mechanics. We recall in Some Basics of Quantum Mechanics that the possible states of a system in quantum mechanics form a vector space. However, more is true – they actually form a Hilbert space, and the states that we can observe “classically” are orthogonal to each other. The Dirac “bra-ket” notation that we have discussed makes use of the inner product to express probabilities.

Meanwhile, Banach spaces often arise when studying operators, which correspond to observables such as position and momentum. Of course the states form Banach spaces too, since all Hilbert spaces are Banach spaces, but there is much motivation to study the Banach spaces formed by the operators as well instead of just that formed by the states. This is an important aspect of the more mathematically involved treatments of quantum mechanics.


Topological Vector Space on Wikipedia

Functional Analysis on Wikipedia

Metric on Wikipedia

Norm on Wikipedia

Inner Product Space on Wikipedia

Complete Metric Space on Wikipedia

Banach Space on Wikipedia

Hilbert Space on Wikipedia

A Functional Analysis Primer on Bahcemizi Yetistermeliyiz

Topology by James R. Munkres

Introductory Functional Analysis with Applications by Erwin Kreyszig

Real Analysis by Halsey Royden

Differentiable Manifolds Revisited

In many posts on this blog, such as Geometry on Curved Spaces and Connection and Curvature in Riemannian Geometry, we have discussed the subject of differential geometry, usually in the context of physics. We have discussed what is probably its most famous application to date, as the mathematical framework of general relativity, which in turn is the foundation of modern day astrophysics. We have also seen its other applications to gauge theory in particle physics, and in describing the phase space, whose points corresponds to the “states” (described by the position and momentum of particles) of a physical system in the Hamiltonian formulation of classical mechanics.

In this post, similar to what we have done in Varieties and Schemes Revisited for the subject of algebraic geometry, we take on the objects of study of differential geometry in more technical terms. These objects correspond to our everyday intuition, but we must develop some technical language in order to treat them “rigorously”, and also to be able to generalize them into other interesting objects. As we give the technical definitions, we will also discuss the intuitive inspiration for these definitions.

Just as varieties and schemes are the main objects of study in algebraic geometry (that is until the ideas discussed in Grothendieck’s Relative Point of View were formulated), in differential geometry the main objects of study are the differentiable manifolds. Before we give the technical definition, we first discuss the intuitive idea of a manifold.

A manifold is some kind of space that “locally” looks like Euclidean space \mathbb{R}^{n}. 1-dimensional Euclidean space is just the line \mathbb{R}, 2-dimensional Euclidean space is the plane \mathbb{R}^{2}, and so on. Obviously, Euclidean space itself is a manifold, but we want to look at more interesting examples, i.e. spaces that “locally” look like Euclidean space but “globally” are very different from it.

As an example, consider the surface of the Earth. “Locally”, that is, on small regions, the surface of the Earth appears flat. However, “globally”, we know that it is actually round.

Another way to think about things is that any small region on the surface of the Earth can be put on a flat map (possibly with some distortion of distances). However, there is no flat map that can include every point on the surface of the Earth while continuing to make sense. The best we can do is use several maps with some overlaps between them, transitioning between different maps when we change the regions we are looking at. We want these overlaps and transitions to make sense in some way.

In differential geometry, what we want is to be able to do calculus on these more general manifolds the way we can do calculus on the line, on the plane, and so on. In order to do this, we require that the “transitions” alluded to in the previous paragraph are given by differentiable functions.

Summarizing the above discussion in technical terms, an n-dimensional differentiable manifold is a topological space X with homeomorphisms \varphi_{\alpha} from the open subsets U_{\alpha} covering X to \mathbb{R}^{n}, such that the composition \varphi_{\alpha}\circ\varphi_{\beta}^{-1} is a differentiable function on \varphi_{\beta}(U_{\alpha}\cap U_{\beta})\subset\mathbb{R}^{n}.

Following the analogy with maps we discussed earlier, the pair \{U_{\alpha}, \varphi_{\alpha}\} is called a chart, and the collection of all these charts that cover the manifold is called an atlas. The map \varphi_{\alpha}\circ\varphi_{\beta}^{-1}|_{\varphi_{\beta}(U_{\alpha}\cap U_{\beta})} is called a transition map.

Now that we have defined what a manifold technically is, we discuss some related concepts, in particular the objects that “live” on our manifold. Perhaps the most basic of these objects are the functions on the manifold; however, we won’t discuss the functions themselves too much since there are not that many new concepts regarding them.

Instead, we will use one of the most useful concepts when it comes to discussing objects that “live” on manifolds – fiber bundles (see Vector Fields, Vector Bundles, and Fiber Bundles). A fiber bundle is given by a topological space E with a projection \pi from E to a base space B, with the requirement that the space \pi^{-1}(U) is homeomorphic to the product space U\times F, where F is the fiber, defined as \pi^{-1}(x) for any point x of B. When the fiber F is also a vector space, we refer to E as a vector bundle. In differential geometry, we require that the relevant maps be also diffeomorphic, i.e. differentiable and bijective.

One of the most important kinds of vector bundles in differential geometry are the tangent bundles, which can be thought of as the collection of all the tangent spaces of a manifold at every point, for all the points of the manifold. We have already made use of these concepts in Geometry on Curved Spaces, and Connection and Curvature in Riemannian Geometry. We needed it, for example, to discuss the notion of parallel transport and the covariant derivative in Riemannian geometry. We will now discuss these concepts more technically.

Let \mathcal{O}_{p} be the ring of real-valued differentiable functions defined in a neighborhood of a point p in a differentiable manifold X. We define the real tangent space at p, written T_{\mathbb{R},p}(X), to be the vector space of p-centered \mathbb{R}-linear derivations, which are \mathbb{R}-linear maps D: \mathcal{O}_{p}\rightarrow\mathbb{R} satisfying Leibniz’s rule D(fg)=f(p)Dg-g(p)Df. Any such derivation D can be written in the following form:

\displaystyle D=\sum_{i}a_{i}\frac{\partial}{\partial x_{i}}\bigg\rvert_{p}

This means that \frac{\partial}{\partial x_{i}} is a basis for the real tangent space at p. It might be a little jarring to see “differential operators” serving as a basis for a vector space, but it might perhaps be helpful to think of tangent vectors as giving “how fast” functions on the manifold are changing at a certain point. See the following picture:


The manifold is M, and its tangent space at the point x is T_{x}M. One of the tangent vectors, v, is shown. The parametrized curve \gamma(t) is often used to define the tangent vector, although that is not the approach we have given here (it may be found in the references, and is closely related to the definition we have given).

Another concept that we will need is the concept of 1-forms. A 1-form on a particular point on the manifold takes a single tangent vector (an element of the tangent space at that particular point) as an input and gives a number as an output. Just as we have the notion of tangent vectors, tangent spaces, and tangent bundles, we also have the “dual” notion of 1-forms, cotangent spaces, and cotangent bundles, and just as the basis of the tangent vectors are given by \frac{\partial}{\partial x_{i}}, we also have a basis of 1-forms given by dx_{i}.

Aside from 1-forms, we also have mathematical objects that take two elements of the tangent space at a point (i.e. two tangent vectors at that point) as an input and gives a number as an output.

An example that we have already discussed in this blog is the metric tensor, which we refer to sometimes as simply the metric (calling it the metric tensor, however, helps prevent confusion as there are many different concepts in mathematics also referred to as a metric). We have been thinking of the metric tensor as expressing the “infinitesimal distance formula” at a certain point on the manifold.

The metric tensor is defined as a symmetric, nondegenerate, bilinear form. “Symmetric” means that we can interchange the two inputs (the tangent vectors) and get the same output. “Nondegenerate” means that, holding one of the inputs fixed and letting the other vary, having an output of zero for all the varying inputs means that the fixed input must be zero. “Bilinear form” means that it is linear in either input – it respects addition of vectors and multiplication by scalars. If we hold one input fixed, it is then a linear transformation of the other input.

In the case of our previous discussions on Riemannian geometry, the output of the metric tensor is a positive real number, expressing the infinitesimal distance. Hence, a metric tensor on a differentiable manifold which always gives a positive real number as an output is called a Riemannian metric. A manifold with a Riemannian metric is of course called a Riemannian manifold.

In general relativity, the spacetime interval, unlike the distance, may not necessarily be positive. More technically, spacetime in general relativity is an example of a pseudo-Riemannian (or semi-Riemannian) manifold, which do not require the metric to be positive (more specifically it is a Lorentzian manifold – we will leave the details of these definitions to the references for now). As we have seen though, many concepts from the study of Riemannian manifolds carry over to the pseudo-Riemannian case.

Another example of these kinds of objects are the differential forms (see Differential Forms). One important example of these objects is the symplectic form in symplectic geometry (see An Intuitive Introduction to String Theory and (Homological) Mirror Symmetry), which is used as the mathematical framework of the Hamiltonian formulation of classical mechanics. Just as the metric tensor is related to the “infinitesimal distance”, the symplectic form is related to the “infinitesimal area”.

As an example of the symplectic form, the “phase space” in the Hamiltonian formulation of classical mechanics is made up of points which correspond to a “state” of a system as given by the position and momentum of its particles. For the simple case of one particle constrained to move in a line, the symplectic form (written \omega) is given by

\displaystyle \omega=\displaystyle dq\wedge dp

where q is the position and p is the momentum, serving as the coordinates of the phase space (by the way, the phase space is itself already the cotangent bundle of the configuration space, the space whose points are the different “configurations” of the system, which we can think of as a generalization of the concept of position).

Technically, the symplectic form is defined as a closed, nondegenerate, 2-form. By “2-form“, we mean that it is a differential form, obeying the properties we gave in Differential Forms, such as antisymmetry. The notion of a differential being “closed“, also already discussed in the same blog post, means that its exterior derivative is zero. “Nondegenerate” of course was already defined in the preceding paragraphs. The symplectic form is also a bilinear form, although this is a property of all 2-forms, considered as functions of two tangent vectors at some point on the manifold. More generally, all differential forms are examples of multilinear forms. A manifold with a symplectic form is called a symplectic manifold.

There is still so much more to differential geometry, but for now, we have at least accomplished the task of defining some of its most basic concepts in a more technical manner. The language we have discussed here is important to deeper discussions of differential geometry.


Differential Geometry on Wikipedia

Differentiable Manifold on Wikipedia

Tangent Space on Wikipedia

Tangent Bundle on Wikipedia

Cotangent Space on Wikipedia

Cotangent Bundle on Wikipedia

Riemannian Manifold on Wikipedia

Pseudo-Riemannian Manifold on Wikipedia

Symplectic Manifold on Wikipedia

Differential Geometry of Curves and Surfaces by Manfredo P. do Carmo

Differential Geometry: Bundles, Connections, Metrics and Curvature by Clifford Henry Taubes

Foundations of Differential Geometry by Shoshichi Kobayashi and Katsumi Nomizu

Geometry, Topology, and Physics by Mikio Nakahara

Rotations in Three Dimensions

In Rotating and Reflecting Vectors Using Matrices we learned how to express rotations in 2-dimensional space using certain special 2\times 2 matrices which form a group (see Groups) we call the special orthogonal group in dimension 2, or \text{SO}(2) (together with other matrices which express reflections, they form a bigger group that we call the orthogonal group in 2 dimensions, or \text{O}(2)).

In this post, we will discuss rotations in 3-dimensional space. As we will soon see, notations in 3-dimensional space have certain interesting features not present in the 2-dimensional case, and despite being seemingly simple and mundane, play very important roles in some of the deepest aspects of fundamental physics.

We will first discuss rotations in 3-dimensional space as represented by the special orthogonal group in dimension 3, written as \text{SO}(3).

We recall some relevant terminology from Rotating and Reflecting Vectors Using Matrices. A matrix is called orthogonal if it preserves the magnitude of (real) vectors. The magnitude of the vector v must be equal to the magnitude of the vector Av, for a matrix A, to be orthogonal. Alternatively, we may require, for the matrix A to be orthogonal, that it satisfy the condition

\displaystyle AA^{T}=A^{T}A=I

where A^{T} is the transpose of A and I is the identity matrix. The word “special” denotes that our matrices must have determinant equal to 1. Therefore, the group \text{SO}(3) consists of the 3\times3 orthogonal matrices whose determinant is equal to 1.

The idea of using the group \text{SO}(3) to express rotations in 3-dimensional space may be made more concrete using several different formalisms.

One popular formalism is given by the so-called Euler angles. In this formalism, we break down any arbitrary rotation in 3-dimensional space into three separate rotations. The first, which we write here by \varphi, is expressed as a counterclockwise rotation about the z-axis. The second, \theta, is a counterclockwise rotation about an x-axis that rotates along with the object. Finally, the third, \psi, is expressed as a counterclockwise rotation about a z-axis that, once again, has rotated along with the object. For readers who may be confused, animations of these steps can be found among the references listed at the end of this post.

The matrix which expresses the rotation which is the product of these three rotations can then be written as

\displaystyle g(\varphi,\theta,\psi) = \left(\begin{array}{ccc} \text{cos}(\varphi)\text{cos}(\psi)-\text{cos}(\theta)\text{sin}(\varphi)\text{sin}(\psi) & -\text{cos}(\varphi)\text{sin}(\psi)-\text{cos}(\theta)\text{sin}(\varphi)\text{cos}(\psi) & \text{sin}(\varphi)\text{sin}(\theta) \\ \text{sin}(\varphi)\text{cos}(\psi)+\text{cos}(\theta)\text{cos}(\varphi)\text{sin}(\psi) & -\text{sin}(\varphi)\text{sin}(\psi)+\text{cos}(\theta)\text{cos}(\varphi)\text{cos}(\psi) & -\text{cos}(\varphi)\text{sin}(\theta) \\ \text{sin}(\psi)\text{sin}(\theta) & \text{cos}(\psi)\text{sin}(\theta) & \text{cos}(\theta) \end{array}\right).

The reader may check that, in the case that the rotation is strictly in the xy plane, i.e. \theta and \psi are zero, we will obtain

\displaystyle g(\varphi,\theta,\psi) = \left(\begin{array}{ccc} \text{cos}(\varphi) & -\text{sin}(\varphi) & 0 \\ \text{sin}(\varphi) & \text{cos}(\varphi) & 0 \\ 0 & 0 & 1 \end{array}\right).

Note how the upper left part is an element of \text{SO}(2), expressing a counterclockwise rotation by an angle \varphi, as we might expect.

Contrary to the case of \text{SO}(2), which is an abelian group, the group \text{SO}(3) is not an abelian group. This means that for two elements a and b of \text{SO}(3), the product ab may not always be equal to the product ba. One can check this explicitly, or simply consider rotating an object along different axes; for example, rotating an object first counterclockwise by 90 degrees along the z-axis, and then counterclockwise again by 90 degrees along the x-axis, will not end with the same result as performing the same operations in the opposite order.

We now know how to express rotations in 3-dimensional space using 3\times 3 orthogonal matrices. Now we discuss another way of expressing the same concept, but using “unitary”, instead of orthogonal, matrices. However, first we must revisit rotations in 2 dimensions.

The group \text{SO}(2) is not the only way we have of expressing rotations in 2-dimensions. For example, we can also make use of the unitary (we will explain the meaning of this word shortly) group in 1-dimension, also written \text{U}(1). It is the group formed by the complex numbers with magnitude equal to 1. The elements of this group can always be written in the form e^{i\theta}, where \theta is the angle of our rotation. As we have seen in Connection and Curvature in Riemannian Geometry, this group is related to quantum electrodynamics, as it expresses the gauge symmetry of the theory.

The groups \text{SO}(2) and \text{U}(1) are actually isomorphic. There is a one-to-one correspondence between the elements of \text{SO}(2) and the elements of \text{U}(1) which respects the group operation. In other words, there is a bijective function f:\text{SO}(2)\rightarrow\text{U}(1), which satisfies ab=f(a)f(b) for a, b elements of \text{SO}(2). When two groups are isomorphic, we may consider them as being essentially the same group. For this reason, both \text{SO}(2) and U(1) are often referred to as the circle group.

We can now go back to rotations in 3 dimensions and discuss the group \text{SU}(2), the special unitary group in dimension 2. The word “unitary” is in some way analogous to “orthogonal”, but applies to vectors with complex number entries.

Consider an arbitrary vector

\displaystyle v=\left(\begin{array}{c}v_{1}\\v_{2}\\v_{3}\end{array}\right).

An orthogonal matrix, as we have discussed above, preserves the quantity (which is the square of what we have referred to earlier as the “magnitude” for vectors with real number entries)

\displaystyle v_{1}^{2}+v_{2}^{2}+v_{3}^{2}

while a unitary matrix preserves

\displaystyle v_{1}^{*}v_{1}+v_{2}^{*}v_{2}+v_{3}^{*}v_{3}

where v_{i}^{*} denotes the complex conjugate of the complex number v_{i}. This is the square of the analogous notion of “magnitude” for vectors with complex number entries.

Just as orthogonal matrices must satisfy the condition

\displaystyle AA^{T}=A^{T}A=I,

unitary matrices are required to satisfy the condition

\displaystyle AA^{\dagger}=A^{\dagger}A=I

where A^{\dagger} is the Hermitian conjugate of A, a matrix whose entries are the complex conjugates of the entries of the transpose A^{T} of A.

An element of the group \text{SU}(2) is therefore a 2\times 2 unitary matrix whose determinant is equal to 1. Like the group \text{SO}(3), the group \text{SU}(2) is also a group which is not abelian.

Unlike the analogous case in 2 dimensions, the groups \text{SO}(3) and \text{SU}(2) are not isomorphic. There is no one-to-one correspondence between them. However, there is a homomorphism from \text{SU}(2) to \text{SO}(3) that is “two-to-one”, i.e. there are always two elements of \text{SU}(2) that get mapped to the same element of \text{SO}(3) under this homomorphism. Hence, \text{SU}(2) is often referred to as a “double cover” of \text{SO}(3).

In physics, this concept underlies the weird behavior of quantum-mechanical objects called spinors (such as electrons), which require a rotation of 720, not 360, degrees to return to its original state!

The groups we have so far discussed are not “merely” groups. They also possesses another kind of mathematical structure. They describe certain shapes which happen to have no sharp corners or edges. Technically, such a shape is called a manifold, and it is the object of study of the branch of mathematics called differential geometry, which we have discussed certain basic aspects of in Geometry on Curved Spaces and Connection and Curvature in Riemannian Geometry.

For the circle group, the manifold that it describes is itself a circle. The elements of the circle group correspond to the points of the circle. The group \text{SU}(2) is the surface of the 4– dimensional sphere, or what we call a 3-sphere (for those who might be confused by the terminology, recall that we are only considering the surface of the sphere, not the entire volume, and this surface is a 3-dimensional, not a 4-dimensional, object). The group \text{SO}(3) is 3-dimensional real projective space, written \mathbb{RP}^{3}. It is a manifold which can be described using the concepts of projective geometry (see Projective Geometry).

A group that is also a manifold is called a Lie group (pronounced like “lee”) in honor of the mathematician Marius Sophus Lie who pioneered much of their study. Lie groups are very interesting objects of study in mathematics because they bring together the techniques of group theory and differential geometry, which teaches us about Lie groups on one hand, and on the other hand also teaches us more about both group theory and differential geometry themselves.


Orthogonal Group on Wikipedia

Rotation Group SO(3) on Wikipedia

Euler Angles on Wikipedia

Unitary Group on Wikipedia

Spinor on Wikipedia

Lie Group on Wikipedia

Real Projective Space on Wikipedia

Algebra by Michael Artin

An Intuitive Introduction to String Theory and (Homological) Mirror Symmetry

String theory is by far the most popular of the current proposals to unify the as of now still incompatible theories of quantum mechanics and general relativity. In this post we will give a short overview of the concepts involved in string theory, but not with the goal of discussing the theory itself in depth (hopefully there will be more posts in the future working towards this task). Instead, we will focus on introducing a very interesting and very beautiful branch of mathematics that arose out of string theory called mirror symmetry. In particular, we will focus on a version of it originally formulated by the mathematician Maxim Kontsevich in 1994 called homological mirror symmetry.

We will start with string theory. String theory started out as a theory of the nuclear forces that held together the protons and electrons in the nucleus of an atom. It was abandoned later on, due to a more successful theory called quantum chromodynamics taking its place. However, it was soon found out that string theory could model the elusive graviton, a particle “carrier” of gravity in the same way that a photon is a particle “carrier” of electromagnetism (the photon is more popularly referred to as a particle of light, but because light itself is an electromagnetic wave, it is also a manifestation of an electromagnetic field), and since then physicists have started developing string theory, no longer in the sole context of nuclear forces, but as a possible candidate for a working theory of quantum gravity.

The incompatibility of quantum mechanics and general relativity (which is currently our accepted theory of gravity) arises from the nonrenormalizability of gravity. In calculations in quantum field theory (see Some Basics of Relativistic Quantum Field Theory and Some Basics of (Quantum) Electrodynamics), there appear certain “nonsensical” quantities which are made sense of via a “corrective” procedure called renormalization (not to be confused with some other procedures called “normalization”). While the way that renormalization works is not really completely understood at the moment, it is known that this procedure at least “works” – this means that it produces the correct values of quantities, as can be checked via experiment.

Renormalization, while it works for the other forces, however fails for gravity. Roughly this is sometimes described as gravity “wildly fluctuating” at the smallest scales. What we know is that this signals, for us, a lack of knowledge of  what physics is like at these extremely small scales (much smaller than the current scale of quantum mechanics).

String theory attempts to solve this conundrum by proposing that particles, at the very smallest scales, are not “particles” at all, but “strings”. This takes care of the problem of fluctuations at the smallest scales, since there is a limit to how small the scale can be, set by the length of the strings. It is perhaps worth noting at this point that the next most popular contender to string theory, loop quantum gravity, tackles this problem by postulating that space itself is not continuous, but “discretized” into units of a certain length. For both theories, this length is predicted to be around 10^{-35} meters, a constant quantity which is known as the Planck length.

Over time, as string theory was developed, it became more ambitious, aiming to provide not only the unification of quantum mechanics and general relativity, but also the unification of the four fundamental forces – electromagnetism, the weak nuclear force, the strong nuclear force, and gravity, under one “theory of everything“. At the same time, it needed more ingredients – to be able to account for bosons, the particles carrying “forces”, such as photons and gravitons, and the fermions, particles that make up matter, such as electrons, protons, and neutrons, a new ingredient had to be added, called supersymmetry. In addition, it worked not in the four dimensions of spacetime that we are used to, but instead required ten dimensions (for the “bosonic” string theory, before supersymmetry, the number of dimensions required was a staggering twenty-six)!

How do we explain spacetime having ten dimensions, when we experience only four? It turns out, even before string theory, the idea of extra dimensions was already explored by the physicists Theodor Kaluza and Oskar Klein. They proposed a theory unifying electromagnetism and gravity by postulating an “extra” dimension which was “curled up” into a loop so small we could never notice it. The usual analogy is that of an ant crossing a wire – when the radius of the wire is big, the ant realizes that it can go sideways along the wire, but when the radius of the wire is small, it is as if there is only one dimension that the ant can move along.

So we now have this idea of six curled up dimensions of spacetime, in addition to the usual four. It turns out that there are so many ways that these dimensions can be curled up. This phenomenon is called the string theory landscape, and it is one of the biggest problems facing string theory today. What could be the specific “shape” in which these dimensions are curled up, and why are they not curled up in some other way? Some string theorists answer this by resorting to the controversial idea of a multiverse, so that there are actually several existing universes, each with its own way of how the extra six dimensions are curled up, and we just happen to be in this one because, perhaps, this is the only one where the laws of physics (determined by the way the dimensions are curled up) are able to support life. This kind of reasoning is called the anthropic principle.

In addition to the string theory landscape, there was also the problem of having several different versions of string theory. These problems were perhaps alleviated by the discovery of mysterious dualities. For example, there is the so-called T-duality, where a compactification (a “curling up”) with a bigger radius gives the same laws of physics as a compactification with a smaller, “reciprocal” radius. Not only do the concept of dualities connect the different ways in which the extra dimensions are curled up, they also connect the several different versions of string theory! In 1995, the physicist Edward Witten conjectured that this is perhaps because all these different versions of string theory come from a single “mother theory”, which he called “M-theory“.

In 1991, physicists Philip Candelas, Xenia de la Ossa, Paul Green, and Linda Parkes used these dualities to solve a mathematical problem that had occupied mathematicians for decades, that of counting curves on a certain manifold (a manifold is a shape without sharp corners or edges) known as a Calabi-Yau manifold. In the context of Calabi-Yau manifolds, which are some of the shapes in which the extra dimensions of spacetime are postulated to be curled up, these dualities are known as mirror symmetry. With the success of Candelas, de la Ossa, Green, and Parkes, mathematicians would take notice of mirror symmetry and begin to study it as a subject of its own.

Calabi-Yau manifolds are but special cases of Kahler manifolds, which themselves are very interesting mathematical objects because they can be studied using three aspects of differential geometry – Riemannian geometry, symplectic geometry, and complex geometry.

We have already encountered examples of Kahler manifolds on this blog – they are the elliptic curves (see Elliptic Curves and The Moduli Space of Elliptic Curves). In fact elliptic curves are not only Kahler manifolds but also Calabi-Yau manifolds, and they are the only two-dimensional Calabi-Yau manifolds (we sometimes refer to them as “one-dimensional” when we are considering “complex dimensions”, as is common practice in algebraic geometry – this apparent “discrepancy” in counting dimensions arises because we need two real numbers to specify a complex number). In string theory of course we consider six-dimensional (three-dimensional when considering complex dimensions) Calabi-Yau manifolds, since there are six extra curled up dimensions of spacetime, but often it is also fruitful to study also the other cases, especially the simpler ones, since they can serve as our guide for the study of the more complicated cases.

Riemannian geometry studies Riemannian manifolds, which are manifolds equipped with a metric tensor, which intuitively corresponds to an “infinitesimal distance formula” dependent on where we are on the manifold. We have already encountered Riemannian geometry before in Geometry on Curved Spaces and Connection and Curvature in Riemannian Geometry. There we have seen that Riemannian geometry is very important in the mathematical formulation of general relativity, since in this theory gravity is just the curvature of spacetime, and the metric tensor expresses this curvature by showing how the formula for the infinitesimal distance between two points (actually the infinitesimal spacetime interval between two events) changes as we move around the manifold.

Symplectic geometry, meanwhile, studies symplectic manifolds. If Riemannian manifolds are equipped with a metric tensor that measures “distances”, symplectic manifolds are equipped with a symplectic form that measures “areas”. The origins of symplectic geometry are actually related to William Rowan Hamilton’s formulation of classical mechanics (see Lagrangians and Hamiltonians), as developed later on by Henri Poincare. There the object of study is phase space, which gives the state of a system based on the position and momentum of the objects that comprise it. It is this phase space that is expressed as a symplectic manifold.

Complex geometry, following our pattern, studies complex manifolds. These are manifolds which locally look like \mathbb{C}^{n}, in the same way that ordinary differentiable manifolds locally look like \mathbb{R}^{n}. Just as Riemannian geometry has metric tensors and symplectic geometry has symplectic forms, complex geometry has complex structures, mappings of tangent spaces with the property that applying them twice is the same as multiplication by -1, mimicking the usual multiplication by the imaginary unit i on the complex plane.

Complex manifolds are not only part of differential geometry, they are also often studied using the methods of algebraic geometry! We recall (see Basics of Algebraic Geometry) that algebraic geometry studies varieties and schemes, which are shapes such as lines, conic sections (parabolas, hyperbolas, ellipses, and circles), and elliptic curves, that can be described by polynomials (their modern definitions are generalizations of this concept). In fact, all Calabi-Yau manifolds can be described by polynomials, such as the following example, due to user Andrew J. Hanson of Wikipedia:


This is a visualization (actually a sort of “cross section”, since we can only display two dimensions and this object is actually six-dimensional) of the Calabi-Yau manifold described by the following polynomial equation:

\displaystyle V^{5}+W^{5}+X^{5}+Y^{5}+Z^{5}=0

This polynomial equation (known as the Fermat quintic) actually describes the Calabi-Yau manifold  in projective space using homogeneous coordinates. This means that we are using the concepts of projective geometry (see Projective Geometry) to include “points at infinity“.

We note at this point that Kahler manifolds and Calabi-Yau manifolds are interesting in their own right, even outside of the context of string theory. For instance, we have briefly mentioned in Algebraic Cycles and Intersection Theory the Hodge conjecture, one of seven “Millenium Problems” for which the Clay Mathematics Institute is currently offering a million-dollar prize, and it concerns Kahler manifolds. Perhaps most importantly, it “unifies” several different branches of mathematics; as we have already seen, the study of Kahler manifolds and Calabi-Yau manifolds involves Riemannian geometry, symplectic geometry, complex geometry, and algebraic geometry. The more recent version of mirror symmetry called homological mirror symmetry further adds category theory and homological algebra to the mix.

Now what mirror symmetry more specifically states is that a version of string theory called Type IIA string theory, on a spacetime with extra dimensions compactified onto a certain Calabi-Yau manifold V, is the same as another version of string theory, called Type IIB string theory, on a spacetime with extra dimensions compactified onto another Calabi-Yau manifold W, which is “mirror” to the Calabi-Yau manifold V.

The statement of homological mirror symmetry (which is still conjectural, but mathematically proven in certain special cases) expresses the idea of the previous paragraph as follows (quoted verbatim from the paper Homological Algebra of Mirror Symmetry by Maxim Kontsevich):

Let (V,\omega) be a 2n-dimensional symplectic manifold with c_{1}(V)=0 and W be a dual n-dimensional complex algebraic manifold.

The derived category constructed from the Fukaya category F(V) (or a suitably enlarged one) is equivalent to the derived category of coherent sheaves on a complex algebraic variety W.

The statement makes use of the language of category theory and homological algebra (see Category TheoryMore Category Theory: The Grothendieck ToposEven More Category Theory: The Elementary ToposExact SequencesMore on Chain Complexes, and The Hom and Tensor Functors), but the idea that it basically expresses is that there exists a relation between the symplectic aspects of the Calabi-Yau manifold V, as encoded in its Fukaya category, and the complex aspects of the Calabi-Yau manifold W, as encoded in its category of coherent sheaves (see Sheaves and More on Sheaves). As we have said earlier, the subjects of algebraic geometry and complex geometry are closely related, and hence the language of sheaves show up in (and is an important part of) both subjects. The concept of derived categories, which generalize derived functors like the Ext and Tor functors, allow us to relate the two categories, which otherwise would be expressing different concepts. Inspired by string theory, therefore, we have now a deep and beautiful idea in geometry, relating its different aspects.

Is string theory the correct way towards a complete theory of quantum gravity, or the so-called “theory of everything”? As of the moment, we don’t know. Quantum gravity is a very difficult problem, and the scales involved are still far out of our reach – in order to probe smaller and smaller scales we need particle accelerators with higher and higher energies, and right now the technologies that we have are still very, very far from the scales which are relevant to quantum gravity. Still, it is hoped for that whatever we find in experiments in the near future, not only in the particle accelerators but also in the radio telescopes that look out into space, will at least guide us towards the correct path.

There are some who believe that, in the absence of definitive experimental evidence, mathematical beauty is our next best guide. And, without a doubt, string theory is related to, and has inspired, some very beautiful and very interesting mathematics, including that which we have discussed in this post. Still, physics, like all natural science, is empirical (based on evidence and observation), and hence it is ultimately physical evidence that will be the judge of correctness. It may yet turn out that string theory is wrong, and that it is a different theory which describes the fundamental physical laws of nature, or that it needs drastic modifications to its ideas. This will not invalidate the mathematics that we have described here, anymore than the discoveries of Copernicus invalidated the mathematics behind the astronomical model of Ptolemy – in fact this mathematics not only outlived the astronomy of Ptolemy, but served the theories of Copernicus, and his successors, just as well. Hence we cannot really say that the efforts of Ptolemy were wasted, since even though his scientific ideas were shown to be wrong, still his mathematical methods were found very useful by those who succeeded him. Thus, while our current technological limitations prohibit us from confirming or ruling out proposals for a theory of quantum gravity such as string theory, there is still much to be gained from such continued efforts on the part of theory, while experiment is still in the process of catching up.

Our search for truth continues. Meanwhile, we have beauty to cultivate.


String Theory on Wikipedia

Mirror Symmetry on Wikipedia

Homological Mirror Symmetry on Wikipedia

Calabi-Yau Manifold on Wikipedia

Kahler Manifold on Wikipedia

Riemannian Geometry on Wikipedia

Symplectic Geometry on Wikipedia

Complex Geometry on Wikipedia

Fukaya Category on Wikipedia

Coherent Sheaf on Wikipedia

Derived Category on Wikipedia

Image by User Andrew J. Hanson of Wikipedia

Homological Algebra of Mirror Symmetry by Maxim Kontsevich

The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory by Brian Greene

String Theory by Joseph Polchinski

String Theory and M-Theory: A Modern Introduction by Katrin Becker, Melanie Becker, and John Schwarz

Some Basics of Statistical Mechanics

The branch of physics now known as statistical mechanics started out as thermodynamics, the study of heat and related concepts. The relation of thermodynamics to the rest of physics, i.e. the relation of heat and motion, was studied by scientists like James Prescott Joule in the 19th century. Due to their efforts, we have the idea that what they used to refer to as “heat” is a form of energy which is transferred from one object to another, manifesting in ways other than the bulk motion of the objects (in particular, as a change in the “internal energy” of the objects involved) .

Energy, a concept that was already associated to the motion of objects (see also Lagrangians and Hamiltonians), can be transferred from one object to another, or one system to another, and in the case of heat, this transfer involves the concept of temperature. Temperature is what we measure on a thermometer, and when we say something is “hot” or “cold”, we are usually referring to its temperature.

The way by which temperature dictates the direction in which heat is transferred is summarized in the second law of thermodynamics (here we give one of its many equivalent statements):

Heat flows from a hotter object to a colder one.

This process of transfer of heat will continue, decreasing the internal energy of the hotter object and increasing the internal energy of the cooler one, until the two objects have equal temperatures, in which case we say that they are in thermal equilibrium.

But if heat is a transfer of energy, and energy is associated to motion, then what was it, exactly, that was moving (or had the capacity to cause something to move)? What is this “internal energy?” For us, who have been taught about atoms and molecules since childhood, the answer might come rather easily. Internal energy is the energy that comes from the motion of the atoms and molecules that comprise the object. But for the scientists who were developing the subject during the 19th century, the concept of atoms and molecules was still in its very early stages, with many of them facing severe criticism for adopting ideas that at the time were still not completely verified.

Still, these scientists continued to take the point of view that the subject of thermodynamics was just the same physics that had already been applied to, say, the motion of cannonballs and pendulums and other objects, except that now they had to be applied to a very large quantity of very small particles (quantum mechanics would later have much to contribute also, but even before the introduction of that theory the concept of atoms and molecules was already starting to become very fruitful in thermodynamics).

Now we have an explanation for what internal energy is in terms of the motion of the particles that make up an object. But what about temperature? It is possible to explain temperature (and therefore the laws that decide the direction of the transfer of heat) using more “basic” concepts such as Newton’s laws of motion, like we have done for the internal energy?

It was the revolutionary ideas of Ludwig Boltzmann that provided the solution. It indeed involved a more “basic” concept, but not one we would usually think of as belonging to the realm of physics or the study of motion. The idea of Boltzmann was to relate temperature to the concepts of information, probability, and statistics, via the notion of entropy. We may therefore think of this era as the time when “thermodynamics” became “statistical mechanics”.

In order to discuss the idea of entropy, for a moment we step away from physics, and discuss instead cards. It is not cards themselves that we are interested in, but information. Entropy is really about information, which is why it also shows up, for instance, when discussing computer passwords. Cards will give us a simple but concrete way to discuss information.

Consider now, therefore, a deck of 52 ordinary playing cards. A hand, of course, consists of five cards. Using the rules of combinatorics, we can find that there are 2,598,960 different hands (combinations of 52 different playing cards taken five at a time, in any order). In the game of poker, there are certain special combinations, the rarest (and highest-ranking) of which is called the “royal flush”. There are only four possible ways to get a royal flush (one for each suit). In contrast, the most common kind of hand is one which has no special combination (sometimes called “no pair”), and there are 1,302,540 different combinations which fit this description.

Now suppose the deck is shuffled and we are dealt a hand. The shuffling process is not entirely random (not in the way that quantum mechanics is), but there are so many things going on that it is near-impossible for us to follow and determine what kind of hand we are going to get. The most we can do is make use of what we know about probability and statistics. We know that it is more likely for us to obtain a no pair rather than a royal flush, simply because there are so many more combinations that are considered a no pair than there are combinations that are considered a royal flush. There are no laws of physics involved in making this prediction; there is only the intuitive idea that an event with more ways of happening is more likely to happen compared to an event with less ways of happening, in the absence of any more information regarding the system.

We now go back to physics. Let us consider a system made up of a very large number of particles. The state of a single particle is specified by its position and momentum, and the state of the entire system is specified by the position and momentum of every one of its particles. This state is almost impossible for us to determine, because there are simply too many particles to keep track of.

However, we may be able to determine properties of the system without having to look at every single particle. Such properties may involve the total energy, pressure, volume, and so on. These properties determine the “macrostate” of a system. The “true” state that may only be specified by the position and momentum of every single particle is called the microstate” of the system. There may be several different microstates that correspond to a single macrostate, just like there are four different combinations that correspond to a royal flush, or 1,302,540 different combinations that correspond to a no pair.

Let the system be in a certain macrostate, and let the number of microstates that correspond to this macrostate be denoted by \Omega. The entropy of the system is then defined as

\displaystyle S=k_{B}\text{ln }\Omega.

where k_{B} is a constant known as Boltzmann’s constant. We may think of this constant and the logarithm as merely convenient ways (in terms of calculation, and in terms of making contact with older ideas in thermodynamics) to express the idea that the higher the number of microstates, the higher the entropy.

Now even though the system may not seem to be changing, imperceptible to us, there may be many things that happen on a microscopic level. Molecules may be moving around in many directions, in motions that are too difficult for us to keep track of, not only because the particles are very small but also because there are just too many of them. This is analogous to the shuffling of cards. All that we have at our disposal are the tools of probability and statistics. Hence the term, “statistical mechanics“.

What have we learned from the example of the shuffling of cards? Even though we could not keep track of things and determine results, we could still make predictions. And the predictions we made were simply of the nature that an event with more ways of happening was more likely to happen than an even with less ways of happening.

Therefore, we have the following restatement of the second law of thermodynamics:

The entropy of a closed system never decreases.

This simply reflects the idea that under these processes we cannot keep track of, the system is more likely to adopt a configuration with more ways of happening, compared to one with less ways of happening. In other words,it will be in a macrostate that will have more microstates. Microscopically, it may happen that “miraculously” the entropy increases; but given how many particles there are, and how many processes happen, this is extremely unlikely to be a sustained phenomenon, and macroscopically, the second law of thermodynamics is always satisfied. This is like obtaining a royal flush on one deal of cards; but if we are going to reshuffle multiple times, it is unlikely that we keep getting royal flushes for a sustained period of time.

The “closed system” requirement is there to ensure that the system is “left to its own devices” so to speak, or that there is no “outside interference”.

Considering that the entirety of the universe is an example of a “closed system” (there is nothing outside of it, since by definition the universe means the collection of everything that exists), the second law of thermodynamics has some interesting (perhaps disturbing, to some people) implications. What we usually consider to be an “ordered” configuration is very specific; for example, a room is only in order when all of the furniture is upright, all the trash is in the wastebasket, and so on. There are fewer such configurations compared to the “disordered” ones, since there are so many ways in which the furniture can be “not upright”, and so many ways in which the trash may be outside of the wastebasket, etc. In other words, disordered configurations have higher entropy. All of these things considered, what the second law of thermodynamics implies is that the entropy of the universe is ever increasing, moving toward an increasing state of disorder, away from the delicate state of order that we now enjoy.

We now want to derive the “macroscopic” from the “microscopic”. We want to connect the “microscopic” concept of entropy to the “macroscopic” concept of temperature. We do this by defining “temperature” as the following relationship between the entropy and the energy (in this case the internal energy, as the system may have other kinds of energy, for example arising from its motion in bulk):

\displaystyle T=\frac{\partial E}{\partial S}

Although we will not discuss the specifics in this post, we make the following claim – the entropy of the system is at its maximum when the system is in thermal equilibrium. Or perhaps more properly, the state of “thermal equilibrium” may be defined as the macrostate which has the most amount of microstates corresponding to it. This in turn explains why heat flows from a hotter object to a cooler one.

We have now discussed some of the most basic concepts in thermodynamics and statistical mechanics. We now briefly discuss certain technical and calculational aspects of the theory. Aside from making the theory more concrete, this is important also because there are many analogies to be made outside of thermodynamics and statistical mechanics. For example, in the Feynman path integral formulation of quantum field theory (see Some Basics of Relativistic Quantum Field Theory) we calculate correlation functions, which mathematically have a form very similar to some of the quantities that we will discuss.

In modern formulations of statistical mechanics, a central role is played by the partition function Z, which is defined as

\displaystyle Z=\sum_{i}e^{-\beta E_{i}}

where \beta, often simply referred to as the “thermodynamic beta”, is defined as

\displaystyle \beta=\frac{1}{k_{B}T}.

The partition function is a very convenient way to package information about the system we are studying, and many quantities of interest can be obtained from it. One of the most important ones is the probability P_{i} for the system to be in a macrostate with energy E_{i}:

\displaystyle P_{i}=\frac{1}{Z}e^{-\beta E_{i}}.

Knowing this formula for the probabilities of certain macrostates allows us to derive the formulas for expectation values of quantities that may be of interest to us, such as the average energy of the system:

\displaystyle \langle E\rangle=\frac{1}{Z}\sum_{i}e^{-\beta E_{i}}.

After some manipulation we may find that the expectation value of the energy is also equal to the following more compact expression:

\displaystyle \langle E\rangle=\frac{\partial \text{ln }Z}{\partial \beta} .

Another familiar quantity that we can obtain from the partition function is the entropy of the system:

\displaystyle S=\frac{\partial (k_{B}T\text{ln }Z)}{\partial T} .

There are various other quantities that can be obtained from the partition function, such as the variance of the energy (or energy fluctuations), the heat capacity, and the so-called Helmholtz free energy. We note that for “continuous” systems, expressions involving sums are replaced by expressions involving integrals. Also, for quantum mechanical systems, there are some modifications, as well as for systems which exchange particles with the environment.

The development of statistical mechanics, and the introduction of the concept of entropy, is perhaps a rather understated revolution in physics. Before Boltzmann’s redefinition of these concepts, physics was thought of as studying only motion, in the classical sense of Newton and his contemporaries. Information has since then taken just as central a role in modern physics as motion.

The mathematician and engineer Claude Elwood Shannon further modernized the notion of entropy by applying it to systems we would not have ordinarily thought of as part of physics, for example the bits on a computer. According to some accounts, Shannon was studying a certain quantity he wanted to name “information”; however, the physicist and mathematician John von Neumann told him that a version of his concept had already been developed before in physics, and was called “entropy”. With Neumann’s encouragement, Shannon adopted the name, symbolically unifying subjects formerly thought of as separate.

Information theory, the subject which Shannon founded, has together with quantum mechanics led to quantum information theory, which not only has many potential applications in technology but also is one of the methods by which we attempt to figure out deep questions regarding the universe.

Another way in which the concept of entropy is involved in modern issues in physics is in the concept of entropic gravity, where gravity, as expressed in Einstein’s general theory of relativity, is derived from more fundamental concepts similar to how the simple statistical concept of entropy gives rise to something that manifests macroscopically as a law of physics. Another part of modern physics where information, quantum mechanics, and general relativity meet is the open problem called the black hole information paradox, which concerns the way in which black holes seemingly do not conserve information, and is a point of contention among many physicists even today.

Finally, we mention another very interesting aspect of statistical mechanics, perhaps, on the surface, a little more mundane compared to what we have mentioned on the preceding paragraphs, but not the slightest bit less interesting – phase transitions. Phase transitions are “abrupt” changes in the property of an object brought about by some seemingly continuous process, like, for example, the freezing of water into ice. We “cool” water, taking away heat from it by some process, and for a long time it seems that nothing happens except that the water becomes colder and colder, but at some point it freezes – an abrupt change, even though we have done just the same thing we did to it before. What really happens, microscopically, is that the molecules have arranged themselves into a some sort of structure, and the material loses some of symmetry (the “disordered” molecules of water were more symmetric than the “ordered” molecules in ice) – a process known as symmetry breaking. Phase transitions and symmetry breaking are ubiquitous in the sciences, and have applications from studying magnets to tackling the problem of why we have observed so much more matter compared to antimatter.


Thermodynamics on Wikipedia

Statistical Mechanics on Wikipedia

Entropy on Wikipedia

Partition Function on Wikipedia

Entropy in Thermodynamics and Information Theory on Wikipedia

Quantum Information on Wikipedia

Black Hole Information Paradox on Wikipedia

Phase Transition on Wikipedia

Symmetry Breaking on Wikipedia

It From Bit – Entropic Gravity for Pedestrians on Science 2.0

Black Hole Information Paradox: An Introduction on Of Particular Significance

Thermal Physics by Charles Kittel and Herbert Kroemer

Fundamentals of Statistical and Thermal Physics by Frederick Reif

A Modern Course in Statistical Physics by Linda Reichl

Book List

There’s a new page on the blog: Book List. It’s far from comprehensive, but I hope to be able to update it from time to time. I don’t intend to put every book on mathematics and physics on the list of course, just the ones I have read and liked, or heavily recommended by other people. I hope to strike a balance between being somewhat comprehensive, with more than one book on the same subject if they happen to complement each other, and yet not listing too many so as not to confuse people with an overwhelming list of multiple books on the same subjects. Links to older, and more comprehensive, book lists (with helpful reviews) are included at the bottom of the page.

Some Basics of Fourier Analysis

Why do we study sine and cosine waves so much? Most waves, like most water waves and most sound waves, do not resemble sine and cosine waves at all (we will henceforth refer to sine and cosine waves as sinusoidal waves).

Well, it turns out that while most waves are not sinusoidal waves, all of them are actually combinations of sinusoidal waves of different sizes and frequencies. Hence we can understand much about essentially any wave simply by studying sinusoidal waves. This idea that any wave is a combination of multiple sinusoidal waves is part of the branch of mathematics called Fourier analysis.

Here’s a suggestion for an experiment from the book Vibrations and Waves by A.P. French: If you speak into the strings of piano (I believe one of the pedals have to be held down first) the strings will vibrate, and since each string corresponds to a sine wave of a certain frequency, it will give you the breakdown of the sine wave components that make up your voice. If a string vibrates more strongly more than others it means there’s a bigger part of that in your voice, i.e. that sine wave component has a bigger amplitude.

More technically, we can express these concepts in the following manner. Let f(x) be a function that is integrable over some interval from x_{0} to x_{0}+P (for a wave, we can take P to be the “period” over which the wave repeats itself). Then over this interval the function can be expressed as the sum of sine and cosine waves of different sizes and frequencies, as follows:

\displaystyle f(x)=\frac{a_{0}}{2}+\sum_{n=1}^{\infty}\bigg(a_{n}\text{cos}\bigg(\frac{2\pi nx}{P}\bigg)+b_{n}\text{sin}\bigg(\frac{2\pi nx}{P}\bigg)\bigg)

This expression is called the Fourier series expansion of the function f(x). The coefficient \frac{a_{0}}{2} is the “level” around which the waves oscillate; the other coefficients a_{n} and b_{n} refer to the amplitude, or the “size” of the respective waves whose frequencies are equal to n. Of course, the bigger the frequency, the “faster” these waves oscillate.

Now given a function f(x) that satisfies the condition given earlier, how do we know what sine and cosine waves make it up? For this we must know what the coefficients a_{n} and b_{n} are.

In order to solve for a_{n} and b_{n}, we will make use of the property of the sine and cosine functions called orthogonality (the rest of the post will make heavy use of the language of calculus, therefore the reader might want to look at An Intuitive Introduction to Calculus):

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{cos}(nx)dx=0    if m\neq n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{cos}(nx)dx=1    if m=n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{sin}(mx)\text{sin}(nx)dx=0    if m\neq n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{sin}(mx)\text{sin}( nx)dx=1    if m=n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{sin}(nx)dx=0    for all m,n

What this means is that when a sine or cosine function is not properly “paired” then its integral over an interval equal to its period will always be zero. It will only give a nonzero value if it is properly paired, and we can “rescale” this value to make it equal to 1.

Now we can look at the following expression:

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx

Knowing that the function f(x) has a Fourier series expansion as above, we now have

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}\bigg(\frac{a_{0}}{2}+\sum_{n=1}^{\infty}\bigg(a_{n}\text{cos}\bigg(\frac{2\pi nx}{P}\bigg)+b_{n}\text{sin}\bigg(\frac{2\pi nx}{P}\bigg)\bigg)\text{cos}(\frac{2\pi x}{P})dx.

But we know that integrals involving the cosine function will always be zero unless it is properly paired; therefore it will be zero for all terms of the infinite series except for one, in which case it will yield (the constants are all there to properly scale the result)

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}\bigg(a_{1}\text{cos}\bigg(\frac{2\pi x}{P}\bigg)\text{cos}(\frac{2\pi x}{P})dx

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=a_{1}.

We have therefore used the orthogonality property of the cosine function to “filter” a single frequency component out of the many that make up our function.

Next we might use \text{cos}(\frac{4\pi x}{P}) instead of \text{cos}(\frac{2\pi x}{P}). This will give us

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{4\pi x}{P})dx=a_{2}.

We can continue to the procedure to solve for the coefficients a_{3}, a_{4}, and so on, and we can replace the cosine function by the sine function to solve for the coefficients b_{1}, b_{2}, and so on. Of course, the coefficient a_{0} can also be obtained by using \text{cos}(0)=1.

In summary, we can solve for the coefficients using the following formulas:

\displaystyle a_{n}=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi nx}{P})dx

\displaystyle b_{n}=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{sin}(\frac{2\pi nx}{P})dx

Now that we have shown how a function can be “broken down” or “decomposed” into a (possibly infinite) sum of sine and cosine waves of different amplitudes and frequencies, we now revisit the relationship between the sine and cosine functions and the exponential function (see “The Most Important Function in Mathematics”) in order to give us yet another expression for the Fourier series. We recall that, combining the concepts of the exponential function and complex numbers we have the beautiful and important equation

\displaystyle e^{ix}=\text{cos}(x)+i\text{sin}(x)

which can also be expressed in the following forms:

\displaystyle \text{cos}(x)=\frac{e^{ix}+e^{-ix}}{2}

\displaystyle \text{sin}(x)=\frac{e^{ix}-e^{-ix}}{2i}.

Using these expressions, we can rewrite the Fourier series of a function in a more “shorthand” form:

\displaystyle f(x)=\sum_{n=-\infty}^{\infty}c_{n}e^{\frac{2\pi i nx}{P}}


\displaystyle c_{n}=\frac{1}{P}\int_{x_{0}}^{x_{0}+P}f(x)e^{-\frac{2\pi i nx}{P}}dx.

Finally, we discuss more concepts related to the process we used in solving for the coefficients a_{n}, b_{n}, and c_{n}. As we have already discussed, these coefficients express “how much” of the waves with frequency equal to n are in the function f(x). We can now abstract this idea to define the Fourier transform \hat{f}(k) of a function f(x) as follows:

\displaystyle \hat{f}(k)=\int_{-\infty}^{\infty}f(x)e^{-2\pi i kx}dx

There are of course versions of the Fourier transform that use the sine and cosine functions instead of the exponential function, but the form written above is more common in the literature. Roughly, the Fourier transform \hat{f}(k) also expresses “how much” of the waves with frequency equal to k are in the function f(x). The difference lies in the interval over which we are integrating; however, we may consider the formula for obtaining the coefficients of the Fourier series as taking the Fourier transform of a single cycle of a periodic function, with its value set to 0 outside of the interval occupied by the cycle, and with variables appropriately rescaled.

The Fourier transform has an “inverse”, which allows us to recover f(x) from \hat{f}(k):

\displaystyle f(x)=\int_{-\infty}^{\infty}\hat{f}(k)e^{2\pi i kx}dk.

Fourier analysis, aside from being an interesting subject in itself, has many applications not only in other branches of mathematics and also in the natural sciences and in engineering. For example, in physics, the Heisenberg uncertainty principle of quantum mechanics (see More Quantum Mechanics: Wavefunctions and Operators) comes from the result in Fourier analysis that the more a function is “localized” around a small area, the more its Fourier transform will be spread out over all of space, and vice-versa. Since the probability amplitudes for the position and the momentum are related to each other as the Fourier transform and inverse Fourier transform of each other (a result of the de Broglie relations), this manifests in the famous principle that the more we know about the position, the less we know about the momentum, and vice-versa.

Fourier analysis can even be used to explain the distinctive “distorted” sound of electric guitars in rock and heavy metal music. Usually, plucking a guitar string produces a sound wave which is sinusoidal. For electric guitars, the sound is amplified using transistors; however, there is a limit to how much amplification can be done, and at a certain point (technically, this is when the transistor is operating outside of the “linear region”), the sound wave looks like a sine function with its peaks and troughs “clipped”. In Fourier analysis this corresponds to an addition of higher-frequency components, and this results in the distinctive sound of that genre of music.

Yet another application of Fourier analysis, and in fact its original application, is the study of differential equations. The mathematician Joseph Fourier, after whom Fourier analysis is named, developed the techniques we have discussed in this post in order to study the differential equation expressing the flow of heat in a material. It so happens that difficult calculations, for example differentiation, involving a function correspond to easier ones, such as simple multiplication, involving its Fourier transform. Therefore it is a common technique to convert a difficult problem to a simple one using the Fourier transform, and after the problem has been solved, we use the inverse Fourier transform to get the solution to the original problem.

Despite the crude simplifications we have assumed in order to discuss Fourier analysis in this post, the reader should know that it remains a deep and interesting subject in modern mathematics. A more general and more advanced form of the subject is called harmonic analysis, and it is one of the areas where there is much research, both on its own, and in connection to other subjects.


Fourier Analysis on Wikipedia

Fourier Series on Wikipedia

Fourier Transform on Wikipedia

Harmonic Analysis on Wikipedia

Vibrations and Waves by A.P. French

Fourier Analysis: An Introduction by Elias M. Stein and Rami Shakarchi

Connection and Curvature in Riemannian Geometry

In Geometry on Curved Spaces, we showed how different geometry can be when we are working on curved space instead of flat space, which we are usually more familiar with. We used the concept of a metric to express how the distance formula changes depending on where we are on this curved space. This gives us some way to “measure” the curvature of the space.

We also described the concept of parallel transport, which is in some way even more general than the metric, and can also be used to provide us with some measure of the curvature of a space. Although we can use concepts analogous to parallel transport even without the metric, if we do have a metric on the space and an expression for it, we can relate the concept of parallel transport to the metric, which is perhaps more intuitive. In this post, we formalize the concept of parallel transport by defining the Christoffel symbol and the Riemann curvature tensor, both of which we can obtain given the form of the metric. The Christoffel symbol and the Riemann curvature tensor are examples of the more general concepts of a connection and a curvature form, respectively, which need not be obtained from the metric.

Some Basics of Tensor Notation

First we establish some notation. We have already seen some tensor notation in Some Basics of (Quantum) Electrodynamics, but we explain a little bit more of that notation here, since it will be the language we will work in. Many of the ordinary vectors we are used to, such as the position, will be indexed by superscripts. We refer to these vectors as contravariant vectors. A common convention is to use Latin letters, such as i or j, as indices when we are working with space, and Greek letters, such as \mu and \nu, as indices when we are working with spacetime. Let us consider , for example, spacetime. An event in this spacetime is specified by its 4-position x^{\mu}, where x^{0}=ctx^{1}=xx^{2}=y, and x^{3}=z.

We will use the symbol g_{\mu\nu} for our metric, and we will also often express it as a matrix. For the case of flat spacetime, our metric is given by the Minkowski metric \eta_{\mu\nu}:

\displaystyle \eta_{\mu\nu}=\left(\begin{array}{cccc}-1&0&0&0\\0&1&0&0\\0&0&1&0\\ 0&0&0&1\end{array}\right)

We can use the metric to “raise” and “lower” indices. This is done by multiplying the metric and a vector, and summing over a common index (one will be a superscript and the other a subscript). We have introduced the Einstein summation convention in Some Basics of (Quantum) Electrodynamics, where repeated indices always imply summation, unless explicitly stated otherwise, and we will continue to use this convention for posts discussing differential geometry and the theory of relativity.

Here is an example of “lowering” the index of x^{\nu} in flat spacetime using the metric \eta_{\mu\nu} to obtain a new quantity x_{\mu}:

\displaystyle x_{\mu}=\eta_{\mu\nu}x^{\nu}

Explicitly, the components of the quantity x_{\mu} are given by x_{0}=-ctx_{1}=xx_{2}=y, and x_{3}=z. Note that the “time” component x_{0} has changed sign; this is because \eta_{00}=-1. A quantity such as x_{\mu}, which has a subscript index, is called a covariant vector.

In order to “raise” indices, we need the “inverse metricg^{\mu\nu}. For the Minkowski metric \eta_{\mu\nu}, the inverse metric \eta^{\mu\nu} has the exact same components as \eta_{\mu\nu}, but for more general metrics this may not be the case. The general procedure for obtaining the inverse metric is to consider the expression


where \delta_{\mu}^{\rho} is the Kronecker delta, a quantity that can be expressed as the matrix

\displaystyle \delta_{\mu}^{\rho}=\left(\begin{array}{cccc}1&0&0&0\\0&1&0&0\\0&0&1&0\\ 0&0&0&1\end{array}\right).

As a demonstration of what our notation can do, we recall the formula for the invariant spacetime interval:

\displaystyle (ds)^2=-(cdt)^2+(dx)^2+(dy)^2+(dz)^2

Using tensor notation combined with the Einstein summation convention, this can be written simply as

\displaystyle (ds)^2=\eta_{\mu\nu}dx^{\mu}dx^{\nu}.

The Christoffel Symbol and the Covariant Derivative

We now come back to the Christoffel symbol \Gamma^{\mu}_{\nu\lambda}. The idea behind the Christoffel symbol is that it is used to define the covariant derivative \nabla_{\nu}V^{\mu} of a vector V^{\mu}.

The covariant derivative is a very important concept in differential geometry (and not just in Riemannian geometry). When we take derivatives, we are actually comparing two vectors. To further explain what we mean, we recall that individually the components of the vectors can be thought of as functions on the space, and we recall the expression for the derivative from An Intuitive Introduction to Calculus:

\displaystyle \frac{df}{dx}=\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)} when \epsilon is extremely small (essentially negligible)

More formally, we can write

\displaystyle \frac{df}{dx}=\lim_{\epsilon\to 0}\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)}.

Therefore, employing the language of partial derivatives, we could have written the following partial derivative of the \mu-th component of an m-dimensional vector V^{\mu} on an m-dimensional space with respect to the coordinate x^{\nu}:

\displaystyle \frac{\partial V^{\mu}}{\partial x^{\nu}}=\lim_{\Delta x^{\nu}\to 0}\frac{V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})-V^{\mu}(x_{1},...,x^{\nu},...,x^{m})}{(x^{\nu}+\Delta x^{\nu})-(x^{\nu})}

The problem is that we are comparing vectors from different vector spaces. Recall from Vector Fields, Vector Bundles, and Fiber Bundles that we can think of a vector bundle as having a vector space for every point on the base space. The vector V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) belongs to the vector space on the point (x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}), while the vector V^{\mu}(x_{1},...,x^{\nu},...,x^{m}) belongs to the vector space on the point (x_{1},...,x^{\nu},...,x^{m}). To be able to compare the two vectors we need to “transport” one to the other in the “correct” way, by which we mean parallel transport. Now we have seen in Geometry on Curved Spaces that parallel transport can have weird effects on vectors, and these weird effects are what the Christoffel symbol expresses.

Let \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) denote the vector V^{\mu}(x_{1},...,x^{\nu},...,x^{m}) parallel transported from its original vector space on (x_{1},...,x^{\nu},...,x^{m}) to the vector space on (x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}). The vector \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) is given by the following expression:

\displaystyle \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})=V^{\mu}(x_{1},...,x^{\nu},...,x^{m})-V^{\lambda}(x_{1},...,x^{\nu},...,x^{m})\Gamma^{\mu}_{\nu\lambda}(x_{1},...,x^{\nu},...,x^{m})\Delta x^{\nu}

Therefore the Christoffel symbol provides a “correction” for what happens when we parallel transport a vector from one point to another. This is an example of the concept of a connection, which, like the covariant derivative, is part of more general differential geometry beyond Riemannian geometry. The object that is to be parallel transported may not be a vector, for example when we have more general fiber bundles instead of vector bundles. However, in Riemannian geometry we will usually focus on vector bundles, in particular a special kind of vector bundle called the tangent bundle, which consists of the tangent vectors at a point.

Now there is more than one way to parallel transport a mathematical object, which means that there are many choices of a connection. However, in Riemannian geometry there is a special kind of connection that we will prefer. This is the connection that satisfies the following two properties:

\displaystyle \Gamma^{\mu}_{\nu\lambda}=\Gamma^{\mu}_{\lambda\nu}    (torsion-free)

\displaystyle \nabla_{\rho}g_{\mu\nu}    (metric compatibility)

The connection that satisfies these two properties is the one that can be obtained from the metric via the following formula:

\displaystyle \Gamma^{\mu}_{\nu\lambda}=\frac{1}{2}g^{\mu\sigma}(\partial_{\lambda}g_{\mu\sigma}+\partial_{\mu}g_{\sigma\lambda}-\partial_{\sigma}g_{\lambda\mu}).

The covariant derivative is then defined as

\displaystyle \nabla_{\nu}V^{\mu}=\lim_{\Delta x^{\nu}\to 0}\frac{V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})-\tilde{V}^{\mu}(x_{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})}{(x^{\nu}+\Delta x^{\nu})-(x^{\nu})}.

We are now comparing vectors belonging to the same vector space, and evaluating the expression above leads to the formula for the covariant derivative:

\displaystyle \nabla_{\nu}V^{\mu}=\partial_{\nu}V^{\mu}+\Gamma^{\mu}_{\nu\lambda}V^{\lambda}.

The Riemann Curvature Tensor

Next we consider the quantity known as the Riemann curvature tensor. It is once again related to parallel transport, in the following manner. Consider parallel transporting a vector V^{\sigma} through an “infinitesimal” distance specified by another vector A^{\mu}, and after that, through another infinitesimal distance specified by a yet another vector B^{\nu}. Then we go parallel transport it again in the opposite direction to A^{\mu}, then finally in the opposite direction to B^{\nu}. The path forms a parallelogram, and when the vector V^{\sigma} returns to its starting point it will then be changed by an amount \delta V^{\rho}. We can think of the Riemann curvature tensor as the quantity that relates all of these:

\displaystyle \delta V^{\rho}=R^{\rho}_{\ \sigma\mu\nu}V^{\sigma}A^{\mu}B^{\nu}.

Another way to put this is to consider taking the covariant derivative of the vector V^{\rho} along the same path as described above. The Riemann curvature tensor is then related to this quantity as follows:

\displaystyle \nabla_{\mu}\nabla_{\nu}V^{\rho}-\nabla_{\nu}\nabla_{\mu}V^{\rho}=R^{\rho}_{\ \sigma\mu\nu}V^{\sigma}.

Expanding the left hand side, and using the torsion-free property of the Christoffel symbol, we will find that

\displaystyle R^{\rho}_{\ \sigma\mu\nu}=\partial_{\mu}\Gamma^{\rho}_{\nu\sigma}-\partial_{\nu}\Gamma^{\rho}_{\mu\sigma}+\Gamma^{\rho}_{\mu\lambda}\Gamma^{\lambda}_{\nu\sigma}-\Gamma^{\rho}_{\nu\lambda}\Gamma^{\lambda}_{\mu\sigma}.

For connections other than the torsion-free one that we chose, there will be another part of the expansion of the expression \nabla_{\mu}\nabla_{\nu}-\nabla_{\nu}\nabla_{\mu} called the torsion tensor. For our case, however, we need not worry about it and we can focus on the Riemann curvature tensor.

There is another quantity that can be obtained from the Riemann curvature tensor called the Ricci tensor, denoted by R_{\mu\nu}. It is given by

\displaystyle R_{\mu\nu}=R^{\lambda}_{\ \mu\lambda\nu}.

Following the Einstein summation convention, we sum over the repeated index \lambda, and therefore the resulting quantity will have only two indices instead of four. This is an example of the operation on tensors called contraction. If we raise one index using the metric and contract again, we obtain a quantity called the Ricci scalar, denoted R:

\displaystyle R=R^{\mu}_{\ \mu}

Example: The 2-Sphere

To provide an explicit example of the concepts discussed, we show their specific expressions for the case of a 2-sphere. We will only give the final results here. The explicit computations can be found among the references, but the reader may gain some practice, especially on manipulating tensors, by performing the calculations and checking only the answers here. In any case, since the metric is given, it is only a matter of substituting the relevant quantities into the formulas already given above.

We have already given the expression for the metric of the 2-sphere in Geometry on Curved Spaces. We recall that it in matrix form, it is given by (we change our notation for the radius of the 2-sphere to R_{0} to avoid confusion with the symbol for the Ricci scalar)

\displaystyle g_{mn}= \left(\begin{array}{cc}R_{0}^{2}&0\\ 0&R_{0}^{2}\text{sin}(\theta)^{2}\end{array}\right)

Individually, the components are (we will use \theta and \varphi instead of the numbers 1 and 2 for the indices)

\displaystyle g_{\theta\theta}=R_{0}^{2}

\displaystyle g_{\varphi\varphi}=R_{0}^{2}(\text{sin}(\theta))^{2}

The other components (g_{\theta\varphi} and g_{\varphi\theta}) are all equal to zero.

The Christoffel symbols are therefore given by

\displaystyle \Gamma^{\theta}_{\varphi\varphi}=-\text{sin}(\theta)\text{cos}(\theta)

\displaystyle \Gamma^{\varphi}_{\theta\varphi}=\text{cot}(\theta)

\displaystyle \Gamma^{\varphi}_{\varphi\theta}=\text{cot}(\theta)

The other components (\Gamma^{\theta}_{\theta\theta}, \Gamma^{\theta}_{\theta\varphi}, \Gamma^{\theta}_{\varphi\theta}, \Gamma^{\varphi}_{\theta\theta}, and \Gamma^{\varphi}_{\varphi\varphi}) are all equal to zero.

The components of the Riemann curvature tensor are given by

\displaystyle R^{\theta}_{\ \varphi\theta\varphi}=(\text{sin}(\theta))^{2}

\displaystyle R^{\theta}_{\ \varphi\varphi\theta}=-(\text{sin}(\theta))^{2}

\displaystyle R^{\varphi}_{\ \theta\theta\varphi}=-1

\displaystyle R^{\varphi}_{\ \theta\varphi\theta}=1

The other components (there are still twelve of them, so I won’t bother writing all their symbols down here anymore) are all equal to zero.

The components of the Ricci tensor is

\displaystyle R_{\theta\theta}=1

\displaystyle R_{\varphi\varphi}=(\text{sin}(\theta))^{2}

The other components (R_{\theta\varphi} and R_{\varphi\theta}) are all equal to zero.

Finally, the Ricci scalar is

\displaystyle R=\frac{2}{R_{0}^{2}}

We note that the larger the radius of the 2-sphere, the smaller the curvature. We can see this intuitively, for example, when it comes to the surface of our planet, which appears flat because the radius is so large. If our planet was much smaller, this would not be the case.

Bonus: The Einstein Field Equations of General Relativity

Given what we have discussed in this post, we can now write down here the expression for the Einstein field equations (also known simply as Einstein’s equations) of general relativity. It is given in terms of the Ricci tensor and the metric (of spacetime) via the following equation:

\displaystyle R_{\mu\nu}-\frac{1}{2}Rg_{\mu\nu}+\Lambda g_{\mu\nu}=\frac{8\pi}{c^{4}} GT_{\mu\nu}

where G is the gravitational constant, the same constant that appears in Newton’s law of universal gravitation (which is approximated by Einstein’s equations at certain limiting conditions), c is the speed of light in a vacuum, and T_{\mu\nu} is the energy-momentum tensor (also known as the stress-energy tensor), which gives the “density” of energy and momentum, as well as certain other related concepts, such as the pressure and shear stress. The symbol \Lambda refers to what is known as the cosmological constant, which was not there in Einstein’s original formulation but later added to support his view of an unchanging universe. Later, with the dawn of George Lemaitre’s theory of an expanding universe, later known as the Big Bang theory, the cosmological constant was abandoned. More recently, the universe was found to not only be expanding, but expanding at an accelerating rate, necessitating the return of the cosmological constant, with an interpretation in terms of the “vacuum energy”, also known as “dark energy”. Today the nature of the cosmological constant remains one of the great mysteries of modern physics.

Bonus: Connection and Curvature in Quantum Electrodynamics

The concepts of connection and curvature also appear in quantum field theory, in particular quantum electrodynamics (see Some Basics of (Quantum) Electrodynamics). It is the underlying concept in gauge theory, of which quantum electrodynamics is probably the simplest example. However, it is an example of differential geometry which does not make use of the metric. We consider a fiber bundle, where the base space is flat spacetime (also known as Minkowski spacetime), and the fiber is \text{U}(1), which is the group formed by the complex numbers with magnitude equal to 1, with law of composition given by multiplication (we can also think of this as a circle).

We want the group \text{U}(1) to act on the wave function (or field operator) \psi(x), so that the wave function has a “phase”, i.e. we have e^{i\phi(x)}\psi(x), where e^{i\phi(x)} is a complex number which depends on the location x in spacetime. Note that therefore different values of the wave function at different points in spacetime will have different values of the “phase”. In order to compare, them, we need a connection and a covariant derivative.

The connection we want is given by

\displaystyle i\frac{q}{\hbar c}A_{\mu}

where q is the charge of the electron, \hbar is the normalized Planck’s constant, c is the speed of light in a vacuum, and A_{\mu} is the four-potential of electrodynamics.

The covariant derivative (here written using the symbol D_{\mu})is

\displaystyle D_{\mu}\psi(x)=\partial_{\mu}\psi(x)+i\frac{q}{\hbar c}A_{\mu}\psi(x)

We will also have a concept analogous to the Riemann curvature tensor, called the field strength tensor, denoted F_{\mu\nu}. Of course, our “curvature” in this case is not the literal curvature of spacetime, as we have already specified that our spacetime is flat, but an abstract notion of “curvature” that specifies how the phase of our wavefunction changes as we move around the spacetime. This field strength tensor is given by the following expression:


This may be compared to the expression for the Riemann curvature tensor, where the connection is given by the Christoffel symbols. The first two terms of both expressions are very similar. The difference is that the expression for the Riemann curvature tensor has some extra terms that the expression for the field strength tensor does not have. However, a generalization of this procedure for quantum electrodynamics to groups other than \text{U}(1), called Yang-Mills theory, does feature extra terms in the expression for the field strength tensor that perhaps makes the two more similar.

The concepts we have discussed here can be used to derive the theory of quantum electrodynamics simply from requiring that the Lagrangian (from which we can obtain the equations of motion, see also Lagrangians and Hamiltonians) be invariant under \text{U}(1) transformations, i.e. even if we change the “phase” of the wave function at every point the Lagrangian remains the same. This is an example of what is known as gauge symmetry. Generalized to other groups such as \text{SU}(2) and \text{SU}(3), this is the idea behind gauge theories, which include Yang-Mills theory and leads to the standard model of particle physics.


Christoffel Symbols on Wikipedia

Riemannian Curvature Tensor on Wikipedia

Einstein Field Equations on Wikipedia

Gauge Theory on Wikipedia

Riemann Tensor for Surface of a Sphere on Physics Pages

Ricci Tensor and Curvature Scalar for a Sphere on Physics Pages

Spacetime and Geometry by Sean Carroll

Geometry, Topology, and Physics by Mikio Nakahara

Introduction to Elementary Particle Physics by David J. Griffiths

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder