Some Useful Links: Knots in Physics and Number Theory

In modern times, “knots” have been important objects of study in mathematics. These “knots” are akin to the ones we encounter in ordinary life, except that they don’t have loose ends. For a better idea of what I mean, consider the following picture of what is known as a “trefoil knot“:


More technically, a knot is defined as the embedding of a circle in 3-dimensional space. For more details on the theory of knots, the reader is referred to the following Wikipedia pages:

Knot on Wikipedia

Knot Theory on Wikipedia

One of the reasons why knots have become such a major part of modern mathematical research is because of the work of mathematical physicists such as Edward Witten, who has related them to the Feynman path integral in quantum mechanics (see Lagrangians and Hamiltonians).

Witten, who is very famous for his work on string theory (see An Intuitive Introduction to String Theory and (Homological) Mirror Symmetry) and for being the first, and so far only, physicist to win the prestigious Fields medal, himself explains the relationship between knot theory and quantum mechanics in the following article:

Knots and Quantum Theory by Edward Witten

But knots have also appeared in other branches of mathematics. For example, in number theory, the result in etale cohomology known as Artin-Verdier duality states that the integers are similar to a 3-dimensional object in some sense. In particular, because it has a trivial etale fundamental group (which is kind of an algebraic analogue of the fundamental group discussed in Homotopy Theory and Covering Spaces), it is similar to a 3-sphere (recall the common but somewhat confusing notation that the ordinary sphere we encounter in everyday life is called the 2-sphere, while a circle is also called the 1-sphere).

Note: The fact that a closed 3-dimensional space with a trivial fundamental group is a 3-sphere is the content of a very famous conjecture known as the Poincare conjecture, proved by Grigori Perelman in the early 2000’s.  Perelman refused the million-dollar prize that was supposed to be his reward, as well as the Fields medal.

The prime numbers, because their associated finite fields have one cover for every integer, are like circles, and recalling the definition of knots mentioned above, are therefore like knots on this 3-sphere. This analogy, originally developed by David Mumford and Barry Mazur, is better explained in the following post by Lieven le Bruyn on his blog neverendingbooks:

What is the Knot Associated to a Prime on neverendingbooks

Finally, given what we have discussed, could it be that knot theory can “tie together” (pun intended) physics and number theory? This is the motivation behind the new subject called “arithmetic Chern-Simons theory” which is introduced in the following paper by Minhyong Kim:

Arithmetic Chern-Simons Theory I by Minhyong Kim

Of course, it must also be clarified that this is not the only way by which physics and number theory are related. It is merely another way, a new and not yet thoroughly explored one, by which the unity of mathematics manifests itself via its many different branches helping one another.


Some Useful Links: Quantum Gravity Seminar by John Baez

I have not been able to make posts tackling physics in a while, since I have lately been focusing my efforts on some purely mathematical stuff which I’m trying very hard to understand. Hence my last few posts have been quite focused mostly on algebraic geometry and category theory. Such might perhaps be the trend in the coming days, although of course I still want to make more posts on physics at some point.

Of course, the “purely mathematical” stuff I’ve been posting about is still very much related to physics. For instance, in this post I’m going to link to a webpage collecting notes from seminars by mathematical physicist John Baez on the subject of quantum gravity – and much of it involves concepts from subjects like category theory and algebraic topology (for more on the basics of these subjects from this blog, see Category TheoryHomotopy Theory, and Homology and Cohomology).

Here’s the link:

Seminar by John Baez

As Baez himself says on the page, however, quantum gravity is not the only subject tackled on his seminars. Other subjects include topological quantum field theory, quantization, and gauge theory, among many others.

John Baez also has lots of other useful stuff on his website. One of the earliest mathematics and mathematical physics blogs on the internet is This Week’s Finds in Mathematical Physics, which apparently goes back all the way to 1995, and is one of the inspirations for this blog:

This Week’s Finds in Mathematical Physics by John Baez

Many of the posts on This Week’s Finds in Mathematical Physics show the countless fruitful, productive, and beautiful interactions between mathematics and physics. This is also one of the main goals of this blog – reflected even by the posts which have been focused on mostly “purely mathematical” stuff.

Connection and Curvature in Riemannian Geometry

In Geometry on Curved Spaces, we showed how different geometry can be when we are working on curved space instead of flat space, which we are usually more familiar with. We used the concept of a metric to express how the distance formula changes depending on where we are on this curved space. This gives us some way to “measure” the curvature of the space.

We also described the concept of parallel transport, which is in some way even more general than the metric, and can also be used to provide us with some measure of the curvature of a space. Although we can use concepts analogous to parallel transport even without the metric, if we do have a metric on the space and an expression for it, we can relate the concept of parallel transport to the metric, which is perhaps more intuitive. In this post, we formalize the concept of parallel transport by defining the Christoffel symbol and the Riemann curvature tensor, both of which we can obtain given the form of the metric. The Christoffel symbol and the Riemann curvature tensor are examples of the more general concepts of a connection and a curvature form, respectively, which need not be obtained from the metric.

Some Basics of Tensor Notation

First we establish some notation. We have already seen some tensor notation in Some Basics of (Quantum) Electrodynamics, but we explain a little bit more of that notation here, since it will be the language we will work in. Many of the ordinary vectors we are used to, such as the position, will be indexed by superscripts. We refer to these vectors as contravariant vectors. A common convention is to use Latin letters, such as i or j, as indices when we are working with space, and Greek letters, such as \mu and \nu, as indices when we are working with spacetime. Let us consider , for example, spacetime. An event in this spacetime is specified by its 4-position x^{\mu}, where x^{0}=ctx^{1}=xx^{2}=y, and x^{3}=z.

We will use the symbol g_{\mu\nu} for our metric, and we will also often express it as a matrix. For the case of flat spacetime, our metric is given by the Minkowski metric \eta_{\mu\nu}:

\displaystyle \eta_{\mu\nu}=\left(\begin{array}{cccc}-1&0&0&0\\0&1&0&0\\0&0&1&0\\ 0&0&0&1\end{array}\right)

We can use the metric to “raise” and “lower” indices. This is done by multiplying the metric and a vector, and summing over a common index (one will be a superscript and the other a subscript). We have introduced the Einstein summation convention in Some Basics of (Quantum) Electrodynamics, where repeated indices always imply summation, unless explicitly stated otherwise, and we will continue to use this convention for posts discussing differential geometry and the theory of relativity.

Here is an example of “lowering” the index of x^{\nu} in flat spacetime using the metric \eta_{\mu\nu} to obtain a new quantity x_{\mu}:

\displaystyle x_{\mu}=\eta_{\mu\nu}x^{\nu}

Explicitly, the components of the quantity x_{\mu} are given by x_{0}=-ctx_{1}=xx_{2}=y, and x_{3}=z. Note that the “time” component x_{0} has changed sign; this is because \eta_{00}=-1. A quantity such as x_{\mu}, which has a subscript index, is called a covariant vector.

In order to “raise” indices, we need the “inverse metricg^{\mu\nu}. For the Minkowski metric \eta_{\mu\nu}, the inverse metric \eta^{\mu\nu} has the exact same components as \eta_{\mu\nu}, but for more general metrics this may not be the case. The general procedure for obtaining the inverse metric is to consider the expression


where \delta_{\mu}^{\rho} is the Kronecker delta, a quantity that can be expressed as the matrix

\displaystyle \delta_{\mu}^{\rho}=\left(\begin{array}{cccc}1&0&0&0\\0&1&0&0\\0&0&1&0\\ 0&0&0&1\end{array}\right).

As a demonstration of what our notation can do, we recall the formula for the invariant spacetime interval:

\displaystyle (ds)^2=-(cdt)^2+(dx)^2+(dy)^2+(dz)^2

Using tensor notation combined with the Einstein summation convention, this can be written simply as

\displaystyle (ds)^2=\eta_{\mu\nu}dx^{\mu}dx^{\nu}.

The Christoffel Symbol and the Covariant Derivative

We now come back to the Christoffel symbol \Gamma^{\mu}_{\nu\lambda}. The idea behind the Christoffel symbol is that it is used to define the covariant derivative \nabla_{\nu}V^{\mu} of a vector V^{\mu}.

The covariant derivative is a very important concept in differential geometry (and not just in Riemannian geometry). When we take derivatives, we are actually comparing two vectors. To further explain what we mean, we recall that individually the components of the vectors can be thought of as functions on the space, and we recall the expression for the derivative from An Intuitive Introduction to Calculus:

\displaystyle \frac{df}{dx}=\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)} when \epsilon is extremely small (essentially negligible)

More formally, we can write

\displaystyle \frac{df}{dx}=\lim_{\epsilon\to 0}\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)}.

Therefore, employing the language of partial derivatives, we could have written the following partial derivative of the \mu-th component of an m-dimensional vector V^{\mu} on an m-dimensional space with respect to the coordinate x^{\nu}:

\displaystyle \frac{\partial V^{\mu}}{\partial x^{\nu}}=\lim_{\Delta x^{\nu}\to 0}\frac{V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})-V^{\mu}(x_{1},...,x^{\nu},...,x^{m})}{(x^{\nu}+\Delta x^{\nu})-(x^{\nu})}

The problem is that we are comparing vectors from different vector spaces. Recall from Vector Fields, Vector Bundles, and Fiber Bundles that we can think of a vector bundle as having a vector space for every point on the base space. The vector V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) belongs to the vector space on the point (x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}), while the vector V^{\mu}(x_{1},...,x^{\nu},...,x^{m}) belongs to the vector space on the point (x_{1},...,x^{\nu},...,x^{m}). To be able to compare the two vectors we need to “transport” one to the other in the “correct” way, by which we mean parallel transport. Now we have seen in Geometry on Curved Spaces that parallel transport can have weird effects on vectors, and these weird effects are what the Christoffel symbol expresses.

Let \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) denote the vector V^{\mu}(x_{1},...,x^{\nu},...,x^{m}) parallel transported from its original vector space on (x_{1},...,x^{\nu},...,x^{m}) to the vector space on (x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}). The vector \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) is given by the following expression:

\displaystyle \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})=V^{\mu}(x_{1},...,x^{\nu},...,x^{m})-V^{\lambda}(x_{1},...,x^{\nu},...,x^{m})\Gamma^{\mu}_{\nu\lambda}(x_{1},...,x^{\nu},...,x^{m})\Delta x^{\nu}

Therefore the Christoffel symbol provides a “correction” for what happens when we parallel transport a vector from one point to another. This is an example of the concept of a connection, which, like the covariant derivative, is part of more general differential geometry beyond Riemannian geometry. The object that is to be parallel transported may not be a vector, for example when we have more general fiber bundles instead of vector bundles. However, in Riemannian geometry we will usually focus on vector bundles, in particular a special kind of vector bundle called the tangent bundle, which consists of the tangent vectors at a point.

Now there is more than one way to parallel transport a mathematical object, which means that there are many choices of a connection. However, in Riemannian geometry there is a special kind of connection that we will prefer. This is the connection that satisfies the following two properties:

\displaystyle \Gamma^{\mu}_{\nu\lambda}=\Gamma^{\mu}_{\lambda\nu}    (torsion-free)

\displaystyle \nabla_{\rho}g_{\mu\nu}    (metric compatibility)

The connection that satisfies these two properties is the one that can be obtained from the metric via the following formula:

\displaystyle \Gamma^{\mu}_{\nu\lambda}=\frac{1}{2}g^{\mu\sigma}(\partial_{\lambda}g_{\mu\sigma}+\partial_{\mu}g_{\sigma\lambda}-\partial_{\sigma}g_{\lambda\mu}).

The covariant derivative is then defined as

\displaystyle \nabla_{\nu}V^{\mu}=\lim_{\Delta x^{\nu}\to 0}\frac{V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})-\tilde{V}^{\mu}(x_{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})}{(x^{\nu}+\Delta x^{\nu})-(x^{\nu})}.

We are now comparing vectors belonging to the same vector space, and evaluating the expression above leads to the formula for the covariant derivative:

\displaystyle \nabla_{\nu}V^{\mu}=\partial_{\nu}V^{\mu}+\Gamma^{\mu}_{\nu\lambda}V^{\lambda}.

The Riemann Curvature Tensor

Next we consider the quantity known as the Riemann curvature tensor. It is once again related to parallel transport, in the following manner. Consider parallel transporting a vector V^{\sigma} through an “infinitesimal” distance specified by another vector A^{\mu}, and after that, through another infinitesimal distance specified by a yet another vector B^{\nu}. Then we go parallel transport it again in the opposite direction to A^{\mu}, then finally in the opposite direction to B^{\nu}. The path forms a parallelogram, and when the vector V^{\sigma} returns to its starting point it will then be changed by an amount \delta V^{\rho}. We can think of the Riemann curvature tensor as the quantity that relates all of these:

\displaystyle \delta V^{\rho}=R^{\rho}_{\ \sigma\mu\nu}V^{\sigma}A^{\mu}B^{\nu}.

Another way to put this is to consider taking the covariant derivative of the vector V^{\rho} along the same path as described above. The Riemann curvature tensor is then related to this quantity as follows:

\displaystyle \nabla_{\mu}\nabla_{\nu}V^{\rho}-\nabla_{\nu}\nabla_{\mu}V^{\rho}=R^{\rho}_{\ \sigma\mu\nu}V^{\sigma}.

Expanding the left hand side, and using the torsion-free property of the Christoffel symbol, we will find that

\displaystyle R^{\rho}_{\ \sigma\mu\nu}=\partial_{\mu}\Gamma^{\rho}_{\nu\sigma}-\partial_{\nu}\Gamma^{\rho}_{\mu\sigma}+\Gamma^{\rho}_{\mu\lambda}\Gamma^{\lambda}_{\nu\sigma}-\Gamma^{\rho}_{\nu\lambda}\Gamma^{\lambda}_{\mu\sigma}.

For connections other than the torsion-free one that we chose, there will be another part of the expansion of the expression \nabla_{\mu}\nabla_{\nu}-\nabla_{\nu}\nabla_{\mu} called the torsion tensor. For our case, however, we need not worry about it and we can focus on the Riemann curvature tensor.

There is another quantity that can be obtained from the Riemann curvature tensor called the Ricci tensor, denoted by R_{\mu\nu}. It is given by

\displaystyle R_{\mu\nu}=R^{\lambda}_{\ \mu\lambda\nu}.

Following the Einstein summation convention, we sum over the repeated index \lambda, and therefore the resulting quantity will have only two indices instead of four. This is an example of the operation on tensors called contraction. If we raise one index using the metric and contract again, we obtain a quantity called the Ricci scalar, denoted R:

\displaystyle R=R^{\mu}_{\ \mu}

Example: The 2-Sphere

To provide an explicit example of the concepts discussed, we show their specific expressions for the case of a 2-sphere. We will only give the final results here. The explicit computations can be found among the references, but the reader may gain some practice, especially on manipulating tensors, by performing the calculations and checking only the answers here. In any case, since the metric is given, it is only a matter of substituting the relevant quantities into the formulas already given above.

We have already given the expression for the metric of the 2-sphere in Geometry on Curved Spaces. We recall that it in matrix form, it is given by (we change our notation for the radius of the 2-sphere to R_{0} to avoid confusion with the symbol for the Ricci scalar)

\displaystyle g_{mn}= \left(\begin{array}{cc}R_{0}^{2}&0\\ 0&R_{0}^{2}\text{sin}(\theta)^{2}\end{array}\right)

Individually, the components are (we will use \theta and \varphi instead of the numbers 1 and 2 for the indices)

\displaystyle g_{\theta\theta}=R_{0}^{2}

\displaystyle g_{\varphi\varphi}=R_{0}^{2}(\text{sin}(\theta))^{2}

The other components (g_{\theta\varphi} and g_{\varphi\theta}) are all equal to zero.

The Christoffel symbols are therefore given by

\displaystyle \Gamma^{\theta}_{\varphi\varphi}=-\text{sin}(\theta)\text{cos}(\theta)

\displaystyle \Gamma^{\varphi}_{\theta\varphi}=\text{cot}(\theta)

\displaystyle \Gamma^{\varphi}_{\varphi\theta}=\text{cot}(\theta)

The other components (\Gamma^{\theta}_{\theta\theta}, \Gamma^{\theta}_{\theta\varphi}, \Gamma^{\theta}_{\varphi\theta}, \Gamma^{\varphi}_{\theta\theta}, and \Gamma^{\varphi}_{\varphi\varphi}) are all equal to zero.

The components of the Riemann curvature tensor are given by

\displaystyle R^{\theta}_{\ \varphi\theta\varphi}=(\text{sin}(\theta))^{2}

\displaystyle R^{\theta}_{\ \varphi\varphi\theta}=-(\text{sin}(\theta))^{2}

\displaystyle R^{\varphi}_{\ \theta\theta\varphi}=-1

\displaystyle R^{\varphi}_{\ \theta\varphi\theta}=1

The other components (there are still twelve of them, so I won’t bother writing all their symbols down here anymore) are all equal to zero.

The components of the Ricci tensor is

\displaystyle R_{\theta\theta}=1

\displaystyle R_{\varphi\varphi}=(\text{sin}(\theta))^{2}

The other components (R_{\theta\varphi} and R_{\varphi\theta}) are all equal to zero.

Finally, the Ricci scalar is

\displaystyle R=\frac{2}{R_{0}^{2}}

We note that the larger the radius of the 2-sphere, the smaller the curvature. We can see this intuitively, for example, when it comes to the surface of our planet, which appears flat because the radius is so large. If our planet was much smaller, this would not be the case.

Bonus: The Einstein Field Equations of General Relativity

Given what we have discussed in this post, we can now write down here the expression for the Einstein field equations (also known simply as Einstein’s equations) of general relativity. It is given in terms of the Ricci tensor and the metric (of spacetime) via the following equation:

\displaystyle R_{\mu\nu}-\frac{1}{2}Rg_{\mu\nu}+\Lambda g_{\mu\nu}=\frac{8\pi}{c^{4}} GT_{\mu\nu}

where G is the gravitational constant, the same constant that appears in Newton’s law of universal gravitation (which is approximated by Einstein’s equations at certain limiting conditions), c is the speed of light in a vacuum, and T_{\mu\nu} is the energy-momentum tensor (also known as the stress-energy tensor), which gives the “density” of energy and momentum, as well as certain other related concepts, such as the pressure and shear stress. The symbol \Lambda refers to what is known as the cosmological constant, which was not there in Einstein’s original formulation but later added to support his view of an unchanging universe. Later, with the dawn of George Lemaitre’s theory of an expanding universe, later known as the Big Bang theory, the cosmological constant was abandoned. More recently, the universe was found to not only be expanding, but expanding at an accelerating rate, necessitating the return of the cosmological constant, with an interpretation in terms of the “vacuum energy”, also known as “dark energy”. Today the nature of the cosmological constant remains one of the great mysteries of modern physics.

Bonus: Connection and Curvature in Quantum Electrodynamics

The concepts of connection and curvature also appear in quantum field theory, in particular quantum electrodynamics (see Some Basics of (Quantum) Electrodynamics). It is the underlying concept in gauge theory, of which quantum electrodynamics is probably the simplest example. However, it is an example of differential geometry which does not make use of the metric. We consider a fiber bundle, where the base space is flat spacetime (also known as Minkowski spacetime), and the fiber is \text{U}(1), which is the group formed by the complex numbers with magnitude equal to 1, with law of composition given by multiplication (we can also think of this as a circle).

We want the group \text{U}(1) to act on the wave function (or field operator) \psi(x), so that the wave function has a “phase”, i.e. we have e^{i\phi(x)}\psi(x), where e^{i\phi(x)} is a complex number which depends on the location x in spacetime. Note that therefore different values of the wave function at different points in spacetime will have different values of the “phase”. In order to compare, them, we need a connection and a covariant derivative.

The connection we want is given by

\displaystyle i\frac{q}{\hbar c}A_{\mu}

where q is the charge of the electron, \hbar is the normalized Planck’s constant, c is the speed of light in a vacuum, and A_{\mu} is the four-potential of electrodynamics.

The covariant derivative (here written using the symbol D_{\mu})is

\displaystyle D_{\mu}\psi(x)=\partial_{\mu}\psi(x)+i\frac{q}{\hbar c}A_{\mu}\psi(x)

We will also have a concept analogous to the Riemann curvature tensor, called the field strength tensor, denoted F_{\mu\nu}. Of course, our “curvature” in this case is not the literal curvature of spacetime, as we have already specified that our spacetime is flat, but an abstract notion of “curvature” that specifies how the phase of our wavefunction changes as we move around the spacetime. This field strength tensor is given by the following expression:


This may be compared to the expression for the Riemann curvature tensor, where the connection is given by the Christoffel symbols. The first two terms of both expressions are very similar. The difference is that the expression for the Riemann curvature tensor has some extra terms that the expression for the field strength tensor does not have. However, a generalization of this procedure for quantum electrodynamics to groups other than \text{U}(1), called Yang-Mills theory, does feature extra terms in the expression for the field strength tensor that perhaps makes the two more similar.

The concepts we have discussed here can be used to derive the theory of quantum electrodynamics simply from requiring that the Lagrangian (from which we can obtain the equations of motion, see also Lagrangians and Hamiltonians) be invariant under \text{U}(1) transformations, i.e. even if we change the “phase” of the wave function at every point the Lagrangian remains the same. This is an example of what is known as gauge symmetry. Generalized to other groups such as \text{SU}(2) and \text{SU}(3), this is the idea behind gauge theories, which include Yang-Mills theory and leads to the standard model of particle physics.


Christoffel Symbols on Wikipedia

Riemannian Curvature Tensor on Wikipedia

Einstein Field Equations on Wikipedia

Gauge Theory on Wikipedia

Riemann Tensor for Surface of a Sphere on Physics Pages

Ricci Tensor and Curvature Scalar for a Sphere on Physics Pages

Spacetime and Geometry by Sean Carroll

Geometry, Topology, and Physics by Mikio Nakahara

Introduction to Elementary Particle Physics by David J. Griffiths

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder

Some Basics of (Quantum) Electrodynamics

There are only four fundamental forces as far as we know, and every force that we know of can ultimately be considered as manifestations of these four. These four are electromagnetism, the weak nuclear force, the strong nuclear force, and gravity. Among them, the one we are most familiar with is electromagnetism, both in terms of our everyday experience (where it is somewhat on par with gravity) and in terms of our physical theories (where our understanding of electrodynamics is far ahead of our understanding of the other three forces, including, and especially, gravity).

Electromagnetism is dominant in everyday life because the weak and strong nuclear forces have a very short range, and because gravity is very weak. Now gravity doesn’t seem weak at all, especially if we have experienced falling on our face at some point in our lives. But that’s only because the “source” of this gravity, our planet, is very large. But imagine a small pocket-sized magnet lifting, say an iron nail, against the force exerted by the Earth’s gravity. This shows how much stronger the electromagnetic force is compared to gravity. Maybe we should be thankful that gravity is not on the same level of strength, or falling on our face would be so much more painful.

It is important to note also, that atoms, which make up everyday matter, are themselves made up of charged particles – electrons and protons (there are also neutrons, which are uncharged). Electromagnetism therefore plays an important part, not only in keeping the “parts” of an atom together, but also in “joining” different atoms together to form molecules, and other larger structures like crystals. It might be gravity that keeps our planet together, but for less massive objects like a house, or a car, or a human body, it is electromagnetism that keeps them from falling apart.

Aside from electromagnetism being the one fundamental force we are most familiar with, another reason to study it is that it is the “template” for our understanding of the rest of the fundamental forces, including gravity. In Einstein’s general theory of relativity, gravity is the curvature of spacetime; it appears that this gives it a nature different from the other fundamental forces. But even then, the expression for this curvature, in terms of the Riemann curvature tensor, is very similar in form to the equation for the electromagnetic fields in terms of the field strength tensor.

The electromagnetic fields, which we shall divide into the electric field and the magnetic field, are vector fields (see Vector Fields, Vector Bundles, and Fiber Bundles), which means that they have a value (both magnitude and direction) at every point in space. A charged particle in an electric or magnetic field (or both) will experience a force according to the Lorentz force law:

\displaystyle F_{x}=q(E_{x}+v_{y}B_{z}-v_{z}B_{y})

\displaystyle F_{y}=q(E_{y}+v_{z}B_{x}-v_{x}B_{z})

\displaystyle F_{z}=q(E_{z}+v_{x}B_{y}-v_{y}B_{x})

where F_{x}, F_{y}, and F_{z} are the three components of the force, in the x, y, and z direction, respectively; E_{x}, E_{y}, and E_{z} are the three components of the electric field;  B_{x}, B_{y}, B_{z} are the three components of the magnetic field; v_{x}, v_{y}, v_{z} are the three components of the velocity of the particle, and q is its charge. Newton’s second law (see My Favorite Equation in Physics) gives us the motion of an object given the force acting on it (and its mass), so together with the Lorentz force law, we can determine the motion of charged particles in electric and magnetic fields.

The Lorentz force law is extremely important in electrodynamics and we will keep the following point in mind throughout this discussion:

The Lorentz force law tells us how charges move under the influence of electric and magnetic fields.

Instead of discussing electrodynamics in terms of these fields, however, we will instead focus on the electric and magnetic potentials, which together form what is called the four-potential and are related to the fields in terms of the following equations:

\displaystyle E_{x}=-\frac{1}{c}\frac{\partial A_{x}}{\partial t}-\frac{\partial A_{t}}{\partial x}

\displaystyle E_{y}=-\frac{1}{c}\frac{\partial A_{y}}{\partial t}-\frac{\partial A_{t}}{\partial y}

\displaystyle E_{z}=-\frac{1}{c}\frac{\partial A_{z}}{\partial t}-\frac{\partial A_{t}}{\partial z}

\displaystyle B_{x}=\frac{\partial A_{z}}{\partial y}-\frac{\partial A_{y}}{\partial z}

\displaystyle B_{y}=\frac{\partial A_{x}}{\partial z}-\frac{\partial A_{z}}{\partial x}

\displaystyle B_{z}=\frac{\partial A_{y}}{\partial x}-\frac{\partial A_{x}}{\partial y}

The values of the potentials, as functions of space and time, are related to the distribution of charges and currents by the very famous set of equations called Maxwell’s equations:

\displaystyle -\frac{\partial^{2} A_{t}}{\partial x^{2}}-\frac{\partial^{2} A_{t}}{\partial y^{2}}-\frac{\partial^{2} A_{t}}{\partial z^{2}}-\frac{\partial^{2} A_{x}}{\partial t\partial x}-\frac{\partial^{2} A_{y}}{\partial t\partial y}-\frac{\partial^{2} A_{z}}{\partial t\partial z}=\frac{4\pi}{c}J_{t}

\displaystyle \frac{1}{c^{2}}\frac{\partial^{2} A_{x}}{\partial t^{2}}-\frac{\partial^{2} A_{x}}{\partial y^{2}}-\frac{\partial^{2} A_{x}}{\partial z^{2}}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial x\partial t}+\frac{\partial^{2} A_{y}}{\partial x\partial y}+\frac{\partial^{2} A_{z}}{\partial x\partial z}=\frac{4\pi}{c}J_{x}

\displaystyle -\frac{\partial^{2} A_{y}}{\partial x^{2}}+\frac{1}{c^{2}}\frac{\partial^{2} A_{y}}{\partial t^{2}}-\frac{\partial^{2} A_{y}}{\partial z^{2}}+\frac{\partial^{2} A_{x}}{\partial y\partial x}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial t\partial y}+\frac{\partial^{2} A_{z}}{\partial y\partial z}=\frac{4\pi}{c}J_{y}

\displaystyle -\frac{\partial^{2} A_{z}}{\partial x^{2}}-\frac{\partial^{2} A_{z}}{\partial y^{2}}+\frac{1}{c^{2}}\frac{\partial^{2} A_{z}}{\partial t^{2}}+\frac{\partial^{2} A_{x}}{\partial z\partial x}+\frac{\partial^{2} A_{y}}{\partial z\partial y}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial z\partial t}=\frac{4\pi}{c}J_{z}

Some readers may be more familiar with Maxwell’s equations written in terms of the electric and magnetic fields; in that case, they have individual names: Gauss’ lawGauss’ law for magnetismFaraday’s law, and Ampere’s law (with Maxwell’s addition). When written down in terms of the fields, they can offer more physical intuition – for instance, Gauss’ law for magnetism tells us that the magnetic field has no “divergence”, and is always “solenoidal”. However, we leave this approach to the references for the moment, and focus on the potentials, which will be more useful for us when we relate our discussion to quantum mechanics later on. We will, however, always remind ourselves of the following important point:

Maxwell’s laws tells us the configuration and evolution of the electric and magnetic fields (possibly via the potentials) under the influence of sources (charge and current distributions).

There is one catch (an extremely interesting one) that comes about when dealing with potentials instead of fields. It is called gauge freedom, and is one of the foundations of modern particle physics. However, we will not discuss it in this post. Our equations will remain correct, so the reader need not worry; gauge freedom is not a constraint, but is instead a kind of “symmetry” that will have some very interesting consequences. This concept is left to the references for now, however it is hoped that it will at some time be discussed in this blog.

The way we have wrote down Maxwell’s equations is rather messy. However, we can introduce some notation to write them in a more elegant form. We use what is known as tensor notation; however we will not discuss the concept of tensors in full here. We will just note that because the formula for the spacetime interval contains a sign different from the others, we need two different types of indices for our vectors. The so-called contravariant vectors will be indexed by a superscript, while the so-called covariant vectors will be indexed by a subscript. “Raising” and “lowering” these indices will involve a change in sign for some quantities; we will indicate them explicitly here.

Let x^{0}=ctx^{1}=xx^{2}=yx^{3}=z. Then we will adopt the following notation:

\displaystyle \partial_{\mu}=\frac{\partial}{\partial x^{\mu}}

\displaystyle \partial^{\mu}=\frac{\partial}{\partial x^{\mu}} for \mu=0

\displaystyle \partial^{\mu}=-\frac{\partial}{\partial x^{\mu}} for \mu\neq 0

Let A^{0}=A_{t}A^{1}=A_{x}A^{2}=A_{y}A^{3}=A_{z}. Then Maxwell’s equations can be written as

\displaystyle \sum_{\mu=0}^{3}\partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=\frac{4\pi}{c}J^{\nu}.

We now introduce the so-called Einstein summation convention. Note that the summation is performed over the index that is repeated; and that that one of these indices is a superscript and the other is a subscript. Albert Einstein noticed that almost all summations in his calculations happen in this way, so he adopted the convention that instead of explicitly writing out the summation sign, repeated indices (one superscript and one subscript) would instead indicate that a summation should be performed. Like most modern references, we adopt this notation, and only explicitly say so when there is an exception. This allows us to write Maxwell’s equations as

\displaystyle \partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=\frac{4\pi}{c}J^{\nu}.

We can also use the Einstein summation convention to rewrite other important expressions in physics in more compact form. In particular, it allows us to rewrite the Dirac equation (see Some Basics of Relativistic Quantum Field Theory) as follows:

\displaystyle i\hbar\gamma^{\mu}\partial_{\mu}\psi-mc\psi=0

We now go to the quantum realm and discuss the equations of motion of quantum electrodynamics. Let A_{0}=A_{t}A_{1}=-A_{x}A_{2}=-A_{y}A_{3}=-A_{z}. These equations are given by

\displaystyle \displaystyle i\hbar\gamma^{\mu}\partial_{\mu}\psi-mc\psi=\frac{q}{c}A_{\mu}\psi

\displaystyle \partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=4\pi q\bar{\psi}\gamma^{\mu}\psi

What do these two equations mean?

The first equation looks like the Dirac equation, except that on the right hand side we have a term with both the “potential” (which we now call the Maxwell field, or the Maxwell field operator), the Dirac “wave function” for a particle such as an electron (which, as we have discussed in Some Basics of Relativistic Quantum Field Theory, is actually the Dirac field operator which operates on the “vacuum” state to describe a state with a single electron), as well as the charge. It describes the “motion” of the Dirac field under the influence of the Maxwell field. Hence, this is the quantum mechanical version of the Lorentz force law.

The second equation is none other than our shorthand version of Maxwell equations, and on the right hand side is an explicit expression for the current in terms of the Dirac field and some constants. The symbol \bar{\psi} refers to the “adjoint” of the Dirac field; actually the Dirac field itself has components, although, because of the way it transforms under rotations, we usually do not refer to it as a vector. Hence it can be written as a column matrix (see Matrices), and has a “transpose” which is a row matrix; the “adjoint” is given by the “conjugate transpose” which is a row matrix where all the entries are the complex conjugates of the transpose of the Dirac field.

In general relativity there is this quote, from the physicist John Archibald Wheeler: “Spacetime tells matter how to move; matter tells spacetime how to curve”. One can perhaps think of electrodynamics, whether classical or quantum, in a similar way. Fields tell charges and currents how to move, charges and currents tell fields how they are supposed to be “shaped”. And this is succinctly summarized by the Lorentz force law and Maxwell’s equations, again whether in its classical or quantum version.

As we have seen in Lagrangians and Hamiltonians, the equations of motion are not the only way we can express a physical theory. We can also use the language of Lagrangians and Hamiltonians. In particular, an important quantity in quantum mechanics that involves the Lagrangian and Hamiltonian is the probability amplitude. In order to calculate the probability amplitude, the physicist Richard Feynman developed a method involving the now famous Feynman diagrams, which can be though of as expanding the exponential function (see “The Most Important Function in Mathematics”) in the expression for the probability amplitude and expressing the different terms using diagrams. Just as we have associated the Dirac field to electrons, the Maxwell field is similarly associated to photons. Expressions involving the Dirac field and the Maxwell field can be thought of as electrons “emitting” or “absorbing” photons, or electrons and positrons (the antimatter counterpart of electrons) annihilating each other and creating a photon. The calculated probability amplitudes can then be used to obtain quantities that can be compared to results obtained from experiment, in order to verify the theory.


Lorentz Force on Wikipedia

Electromagnetic Four-Potential on Wikipedia

Maxwell’s Equations on Wikipedia

Quantum Electrodynamics on Wikipedia

Featured Image Produced by CERN

The Douglas Robb Memorial Lectures by Richard Feynman

QED: The Strange Theory of Light and Matter by Richard Feynman

Introduction to Electrodynamics by David J. Griffiths

Introduction to Elementary Particle Physics by David J. Griffiths

Quantum Field Theory by Fritz Mandl and Graham Shaw

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder

Some Basics of Relativistic Quantum Field Theory

So far, on this blog, we have introduced the two great pillars of modern physics, relativity (see From Pythagoras to Einstein) and quantum mechanics (see Some Basics of Quantum Mechanics and More Quantum Mechanics: Wavefunctions and Operators). Although a complete unification between these two pillars is yet to be achieved, there already exists such a unified theory in the special case when gravity is weak, i.e. spacetime is flat. This unification of relativity (in this case special relativity) and quantum mechanics is called relativistic quantum field theory, and we discuss the basic concepts of it in this post.

In From Pythagoras to Einstein, we introduced the formula at the heart of Einstein’s theory of relativity. It is very important to modern physics and is worth writing here again:

\displaystyle -(c\Delta t)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2

This holds only for flat spacetime, however, even in general relativity, where spacetime may be curved, a “local” version still holds:

\displaystyle -(cdt)^2+(dx)^2+(dy)^2+(dz)^2=(ds)^2

The notation comes from calculus (see An Intuitive Introduction to Calculus), and means that this equation holds when the quantities involved are very small.

In this post, however, we shall consider a flat spacetime only. Aside from being “locally” true, as far as we know, in regions where the gravity is not very strong (like on our planet), spacetime is pretty much actually flat.

We recall how we obtained the important equation above; we made an analogy with the distance between two objects in 3D space, and noted how this distance does not change with translation and rotation; if we are using different coordinate systems, we may disagree about the coordinates of the two objects, but even then we will always agree on the distance between them. This distance is therefore “invariant”. But we live not only in a 3D space but in a 4D spacetime, and instead of an invariant distance we have an invariant spacetime interval.

But even in nonrelativistic mechanics, the distance is not the only “invariant”. We have the concept of velocity of an object. Again, if we are positioned and oriented differently in space, we may disagree about the velocity of the object, for me it may be going to the right, and forward away from me; for you it may in front of you and going straight towards you. However, we will always agree about the magnitude of this velocity, also called its speed.

The quantity we call the momentum is related to the velocity of the object; in fact for simple cases it is simply the mass of the object multiplied by the velocity. Once again, two observers may disagree about the momentum, since it involves direction; however they will always agree about the magnitude of the momentum. This magnitude is therefore also invariant.

The velocity, and by extension the momentum, has three components, one for each dimension of space. We write them as v_{x}, v_{y}, and v_{z} for the velocity and p_{x}, p_{y}, and p_{z} for the momentum.

What we want now is a 4D version of the momentum. Three of its components will be the components we already know of, p_{x}, p_{y}, and p_{z}. So we just need its “time” component, and the “magnitude” of this momentum is going to be an invariant.

It turns out that the equation we are looking for is the following (note the similarity of its form to the equation for the spacetime interval):

\displaystyle -\frac{E^{2}}{c^{2}}+p_{x}^{2}+p_{y}^{2}+p_{z}^{2}=-m^{2}c^{2}

The quantity m is the invariant we are looking for (The factors of c are just constants anyway), and it is called the “rest mass” of the object. As an effect of the unity of spacetime, the mass of an object as seen by an observer actually changes depending on its motion with respect to the observer; however, by definition, the rest mass is the mass of an object as seen by the observer when it is not moving with respect to the observer, therefore, it is an invariant. The quantity E stands for the energy.

Also, when the object is not moving with respect to us, we see no momentum in the x, y, or z direction, and the equation becomes E=mc^{2}, which is the very famous mass-energy equivalence which was published by Albert Einstein during his “miracle year” in 1905.

We now move on to quantum mechanics. In quantum mechanics our observables, such as the position, momentum, and energy, correspond to self-adjoint operators (see More Quantum Mechanics: Wavefunctions and Operators), whose eigenvalues are the values that we obtain when we perform a measurement of the observable corresponding to the operator.

The “momentum operator” (to avoid confusion between ordinary quantities and operators, we will introduce here the “hat” symbol on our operators) corresponding to the x component of the momentum is given by

\displaystyle \hat{p_{x}}=-i\hbar\frac{\partial}{\partial x}

The eigenvalue equation means that when we measure the x component of the momentum of a quantum system in the state represented by the wave function \psi(x,y,z,t), which is an eigenvector of the momentum operator, then then the measurement will yield the value p_{x}, where p_{x} is the eigenvalue correponding to \psi(x,y,z,t) (see Eigenvalues and Eigenvectors), i.e.

\displaystyle -i\hbar\frac{\partial \psi(x,y,z,t)}{\partial x}=p_{x}\psi(x,y,z,t)

Analogues exist of course for the y and z components of the momentum.

Meanwhile, we also have an energy operator given by

\displaystyle \hat{E}=i\hbar\frac{\partial}{\partial t}

To obtain a quantum version of the important equation above relating the energy, momentum, and the mass, we need to replace the relevant quantities by the corresponding operators acting on the wave function. Therefore, from

\displaystyle -\frac{E^{2}}{c^{2}}+p_{x}^{2}+p_{y}^{2}+p_{z}^{2}=-m^{2}c^{2}

we obtain an equation in terms of operators

\displaystyle -\frac{\hat{E}^{2}}{c^{2}}+\hat{p}_{x}^{2}+\hat{p}_{y}^{2}+\hat{p}_{z}^{2}=-m^{2}c^{2}

or explicitly, with the wavefunction,

\displaystyle \frac{\hbar^{2}}{c^{2}}\frac{\partial^{2}\psi}{\partial t^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial x^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial y^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial z^{2}}=-m^{2}c^{2}\psi.

This equation is called the Klein-Gordon equation.

The Klein-Gordon equation is a second-order differential equation. It can be “factored” in order to obtain two first-order differential equations, both of which are called the Dirac equation.

We elaborate more on what we mean by “factoring”. Suppose we have a quantity which can be written as a^{2}-b^{2}. From basic high school algebra, we know that we can “factor” it as (a+b)(a-b). Now suppose we have p_{x}=p_{y}=p_{z}=0. We can then write the Klein-Gordon equation as


which factors into





These are the kinds of equations that we want. However, the case where the momentum is nonzero complicates things. The solution of the physicist Paul Dirac was to introduce matrices (see Matrices) as coefficients. These matrices (there are four of them) are 4\times 4 matrices with complex coefficients, and are explicitly written down as follows:

\displaystyle \gamma^{0}=\left(\begin{array}{cccc}1&0&0&0\\ 0&1&0&0\\0&0&-1&0\\0&0&0&-1\end{array}\right)

\displaystyle \gamma^{1}=\left(\begin{array}{cccc}0&0&0&1\\ 0&0&1&0\\0&-1&0&0\\-1&0&0&0\end{array}\right)

\displaystyle \gamma^{2}=\left(\begin{array}{cccc}0&0&0&-i\\ 0&0&i&0\\0&i&0&0\\-i&0&0&0\end{array}\right)

\displaystyle \gamma^{3}=\left(\begin{array}{cccc}0&0&1&0\\ 0&0&0&-1\\-1&0&0&0\\0&1&0&0\end{array}\right).

Using the laws of matrix multiplication, one can verify the following properties of these matrices (usually called gamma matrices):



\gamma^{\mu}\gamma^{\nu}=-\gamma^{\mu}\gamma^{\nu} for \mu\neq\nu.

With the help of these properties, we can now factor the Klein-Gordon equation as follows:

\displaystyle \frac{\hat{E}^{2}}{c^{2}}-\hat{p}_{x}^{2}-\hat{p}_{y}^{2}-\hat{p}_{z}^{2}-m^{2}c^{2}=0

\displaystyle (\gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}+mc)(\gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}-mc)=0

\displaystyle \gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}+mc=0

\displaystyle \gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}-mc=0

Both of the last two equations are known as the Dirac equation, although for purposes of convention, we usually use the last one. Writing the operators and the wave function explicitly, this is

\displaystyle i\hbar\gamma^{0}\frac{\partial\psi}{c\partial t}+i\hbar\gamma^{1}\frac{\partial\psi}{\partial x}+i\hbar\gamma^{2}\frac{\partial\psi}{\partial y}+i\hbar\gamma^{3}\frac{\partial\psi}{\partial z}-mc\psi=0

We now have the Klein-Gordon equation and the Dirac equation, both of which are important in relativistic quantum field theory. In particular, the Klein-Gordon equation is used for “scalar” fields while the Dirac equation is used for “spinor” fields. This is related to how they “transform” under rotations (which, in relativity, includes “boosts” – rotations that involve both space and time). A detailed discussion of these concepts will be left to the references for now and will perhaps be tackled in future posts.

We will, however, mention one more important (and interesting) phenomenon in relativistic quantum mechanics. The equation E=mc^{2} allows for the “creation” of particle-antiparticle pairs out of seemingly nothing! Even when there seems to be “not enough energy”, there exists an “energy-time uncertainty principle”, which allows such particle-antiparticle pairs to exist, even for only a very short time. This phenomenon of “creation” (and the related phenomenon of “annihilation”) means we cannot take the number of particles in our system to be fixed.

With this, we need to modify our language to be able to describe a system with varying numbers of particles. We will still use the language of linear algebra, but we will define our “states” differently. In earlier posts in the blog, where we only dealt with a single particle, the “state” of the particle simply gave us information about the position. In the relativistic case (and in other cases where there are varying numbers of particles – for instance, when the system “gain” or “loses” particles from the environment), the number (and kind) of particles need to be taken into account.

We will do this as follows. We first define a state with no particles, which we shall call the “vacuum”. We write it as |0\rangle. Recall that an operator is a function from state vectors to state vectors, hence, an operator acting on a state is another state. We now define a new kind of operator, called the “field” operator \psi, such that the state with a single particle of a certain type, which would have been given by  the wave function \psi in the old language, is now described by the state vector \psi|0\rangle.

Important note: The symbol \psi no longer refers to a state vector, but an operator! The state vector is \psi|0\rangle.

The Klein-Gordon and the Dirac equations still hold of course (otherwise we wouldn’t even have bothered to write them here). It is just important to take note that the symbol \psi now refers to an operator and not a state vector. We might as well write it as \hat{\psi}, but this usually not done in the literature since we will not use \psi for anything else other than to refer to the field operator. Further, if we have a state with several particles, we can write \psi\phi...\theta|0\rangle. This new language is called second quantization, which does not mean “quantize for a second time”, but rather a second version of quantization, since the first version did not have the means to deal with varying numbers of particles.

We have barely scratched the surface of relativistic quantum field theory in this post. Even though much has been made about the quest to unify quantum mechanics and general relativity, there is so much that also needs to be studied in relativistic quantum field theory, and still many questions that need to be answered. Still, relativistic quantum field theory has had many impressive successes – one striking example is the theoretical formulation of the so-called Higgs mechanism, and its experimental verification almost half a century later. The success of relativistic quantum field theory also gives us a guide on how to formulate new theories of physics in the same way that F=ma guided the development of the very theories that eventually replaced it.

The reader is encouraged to supplement what little exposition has been provided in this post by reading the references. The books are listed in increasing order of sophistication, so it is perhaps best to read them in that order too, although The Road to Reality: A Complete Guide to the Laws of Reality by Roger Penrose is a high-level popular exposition and not a textbook, so it is perhaps best read in tandem with Introduction to Elementary Particles by David J. Griffiths, which is a textbook, although it does have special relativity and basic quantum mechanics as prerequisites. One may check the references listed in the blog posts discussing these respective subjects.


Quantum Field Theory on Wikipedia

Klein-Gordon Equation on Wikipedia

Dirac Equation on Wikipedia

Second Quantization on Wikipedia

Featured Image Produced by CERN

The Road to Reality: A Complete Guide to the Laws of Reality by Roger Penrose

Introduction to Elementary Particles by David J. Griffiths

Quantum Field Theory by Fritz Mandl and Graham Shaw

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder