Connection and Curvature in Riemannian Geometry

In Geometry on Curved Spaces, we showed how different geometry can be when we are working on curved space instead of flat space, which we are usually more familiar with. We used the concept of a metric to express how the distance formula changes depending on where we are on this curved space. This gives us some way to “measure” the curvature of the space.

We also described the concept of parallel transport, which is in some way even more general than the metric, and can also be used to provide us with some measure of the curvature of a space. Although we can use concepts analogous to parallel transport even without the metric, if we do have a metric on the space and an expression for it, we can relate the concept of parallel transport to the metric, which is perhaps more intuitive. In this post, we formalize the concept of parallel transport by defining the Christoffel symbol and the Riemann curvature tensor, both of which we can obtain given the form of the metric. The Christoffel symbol and the Riemann curvature tensor are examples of the more general concepts of a connection and a curvature form, respectively, which need not be obtained from the metric.

Some Basics of Tensor Notation

First we establish some notation. We have already seen some tensor notation in Some Basics of (Quantum) Electrodynamics, but we explain a little bit more of that notation here, since it will be the language we will work in. Many of the ordinary vectors we are used to, such as the position, will be indexed by superscripts. We refer to these vectors as contravariant vectors. A common convention is to use Latin letters, such as $i$ or $j$, as indices when we are working with space, and Greek letters, such as $\mu$ and $\nu$, as indices when we are working with spacetime. Let us consider , for example, spacetime. An event in this spacetime is specified by its $4$-position $x^{\mu}$, where $x^{0}=ct$$x^{1}=x$$x^{2}=y$, and $x^{3}=z$.

We will use the symbol $g_{\mu\nu}$ for our metric, and we will also often express it as a matrix. For the case of flat spacetime, our metric is given by the Minkowski metric $\eta_{\mu\nu}$:

$\displaystyle \eta_{\mu\nu}=\left(\begin{array}{cccc}-1&0&0&0\\0&1&0&0\\0&0&1&0\\ 0&0&0&1\end{array}\right)$

We can use the metric to “raise” and “lower” indices. This is done by multiplying the metric and a vector, and summing over a common index (one will be a superscript and the other a subscript). We have introduced the Einstein summation convention in Some Basics of (Quantum) Electrodynamics, where repeated indices always imply summation, unless explicitly stated otherwise, and we will continue to use this convention for posts discussing differential geometry and the theory of relativity.

Here is an example of “lowering” the index of $x^{\nu}$ in flat spacetime using the metric $\eta_{\mu\nu}$ to obtain a new quantity $x_{\mu}$:

$\displaystyle x_{\mu}=\eta_{\mu\nu}x^{\nu}$

Explicitly, the components of the quantity $x_{\mu}$ are given by $x_{0}=-ct$$x_{1}=x$$x_{2}=y$, and $x_{3}=z$. Note that the “time” component $x_{0}$ has changed sign; this is because $\eta_{00}=-1$. A quantity such as $x_{\mu}$, which has a subscript index, is called a covariant vector.

In order to “raise” indices, we need the “inverse metric$g^{\mu\nu}$. For the Minkowski metric $\eta_{\mu\nu}$, the inverse metric $\eta^{\mu\nu}$ has the exact same components as $\eta_{\mu\nu}$, but for more general metrics this may not be the case. The general procedure for obtaining the inverse metric is to consider the expression

$\eta_{\mu\nu}\eta^{\nu\rho}=\delta_{\mu}^{\rho}$

where $\delta_{\mu}^{\rho}$ is the Kronecker delta, a quantity that can be expressed as the matrix

$\displaystyle \delta_{\mu}^{\rho}=\left(\begin{array}{cccc}1&0&0&0\\0&1&0&0\\0&0&1&0\\ 0&0&0&1\end{array}\right)$.

As a demonstration of what our notation can do, we recall the formula for the invariant spacetime interval:

$\displaystyle (ds)^2=-(cdt)^2+(dx)^2+(dy)^2+(dz)^2$

Using tensor notation combined with the Einstein summation convention, this can be written simply as

$\displaystyle (ds)^2=\eta_{\mu\nu}dx^{\mu}dx^{\nu}$.

The Christoffel Symbol and the Covariant Derivative

We now come back to the Christoffel symbol $\Gamma^{\mu}_{\nu\lambda}$. The idea behind the Christoffel symbol is that it is used to define the covariant derivative $\nabla_{\nu}V^{\mu}$ of a vector $V^{\mu}$.

The covariant derivative is a very important concept in differential geometry (and not just in Riemannian geometry). When we take derivatives, we are actually comparing two vectors. To further explain what we mean, we recall that individually the components of the vectors can be thought of as functions on the space, and we recall the expression for the derivative from An Intuitive Introduction to Calculus:

$\displaystyle \frac{df}{dx}=\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)}$ when $\epsilon$ is extremely small (essentially negligible)

More formally, we can write

$\displaystyle \frac{df}{dx}=\lim_{\epsilon\to 0}\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)}$.

Therefore, employing the language of partial derivatives, we could have written the following partial derivative of the $\mu$-th component of an $m$-dimensional vector $V^{\mu}$ on an $m$-dimensional space with respect to the coordinate $x^{\nu}$:

$\displaystyle \frac{\partial V^{\mu}}{\partial x^{\nu}}=\lim_{\Delta x^{\nu}\to 0}\frac{V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})-V^{\mu}(x_{1},...,x^{\nu},...,x^{m})}{(x^{\nu}+\Delta x^{\nu})-(x^{\nu})}$

The problem is that we are comparing vectors from different vector spaces. Recall from Vector Fields, Vector Bundles, and Fiber Bundles that we can think of a vector bundle as having a vector space for every point on the base space. The vector $V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})$ belongs to the vector space on the point $(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})$, while the vector $V^{\mu}(x_{1},...,x^{\nu},...,x^{m})$ belongs to the vector space on the point $(x_{1},...,x^{\nu},...,x^{m})$. To be able to compare the two vectors we need to “transport” one to the other in the “correct” way, by which we mean parallel transport. Now we have seen in Geometry on Curved Spaces that parallel transport can have weird effects on vectors, and these weird effects are what the Christoffel symbol expresses.

Let $\tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})$ denote the vector $V^{\mu}(x_{1},...,x^{\nu},...,x^{m})$ parallel transported from its original vector space on $(x_{1},...,x^{\nu},...,x^{m})$ to the vector space on $(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})$. The vector $\tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})$ is given by the following expression:

$\displaystyle \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})=V^{\mu}(x_{1},...,x^{\nu},...,x^{m})-V^{\lambda}(x_{1},...,x^{\nu},...,x^{m})\Gamma^{\mu}_{\nu\lambda}(x_{1},...,x^{\nu},...,x^{m})\Delta x^{\nu}$

Therefore the Christoffel symbol provides a “correction” for what happens when we parallel transport a vector from one point to another. This is an example of the concept of a connection, which, like the covariant derivative, is part of more general differential geometry beyond Riemannian geometry. The object that is to be parallel transported may not be a vector, for example when we have more general fiber bundles instead of vector bundles. However, in Riemannian geometry we will usually focus on vector bundles, in particular a special kind of vector bundle called the tangent bundle, which consists of the tangent vectors at a point.

Now there is more than one way to parallel transport a mathematical object, which means that there are many choices of a connection. However, in Riemannian geometry there is a special kind of connection that we will prefer. This is the connection that satisfies the following two properties:

$\displaystyle \Gamma^{\mu}_{\nu\lambda}=\Gamma^{\mu}_{\lambda\nu}$    (torsion-free)

$\displaystyle \nabla_{\rho}g_{\mu\nu}$    (metric compatibility)

The connection that satisfies these two properties is the one that can be obtained from the metric via the following formula:

$\displaystyle \Gamma^{\mu}_{\nu\lambda}=\frac{1}{2}g^{\mu\sigma}(\partial_{\lambda}g_{\mu\sigma}+\partial_{\mu}g_{\sigma\lambda}-\partial_{\sigma}g_{\lambda\mu})$.

The covariant derivative is then defined as

$\displaystyle \nabla_{\nu}V^{\mu}=\lim_{\Delta x^{\nu}\to 0}\frac{V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})-\tilde{V}^{\mu}(x_{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})}{(x^{\nu}+\Delta x^{\nu})-(x^{\nu})}$.

We are now comparing vectors belonging to the same vector space, and evaluating the expression above leads to the formula for the covariant derivative:

$\displaystyle \nabla_{\nu}V^{\mu}=\partial_{\nu}V^{\mu}+\Gamma^{\mu}_{\nu\lambda}V^{\lambda}$.

The Riemann Curvature Tensor

Next we consider the quantity known as the Riemann curvature tensor. It is once again related to parallel transport, in the following manner. Consider parallel transporting a vector $V^{\sigma}$ through an “infinitesimal” distance specified by another vector $A^{\mu}$, and after that, through another infinitesimal distance specified by a yet another vector $B^{\nu}$. Then we go parallel transport it again in the opposite direction to $A^{\mu}$, then finally in the opposite direction to $B^{\nu}$. The path forms a parallelogram, and when the vector $V^{\sigma}$ returns to its starting point it will then be changed by an amount $\delta V^{\rho}$. We can think of the Riemann curvature tensor as the quantity that relates all of these:

$\displaystyle \delta V^{\rho}=R^{\rho}_{\ \sigma\mu\nu}V^{\sigma}A^{\mu}B^{\nu}$.

Another way to put this is to consider taking the covariant derivative of the vector $V^{\rho}$ along the same path as described above. The Riemann curvature tensor is then related to this quantity as follows:

$\displaystyle \nabla_{\mu}\nabla_{\nu}V^{\rho}-\nabla_{\nu}\nabla_{\mu}V^{\rho}=R^{\rho}_{\ \sigma\mu\nu}V^{\sigma}$.

Expanding the left hand side, and using the torsion-free property of the Christoffel symbol, we will find that

$\displaystyle R^{\rho}_{\ \sigma\mu\nu}=\partial_{\mu}\Gamma^{\rho}_{\nu\sigma}-\partial_{\nu}\Gamma^{\rho}_{\mu\sigma}+\Gamma^{\rho}_{\mu\lambda}\Gamma^{\lambda}_{\nu\sigma}-\Gamma^{\rho}_{\nu\lambda}\Gamma^{\lambda}_{\mu\sigma}$.

For connections other than the torsion-free one that we chose, there will be another part of the expansion of the expression $\nabla_{\mu}\nabla_{\nu}-\nabla_{\nu}\nabla_{\mu}$ called the torsion tensor. For our case, however, we need not worry about it and we can focus on the Riemann curvature tensor.

There is another quantity that can be obtained from the Riemann curvature tensor called the Ricci tensor, denoted by $R_{\mu\nu}$. It is given by

$\displaystyle R_{\mu\nu}=R^{\lambda}_{\ \mu\lambda\nu}$.

Following the Einstein summation convention, we sum over the repeated index $\lambda$, and therefore the resulting quantity will have only two indices instead of four. This is an example of the operation on tensors called contraction. If we raise one index using the metric and contract again, we obtain a quantity called the Ricci scalar, denoted $R$:

$\displaystyle R=R^{\mu}_{\ \mu}$

Example: The $2$-Sphere

To provide an explicit example of the concepts discussed, we show their specific expressions for the case of a $2$-sphere. We will only give the final results here. The explicit computations can be found among the references, but the reader may gain some practice, especially on manipulating tensors, by performing the calculations and checking only the answers here. In any case, since the metric is given, it is only a matter of substituting the relevant quantities into the formulas already given above.

We have already given the expression for the metric of the $2$-sphere in Geometry on Curved Spaces. We recall that it in matrix form, it is given by (we change our notation for the radius of the $2$-sphere to $R_{0}$ to avoid confusion with the symbol for the Ricci scalar)

$\displaystyle g_{mn}= \left(\begin{array}{cc}R_{0}^{2}&0\\ 0&R_{0}^{2}\text{sin}(\theta)^{2}\end{array}\right)$

Individually, the components are (we will use $\theta$ and $\varphi$ instead of the numbers $1$ and $2$ for the indices)

$\displaystyle g_{\theta\theta}=R_{0}^{2}$

$\displaystyle g_{\varphi\varphi}=R_{0}^{2}(\text{sin}(\theta))^{2}$

The other components ($g_{\theta\varphi}$ and $g_{\varphi\theta}$) are all equal to zero.

The Christoffel symbols are therefore given by

$\displaystyle \Gamma^{\theta}_{\varphi\varphi}=-\text{sin}(\theta)\text{cos}(\theta)$

$\displaystyle \Gamma^{\varphi}_{\theta\varphi}=\text{cot}(\theta)$

$\displaystyle \Gamma^{\varphi}_{\varphi\theta}=\text{cot}(\theta)$

The other components ($\Gamma^{\theta}_{\theta\theta}$, $\Gamma^{\theta}_{\theta\varphi}$, $\Gamma^{\theta}_{\varphi\theta}$, $\Gamma^{\varphi}_{\theta\theta}$, and $\Gamma^{\varphi}_{\varphi\varphi}$) are all equal to zero.

The components of the Riemann curvature tensor are given by

$\displaystyle R^{\theta}_{\ \varphi\theta\varphi}=(\text{sin}(\theta))^{2}$

$\displaystyle R^{\theta}_{\ \varphi\varphi\theta}=-(\text{sin}(\theta))^{2}$

$\displaystyle R^{\varphi}_{\ \theta\theta\varphi}=-1$

$\displaystyle R^{\varphi}_{\ \theta\varphi\theta}=1$

The other components (there are still twelve of them, so I won’t bother writing all their symbols down here anymore) are all equal to zero.

The components of the Ricci tensor is

$\displaystyle R_{\theta\theta}=1$

$\displaystyle R_{\varphi\varphi}=(\text{sin}(\theta))^{2}$

The other components ($R_{\theta\varphi}$ and $R_{\varphi\theta}$) are all equal to zero.

Finally, the Ricci scalar is

$\displaystyle R=\frac{2}{R_{0}^{2}}$

We note that the larger the radius of the $2$-sphere, the smaller the curvature. We can see this intuitively, for example, when it comes to the surface of our planet, which appears flat because the radius is so large. If our planet was much smaller, this would not be the case.

Bonus: The Einstein Field Equations of General Relativity

Given what we have discussed in this post, we can now write down here the expression for the Einstein field equations (also known simply as Einstein’s equations) of general relativity. It is given in terms of the Ricci tensor and the metric (of spacetime) via the following equation:

$\displaystyle R_{\mu\nu}-\frac{1}{2}Rg_{\mu\nu}+\Lambda g_{\mu\nu}=\frac{8\pi}{c^{4}} GT_{\mu\nu}$

where $G$ is the gravitational constant, the same constant that appears in Newton’s law of universal gravitation (which is approximated by Einstein’s equations at certain limiting conditions), $c$ is the speed of light in a vacuum, and $T_{\mu\nu}$ is the energy-momentum tensor (also known as the stress-energy tensor), which gives the “density” of energy and momentum, as well as certain other related concepts, such as the pressure and shear stress. The symbol $\Lambda$ refers to what is known as the cosmological constant, which was not there in Einstein’s original formulation but later added to support his view of an unchanging universe. Later, with the dawn of George Lemaitre’s theory of an expanding universe, later known as the Big Bang theory, the cosmological constant was abandoned. More recently, the universe was found to not only be expanding, but expanding at an accelerating rate, necessitating the return of the cosmological constant, with an interpretation in terms of the “vacuum energy”, also known as “dark energy”. Today the nature of the cosmological constant remains one of the great mysteries of modern physics.

Bonus: Connection and Curvature in Quantum Electrodynamics

The concepts of connection and curvature also appear in quantum field theory, in particular quantum electrodynamics (see Some Basics of (Quantum) Electrodynamics). It is the underlying concept in gauge theory, of which quantum electrodynamics is probably the simplest example. However, it is an example of differential geometry which does not make use of the metric. We consider a fiber bundle, where the base space is flat spacetime (also known as Minkowski spacetime), and the fiber is $\text{U}(1)$, which is the group formed by the complex numbers with magnitude equal to $1$, with law of composition given by multiplication (we can also think of this as a circle).

We want the group $\text{U}(1)$ to act on the wave function (or field operator) $\psi(x)$, so that the wave function has a “phase”, i.e. we have $e^{i\phi(x)}\psi(x)$, where $e^{i\phi(x)}$ is a complex number which depends on the location $x$ in spacetime. Note that therefore different values of the wave function at different points in spacetime will have different values of the “phase”. In order to compare, them, we need a connection and a covariant derivative.

The connection we want is given by

$\displaystyle i\frac{q}{\hbar c}A_{\mu}$

where $q$ is the charge of the electron, $\hbar$ is the normalized Planck’s constant, $c$ is the speed of light in a vacuum, and $A_{\mu}$ is the four-potential of electrodynamics.

The covariant derivative (here written using the symbol $D_{\mu}$)is

$\displaystyle D_{\mu}\psi(x)=\partial_{\mu}\psi(x)+i\frac{q}{\hbar c}A_{\mu}\psi(x)$

We will also have a concept analogous to the Riemann curvature tensor, called the field strength tensor, denoted $F_{\mu\nu}$. Of course, our “curvature” in this case is not the literal curvature of spacetime, as we have already specified that our spacetime is flat, but an abstract notion of “curvature” that specifies how the phase of our wavefunction changes as we move around the spacetime. This field strength tensor is given by the following expression:

$F_{\mu\nu}=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}$

This may be compared to the expression for the Riemann curvature tensor, where the connection is given by the Christoffel symbols. The first two terms of both expressions are very similar. The difference is that the expression for the Riemann curvature tensor has some extra terms that the expression for the field strength tensor does not have. However, a generalization of this procedure for quantum electrodynamics to groups other than $\text{U}(1)$, called Yang-Mills theory, does feature extra terms in the expression for the field strength tensor that perhaps makes the two more similar.

The concepts we have discussed here can be used to derive the theory of quantum electrodynamics simply from requiring that the Lagrangian (from which we can obtain the equations of motion, see also Lagrangians and Hamiltonians) be invariant under $\text{U}(1)$ transformations, i.e. even if we change the “phase” of the wave function at every point the Lagrangian remains the same. This is an example of what is known as gauge symmetry. Generalized to other groups such as $\text{SU}(2)$ and $\text{SU}(3)$, this is the idea behind gauge theories, which include Yang-Mills theory and leads to the standard model of particle physics.

References:

Christoffel Symbols on Wikipedia

Riemannian Curvature Tensor on Wikipedia

Einstein Field Equations on Wikipedia

Gauge Theory on Wikipedia

Riemann Tensor for Surface of a Sphere on Physics Pages

Ricci Tensor and Curvature Scalar for a Sphere on Physics Pages

Spacetime and Geometry by Sean Carroll

Geometry, Topology, and Physics by Mikio Nakahara

Introduction to Elementary Particle Physics by David J. Griffiths

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder

Geometry on Curved Spaces

Differential geometry is the branch of mathematics used by Albert Einstein when he formulated the general theory of relativity, where gravity is the curvature of spacetime. It was originally invented by Carl Friedrich Gauss to study the curvature of hills and valleys in the Kingdom of Hanover.

From what I described, one may guess that differential geometry has something to do with curvature. The geometry we learn in high school only occurs on a flat surface. There we can put coordinates $x$ and $y$ and compute distances, angles, areas, and so on.

To imagine what geometry on curved spaces looks like, imagine a globe. Instead of $x$ and $y$ coordinates, we can use latitude and longitude. One can now see just how different geometry is on this globe. Vertical lines (the lines of constant $x$) on a flat surface are always the same distance apart. On a globe, the analogues of these vertical lines, the lines of constant longitude, are closer near the poles than they are near the equator.

Other weird things happen on our globe: One can have triangles with angles that sum to more than 180 degrees. Run two perpendicular line segments from the north pole to the equator. They will meet the equator at a right angle and form a triangle with three right angles for a total of 270 degrees. Also on the globe the ratio between the circumference of a circle to its diameter might no longer be equal to the number $\pi$.

To make things more explicit, we will introduce the concept of a metric (the word “metric” refers to a variety of mathematical concepts related to notion of distance – in this post we use it in the sense of differential geometry to refer to what is also called the metric tensor). The metric is an example of a mathematical object called a tensor, which we will not discuss much of in this post. Instead, we will think of the metric as expressing a kind of “distance formula” for our space, which may be curved. The part of differential geometry that makes use of the metric is called Riemannian geometry, named after the mathematician Bernhard Riemann, a student of Gauss who extended his results on curved spaces to higher dimensions.

We recall from From Pythagoras to Einstein several important versions of the “distance formula”, from the case of 2D space, to the case of 4D spacetime. We will focus on the simple case of 2D space in this post, since it is much easier to visualize; in fact, we have already given an example of a 2D space earlier, the globe, which we shall henceforth technically refer to as the $2$-sphere. As we have learned in From Pythagoras to Einstein, a knowledge of the most simple cases can go very far toward the understanding of more complicated ones.

We will make a little change in our notation so as to stay consistent with the literature. Instead of the latitude, we will make use of the colatitude, written using the symbol $\theta$, and defined as the complementary angle to the latitude, i.e. the colatitude is 90 degrees minus the latitude. We will keep using the longitude, and we write it using the symbol $\varphi$. Note that even though we colloquially express our angles in degrees, for calculations we will always use radians, as is usual practice in mathematics and physics.

On a flat 2D space, the distance formula is given by

$\displaystyle (\Delta x)^{2}+(\Delta y)^{2}=(\Delta s)^{2}$.

It will be productive for us to work with extremely small quantities for now; from them we can obtain larger quantities later on using the language of calculus (see An Intuitive Introduction to Calculus). Adopting the notation of this language, we write

$\displaystyle (dx)^{2}+(dy)^{2}=(ds)^{2}$

We now give the distance formula for a $2$-sphere:

$\displaystyle R^{2}(d\theta)^{2}+R^{2}\text{sin}(\theta)^{2}(d\varphi)^{2}=(ds)^{2}$

where $R$ is the radius of the $2$-sphere. This formula agrees with our intuition; the same difference in latitude and longitude result in a bigger distance for a bigger $2$-sphere than for a smaller one, and the same difference in longitude results in a bigger distance for points near the equator than for points near the poles.

The idea behind the concept of the metric is that it gives how the distance formula changes depending on the coordinates. It is often written as a matrix (see Matrices) whose entries are the “coefficients” of the distance formula. Hence, for a flat 2D space it is given by

$\displaystyle \left(\begin{array}{cc}1&0\\ 0&1\end{array}\right)$

while for a $2$-sphere it is given by

$\displaystyle \left(\begin{array}{cc}R^{2}&0\\ 0&R^{2}\text{sin}(\theta)^{2}\end{array}\right)$.

We have seen that the metric can express how a space is curved. There are several other quantities related to the metric (and which can be derived from it), such as the Christoffel symbol and the Riemann curvature tensor, which express ideas related to curvature – however, unlike the metric which expresses curvature in terms of the distance formula, the Christoffel symbol and the Riemann curvature tensor express curvature in terms of how vectors (see Vector Fields, Vector Bundles, and Fiber Bundles) change as they move around the space.

The main equations of Einstein’s general theory of relativity, called the Einstein equations, relate the Riemann curvature tensor of 4D spacetime to the distribution of mass (or, more properly, the distribution of energy and momentum), expressed via the so-called energy-momentum tensor (also known as the stress-energy tensor).

The application of differential geometry is not limited to general relativity of course, and its objects of study are not limited to the metric. For example, in particle physics, gauge theories such as electrodynamics (see Some Basics of (Quantum) Electrodynamics) use the language of differential geometry to express forces like the electromagnetic force as a kind of “curvature”, even though a metric is not used to express this more “abstract” kind of curvature. Instead, a generalization of the concept of “parallel transport” is used. Parallel transport is the idea behind objects like the Christoffel symbol and the Riemann curvature tensor – it studies how vectors change as they move around the space. To generalize this, we replace vector bundles by more general fiber bundles (see Vector Fields, Vector Bundles, and Fiber Bundles).

To give a rough idea of parallel transport, we give a simple example again in 2D space – this 2D space will be the surface of our planet. Now space itself is 3D (with time it forms a 4D spacetime). But we will ignore the up/down dimension for now and focus only on the north/south and east/west dimensions. In other words, we will imagine ourselves as 2D beings, like the characters in the novel Flatland by Edwin Abbott. The discussion below will not make references to the third up/down dimension.

Imagine that you are somewhere at the Equator, holding a spear straight in front of you, facing north. Now imagine you take a step forward with this spear. The spear will therefore remain parallel to its previous direction. You take another step, and another, walking forward (ignoring obstacles and bodies of water) until you reach the North Pole. Now at the North Pole, without turning, you take a step to the right. The spear is still parallel to its previous direction, because you did not turn. You just keep stepping to the right until you reach the Equator again. You are not at your previous location of course. To go back you need to walk backwards, which once again keeps the spear parallel to its previous direction.

When you finally come back to your starting location, you will find that you are not facing the same direction as when you first started. In fact, you (and the spear) will be facing the east, which is offset by 90 degrees clockwise from the direction you were facing at the beginning, despite the fact that you were keeping the spear parallel all the time.

This would not have happened on a flat space; this “turning” is an indicator that the space (the surface of our planet) is curved. The amount of turning depends, among other things, on the curvature of the space. Hence the idea of parallel transport gives us a way to actually measure this curvature. It is this idea, generalized to mathematical objects other than vectors, which leads to the abstract notion of curvature – it is a measure of the changes that occur in certain mathematical objects when you move around a space in a certain way, which would not have happened if you were on a flat space.

In closing, I would like to note that although differential geometry is probably most famous for its applications in physics (another interesting application in physics, by the way, is the so-called Berry’s phase in quantum mechanics), it is by no means limited to these applications alone, as already reflected in its historical origins, which barely have anything to do with physics. It has even found applications in number theory, via Arakelov theory. Still, it has an especially important role in physics, with much of modern physics written in its language, and many prospects for future theories depending on it. Whether in pure mathematics or theoretical physics, it is one of the most fruitful and active fields of research in modern times.

Bonus:

Since we have restricted ourselves to 2D spaces in this post, here is an example of a metric in 4D spacetime – this is the Schwarzschild metric, which describes the curved spacetime around objects like stars or black holes (it makes use of spherical polar coordinates):

$\displaystyle \left(\begin{array}{cccc}-(1-\frac{2GM}{rc^{2}})&0&0&0\\0&(1-\frac{2GM}{rc^{2}})^{-1}&0&0\\0&0&r^{2}&0\\ 0&0&0&r^{2}\text{sin}(\theta)^{2}\end{array}\right)$

In other words, the “infinitesimal distance formula” for this curved spacetime is given by

$\displaystyle -(1-\frac{2GM}{rc^{2}})(d(ct))^{2}+(1-\frac{2GM}{rc^{2}})^{-1}(dr)^{2}+r^{2}(d\theta)^{2}+r^{2}\text{sin}(\theta)^{2}(d\varphi)^{2}=(ds)^{2}$

where $G$ is the gravitational constant and $M$ is the mass. Note also that as a matter of convention the time coordinate is “scaled” by the constant $c$ (the speed of light in a vacuum).

References:

Differential Geometry on Wikipedia

Riemannian Geometry on Wikipedia

Metric Tensor on Wikipedia

Parallel Transport on Wikipedia

Differential Geometry of Curves and Surfaces by Manfredo P. do Carmo

Geometry, Topology, and Physics by Mikio Nakahara

Some Basics of (Quantum) Electrodynamics

There are only four fundamental forces as far as we know, and every force that we know of can ultimately be considered as manifestations of these four. These four are electromagnetism, the weak nuclear force, the strong nuclear force, and gravity. Among them, the one we are most familiar with is electromagnetism, both in terms of our everyday experience (where it is somewhat on par with gravity) and in terms of our physical theories (where our understanding of electrodynamics is far ahead of our understanding of the other three forces, including, and especially, gravity).

Electromagnetism is dominant in everyday life because the weak and strong nuclear forces have a very short range, and because gravity is very weak. Now gravity doesn’t seem weak at all, especially if we have experienced falling on our face at some point in our lives. But that’s only because the “source” of this gravity, our planet, is very large. But imagine a small pocket-sized magnet lifting, say an iron nail, against the force exerted by the Earth’s gravity. This shows how much stronger the electromagnetic force is compared to gravity. Maybe we should be thankful that gravity is not on the same level of strength, or falling on our face would be so much more painful.

It is important to note also, that atoms, which make up everyday matter, are themselves made up of charged particles – electrons and protons (there are also neutrons, which are uncharged). Electromagnetism therefore plays an important part, not only in keeping the “parts” of an atom together, but also in “joining” different atoms together to form molecules, and other larger structures like crystals. It might be gravity that keeps our planet together, but for less massive objects like a house, or a car, or a human body, it is electromagnetism that keeps them from falling apart.

Aside from electromagnetism being the one fundamental force we are most familiar with, another reason to study it is that it is the “template” for our understanding of the rest of the fundamental forces, including gravity. In Einstein’s general theory of relativity, gravity is the curvature of spacetime; it appears that this gives it a nature different from the other fundamental forces. But even then, the expression for this curvature, in terms of the Riemann curvature tensor, is very similar in form to the equation for the electromagnetic fields in terms of the field strength tensor.

The electromagnetic fields, which we shall divide into the electric field and the magnetic field, are vector fields (see Vector Fields, Vector Bundles, and Fiber Bundles), which means that they have a value (both magnitude and direction) at every point in space. A charged particle in an electric or magnetic field (or both) will experience a force according to the Lorentz force law:

$\displaystyle F_{x}=q(E_{x}+v_{y}B_{z}-v_{z}B_{y})$

$\displaystyle F_{y}=q(E_{y}+v_{z}B_{x}-v_{x}B_{z})$

$\displaystyle F_{z}=q(E_{z}+v_{x}B_{y}-v_{y}B_{x})$

where $F_{x}$, $F_{y}$, and $F_{z}$ are the three components of the force, in the $x$, $y$, and $z$ direction, respectively; $E_{x}$, $E_{y}$, and $E_{z}$ are the three components of the electric field;  $B_{x}$, $B_{y}$, $B_{z}$ are the three components of the magnetic field; $v_{x}$, $v_{y}$, $v_{z}$ are the three components of the velocity of the particle, and $q$ is its charge. Newton’s second law (see My Favorite Equation in Physics) gives us the motion of an object given the force acting on it (and its mass), so together with the Lorentz force law, we can determine the motion of charged particles in electric and magnetic fields.

The Lorentz force law is extremely important in electrodynamics and we will keep the following point in mind throughout this discussion:

The Lorentz force law tells us how charges move under the influence of electric and magnetic fields.

Instead of discussing electrodynamics in terms of these fields, however, we will instead focus on the electric and magnetic potentials, which together form what is called the four-potential and are related to the fields in terms of the following equations:

$\displaystyle E_{x}=-\frac{1}{c}\frac{\partial A_{x}}{\partial t}-\frac{\partial A_{t}}{\partial x}$

$\displaystyle E_{y}=-\frac{1}{c}\frac{\partial A_{y}}{\partial t}-\frac{\partial A_{t}}{\partial y}$

$\displaystyle E_{z}=-\frac{1}{c}\frac{\partial A_{z}}{\partial t}-\frac{\partial A_{t}}{\partial z}$

$\displaystyle B_{x}=\frac{\partial A_{z}}{\partial y}-\frac{\partial A_{y}}{\partial z}$

$\displaystyle B_{y}=\frac{\partial A_{x}}{\partial z}-\frac{\partial A_{z}}{\partial x}$

$\displaystyle B_{z}=\frac{\partial A_{y}}{\partial x}-\frac{\partial A_{x}}{\partial y}$

The values of the potentials, as functions of space and time, are related to the distribution of charges and currents by the very famous set of equations called Maxwell’s equations:

$\displaystyle -\frac{\partial^{2} A_{t}}{\partial x^{2}}-\frac{\partial^{2} A_{t}}{\partial y^{2}}-\frac{\partial^{2} A_{t}}{\partial z^{2}}-\frac{\partial^{2} A_{x}}{\partial t\partial x}-\frac{\partial^{2} A_{y}}{\partial t\partial y}-\frac{\partial^{2} A_{z}}{\partial t\partial z}=\frac{4\pi}{c}J_{t}$

$\displaystyle \frac{1}{c^{2}}\frac{\partial^{2} A_{x}}{\partial t^{2}}-\frac{\partial^{2} A_{x}}{\partial y^{2}}-\frac{\partial^{2} A_{x}}{\partial z^{2}}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial x\partial t}+\frac{\partial^{2} A_{y}}{\partial x\partial y}+\frac{\partial^{2} A_{z}}{\partial x\partial z}=\frac{4\pi}{c}J_{x}$

$\displaystyle -\frac{\partial^{2} A_{y}}{\partial x^{2}}+\frac{1}{c^{2}}\frac{\partial^{2} A_{y}}{\partial t^{2}}-\frac{\partial^{2} A_{y}}{\partial z^{2}}+\frac{\partial^{2} A_{x}}{\partial y\partial x}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial t\partial y}+\frac{\partial^{2} A_{z}}{\partial y\partial z}=\frac{4\pi}{c}J_{y}$

$\displaystyle -\frac{\partial^{2} A_{z}}{\partial x^{2}}-\frac{\partial^{2} A_{z}}{\partial y^{2}}+\frac{1}{c^{2}}\frac{\partial^{2} A_{z}}{\partial t^{2}}+\frac{\partial^{2} A_{x}}{\partial z\partial x}+\frac{\partial^{2} A_{y}}{\partial z\partial y}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial z\partial t}=\frac{4\pi}{c}J_{z}$

Some readers may be more familiar with Maxwell’s equations written in terms of the electric and magnetic fields; in that case, they have individual names: Gauss’ lawGauss’ law for magnetismFaraday’s law, and Ampere’s law (with Maxwell’s addition). When written down in terms of the fields, they can offer more physical intuition – for instance, Gauss’ law for magnetism tells us that the magnetic field has no “divergence”, and is always “solenoidal”. However, we leave this approach to the references for the moment, and focus on the potentials, which will be more useful for us when we relate our discussion to quantum mechanics later on. We will, however, always remind ourselves of the following important point:

Maxwell’s laws tells us the configuration and evolution of the electric and magnetic fields (possibly via the potentials) under the influence of sources (charge and current distributions).

There is one catch (an extremely interesting one) that comes about when dealing with potentials instead of fields. It is called gauge freedom, and is one of the foundations of modern particle physics. However, we will not discuss it in this post. Our equations will remain correct, so the reader need not worry; gauge freedom is not a constraint, but is instead a kind of “symmetry” that will have some very interesting consequences. This concept is left to the references for now, however it is hoped that it will at some time be discussed in this blog.

The way we have wrote down Maxwell’s equations is rather messy. However, we can introduce some notation to write them in a more elegant form. We use what is known as tensor notation; however we will not discuss the concept of tensors in full here. We will just note that because the formula for the spacetime interval contains a sign different from the others, we need two different types of indices for our vectors. The so-called contravariant vectors will be indexed by a superscript, while the so-called covariant vectors will be indexed by a subscript. “Raising” and “lowering” these indices will involve a change in sign for some quantities; we will indicate them explicitly here.

Let $x^{0}=ct$$x^{1}=x$$x^{2}=y$$x^{3}=z$. Then we will adopt the following notation:

$\displaystyle \partial_{\mu}=\frac{\partial}{\partial x^{\mu}}$

$\displaystyle \partial^{\mu}=\frac{\partial}{\partial x^{\mu}}$ for $\mu=0$

$\displaystyle \partial^{\mu}=-\frac{\partial}{\partial x^{\mu}}$ for $\mu\neq 0$

Let $A^{0}=A_{t}$$A^{1}=A_{x}$$A^{2}=A_{y}$$A^{3}=A_{z}$. Then Maxwell’s equations can be written as

$\displaystyle \sum_{\mu=0}^{3}\partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=\frac{4\pi}{c}J^{\nu}$.

We now introduce the so-called Einstein summation convention. Note that the summation is performed over the index that is repeated; and that that one of these indices is a superscript and the other is a subscript. Albert Einstein noticed that almost all summations in his calculations happen in this way, so he adopted the convention that instead of explicitly writing out the summation sign, repeated indices (one superscript and one subscript) would instead indicate that a summation should be performed. Like most modern references, we adopt this notation, and only explicitly say so when there is an exception. This allows us to write Maxwell’s equations as

$\displaystyle \partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=\frac{4\pi}{c}J^{\nu}$.

We can also use the Einstein summation convention to rewrite other important expressions in physics in more compact form. In particular, it allows us to rewrite the Dirac equation (see Some Basics of Relativistic Quantum Field Theory) as follows:

$\displaystyle i\hbar\gamma^{\mu}\partial_{\mu}\psi-mc\psi=0$

We now go to the quantum realm and discuss the equations of motion of quantum electrodynamics. Let $A_{0}=A_{t}$$A_{1}=-A_{x}$$A_{2}=-A_{y}$$A_{3}=-A_{z}$. These equations are given by

$\displaystyle \displaystyle i\hbar\gamma^{\mu}\partial_{\mu}\psi-mc\psi=\frac{q}{c}A_{\mu}\psi$

$\displaystyle \partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=4\pi q\bar{\psi}\gamma^{\mu}\psi$

What do these two equations mean?

The first equation looks like the Dirac equation, except that on the right hand side we have a term with both the “potential” (which we now call the Maxwell field, or the Maxwell field operator), the Dirac “wave function” for a particle such as an electron (which, as we have discussed in Some Basics of Relativistic Quantum Field Theory, is actually the Dirac field operator which operates on the “vacuum” state to describe a state with a single electron), as well as the charge. It describes the “motion” of the Dirac field under the influence of the Maxwell field. Hence, this is the quantum mechanical version of the Lorentz force law.

The second equation is none other than our shorthand version of Maxwell equations, and on the right hand side is an explicit expression for the current in terms of the Dirac field and some constants. The symbol $\bar{\psi}$ refers to the “adjoint” of the Dirac field; actually the Dirac field itself has components, although, because of the way it transforms under rotations, we usually do not refer to it as a vector. Hence it can be written as a column matrix (see Matrices), and has a “transpose” which is a row matrix; the “adjoint” is given by the “conjugate transpose” which is a row matrix where all the entries are the complex conjugates of the transpose of the Dirac field.

In general relativity there is this quote, from the physicist John Archibald Wheeler: “Spacetime tells matter how to move; matter tells spacetime how to curve”. One can perhaps think of electrodynamics, whether classical or quantum, in a similar way. Fields tell charges and currents how to move, charges and currents tell fields how they are supposed to be “shaped”. And this is succinctly summarized by the Lorentz force law and Maxwell’s equations, again whether in its classical or quantum version.

As we have seen in Lagrangians and Hamiltonians, the equations of motion are not the only way we can express a physical theory. We can also use the language of Lagrangians and Hamiltonians. In particular, an important quantity in quantum mechanics that involves the Lagrangian and Hamiltonian is the probability amplitude. In order to calculate the probability amplitude, the physicist Richard Feynman developed a method involving the now famous Feynman diagrams, which can be though of as expanding the exponential function (see “The Most Important Function in Mathematics”) in the expression for the probability amplitude and expressing the different terms using diagrams. Just as we have associated the Dirac field to electrons, the Maxwell field is similarly associated to photons. Expressions involving the Dirac field and the Maxwell field can be thought of as electrons “emitting” or “absorbing” photons, or electrons and positrons (the antimatter counterpart of electrons) annihilating each other and creating a photon. The calculated probability amplitudes can then be used to obtain quantities that can be compared to results obtained from experiment, in order to verify the theory.

References:

Lorentz Force on Wikipedia

Electromagnetic Four-Potential on Wikipedia

Maxwell’s Equations on Wikipedia

Quantum Electrodynamics on Wikipedia

Featured Image Produced by CERN

The Douglas Robb Memorial Lectures by Richard Feynman

QED: The Strange Theory of Light and Matter by Richard Feynman

Introduction to Electrodynamics by David J. Griffiths

Introduction to Elementary Particle Physics by David J. Griffiths

Quantum Field Theory by Fritz Mandl and Graham Shaw

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder

Some Basics of Relativistic Quantum Field Theory

So far, on this blog, we have introduced the two great pillars of modern physics, relativity (see From Pythagoras to Einstein) and quantum mechanics (see Some Basics of Quantum Mechanics and More Quantum Mechanics: Wavefunctions and Operators). Although a complete unification between these two pillars is yet to be achieved, there already exists such a unified theory in the special case when gravity is weak, i.e. spacetime is flat. This unification of relativity (in this case special relativity) and quantum mechanics is called relativistic quantum field theory, and we discuss the basic concepts of it in this post.

In From Pythagoras to Einstein, we introduced the formula at the heart of Einstein’s theory of relativity. It is very important to modern physics and is worth writing here again:

$\displaystyle -(c\Delta t)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2$

This holds only for flat spacetime, however, even in general relativity, where spacetime may be curved, a “local” version still holds:

$\displaystyle -(cdt)^2+(dx)^2+(dy)^2+(dz)^2=(ds)^2$

The notation comes from calculus (see An Intuitive Introduction to Calculus), and means that this equation holds when the quantities involved are very small.

In this post, however, we shall consider a flat spacetime only. Aside from being “locally” true, as far as we know, in regions where the gravity is not very strong (like on our planet), spacetime is pretty much actually flat.

We recall how we obtained the important equation above; we made an analogy with the distance between two objects in 3D space, and noted how this distance does not change with translation and rotation; if we are using different coordinate systems, we may disagree about the coordinates of the two objects, but even then we will always agree on the distance between them. This distance is therefore “invariant”. But we live not only in a 3D space but in a 4D spacetime, and instead of an invariant distance we have an invariant spacetime interval.

But even in nonrelativistic mechanics, the distance is not the only “invariant”. We have the concept of velocity of an object. Again, if we are positioned and oriented differently in space, we may disagree about the velocity of the object, for me it may be going to the right, and forward away from me; for you it may in front of you and going straight towards you. However, we will always agree about the magnitude of this velocity, also called its speed.

The quantity we call the momentum is related to the velocity of the object; in fact for simple cases it is simply the mass of the object multiplied by the velocity. Once again, two observers may disagree about the momentum, since it involves direction; however they will always agree about the magnitude of the momentum. This magnitude is therefore also invariant.

The velocity, and by extension the momentum, has three components, one for each dimension of space. We write them as $v_{x}$, $v_{y}$, and $v_{z}$ for the velocity and $p_{x}$, $p_{y}$, and $p_{z}$ for the momentum.

What we want now is a 4D version of the momentum. Three of its components will be the components we already know of, $p_{x}$, $p_{y}$, and $p_{z}$. So we just need its “time” component, and the “magnitude” of this momentum is going to be an invariant.

It turns out that the equation we are looking for is the following (note the similarity of its form to the equation for the spacetime interval):

$\displaystyle -\frac{E^{2}}{c^{2}}+p_{x}^{2}+p_{y}^{2}+p_{z}^{2}=-m^{2}c^{2}$

The quantity $m$ is the invariant we are looking for (The factors of $c$ are just constants anyway), and it is called the “rest mass” of the object. As an effect of the unity of spacetime, the mass of an object as seen by an observer actually changes depending on its motion with respect to the observer; however, by definition, the rest mass is the mass of an object as seen by the observer when it is not moving with respect to the observer, therefore, it is an invariant. The quantity $E$ stands for the energy.

Also, when the object is not moving with respect to us, we see no momentum in the $x$, $y$, or $z$ direction, and the equation becomes $E=mc^{2}$, which is the very famous mass-energy equivalence which was published by Albert Einstein during his “miracle year” in 1905.

We now move on to quantum mechanics. In quantum mechanics our observables, such as the position, momentum, and energy, correspond to self-adjoint operators (see More Quantum Mechanics: Wavefunctions and Operators), whose eigenvalues are the values that we obtain when we perform a measurement of the observable corresponding to the operator.

The “momentum operator” (to avoid confusion between ordinary quantities and operators, we will introduce here the “hat” symbol on our operators) corresponding to the $x$ component of the momentum is given by

$\displaystyle \hat{p_{x}}=-i\hbar\frac{\partial}{\partial x}$

The eigenvalue equation means that when we measure the $x$ component of the momentum of a quantum system in the state represented by the wave function $\psi(x,y,z,t)$, which is an eigenvector of the momentum operator, then then the measurement will yield the value $p_{x}$, where $p_{x}$ is the eigenvalue correponding to $\psi(x,y,z,t)$ (see Eigenvalues and Eigenvectors), i.e.

$\displaystyle -i\hbar\frac{\partial \psi(x,y,z,t)}{\partial x}=p_{x}\psi(x,y,z,t)$

Analogues exist of course for the $y$ and $z$ components of the momentum.

Meanwhile, we also have an energy operator given by

$\displaystyle \hat{E}=i\hbar\frac{\partial}{\partial t}$

To obtain a quantum version of the important equation above relating the energy, momentum, and the mass, we need to replace the relevant quantities by the corresponding operators acting on the wave function. Therefore, from

$\displaystyle -\frac{E^{2}}{c^{2}}+p_{x}^{2}+p_{y}^{2}+p_{z}^{2}=-m^{2}c^{2}$

we obtain an equation in terms of operators

$\displaystyle -\frac{\hat{E}^{2}}{c^{2}}+\hat{p}_{x}^{2}+\hat{p}_{y}^{2}+\hat{p}_{z}^{2}=-m^{2}c^{2}$

or explicitly, with the wavefunction,

$\displaystyle \frac{\hbar^{2}}{c^{2}}\frac{\partial^{2}\psi}{\partial t^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial x^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial y^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial z^{2}}=-m^{2}c^{2}\psi$.

This equation is called the Klein-Gordon equation.

The Klein-Gordon equation is a second-order differential equation. It can be “factored” in order to obtain two first-order differential equations, both of which are called the Dirac equation.

We elaborate more on what we mean by “factoring”. Suppose we have a quantity which can be written as $a^{2}-b^{2}$. From basic high school algebra, we know that we can “factor” it as $(a+b)(a-b)$. Now suppose we have $p_{x}=p_{y}=p_{z}=0$. We can then write the Klein-Gordon equation as

$\frac{E^{2}}{c^{2}}-m^{2}c^{2}=0$

which factors into

$(\frac{E}{c}-mc)(\frac{E}{c}+mc)=0$

or

$\frac{E}{c}-mc=0$

$\frac{E}{c}+mc=0$

These are the kinds of equations that we want. However, the case where the momentum is nonzero complicates things. The solution of the physicist Paul Dirac was to introduce matrices (see Matrices) as coefficients. These matrices (there are four of them) are $4\times 4$ matrices with complex coefficients, and are explicitly written down as follows:

$\displaystyle \gamma^{0}=\left(\begin{array}{cccc}1&0&0&0\\ 0&1&0&0\\0&0&-1&0\\0&0&0&-1\end{array}\right)$

$\displaystyle \gamma^{1}=\left(\begin{array}{cccc}0&0&0&1\\ 0&0&1&0\\0&-1&0&0\\-1&0&0&0\end{array}\right)$

$\displaystyle \gamma^{2}=\left(\begin{array}{cccc}0&0&0&-i\\ 0&0&i&0\\0&i&0&0\\-i&0&0&0\end{array}\right)$

$\displaystyle \gamma^{3}=\left(\begin{array}{cccc}0&0&1&0\\ 0&0&0&-1\\-1&0&0&0\\0&1&0&0\end{array}\right)$.

Using the laws of matrix multiplication, one can verify the following properties of these matrices (usually called gamma matrices):

$(\gamma^{0})^{2}=1$

$(\gamma^{1})^{2}=(\gamma^{2})^{2}=(\gamma^{3})^{2}=-1$

$\gamma^{\mu}\gamma^{\nu}=-\gamma^{\mu}\gamma^{\nu}$ for $\mu\neq\nu$.

With the help of these properties, we can now factor the Klein-Gordon equation as follows:

$\displaystyle \frac{\hat{E}^{2}}{c^{2}}-\hat{p}_{x}^{2}-\hat{p}_{y}^{2}-\hat{p}_{z}^{2}-m^{2}c^{2}=0$

$\displaystyle (\gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}+mc)(\gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}-mc)=0$

$\displaystyle \gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}+mc=0$

$\displaystyle \gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}-mc=0$

Both of the last two equations are known as the Dirac equation, although for purposes of convention, we usually use the last one. Writing the operators and the wave function explicitly, this is

$\displaystyle i\hbar\gamma^{0}\frac{\partial\psi}{c\partial t}+i\hbar\gamma^{1}\frac{\partial\psi}{\partial x}+i\hbar\gamma^{2}\frac{\partial\psi}{\partial y}+i\hbar\gamma^{3}\frac{\partial\psi}{\partial z}-mc\psi=0$

We now have the Klein-Gordon equation and the Dirac equation, both of which are important in relativistic quantum field theory. In particular, the Klein-Gordon equation is used for “scalar” fields while the Dirac equation is used for “spinor” fields. This is related to how they “transform” under rotations (which, in relativity, includes “boosts” – rotations that involve both space and time). A detailed discussion of these concepts will be left to the references for now and will perhaps be tackled in future posts.

We will, however, mention one more important (and interesting) phenomenon in relativistic quantum mechanics. The equation $E=mc^{2}$ allows for the “creation” of particle-antiparticle pairs out of seemingly nothing! Even when there seems to be “not enough energy”, there exists an “energy-time uncertainty principle”, which allows such particle-antiparticle pairs to exist, even for only a very short time. This phenomenon of “creation” (and the related phenomenon of “annihilation”) means we cannot take the number of particles in our system to be fixed.

With this, we need to modify our language to be able to describe a system with varying numbers of particles. We will still use the language of linear algebra, but we will define our “states” differently. In earlier posts in the blog, where we only dealt with a single particle, the “state” of the particle simply gave us information about the position. In the relativistic case (and in other cases where there are varying numbers of particles – for instance, when the system “gain” or “loses” particles from the environment), the number (and kind) of particles need to be taken into account.

We will do this as follows. We first define a state with no particles, which we shall call the “vacuum”. We write it as $|0\rangle$. Recall that an operator is a function from state vectors to state vectors, hence, an operator acting on a state is another state. We now define a new kind of operator, called the “field” operator $\psi$, such that the state with a single particle of a certain type, which would have been given by  the wave function $\psi$ in the old language, is now described by the state vector $\psi|0\rangle$.

Important note: The symbol $\psi$ no longer refers to a state vector, but an operator! The state vector is $\psi|0\rangle$.

The Klein-Gordon and the Dirac equations still hold of course (otherwise we wouldn’t even have bothered to write them here). It is just important to take note that the symbol $\psi$ now refers to an operator and not a state vector. We might as well write it as $\hat{\psi}$, but this usually not done in the literature since we will not use $\psi$ for anything else other than to refer to the field operator. Further, if we have a state with several particles, we can write $\psi\phi...\theta|0\rangle$. This new language is called second quantization, which does not mean “quantize for a second time”, but rather a second version of quantization, since the first version did not have the means to deal with varying numbers of particles.

We have barely scratched the surface of relativistic quantum field theory in this post. Even though much has been made about the quest to unify quantum mechanics and general relativity, there is so much that also needs to be studied in relativistic quantum field theory, and still many questions that need to be answered. Still, relativistic quantum field theory has had many impressive successes – one striking example is the theoretical formulation of the so-called Higgs mechanism, and its experimental verification almost half a century later. The success of relativistic quantum field theory also gives us a guide on how to formulate new theories of physics in the same way that $F=ma$ guided the development of the very theories that eventually replaced it.

The reader is encouraged to supplement what little exposition has been provided in this post by reading the references. The books are listed in increasing order of sophistication, so it is perhaps best to read them in that order too, although The Road to Reality: A Complete Guide to the Laws of Reality by Roger Penrose is a high-level popular exposition and not a textbook, so it is perhaps best read in tandem with Introduction to Elementary Particles by David J. Griffiths, which is a textbook, although it does have special relativity and basic quantum mechanics as prerequisites. One may check the references listed in the blog posts discussing these respective subjects.

References:

Quantum Field Theory on Wikipedia

Klein-Gordon Equation on Wikipedia

Dirac Equation on Wikipedia

Second Quantization on Wikipedia

Featured Image Produced by CERN

The Road to Reality: A Complete Guide to the Laws of Reality by Roger Penrose

Introduction to Elementary Particles by David J. Griffiths

Quantum Field Theory by Fritz Mandl and Graham Shaw

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder

Lagrangians and Hamiltonians

We discussed the Lagrangian and Hamiltonian formulations of physics in My Favorite Equation in Physics, in our discussion of the historical development of classical physics right before the dawn of the revolutionary ideas of relativity and quantum mechanics at the turn of the 20th century. In this post we discuss them further, and more importantly, we provide some examples.

In order to discuss Lagrangians and Hamiltonians we first need to discuss the concept of energy. Energy is a rather abstract concept, but it can perhaps best be described as a certain conserved quantity – historically, this was how energy was thought of, and the motivation for its development under Rene Descartes and Gottfried Wilhelm Liebniz.

Consider for example, a stone at some height $h$ above the ground. From this we can compute a quantity called the potential energy (which we will symbolize by $V$), which is going to be, in our case, given by

$\displaystyle V=mgh$

where $m$ is the mass of the stone and $g$ is the acceleration due to gravity, which close to the surface of the earth can be considered a constant roughly equal to $9.81$ meters per second per second.

As the stone is dropped from that height, it starts to pick up speed. As it height decreases, its potential energy will also decrease. However, it will gain an increase in a certain quantity called the kinetic energy, which we will write as $T$ and define as

$\displaystyle T=\frac{1}{2}mv^{2}$

where $v$ is the magnitude of the velocity. In our case, since we are considering only motion in one dimension, this is simply given by the speed of the stone. At any point in the motion of the stone, however, the sum of the potential energy and the kinetic energy, called the total mechanical energy, stays at the same value. This is because as the amount by which the potential energy decreases is the same as the amount by which the kinetic energy decreases.

The expression for kinetic energy remains the same for any nonrelativistic system. The expression for the potential energy depends on the system, however, and is related to the force as follows:

$\displaystyle F=-\frac{dV}{dx}$.

We now give the definition of the quantity called the Lagrangian (denoted by $L$). It is simply given by

$\displaystyle L=T-V$.

There is a related quantity to the Lagrangian, called the action (denoted by $S$). It is defined as

$\displaystyle S=\int_{t_{1}}^{t_{2}}L dt$.

For a single particle, the Lagrangian depends on the position and the velocity of the particle. More generally, it will depend on the so-called “configuration” of the system, as well as the “rate of change” of this configuration. We will represent these variables by $q$ and $\dot{q}$ respectively (the “dot” notation is the one developed by Isaac Newton to represent the derivative with respect to time; in the notation of Liebniz, which we have used up to now, this is also written as $\frac{dq}{dt}$).

To explicitly show this dependence, we write the Lagrangian as $L(q,\dot{q})$. Therefore we shall write the action as follows:

$\displaystyle S=\int_{t_{1}}^{t_{2}}L(q,\dot{q}) dt$.

The Lagrangian formulation is important because it allows us to make a connection with Fermat’s principle in optics, which is the following statement:

Light always moves in such a way that it minimizes its time of travel.

Essentially, the Lagrangian formulation allows us to restate the good old Newton’s second law of motion as follows:

An object always moves in such a way that it minimizes its action.

In order to make calculations out of this “principle”, we have to make use of the branch of mathematics called the calculus of variations, which was specifically developed to deal with problems such as these. The calculations are fairly involved, but we will end up with the so-called Euler-Lagrange equations:

$\displaystyle \frac{\partial L}{\partial q}-\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}=0$

We are using the notation $\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}$ instead of the otherwise cumbersome notation $\frac{d\frac{\partial L}{\partial\dot{q}}}{dt}$. It is very common notation in physics to write $\frac{d}{dt}$ to refer to the derivative “operator” (see also More Quantum Mechanics: Wavefunctions and Operators).

For a nonrelativistic system, Euler-Lagrange equations are merely a restatement of Newton’s second law; in fact we can plug in the expressions for the Lagrangian, the kinetic energy, and the potential energy we wrote down earlier and end up exactly with $F=ma$.

Why then, go to all the trouble of formulating this new language, just to express something that we are already familiar with? Well, aside from the “aesthetically pleasing” connection with the very elegant Fermat’s principle, there are also numerous advantages to using the Lagrangian formulation. For instance, it exposes the symmetries of the system, as well as its conserved quantities (both of which are very important in modern physics). Also, the configuration is not always simply just the position, which means that it can be used to describe systems more complicated than just a single particle. Using the concept of a Lagrangian density, it can also describe fields like the electromagnetic field.

We make a mention of the role of the Lagrangian formulation  in quantum mechanics. The probability that a system will be found in a certain state (which we write as $|\phi\rangle$) at time $t_{2}$, given that it was in a state $|\psi\rangle$ at time $t_{1}$, is given by (see More Quantum Mechanics: Wavefunctions and Operators)

$\displaystyle |\langle\phi|e^{-iH(t_{2}-t_{1})}|\psi\rangle|^{2}$

where $H$ is the Hamiltonian (more on this later). The quantity

$\displaystyle \langle\phi|e^{-iH(t_{2}-t_{1})}|\psi\rangle$

is called the transition amplitude and can be expressed in terms of the Feynman path integral

$\displaystyle \int e^{iS}Dq$.

This is not an ordinary integral, as may be inferred from the different notation using $Dq$ instead of $dq$. What this means is that we sum the quantity inside the integral, $e^{iS}$, over all “paths” taken by our system. This has the rather mind blowing interpretation that in going from one point to another, a particle takes all paths. One of the best places to learn more about this concept is in the book QED: The Strange Theory of Light and Matter by Richard Feynman. This book is adapted from Feynman’s lectures at the University of Auckland, videos of which are freely and legally available online (see the references below).

We now discuss the Hamiltonian. The Hamiltonian is defined in terms of the Lagrangian $L$ by first defining the conjugate momentum $p$:

$\displaystyle p=\frac{\partial L}{\partial\dot{q}}$.

Then the Hamiltonian $H$ is given by the formula

$\displaystyle H=p\dot{q}-L$.

In contrast to the Lagrangian, which is a function of $q$ and $\dot{q}$, the Hamiltonian is expressed as a function of $q$ and $p$. For many basic examples the Hamiltonian is simply the total mechanical energy, with the kinetic energy $T$ now written in terms of $p$ instead of $\dot{q}$ as follows:

$\displaystyle T=\frac{p^{2}}{2m}$.

The advantage of the Hamiltonian  formulation is that it shows how the state of the system “evolves” over time. This is given by Hamilton’s equations:

$\displaystyle \dot{q}=\frac{\partial H}{\partial p}$

$\displaystyle \dot{p}=-\frac{\partial H}{\partial q}$

These are differential equations which can be solved to know the value of $q$ and $p$ at any instant of time $t$. One can visualize this better by imagining a “phase space” whose coordinates are $q$ and $p$. The state of the system is then given by a point in this phase space, and this point “moves” across the phase space according to Hamilton’s equations.

The Lagrangian and Hamiltonian formulations of classical mechanics may be easily generalized to more than one dimension. We will therefore have several different coordinates $q_{i}$ for the configuration; for the most simple examples, these may refer to the Cartesian coordinates of 3-dimensional space, i.e. $q_{1}=x$$q_{2}=x$$q_{3}=z$. We summarize the important formulas here:

$\displaystyle \frac{\partial L}{\partial q_{i}}-\frac{d}{dt}\frac{\partial L}{\partial\dot{q_{i}}}=0$

$\displaystyle H=\sum_{i}p_{i}\dot{q_{i}}-L$

$\displaystyle \dot{q_{i}}=\frac{\partial H}{\partial p_{i}}$

$\displaystyle \dot{p_{i}}=-\frac{\partial H}{\partial q_{i}}$

In quantum mechanics, the Hamiltonian formulation still plays an important role. As described in More Quantum Mechanics: Wavefunctions and Operators, the Schrodinger equation describes the time evolution of the state of a quantum system in terms of the Hamiltonian. However, in quantum mechanics the Hamiltonian is not just a quantity but an operator, whose eigenvalues usually correspond to the observable values of the energy of the system.

In most modern publications discussing modern physics, the Lagrangian and Hamiltonian formulations are used, in particular for their various advantages. Although we have limited this discussion to nonrelativistic mechanics, in relativity both formulations are still very important. The equations of general relativity, also known as Einstein’s equations, may be obtained by minimizing from the Einstein-Hilbert action. Meanwhile, there also exists a Hamiltonian formulation of general relativity called the Arnowitt-Deser-Misner formalism. Even the proposed candidates for a theory of quantum gravity, string theory and loop quantum gravity, make use of these formulations (the Lagrangian formulation seems to be more dominant in string theory, while the Hamiltonian formulation is more dominant in loop quantum gravity). It is therefore vital that anyone interested in learning about modern physics be at least comfortable in the use of this language.

References:

Lagrangian Mechanics on Wikipedia

Hamiltonian Mechanics on Wikipedia

Path Integral Formulation on Wikipedia

The Douglas Robb Memorial Lectures by Richard Feynman

QED: The Strange Theory of Light and Matter by Richard Feynman

Mechanics by Lev Landau and Evgeny Lifshitz

Classical Mechanics by Herbert Goldstein

More Quantum Mechanics: Wavefunctions and Operators

In Some Basics of Quantum Mechanics, we explained the role of vector spaces (which we first discussed in Vector Spaces, Modules, and Linear Algebra) in quantum mechanics. Linear transformations, which are functions between vector spaces, would naturally be expected to also play an important role in quantum mechanics. In particular, we would like to focus on the linear transformations from a vector space to itself. In this context, they are also referred to as linear operators.

But first, we explore a little bit more the role of infinite-dimensional vector spaces in quantum mechanics. In Some Basics of Quantum Mechanics, we limited our discussion to “two-state” systems, which are also referred to as “qubits”. We can imagine a system with more “classical states”. For example, consider a row of seats in a movie theater. One can sit in the leftmost chair, the second chair from the left, the third chair from the left, and so on. But if it was a quantum system, one can sit in all chairs simultaneously, at least until one is “measured”, in which case one will be found sitting in one seat only, and the probability of being found in a certain seat is the “absolute square” of the probability amplitude, which is the coefficient of the component of the “state vector” corresponding to that seat.

The number of “classical states” of the system previously discussed is the number of chairs in the row. But if we consider, for example, just “space”, and a system composed of a single particle in this space, and whose classical state is specified by the position of the particle, the number of states of the system is infinite, even if we only consider one dimension. It can be here, there, a meter from here, $0.1$ meters from here, and so on. Even if the particle is constrained on, say, a one meter interval, there is still an infinite number of positions it could be in, since there are an infinite number of numbers between $0$ to $1$. Hence the need for infinite-dimensional vector spaces.

As we have explained in Eigenvalues and Eigenvectors, sets of functions can provide us with an example of an infinite-dimensional vector space. We elaborate more on why functions would do well to describe a quantum system like the one we have described above. Let’s say for example that the particle is constrained to be on an interval from $0$ to $1$. For every point on the interval, there is a corresponding value of the probability amplitude. This is exactly the definition of a function from the interval $[0,1]$ to the set of complex numbers $\mathbb{C}$. We would also have to normalize later on, although the definition of normalization for infinite-dimensional vector spaces is kind of different, involving the notion of an integral (see An Intuitive Introduction to Calculus). For that matter, the square of the probability amplitude is not the probability, but the probability density.

The function that we have described is called the wave function. It is also a vector, an element of an infinite-dimensional vector space. It is most often written using the symbol $\psi(x)$, reflecting its nature as a function. However, since it is also a vector, we can also still use Dirac notation and write it as $|\psi\rangle$. The wave function is responsible for the so-called wave-particle duality of quantum mechanics, as demonstrated in the famous double-slit experiment.

We have noted in My Favorite Equation in Physics that in classical mechanics the state of a one-particle system is given by the position and momentum of that particle, whlie in quantum mechanics the wave function is enough. How can this be, since the wave function only contains information about the position? Well, actually the wave function also contains information about the momentum – this is because of the so-called de Broglie relations, which relates the momentum of a particle in quantum mechanics to its wavelength as a wave.

Actually, the wave function is a function, and does not always have to look like what we normally think of as a wave. But whatever the shape of the wave function, even if it does not look like a wave, it is always a combination of different waves. This statement is part of the branch of mathematics called Fourier analysis. The wavelengths of the different waves are related to the momentum of the corresponding particle, and we should note that like the position, they are also in quantum superposition.

There is one thing to note about this. Suppose our wave function is really a wave (technically we mean a sinusoidal wave). This wave gives us information about where we are likely to find the particle if we make a measurement – it is near the “peaks” and the “troughs” of the wave. But there are many “peaks” and “troughs” in the wave, and so it is difficult to determine where the particle will be when we measure it. On the other hand, since the wave function is composed of only one wave, we can easily determine what the momentum is.

We can also put several different waves together, resulting in a function that is “peaked” only at one place. This means there is only one place where the particle is most likely to be. But since we have combined different waves together, there will not be a single wavelength, hence, the momentum cannot be determined easily! To summarize, if we know more about the position, we know less about the momentum – and if we know more about the momentum, we know less about the position. This observation leads to the very famous Heisenberg uncertainty principle.

The many technicalities of the wave function we leave to the references for now, and proceed to the role of linear transformations, or linear operators, in quantum mechanics. We have already encountered one special role of certain kinds of linear transformations in Eigenvalues and Eigenvectors. Observables are represented by self-adjoint operators. A self-adjoint operator $A$ is a linear operator that satisfies the condition

$\displaystyle \langle Au|v\rangle=\langle u|Av\rangle$.

for a vector $|v\rangle$ and linear functional $\langle u|$ corresponding to the vector $|u\rangle$. The notation $|Av\rangle$ refers to the image of the vector $|v\rangle$ under the linear transformation $A$, while $\langle Au|$ refers to the linear functional corresponding to the vector $|Au\rangle$, which is the image of $|u\rangle$ under $A$. The role of linear functionals in quantum mechanics was discussed in Some Basics of Quantum Mechanics.

There is, for example, an operator corresponding to the position, another corresponding to the momentum, another corresponding to the energy, and so on. If we measure any of these observables for a certain quantum system in the state $|\psi\rangle$, we are certain to obtain one of the eigenvalues of that observable, with the probability of obtaining the eigenvalue $\lambda_{n}$ given by

$\displaystyle |\langle \psi_{n}|\psi\rangle|^{2}$

where $\langle \psi_{n}|$ is the linear functional corresponding to the vector $|\psi_{n}\rangle$, which is the eigenvector corresponding to the eigenvalue $\lambda_{n}$. For systems like our particle in space, whose states form an infinite-dimensional vector space, the quantity above gives the probability density instead of the probability. After measurement, the state of the system “collapses” to the state given by the vector $|\psi_{n}\rangle$.

Another very important kind of linear operator in quantum mechanics is a unitary operator. Unitary operators are kind of like the orthogonal matrices that represent rotations (see Rotating and Reflecting Vectors Using Matrices); in fact an orthogonal matrix is a special kind of unitary operator. We note that the orthogonal matrices had the special property that they preserved the “magnitude” of vectors; unitary operators are the same, except that they are more general, since the coefficients of vectors (the scalars) in this context are complex.

More technically, a unitary operator is a linear operator $U$ that satisfies the following condition:

$\displaystyle \langle u|v\rangle=\langle Uu|Uv\rangle$

with the same conventions as earlier. What this means is that the probability of finding the system in the state given by the vector $|u\rangle$ after measurement, given that it was in the state $|v\rangle$ before measurement remains the same if we rotate the system – or perform other “operations” represented by unitary operators such as letting time pass (time evolution), or “translating” the system to a different location.

So now we know that in quantum mechanics observables correspond to self-adjoint operators, and the “operations” of rotation, translation, and time evolution correspond to unitary operators. We might as well give a passing mention to one of the most beautiful laws of physics, Noether’s theorem, which states that the familiar “conservation laws” of physics (conservation of linear momentum, conservation of angular momentum, and conservation of energy) arise because the laws of physics do not change with translation, rotation, or time evolution. So Noether’s theorem in some way connects some of our “observables” and our “operations”.

We now revisit one of the “guiding questions” of physics, which we stated in My Favorite Equation in Physics:

“Given the state of a system at a particular time, in what state will it be at some other time?”

For classical mechanics, we can obtain the answer by solving the differential equation $F=ma$ (Newton’s second law of motion). In quantum mechanics, we have instead the Schrodinger equation, which is the “$F=ma$” of the quantum realm. The Schrodinger equation can be written in the form

$\displaystyle i\hbar\frac{d}{dt}|\psi(t)\rangle=H|\psi(t)\rangle$

where $i=\sqrt{-1}$ as usual, $\hbar$ is a constant called the reduced Planck’s constant (its value is around $1.054571800\times 10^{-34}$ Joule-seconds), and $H$ is a linear operator called the Hamiltonian. The Hamiltonian is a self-adjoint operator and in many cases corresponds to the energy observable. In the case where the Hamiltonian is time-independent, this differential equation can be solved directly to obtain the equation

$\displaystyle |\psi(t)\rangle=e^{-\frac{i}{\hbar}Ht}|\psi(0)\rangle$.

Since $H$ is a linear operator, $e^{-\frac{i}{\hbar}Ht}$ is also a linear operator (actually a unitary operator) and is the explicit form of the time evolution operator. For a Hamiltonian with time-dependence, one must use other methods to obtain the time evolution operator, such as making use of the so-called interaction picture or Dirac picture. But in any case, it is the Schrodinger equation, and the time evolution operator we can obtain from it, that provides us with the answer to the “guiding question” we asked above.

References:

Wave Function on Wikipedia

Matter Wave on Wikipedia

Uncertainty Principle on Wikipedia

Unitary Operator on Wikipedia

Noether’s Theorem on Wikipedia

Schrodinger Equation on Wikipedia

Introduction to Quantum Mechanics by David J. Griffiths

Modern Quantum Mechanics by Jun John Sakurai

Quantum Mechanics by Eugen Merzbacher

Vector Fields, Vector Bundles, and Fiber Bundles

In physics we have the concept of a vector field. Intuitively, a vector field is given by specifying a vector (in the sense of a quantity with magnitude and direction) at every point in a certain “space”. For instance, the wind velocity on the surface of our planet is a vector field. If we neglect the upward or downward dimension, and look only at the northward, southward, eastward, and westward directions, we have what we usually see on weather maps on the news. In one city the wind might be blowing strongly to the north, in another city it might be blowing weakly to the east, and in a third city it might be blowing moderately to the southwest.

If, instead of specifying a vector space (see Vector Spaces, Modules, and Linear Algebra) at every point, instead of just a single vector, we obtain instead the concept of a vector bundle. Given a vector bundle, we can obtain a vector field by choosing just one vector in the vector space. More technically, we say that a vector field is a section of the vector bundle.

A vector space can be thought of as just a certain kind of space; in our example of wind velocities on the surface of the Earth, the vector space that we attach to every point is the plane $\mathbb{R}^{2}$ endowed with an intuitive vector space structure. Given a point on the plane, we draw an “arrow” with its “tail” at the chosen origin of the plane and its “head” at the given point. We can then add and scale these arrows to obtain other arrows, hence, these arrows form a vector space. This “graphical” method of studying vectors (again in the sense of a quantity with magnitude and direction) is in fact one of the most common ways of introducing the concept of vectors in physics.

If, instead of a vector space such as the plane $\mathbb{R}^{2}$ we generalize to other kinds of spaces such as the circle $S^{1}$, we obtain the notion of a fiber bundle. A vector space is therefore just a special case of a fiber bundle. In Category Theory, we described the torus as a fiber bundle, obtained by “gluing” a circle to every point of another circle. The shape that is glued is called the “fiber“, and the shape to which the fibers are glued is called the “base“.

Simply gluing spaces to the points of another space does not automatically mean that the space obtained is a fiber bundle, however. There is another requirement. Consider, for example, a cylinder. This can be described as a fiber bundle, with the fibers given by lines, and the base given by a circle (this can also be done the other way around, but we use this description for the moment because we will use it to describe an important condition for a space to be a fiber bundle). However, another fiber bundle can be obtained from lines (as the fibers) and a circle (as the base). This other fiber bundle can be obtained by “twisting” the lines as we “glue” them to the points of a circle, resulting in the very famous shape known as the Mobius strip.

The cylinder, which exhibits no “twisting”, is the simplest kind of fiber bundle, called a trivial bundle. Still, even if the Mobius strip has some kind of “twisting”, if we look at them “locally”, i.e. only on small enough areas, there is no difference between the cylinder and the Mobius strip. It is only when we look at them “globally” that we can distinguish the two. This is the important requirement for a space to be a fiber bundle. Locally, they must “look like” the trivial bundle. This condition is related to the notion of continuity (see Basics of Topology and Continuous Functions).

The concept of fiber bundles can be found everywhere in physics, and forms the language for many of its branches. We have already stated an example, with vector fields on a space. Aside from wind velocities (and the velocities of other fluids), the concept of vector fields are also used to express quantities such as electric and magnetic fields.

Fiber bundles can also be used to express ideas that are not so easily visualized. For example, in My Favorite Equation in Physics we mentioned the concept of a phase space, whose coordinates represent the position and momentum of a system, which is used in the Hamiltonian formulation of classical mechanics. The phase space of a system is an example of a kind of fiber bundle called a cotangent bundle. Meanwhile, in Einstein’s general theory of relativity, the concept of a tangent bundle is used to study the curvature of spacetime (which in the theory is what we know as gravity, and is related to mass, or more generally, energy and momentum).

More generally, the tangent bundle can be used to study the curvature of objects aside from spacetime, including more ordinary objects like a sphere, or hills and valleys on a landscape. This leads to a further generalization of the notion of “curvature” involving other kinds of fiber bundles aside from tangent bundles. This more general idea of curvature is important in the study of gauge theories, which is an important part of the standard model of particle physics. A good place to start for those who want to understand curvature in the context of tangent bundles and fiber bundles is by looking up the idea of parallel transport.

Meanwhile, in mathematics, fiber bundles are also very interesting in their own right. For example, vector bundles on a space can be used to study the topology of a space. One famous result involving this idea is the “hairy ball theorem“, which is related to the observation that on our planet every typhoon must have an “eye”. However, on something that is shaped like a torus instead of a sphere (like, say, a space station with an artificial atmosphere), one can have a typhoon with no eye, simply by running the wind along the walls of the torus. Replacing wind velocities by magnetic fields, this becomes the reason why fusion reactors that use magnetic fields to contain the very hot plasma are shaped like a torus instead of like a sphere. We recall, of course, that the sphere and the torus are topologically inequivalent, and this is reflected in the very different characteristics of vector fields on them.

The use of vector bundles in topology has led to such subjects of mathematics such as the study of characteristic classes and K-theory. The concept of mathematical objects “living on” spaces should be reminiscent of the ideas discussed in Presheaves and Sheaves; in fact, in algebraic geometry the two ideas are very much related. Since algebraic geometry serves as a “bridge” between ideas from geometry and ideas from abstract algebra, this then leads to the subject called algebraic K-theory, where ideas from topology get carried over to abstract algebra and linear algebra (even number theory).

References:

Fiber Bundle on Wikipedia

Vector Bundle on Wikipedia

Vector Field on Wikipedia

Parallel Transport on Wikipedia

What is a Field? at Rationalizing the Universe

Algebraic Geometry by Andreas Gathmann

Algebraic Topology by Allen Hatcher

A Concise Course in Algebraic Topology by J. P. May

Geometrical Methods of Mathematical Physics by Bernard F. Schutz

Geometry, Topology and Physics by Mikio Nakahara

Some Basics of Quantum Mechanics

In My Favorite Equation in Physics we discussed a little bit of classical mechanics, the prevailing view of physics from the time of Galileo Galilei up to the start of the 20th century. Keeping in mind the ideas we introduced in that post, we now move on to one of the most groundbreaking ideas in the history of physics since that time (along with Einstein’s theory of relativity, which we have also discussed a little bit of in From Pythagoras to Einstein), the theory of quantum mechanics (also known as quantum physics).

We recall one of the “guiding” questions of physics that we mentioned in My Favorite Equation in Physics:

“Given the state of a system at a particular time, in what state will it be at some other time?”

This emphasizes the importance of the concept of “states” in physics. We recall that the state of a system (for simplicity, we consider a system made up of only one object whose internal structure we ignore – it may be a stone, a wooden block, a planet – but we may refer to this object as a “particle”) in classical mechanics is given by its position and velocity (or alternatively its position and momentum).

A system consisting of a single particle, whose state is specified by its position and velocity, or its position and momentum, might just be the simplest system that we can study in classical mechanics. But in this post, discussing quantum mechanics, we will start with something even simpler.

Consider a light switch. It can be in a “state” of “on” or “off”. Or perhaps we might consider a coin. This coin can be in a “state” of “heads” or “tails”. We consider a similar system for reasons of simplicity. In real life, there also exist such systems with two states, and they are being studied, for example, in cutting-edge research on quantum computing. In the context of quantum mechanics, such systems are called “qubits“, which is short for “quantum bits”.

Now an ordinary light switch may only be in a state of “on” or “off”, and an ordinary coin may be in a state of “heads” or “tails”, but we cannot have a state that is some sort of “combination” of these states. It would be unthinkable in our daily life. But a quantum mechanical system which can be in any of two states may also be in some combination of these states! This is the idea at the very heart of quantum mechanics, and it is called the principle of quantum superposition. Its basic statement can be expressed as follows:

If a system can exist in any number of classical states, it can also exist in any linear combination of these states.

This means that the space of states of a quantum mechanical system form a vector space. The concept of vector spaces, and the branch of mathematics that studies it, called linear algebra, can be found in Vector Spaces, Modules, and Linear Algebra. Linear algebra (and its infinite-dimensional variant called functional analysis) is the language of quantum mechanics. We have to mention that the field of “scalars” of this vector space is the set of complex numbers $\mathbb{C}$.

There is one more mathematical procedure that we have to apply to these states, called “normalization“, which we will learn about later on in this post. First we have to explain what it means if we have a state that is in a “linear combination” of other states.

We write our quantum state in the so-called “Dirac notation”. Consider the “quantum light switch” we described earlier (in real life, we would have something like an electron in “spin up” or “spin down” states, or perhaps a photon in “horizontally polarized” or “vertically polarized” states). We write the “on” state as

$|\text{on}\rangle$

and the “off” state as

$|\text{off}\rangle$

The principle of quantum superposition states that we may have a state such as

$|\text{on}\rangle+|\text{off}\rangle$

This state is made up of equal parts “on” and “off”. Quantum-mechanically, such a state may exist, but when we, classical beings as we are (in the sense that we are very big) interact or make measurements of this system, we only find it in either in a state of “on” or “off”, and never in a state that is in a linear combination of both. What then, does it mean for it to be in a state that is a linear combination of both “on” and “off”, if we can never even find it in such a state?

If a system is in the state $|\text{on}\rangle+|\text{off}\rangle$ before we make our measurement, then there are equal chances that we will find it in the “on” state or in the “off” state after measurement. We do not know beforehand whether we will get an “on” state or “off” state, which implies that there is a certain kind of “randomness” involved in quantum-mechanical systems.

It is at this point that we reflect on the nature of randomness. Let us consider a process we would ordinarily suppose to be “random”, for example the flipping of a coin, or the throwing of a die. We consider these processes random because we do not know all the factors at play, but if we had all the information, such as the force of the flip or the throw, the air resistance and its effects, and so on, and we make all the necessary calculations, at least “theoretically” we would be able to predict the result. Such a process is not really random; we only consider it random because we lack a certain knowledge that if we only possessed, we could use in order to determine the result with absolute certainty.

The “randomness” in quantum mechanics involves no such knowledge; we could know everything that is possible for us to know about the system, and yet, we could never predict with absolute certainty whether we would get an “on” or an “off” state when we make a measurement on the system. We might perhaps say that this randomness is “truly random”. All we can conclude, from our knowledge that the state of the system before measurement is $|\text{on}\rangle+|\text{off}\rangle$, is that there are equal chances of finding it in the “on” or “off” state after measurement.

If the state of the system before measurement is $|\text{on}\rangle$, then after measurement it will also be in the state $|\text{on}\rangle$. If we had some state like $1000|\text{on}\rangle+5|\text{off}\rangle$ before measurement, then there will be a much greater chance that it will be in the state $|\text{on}\rangle$ after measurement, although there is still a small chance that it will be in the state $|\text{off}\rangle$.

We now introduce the concept of normalization. We have seen that the “coefficients” of the components of our “state vector” correspond to certain probabilities, although we have not been very explicit as to how these coefficients are related to the probabilities. We have a well-developed mathematical language to deal with probabilities. When an event is absolutely certain to occur, for instance, we say that the event has a probability of 100%, or that it has a probability of $1$. We want to use this language in our study of quantum mechanics.

We discussed in Matrices the concept of a linear functional, which assigns a real or complex number (or more generally an element of the field of scalars) to any vector. For vectors expressed as column matrices, the linear functionals were expressed as row matrices. In Dirac notation, we also call our “state vectors”, such as $|\text{on}\rangle$ and $|\text{off}\rangle$, “kets”, and we will have special linear functionals $\langle \text{on}|$ and $\langle \text{off}|$, called “bras” (the words “bra” and “ket” come from the word “bracket”, with the fate of the letter “c” unknown; this notation was developed by the physicist Paul Dirac, who made great contributions to the development of quantum mechanics).

The linear functional $\langle \text{on}|$ assigns to any “state vector” representing the state of the system before measurement a certain number which when squared (or rather “absolute squared” for complex numbers) gives the probability that it will be found in the state $|\text{on}\rangle$ after measurement. We have said earlier that if the system is known to be in the state $|\text{on}\rangle$ before the measurement, then after the measurement the system will also be in the state $|\text{on}\rangle$. In other words, given that the system is in the state $|\text{on}\rangle$ before measurement, the probability of finding it in the state $|\text{on}\rangle$ after measurement is equal to $1$. We express this explicitly as

$|\langle \text{on}|\text{on}\rangle|^{2}=1$

From this observation, we make the requirement that

$\langle \psi |\psi \rangle=1$

for any state $|\psi\rangle$. This will lead to the requirement that if we have the state $C_{1}|\text{on}\rangle+C_{2}|\text{off}\rangle$, the coefficients $C_{1}$ and $C_{2}$ must satisfy the equation

$|C_{1}|^{2}+|C_{2}|^{2}=1$

or

$C_{1}^{*}C_{1}+C_{2}^{*}C_{2}=1$

The second expression is to remind the reader that these coefficients are complex. Since we express probabilities as real numbers, it is necessary that we use the “absolute square” of these coefficients, given by multiplying each coefficient by its complex conjugate.

So, in order to express the state where there are equal chances of finding the system in the state $|\text{on}\rangle$ or in the state $|\text{off}\rangle$ after measurement, we do not write it anymore as $|\text{on}\rangle+|\text{off}\rangle$, but instead as

$\frac{1}{\sqrt{2}}|\text{on}\rangle+\frac{1}{\sqrt{2}}|\text{off}\rangle$

The factors of $\frac{1}{\sqrt{2}}$ are there to make our notation agree with the notation in use in the mathematical theory of probabilities, where an event which is certain has a probability of $1$. They are called “normalizing factors”, and this process is what is known as normalization.

We may ask, therefore, what is the probability of finding our system in the state $|\text{on}\rangle$ after measurement, given that before the measurement it was in the state $\frac{1}{\sqrt{2}}|\text{on}\rangle+\frac{1}{\sqrt{2}}|\text{off}\rangle$. We already know the answer; since there are equal chances of finding it in the state $|\text{on}\rangle$ or $|\text{off}\rangle$, then we should have a 50% probability of finding it in the state $|\text{on}\rangle$ after measurement, or that this result has probability $0.5$. Nevertheless, we show how we use Dirac notation and normalization to compute this probability:

$|\langle \text{on}|(\frac{1}{\sqrt{2}}|\text{on}\rangle+\frac{1}{\sqrt{2}}|\text{off}\rangle)|^{2}$

$|\langle \text{on}|\frac{1}{\sqrt{2}}|\text{on}\rangle+\langle \text{on}|\frac{1}{\sqrt{2}}|\text{off}\rangle)|^{2}$

$|\frac{1}{\sqrt{2}}\langle \text{on}|\text{on}\rangle+\frac{1}{\sqrt{2}}\langle \text{on}|\text{off}\rangle)|^{2}$

We know that $\langle \text{on}|\text{on}\rangle=1$, and that $\langle \text{on}|\text{off}\rangle=0$, which leads to

$|\frac{1}{\sqrt{2}}|^{2}$

$\frac{1}{2}$

as we expected. We have used the “linear” property of the linear functionals here, emphasizing once again how important the language of linear algebra is to describing quantum mechanics.

For now that’s about it for this post. We have glossed over many aspects of quantum mechanics in favor of introducing and emphasizing how linear algebra is used as the foundation for its language; and the reason why linear algebra is chosen is because it fits with the principle at the very heart of quantum mechanics, the principle of quantum superposition.

So much of the notorious “weirdness” of quantum mechanics comes from the principle of quantum superposition, and this “weirdness” has found many applications both in explaining why our world is the way it is, and also in improving our quality of life through technological inventions such as semiconductor electronics.

I’ll make an important clarification at this point; we do not really “measure” the state of the system. What we really measure are “observables” which tell us something about the state of the system. These observables are represented by linear transformations, but to understand them better we need the concept of eigenvectors and eigenvalues, which I have not yet discussed in this blog, and did not want to discuss too much in this particular post. In the future perhaps we will discuss it; for now the reader is directed to the references listed at the end of the post. What we have discussed here, the probability of finding the system in a certain state after measurement given that it is in some other state before measurement, is related to the phenomenon known as “collapse“.

Also, despite the fact that we have only tackled two-state (or qubit) systems in this post, it is not too difficult to generalize, at least conceptually, to systems with more states, or even systems with an infinite number of states. The case where the states are given by the position of a particle leads to the famous wave-particle duality. The reader is encouraged once again to read about it in the references below, and at the same time try to think about how one should generalize what we have discussed in here to that case. Such cases will hopefully be tackled in future posts.

(Side remark: I had originally intended to cover quite a bit of ground in at least the basics of quantum mechanics in this post; but before I noticed it had already become quite a hefty post. I have not even gotten to the Schrodinger equation. Well, hopefully I can make more posts on this subject in the future. There’s so much one can make a post about when it comes to quantum mechanics.)

References:

Quantum Mechanics on Wikipedia

Quantum Superposition on Wikipedia

Bra-Ket Notation on Wikipedia

Wave Function Collapse on Wikipedia

Parallel Universes #1 – Basic Copenhagen Quantum Mechanics at Passion for STEM

If You’re Losing Your Grip on Quantum Mechanics, You May Want to Read Up on This at quant-ph/math-ph

The Feynman Lectures on Physics by Richard P. Feynman

Introduction to Quantum Mechanics by David J. Griffiths

Modern Quantum Mechanics by Jun John Sakurai

My Favorite Equation in Physics

My favorite equation in physics is none other than Newton’s second law of motion, often written as

$\displaystyle F=ma$.

I like to call it the “Nokia 3310 of Physics” – the Nokia 3310 was a popular cellular phone model, back in the older days before smartphones became the norm, and which to this day is still well-known for its reliability and its durability. In the same way, Newton’s second law of motion, although superseded in modern physics by relativity and quantum mechanics, is still quite reliable for its purposes, was historical and groundbreaking for its time, and remains the “gold standard” of physical theories for its simplicity and elegance.

In fact, much of modern physics might be said to be just one long quest to “replace” Newton’s second law of motion when it was found out that it didn’t always hold, for example when things were extremely small, extremely fast, or extremely massive, or any combination of the above. Therefore quantum mechanics was developed to describe the physics of the extremely small, special relativity was developed to describe the physics of the extremely fast, and general relativity was developed to describe the physics of the extremely massive. However, a physical theory that could deal with all of the above – a so-called “theory of everything” – has not been developed yet, although a great deal of research is dedicated to this goal.

This so-called “theory of everything” is of course not literally a theory of “everything”. One can think of it instead as just a really high-powered, upgraded version of Newton’s second law of motion that holds even when things were extremely small, extremely fast, and extremely massive.

(Side note: There’s usually other things we might ask for in a “theory of everything” too. For instance, we usually want the theory to “unify” the four fundamental forces of electromagnetism, the weak nuclear force, the strong nuclear force, and gravity. As far as we currently understand, all the ordinary forces we encounter in everyday life, in fact all the forces we know of in the universe, are just manifestations of these four fundamental forces. It’s a pretty elegant scientific fact, and we want our theory to be even more elegant by unifying all these forces under one concept.)

All this being said, we look at a few aspects of Newton’s second law of motion. Even those who are more interested in the more modern theories of physics, or who want to pursue the quest for the “theory of everything”, might be expected to have a reasonably solid understanding, and more importantly an appreciation, for Newton’s second law.

The meaning of the equation is familiar from high school physics: The acceleration (the change in the velocity with respect to time) of an object is directly proportional to the force applied, in the same direction, and is inversely proportional to the mass of the object. Let’s simplify things for a moment and focus only on one dimension of motion, so we don’t have to worry too much about the direction (except forward/backward or upward/downward, and so on). We also assume that the mass of the object is constant.

First of all, we note that, given the definition of acceleration, Newton’s second law of motion is really a differential equation, expressible in the following form:

$\displaystyle F=m\frac{dv}{dt}$

or, since the velocity $v$ is the derivative $\frac{dx}{dt}$ of the position $x$ with respect to the time $t$, we can also express it as

$\displaystyle F=m\frac{d(\frac{dx}{dt})}{dt}$

or, in more compact notation,

$\displaystyle F=m\frac{d^{2}x}{dt^2}$.

We first discuss this form, which is a differential equation for the position. We will go back to the first form later.

The force $F$ itself may have different forms. One particularly simple form is for the force of gravity exerted by the Earth on objects near its surface. In this case we can use the approximate form $F=-mg$, where $g$ is a constant with a value of around $9.81 \text{m}/\text{s}^{2}$. The minus sign is there for purposes of convention, since this force is always in a downward direction, and we take “up” to be the positive direction. We can then take $x$ to be the height of the object above the ground.

We have in this specific case (called “free fall”) the following expression of Newton’s second law:

$\displaystyle -mg=m\frac{d^{2}x}{dt^2}$

$\displaystyle -g=\frac{d^{2}x}{dt^2}$

We can then apply our knowledge of calculus so that we can obtain an expression telling us how the height object changes over time. We skip the steps and just give the answer here:

$x(t)=-\frac{1}{2}gt^{2}+v_{0}t+x_{0}$

where $x_{0}$ and $v_{0}$ are constants, respectively called the initial position and the initial velocity, which need to be specified before we can give the height of the object above the ground at any time $t$.

We go back to the first form we wrote down above to express Newton’s second law of motion as the following differential equation for the velocity:

$\displaystyle F=m\frac{dv}{dt}$

In the case of free fall, this is

$\displaystyle -mg=m\frac{dv}{dt}$

$\displaystyle -g=\frac{dv}{dt}$

This can be solved to obtain the following expression for the velocity at any time $t$:

$v(t)=-gt+v_{0}$

We collect our results here, and summarize. By solving Newton’s second law of motion for this particular system, using the methods of calculus, we have the two equations for the position and velocity at any time $t$.

$x(t)=-\frac{1}{2}gt^{2}+v_{0}t+x_{0}$

$v(t)=-gt+v_{0}$

But to obtain the position and velocity at any time $t$, we also need two constants $x_{0}$ and $v_{0}$, which we respectively call the initial position and the initial velocity.

In other words, when we know the “specifications” of a system, such as the law of physics that governs it, and the form of the force in our case, and we know the initial position and the initial velocity, then it is possible to know the position and the velocity at any other point in time.

This is a special case of the following question that always appears in physics, whether it is classical mechanics, quantum mechanics, or most other branches of modern physics:

“Given the state of a system at a particular time, in what state will it be at some other time?”

In classical mechanics, the “state” of a system is given by its position and velocity. Equivalently, it may also be given by its position and momentum. In quantum mechanics it is a little different, and there is this concept often referred to as the “wavefunction” which gives the “state” of a system. In elementary contexts the wavefunction may be thought of as giving the probability of a particle to be found at a specific position (more precisely, this probability is the “amplitude squared” of the wavefunction). Since quantum mechanics involves probabilities, a variant of this question is the following:

“Given that a system is in a particular state at some particular time, what is the probability that it will be in some other specified state at some other specified time?”

We now go back to classical mechanics and Newton’s second law, and focus on some historical developments. It is perhaps worth mentioning that before Isaac Newton, Galileo Galilei already had ideas on force and acceleration, evident in his book Two New Sciences. Anyway, Newton’s masterpiece Mathematical Principles of Natural Philosophy was where it was first stated in its most familiar form, and where it was used as one of the ingredients needed to put together Newton’s theory of universal gravitation. It was around this time that the study of mechanics became popular among that era’s greatest thinkers.

Meanwhile, also around this time, another branch of physics was gaining ground in popularity. This was the field of optics, which studied the motion of light just as mechanics studied the motion of more ordinary material objects. Just as Newton’s second law of motion, along with the first and third laws, made up the basics of mechanics, the basic law in optics was given by Fermat’s principle, which is given by the following statement:

Light always moves in such a way that it minimizes its time of travel.

It was the goal of the scientists and mathematicians of that time to somehow “unify” these two seemingly separate branches of physics; this was especially inviting since Fermat’s principle seemed even more elegant than Newton’s second law of motion.

While the physical relationship between light and matter would only be revealed with the advent of quantum mechanics, the scientists and mathematicians of the time were at least able to come up with a language for mechanics analogous to the very elegant statement of Fermat’s principle. This was developed over a long period of time by many historical figures such as Pierre de Maupertuis, Leonhard Euler, and Joseph Louis Lagrange. This was fully accomplished in the 19th century by the mathematician William Rowan Hamilton.

This quest gave us many alternative formulations of Newton’s second law; what it says in terms of physics is exactly the same, but it is written in a more elegant language. Although during the time people had no idea about quantum mechanics or relativity, these formulations would become very useful for expressing these newly discovered laws of physics later on. The first of these is called the Lagrangian formulation, and its statement is the following:

An object always moves in such a way that it minimizes its action.

This “action” is a quantity defined as the integral over time of another quantity called the “Lagrangian” which is usually defined as the difference of the expressions for the kinetic energy and the potential energy, both concepts which are related to the more familiar formulation of Newton (although developed by his often rival Gottfried Wilhelm Liebniz).

Another formulation is called the Hamiltonian formulation, and what it does is give us a way to imagine the time evolution of the “state” of the object (given by its position and momentum) in a “space of states” called the “phase space“. This time evolution is given by a quantity called the Hamiltonian, which is usually defined in terms of the Lagrangian.

The Lagrangian and Hamiltonian formulations of classical mechanics, as we stated earlier, contain no new physics. It is still Newton’s second law of motion. It is still $F=ma$. It is just stated in a new language. However, relativity and quantum mechanics, which do contain new physics, can also be stated in this language. In quantum mechanics, for example, the state at a later time is given by applying a “time evolution operator” defined using the Hamiltonian to a “state vector” representing the current state. Meanwhile, the probability that a certain state will be found in some other specified state can be found using the Feynman path integral, which is defined using the action, or in other words, the Lagrangian.

We have thus reviewed Newton’s second law of motion, one of the oldest laws of physics humankind has discovered since the classical age, and looked at it in the light of newer theories. There will always be new theories, as such is the nature of physics and science as a whole, to evolve, to improve. But there are some ideas in our history that have stood the test of time, and in the cases where they had to be replaced, they have paved the way for their own successors. Such ideas, in my opinion, will always be worth studying no matter how old they become.

References:

Classical Mechanics on Wikipedia

Lagrangian Mechanics on Wikipedia

Hamiltonian Mechanics on Wikipedia

Mechanics by Lev Landau and Evgeny Lifshitz

Classical Mechanics by Herbert Goldstein

From Pythagoras to Einstein

The Pythagorean theorem is one of the most famous theorems in all of mathematics. Even people who are not very familiar with much of mathematics are at least familiar with the Pythagorean theorem, especially since its grade school level stuff. It’s also one of the most ancient theorems, obviously known to the ancient Greeks, although it may not have been invented by Pythagoras himself – it may go back to an even earlier time in history.

We recall the statement of the Pythagorean theorem. Suppose we have a right triangle, a triangle where one of the three angles is a right angle. The side opposite the right angle is called the “hypotenuse”, and we will use the symbol c to signify its length. It is always the longest among the three sides. The other two sides are called the altitude, whose length we symbolize by a, and the base, whose length we symbolize by b. The Pythagorean theorem relates the length of these three sides, so that given the lengths of two sides we can calculate the length of the remaining side.

$\displaystyle a^2+b^2=c^2$

Later on, when we learn about Cartesian coordinates, the Pythagorean theorem is used to derive the so-called “distance formula”. Let’s say we have a point A with coordinates $\displaystyle (x_{1}, y_{1})$, and another point B with coordinates $\displaystyle (x_{2}, y_{2})$. The distance between point A and point B is given by

$\displaystyle \text{distance}=\sqrt{(x_{2}-x_{1})^2+(y_{2}-y_{1})^2}$.

There is a very important aspect of the distance formula that will play an important role in the rest of the things that we will discuss in this post. Namely, we can change the coordinate system, so that the points A and B have different coordinates, by translating the origin, or by rotating the coordinate axes. However, even though the coordinates of the two points will be different, the distance given by the distance formula will be the same.

This might be a little technical, so let’s have a more down-to-earth example. I live in Southeast Asia, so I will say, for example, that the Great Pyramid of Giza, in Egypt, is very far away. But someone who lives in Egypt will perhaps say that no, the Great Pyramid is just nearby. In that case, because we live in different places, we will disagree on how far away the Great Pyramid is. But if, instead, I ask for the distance between the Sphinx and the Great Pyramid, then that is something we can agree on even if we live in different places (Google tells me they are about a kilometer apart).

We disagree on how close or far away something is, because the answer to that question depends on where we are. I measure distance from myself based on my own location, and the same is true for the other person, and that is why we disagree. But the distance between the two objects in my example, the Sphinx and the Great Pyramid, is an invariant quantity. It does not depend on where we are. This invariance makes it a very important quantity.

We will rewrite the distance formula using the following symbols to simplify the notation:

$\displaystyle \Delta x=x_{2}-x_{1}$

$\displaystyle \Delta y=y_{2}-y_{1}$

Furthermore we will use the symbol $\Delta s$ to represent the distance. The distance formula is now written

$\displaystyle \Delta s=\sqrt{(\Delta x)^2+(\Delta y)^2}$

That giant square root over the right hand side does not look very nice, so we square both sides of the distance formula, giving us

$\displaystyle (\Delta s)^2=(\Delta x)^2+(\Delta y)^2$.

Finally, we switch the left hand side and the right hand side so that the analogy with the Pythagorean theorem becomes more visible. So our distance formula now becomes

$\displaystyle (\Delta x)^2+(\Delta y)^2=(\Delta s)^2$.

Again we did all of this so that we have the same form for the distance formula and the Pythagorean theorem. Here they are again, for comparison:

Pythagorean Theorem: $\displaystyle a^2+b^2=c^2$

Distance Formula: $\displaystyle (\Delta x)^2+(\Delta y)^2=(\Delta s)^2$.

Of course, real life does not exist on a plane, and if we wanted a distance formula with three dimensions, as may be quite useful for applications in engineering where we work with three-dimensional objects, we have the three-dimensional distance formula:

$\displaystyle (\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2$

Following now the pattern, if we wanted to extend this to some sort of a space with four dimensions, for whatever reason, we just need to add another variable w, as follows:

$\displaystyle (\Delta w)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2$

As far as we know, in real life space only has three dimensions. However, we do live in something four-dimensional; not a four-dimensional space, but a four-dimensional “spacetime”. The originator of the idea of spacetime was a certain Hermann Minkowski,  a mathematician who specialized in number theory but made this gigantic contribution to physics before he tragically died of appendicitis at the age of 44 years old. But Minkowski’s legacy was passed on to his good friend, a rising physicist who was working on a theory unifying two apparently conflicting areas of physics at the time, classical mechanics and electromagnetism. This young physicist’s name was Albert Einstein, and the theory he was working on was called “relativity”. And Minkowski’s idea of spacetime would play a central role in it.

People have been putting space and time together since ancient times. When we set an event, for example, like a meeting or a party, we need to specify a place and a time. But Minkowski’s idea was far more radical. He wanted to think of space and time as parts of a single entity called spacetime, in the same way that the x-axis and the y-axis are parts of a single entity called the x-y plane. If this was true, then just as there was an invariant “distance” between two points in the x-y plane, then there should be an invariant “interval” between two events in spacetime. This would simplify and explain many phenomena already suggested by the work of Einstein and his predecessors such as the measurement of lengths and the passing of time being different for different observers.

However, the formula for this interval was different from the distance in that there was a minus sign for the one coordinate which was different from the rest, time. This was needed for the theory to agree with the electromagnetic phenomena that we observe in everyday life.

$\displaystyle -(\Delta t)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2$

There’s still a little problem with this formula, however. We measure time and distance using different units. For time we usually use seconds, minutes, hours, days, and so on. For distance we use centimeters, meters, kilometers, and so on. When we add or subtract quantities they need to have the same units. But we already have experience in adding or subtracting quantities with different units – for example, let’s stick with distance and imagine that we need to add two different lengths; however, one is measured in meters while another is measured in centimeters. All we need to do is to “convert” one of them so that they have the same units; a hundred centimeters make one meter, so we can use this fact to convert to either centimeters or meters, and then we can add the two lengths. More technically, we say that we need a conversion factor, a constant quantity of 100 centimeters per meter.

The same goes for our problem in calculating the spacetime interval. We need to convert some of the quantities so that they will have the same units. What we need is a conversion factor, a constant quantity measured in units that involve a ratio of units of time and distance, or vice-versa. Such a quantity was found suitable, and it is the speed of light in a vacuum c, which has a value of around 300,000,000 meters per second. This allows us to write

$\displaystyle -(c\Delta t)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2$

This formula is at the heart of the theory of relativity. For those who have seen the 2014 movie “Interstellar”, one may recall (spoiler alert) how the main character aged so much more slowly than his daughter, because of the effects of the geometry of spacetime he experienced during his mission, and when he met up with her again she was already so much older than him. All of this can really be traced back to the idea of a single unified spacetime with an invariant interval as shown above. If space and time were two separate entities instead of being parts of a single spacetime, there would be no such effects. But if they form a single spacetime, then neither time nor distance are invariant; the invariant quantity is the spacetime interval. Time and distance are relative. Hence, “relativity”. Hence, contraction of length and dilation of time. Such effects in real life are already being observed in the GPS satellites that orbit our planet.

The theory of relativity is by no means a “complete” theory, because there are still so many questions, involving black holes for example. Like most of science, there’s always room for improvement. But what we currently have is a very beautiful, very elegant theory that explains many phenomena we would otherwise be unable to explain, and all of it comes back to some very ancient mathematics we are all familiar with from grade school.

References:

Theory of Relativity on Wikipedia

Spacetime on Wikipedia

Spacetime Physics by Edwin F. Taylor and John Archibald Wheeler

Spacetime and Geometry by Sean Carroll