Some Basics of (Quantum) Electrodynamics

There are only four fundamental forces as far as we know, and every force that we know of can ultimately be considered as manifestations of these four. These four are electromagnetism, the weak nuclear force, the strong nuclear force, and gravity. Among them, the one we are most familiar with is electromagnetism, both in terms of our everyday experience (where it is somewhat on par with gravity) and in terms of our physical theories (where our understanding of electrodynamics is far ahead of our understanding of the other three forces, including, and especially, gravity).

Electromagnetism is dominant in everyday life because the weak and strong nuclear forces have a very short range, and because gravity is very weak. Now gravity doesn’t seem weak at all, especially if we have experienced falling on our face at some point in our lives. But that’s only because the “source” of this gravity, our planet, is very large. But imagine a small pocket-sized magnet lifting, say an iron nail, against the force exerted by the Earth’s gravity. This shows how much stronger the electromagnetic force is compared to gravity. Maybe we should be thankful that gravity is not on the same level of strength, or falling on our face would be so much more painful.

It is important to note also, that atoms, which make up everyday matter, are themselves made up of charged particles – electrons and protons (there are also neutrons, which are uncharged). Electromagnetism therefore plays an important part, not only in keeping the “parts” of an atom together, but also in “joining” different atoms together to form molecules, and other larger structures like crystals. It might be gravity that keeps our planet together, but for less massive objects like a house, or a car, or a human body, it is electromagnetism that keeps them from falling apart.

Aside from electromagnetism being the one fundamental force we are most familiar with, another reason to study it is that it is the “template” for our understanding of the rest of the fundamental forces, including gravity. In Einstein’s general theory of relativity, gravity is the curvature of spacetime; it appears that this gives it a nature different from the other fundamental forces. But even then, the expression for this curvature, in terms of the Riemann curvature tensor, is very similar in form to the equation for the electromagnetic fields in terms of the field strength tensor.

The electromagnetic fields, which we shall divide into the electric field and the magnetic field, are vector fields (see Vector Fields, Vector Bundles, and Fiber Bundles), which means that they have a value (both magnitude and direction) at every point in space. A charged particle in an electric or magnetic field (or both) will experience a force according to the Lorentz force law:

\displaystyle F_{x}=q(E_{x}+v_{y}B_{z}-v_{z}B_{y})

\displaystyle F_{y}=q(E_{y}+v_{z}B_{x}-v_{x}B_{z})

\displaystyle F_{z}=q(E_{z}+v_{x}B_{y}-v_{y}B_{x})

where F_{x}, F_{y}, and F_{z} are the three components of the force, in the x, y, and z direction, respectively; E_{x}, E_{y}, and E_{z} are the three components of the electric field;  B_{x}, B_{y}, B_{z} are the three components of the magnetic field; v_{x}, v_{y}, v_{z} are the three components of the velocity of the particle, and q is its charge. Newton’s second law (see My Favorite Equation in Physics) gives us the motion of an object given the force acting on it (and its mass), so together with the Lorentz force law, we can determine the motion of charged particles in electric and magnetic fields.

The Lorentz force law is extremely important in electrodynamics and we will keep the following point in mind throughout this discussion:

The Lorentz force law tells us how charges move under the influence of electric and magnetic fields.

Instead of discussing electrodynamics in terms of these fields, however, we will instead focus on the electric and magnetic potentials, which together form what is called the four-potential and are related to the fields in terms of the following equations:

\displaystyle E_{x}=-\frac{1}{c}\frac{\partial A_{x}}{\partial t}-\frac{\partial A_{t}}{\partial x}

\displaystyle E_{y}=-\frac{1}{c}\frac{\partial A_{y}}{\partial t}-\frac{\partial A_{t}}{\partial y}

\displaystyle E_{z}=-\frac{1}{c}\frac{\partial A_{z}}{\partial t}-\frac{\partial A_{t}}{\partial z}

\displaystyle B_{x}=\frac{\partial A_{z}}{\partial y}-\frac{\partial A_{y}}{\partial z}

\displaystyle B_{y}=\frac{\partial A_{x}}{\partial z}-\frac{\partial A_{z}}{\partial x}

\displaystyle B_{z}=\frac{\partial A_{y}}{\partial x}-\frac{\partial A_{x}}{\partial y}

The values of the potentials, as functions of space and time, are related to the distribution of charges and currents by the very famous set of equations called Maxwell’s equations:

\displaystyle -\frac{\partial^{2} A_{t}}{\partial x^{2}}-\frac{\partial^{2} A_{t}}{\partial y^{2}}-\frac{\partial^{2} A_{t}}{\partial z^{2}}-\frac{\partial^{2} A_{x}}{\partial t\partial x}-\frac{\partial^{2} A_{y}}{\partial t\partial y}-\frac{\partial^{2} A_{z}}{\partial t\partial z}=\frac{4\pi}{c}J_{t}

\displaystyle \frac{1}{c^{2}}\frac{\partial^{2} A_{x}}{\partial t^{2}}-\frac{\partial^{2} A_{x}}{\partial y^{2}}-\frac{\partial^{2} A_{x}}{\partial z^{2}}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial x\partial t}+\frac{\partial^{2} A_{y}}{\partial x\partial y}+\frac{\partial^{2} A_{z}}{\partial x\partial z}=\frac{4\pi}{c}J_{x}

\displaystyle -\frac{\partial^{2} A_{y}}{\partial x^{2}}+\frac{1}{c^{2}}\frac{\partial^{2} A_{y}}{\partial t^{2}}-\frac{\partial^{2} A_{y}}{\partial z^{2}}+\frac{\partial^{2} A_{x}}{\partial y\partial x}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial t\partial y}+\frac{\partial^{2} A_{z}}{\partial y\partial z}=\frac{4\pi}{c}J_{y}

\displaystyle -\frac{\partial^{2} A_{z}}{\partial x^{2}}-\frac{\partial^{2} A_{z}}{\partial y^{2}}+\frac{1}{c^{2}}\frac{\partial^{2} A_{z}}{\partial t^{2}}+\frac{\partial^{2} A_{x}}{\partial z\partial x}+\frac{\partial^{2} A_{y}}{\partial z\partial y}+\frac{1}{c}\frac{\partial^{2} A_{t}}{\partial z\partial t}=\frac{4\pi}{c}J_{z}

Some readers may be more familiar with Maxwell’s equations written in terms of the electric and magnetic fields; in that case, they have individual names: Gauss’ lawGauss’ law for magnetismFaraday’s law, and Ampere’s law (with Maxwell’s addition). When written down in terms of the fields, they can offer more physical intuition – for instance, Gauss’ law for magnetism tells us that the magnetic field has no “divergence”, and is always “solenoidal”. However, we leave this approach to the references for the moment, and focus on the potentials, which will be more useful for us when we relate our discussion to quantum mechanics later on. We will, however, always remind ourselves of the following important point:

Maxwell’s laws tells us the configuration and evolution of the electric and magnetic fields (possibly via the potentials) under the influence of sources (charge and current distributions).

There is one catch (an extremely interesting one) that comes about when dealing with potentials instead of fields. It is called gauge freedom, and is one of the foundations of modern particle physics. However, we will not discuss it in this post. Our equations will remain correct, so the reader need not worry; gauge freedom is not a constraint, but is instead a kind of “symmetry” that will have some very interesting consequences. This concept is left to the references for now, however it is hoped that it will at some time be discussed in this blog.

The way we have wrote down Maxwell’s equations is rather messy. However, we can introduce some notation to write them in a more elegant form. We use what is known as tensor notation; however we will not discuss the concept of tensors in full here. We will just note that because the formula for the spacetime interval contains a sign different from the others, we need two different types of indices for our vectors. The so-called contravariant vectors will be indexed by a superscript, while the so-called covariant vectors will be indexed by a subscript. “Raising” and “lowering” these indices will involve a change in sign for some quantities; we will indicate them explicitly here.

Let x^{0}=ctx^{1}=xx^{2}=yx^{3}=z. Then we will adopt the following notation:

\displaystyle \partial_{\mu}=\frac{\partial}{\partial x^{\mu}}

\displaystyle \partial^{\mu}=\frac{\partial}{\partial x^{\mu}} for \mu=0

\displaystyle \partial^{\mu}=-\frac{\partial}{\partial x^{\mu}} for \mu\neq 0

Let A^{0}=A_{t}A^{1}=A_{x}A^{2}=A_{y}A^{3}=A_{z}. Then Maxwell’s equations can be written as

\displaystyle \sum_{\mu=0}^{3}\partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=\frac{4\pi}{c}J^{\nu}.

We now introduce the so-called Einstein summation convention. Note that the summation is performed over the index that is repeated; and that that one of these indices is a superscript and the other is a subscript. Albert Einstein noticed that almost all summations in his calculations happen in this way, so he adopted the convention that instead of explicitly writing out the summation sign, repeated indices (one superscript and one subscript) would instead indicate that a summation should be performed. Like most modern references, we adopt this notation, and only explicitly say so when there is an exception. This allows us to write Maxwell’s equations as

\displaystyle \partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=\frac{4\pi}{c}J^{\nu}.

We can also use the Einstein summation convention to rewrite other important expressions in physics in more compact form. In particular, it allows us to rewrite the Dirac equation (see Some Basics of Relativistic Quantum Field Theory) as follows:

\displaystyle i\hbar\gamma^{\mu}\partial_{\mu}\psi-mc\psi=0

We now go to the quantum realm and discuss the equations of motion of quantum electrodynamics. Let A_{0}=A_{t}A_{1}=-A_{x}A_{2}=-A_{y}A_{3}=-A_{z}. These equations are given by

\displaystyle \displaystyle i\hbar\gamma^{\mu}\partial_{\mu}\psi-mc\psi=\frac{q}{c}A_{\mu}\psi

\displaystyle \partial_{\mu}(\partial^{\mu}A^{\nu}-\partial^{\nu}A^{\mu})=4\pi q\bar{\psi}\gamma^{\mu}\psi

What do these two equations mean?

The first equation looks like the Dirac equation, except that on the right hand side we have a term with both the “potential” (which we now call the Maxwell field, or the Maxwell field operator), the Dirac “wave function” for a particle such as an electron (which, as we have discussed in Some Basics of Relativistic Quantum Field Theory, is actually the Dirac field operator which operates on the “vacuum” state to describe a state with a single electron), as well as the charge. It describes the “motion” of the Dirac field under the influence of the Maxwell field. Hence, this is the quantum mechanical version of the Lorentz force law.

The second equation is none other than our shorthand version of Maxwell equations, and on the right hand side is an explicit expression for the current in terms of the Dirac field and some constants. The symbol \bar{\psi} refers to the “adjoint” of the Dirac field; actually the Dirac field itself has components, although, because of the way it transforms under rotations, we usually do not refer to it as a vector. Hence it can be written as a column matrix (see Matrices), and has a “transpose” which is a row matrix; the “adjoint” is given by the “conjugate transpose” which is a row matrix where all the entries are the complex conjugates of the transpose of the Dirac field.

In general relativity there is this quote, from the physicist John Archibald Wheeler: “Spacetime tells matter how to move; matter tells spacetime how to curve”. One can perhaps think of electrodynamics, whether classical or quantum, in a similar way. Fields tell charges and currents how to move, charges and currents tell fields how they are supposed to be “shaped”. And this is succinctly summarized by the Lorentz force law and Maxwell’s equations, again whether in its classical or quantum version.

As we have seen in Lagrangians and Hamiltonians, the equations of motion are not the only way we can express a physical theory. We can also use the language of Lagrangians and Hamiltonians. In particular, an important quantity in quantum mechanics that involves the Lagrangian and Hamiltonian is the probability amplitude. In order to calculate the probability amplitude, the physicist Richard Feynman developed a method involving the now famous Feynman diagrams, which can be though of as expanding the exponential function (see “The Most Important Function in Mathematics”) in the expression for the probability amplitude and expressing the different terms using diagrams. Just as we have associated the Dirac field to electrons, the Maxwell field is similarly associated to photons. Expressions involving the Dirac field and the Maxwell field can be thought of as electrons “emitting” or “absorbing” photons, or electrons and positrons (the antimatter counterpart of electrons) annihilating each other and creating a photon. The calculated probability amplitudes can then be used to obtain quantities that can be compared to results obtained from experiment, in order to verify the theory.


Lorentz Force on Wikipedia

Electromagnetic Four-Potential on Wikipedia

Maxwell’s Equations on Wikipedia

Quantum Electrodynamics on Wikipedia

Featured Image Produced by CERN

The Douglas Robb Memorial Lectures by Richard Feynman

QED: The Strange Theory of Light and Matter by Richard Feynman

Introduction to Electrodynamics by David J. Griffiths

Introduction to Elementary Particle Physics by David J. Griffiths

Quantum Field Theory by Fritz Mandl and Graham Shaw

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder

Some Basics of Relativistic Quantum Field Theory

So far, on this blog, we have introduced the two great pillars of modern physics, relativity (see From Pythagoras to Einstein) and quantum mechanics (see Some Basics of Quantum Mechanics and More Quantum Mechanics: Wavefunctions and Operators). Although a complete unification between these two pillars is yet to be achieved, there already exists such a unified theory in the special case when gravity is weak, i.e. spacetime is flat. This unification of relativity (in this case special relativity) and quantum mechanics is called relativistic quantum field theory, and we discuss the basic concepts of it in this post.

In From Pythagoras to Einstein, we introduced the formula at the heart of Einstein’s theory of relativity. It is very important to modern physics and is worth writing here again:

\displaystyle -(c\Delta t)^2+(\Delta x)^2+(\Delta y)^2+(\Delta z)^2=(\Delta s)^2

This holds only for flat spacetime, however, even in general relativity, where spacetime may be curved, a “local” version still holds:

\displaystyle -(cdt)^2+(dx)^2+(dy)^2+(dz)^2=(ds)^2

The notation comes from calculus (see An Intuitive Introduction to Calculus), and means that this equation holds when the quantities involved are very small.

In this post, however, we shall consider a flat spacetime only. Aside from being “locally” true, as far as we know, in regions where the gravity is not very strong (like on our planet), spacetime is pretty much actually flat.

We recall how we obtained the important equation above; we made an analogy with the distance between two objects in 3D space, and noted how this distance does not change with translation and rotation; if we are using different coordinate systems, we may disagree about the coordinates of the two objects, but even then we will always agree on the distance between them. This distance is therefore “invariant”. But we live not only in a 3D space but in a 4D spacetime, and instead of an invariant distance we have an invariant spacetime interval.

But even in nonrelativistic mechanics, the distance is not the only “invariant”. We have the concept of velocity of an object. Again, if we are positioned and oriented differently in space, we may disagree about the velocity of the object, for me it may be going to the right, and forward away from me; for you it may in front of you and going straight towards you. However, we will always agree about the magnitude of this velocity, also called its speed.

The quantity we call the momentum is related to the velocity of the object; in fact for simple cases it is simply the mass of the object multiplied by the velocity. Once again, two observers may disagree about the momentum, since it involves direction; however they will always agree about the magnitude of the momentum. This magnitude is therefore also invariant.

The velocity, and by extension the momentum, has three components, one for each dimension of space. We write them as v_{x}, v_{y}, and v_{z} for the velocity and p_{x}, p_{y}, and p_{z} for the momentum.

What we want now is a 4D version of the momentum. Three of its components will be the components we already know of, p_{x}, p_{y}, and p_{z}. So we just need its “time” component, and the “magnitude” of this momentum is going to be an invariant.

It turns out that the equation we are looking for is the following (note the similarity of its form to the equation for the spacetime interval):

\displaystyle -\frac{E^{2}}{c^{2}}+p_{x}^{2}+p_{y}^{2}+p_{z}^{2}=-m^{2}c^{2}

The quantity m is the invariant we are looking for (The factors of c are just constants anyway), and it is called the “rest mass” of the object. As an effect of the unity of spacetime, the mass of an object as seen by an observer actually changes depending on its motion with respect to the observer; however, by definition, the rest mass is the mass of an object as seen by the observer when it is not moving with respect to the observer, therefore, it is an invariant. The quantity E stands for the energy.

Also, when the object is not moving with respect to us, we see no momentum in the x, y, or z direction, and the equation becomes E=mc^{2}, which is the very famous mass-energy equivalence which was published by Albert Einstein during his “miracle year” in 1905.

We now move on to quantum mechanics. In quantum mechanics our observables, such as the position, momentum, and energy, correspond to self-adjoint operators (see More Quantum Mechanics: Wavefunctions and Operators), whose eigenvalues are the values that we obtain when we perform a measurement of the observable corresponding to the operator.

The “momentum operator” (to avoid confusion between ordinary quantities and operators, we will introduce here the “hat” symbol on our operators) corresponding to the x component of the momentum is given by

\displaystyle \hat{p_{x}}=-i\hbar\frac{\partial}{\partial x}

The eigenvalue equation means that when we measure the x component of the momentum of a quantum system in the state represented by the wave function \psi(x,y,z,t), which is an eigenvector of the momentum operator, then then the measurement will yield the value p_{x}, where p_{x} is the eigenvalue correponding to \psi(x,y,z,t) (see Eigenvalues and Eigenvectors), i.e.

\displaystyle -i\hbar\frac{\partial \psi(x,y,z,t)}{\partial x}=p_{x}\psi(x,y,z,t)

Analogues exist of course for the y and z components of the momentum.

Meanwhile, we also have an energy operator given by

\displaystyle \hat{E}=i\hbar\frac{\partial}{\partial t}

To obtain a quantum version of the important equation above relating the energy, momentum, and the mass, we need to replace the relevant quantities by the corresponding operators acting on the wave function. Therefore, from

\displaystyle -\frac{E^{2}}{c^{2}}+p_{x}^{2}+p_{y}^{2}+p_{z}^{2}=-m^{2}c^{2}

we obtain an equation in terms of operators

\displaystyle -\frac{\hat{E}^{2}}{c^{2}}+\hat{p}_{x}^{2}+\hat{p}_{y}^{2}+\hat{p}_{z}^{2}=-m^{2}c^{2}

or explicitly, with the wavefunction,

\displaystyle \frac{\hbar^{2}}{c^{2}}\frac{\partial^{2}\psi}{\partial t^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial x^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial y^{2}}-\hbar^{2}\frac{\partial^{2}\psi}{\partial z^{2}}=-m^{2}c^{2}\psi.

This equation is called the Klein-Gordon equation.

The Klein-Gordon equation is a second-order differential equation. It can be “factored” in order to obtain two first-order differential equations, both of which are called the Dirac equation.

We elaborate more on what we mean by “factoring”. Suppose we have a quantity which can be written as a^{2}-b^{2}. From basic high school algebra, we know that we can “factor” it as (a+b)(a-b). Now suppose we have p_{x}=p_{y}=p_{z}=0. We can then write the Klein-Gordon equation as


which factors into





These are the kinds of equations that we want. However, the case where the momentum is nonzero complicates things. The solution of the physicist Paul Dirac was to introduce matrices (see Matrices) as coefficients. These matrices (there are four of them) are 4\times 4 matrices with complex coefficients, and are explicitly written down as follows:

\displaystyle \gamma^{0}=\left(\begin{array}{cccc}1&0&0&0\\ 0&1&0&0\\0&0&-1&0\\0&0&0&-1\end{array}\right)

\displaystyle \gamma^{1}=\left(\begin{array}{cccc}0&0&0&1\\ 0&0&1&0\\0&-1&0&0\\-1&0&0&0\end{array}\right)

\displaystyle \gamma^{2}=\left(\begin{array}{cccc}0&0&0&-i\\ 0&0&i&0\\0&i&0&0\\-i&0&0&0\end{array}\right)

\displaystyle \gamma^{3}=\left(\begin{array}{cccc}0&0&1&0\\ 0&0&0&-1\\-1&0&0&0\\0&1&0&0\end{array}\right).

Using the laws of matrix multiplication, one can verify the following properties of these matrices (usually called gamma matrices):



\gamma^{\mu}\gamma^{\nu}=-\gamma^{\mu}\gamma^{\nu} for \mu\neq\nu.

With the help of these properties, we can now factor the Klein-Gordon equation as follows:

\displaystyle \frac{\hat{E}^{2}}{c^{2}}-\hat{p}_{x}^{2}-\hat{p}_{y}^{2}-\hat{p}_{z}^{2}-m^{2}c^{2}=0

\displaystyle (\gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}+mc)(\gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}-mc)=0

\displaystyle \gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}+mc=0

\displaystyle \gamma^{0}\frac{\hat{E}}{c}-\gamma^{1}\hat{p}_{x}-\gamma^{2}\hat{p}_{y}-\gamma^{3}\hat{p}_{z}-mc=0

Both of the last two equations are known as the Dirac equation, although for purposes of convention, we usually use the last one. Writing the operators and the wave function explicitly, this is

\displaystyle i\hbar\gamma^{0}\frac{\partial\psi}{c\partial t}+i\hbar\gamma^{1}\frac{\partial\psi}{\partial x}+i\hbar\gamma^{2}\frac{\partial\psi}{\partial y}+i\hbar\gamma^{3}\frac{\partial\psi}{\partial z}-mc\psi=0

We now have the Klein-Gordon equation and the Dirac equation, both of which are important in relativistic quantum field theory. In particular, the Klein-Gordon equation is used for “scalar” fields while the Dirac equation is used for “spinor” fields. This is related to how they “transform” under rotations (which, in relativity, includes “boosts” – rotations that involve both space and time). A detailed discussion of these concepts will be left to the references for now and will perhaps be tackled in future posts.

We will, however, mention one more important (and interesting) phenomenon in relativistic quantum mechanics. The equation E=mc^{2} allows for the “creation” of particle-antiparticle pairs out of seemingly nothing! Even when there seems to be “not enough energy”, there exists an “energy-time uncertainty principle”, which allows such particle-antiparticle pairs to exist, even for only a very short time. This phenomenon of “creation” (and the related phenomenon of “annihilation”) means we cannot take the number of particles in our system to be fixed.

With this, we need to modify our language to be able to describe a system with varying numbers of particles. We will still use the language of linear algebra, but we will define our “states” differently. In earlier posts in the blog, where we only dealt with a single particle, the “state” of the particle simply gave us information about the position. In the relativistic case (and in other cases where there are varying numbers of particles – for instance, when the system “gain” or “loses” particles from the environment), the number (and kind) of particles need to be taken into account.

We will do this as follows. We first define a state with no particles, which we shall call the “vacuum”. We write it as |0\rangle. Recall that an operator is a function from state vectors to state vectors, hence, an operator acting on a state is another state. We now define a new kind of operator, called the “field” operator \psi, such that the state with a single particle of a certain type, which would have been given by  the wave function \psi in the old language, is now described by the state vector \psi|0\rangle.

Important note: The symbol \psi no longer refers to a state vector, but an operator! The state vector is \psi|0\rangle.

The Klein-Gordon and the Dirac equations still hold of course (otherwise we wouldn’t even have bothered to write them here). It is just important to take note that the symbol \psi now refers to an operator and not a state vector. We might as well write it as \hat{\psi}, but this usually not done in the literature since we will not use \psi for anything else other than to refer to the field operator. Further, if we have a state with several particles, we can write \psi\phi...\theta|0\rangle. This new language is called second quantization, which does not mean “quantize for a second time”, but rather a second version of quantization, since the first version did not have the means to deal with varying numbers of particles.

We have barely scratched the surface of relativistic quantum field theory in this post. Even though much has been made about the quest to unify quantum mechanics and general relativity, there is so much that also needs to be studied in relativistic quantum field theory, and still many questions that need to be answered. Still, relativistic quantum field theory has had many impressive successes – one striking example is the theoretical formulation of the so-called Higgs mechanism, and its experimental verification almost half a century later. The success of relativistic quantum field theory also gives us a guide on how to formulate new theories of physics in the same way that F=ma guided the development of the very theories that eventually replaced it.

The reader is encouraged to supplement what little exposition has been provided in this post by reading the references. The books are listed in increasing order of sophistication, so it is perhaps best to read them in that order too, although The Road to Reality: A Complete Guide to the Laws of Reality by Roger Penrose is a high-level popular exposition and not a textbook, so it is perhaps best read in tandem with Introduction to Elementary Particles by David J. Griffiths, which is a textbook, although it does have special relativity and basic quantum mechanics as prerequisites. One may check the references listed in the blog posts discussing these respective subjects.


Quantum Field Theory on Wikipedia

Klein-Gordon Equation on Wikipedia

Dirac Equation on Wikipedia

Second Quantization on Wikipedia

Featured Image Produced by CERN

The Road to Reality: A Complete Guide to the Laws of Reality by Roger Penrose

Introduction to Elementary Particles by David J. Griffiths

Quantum Field Theory by Fritz Mandl and Graham Shaw

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder

Lagrangians and Hamiltonians

We discussed the Lagrangian and Hamiltonian formulations of physics in My Favorite Equation in Physics, in our discussion of the historical development of classical physics right before the dawn of the revolutionary ideas of relativity and quantum mechanics at the turn of the 20th century. In this post we discuss them further, and more importantly, we provide some examples.

In order to discuss Lagrangians and Hamiltonians we first need to discuss the concept of energy. Energy is a rather abstract concept, but it can perhaps best be described as a certain conserved quantity – historically, this was how energy was thought of, and the motivation for its development under Rene Descartes and Gottfried Wilhelm Liebniz.

Consider for example, a stone at some height h above the ground. From this we can compute a quantity called the potential energy (which we will symbolize by V), which is going to be, in our case, given by

\displaystyle V=mgh

where m is the mass of the stone and g is the acceleration due to gravity, which close to the surface of the earth can be considered a constant roughly equal to 9.81 meters per second per second.

As the stone is dropped from that height, it starts to pick up speed. As it height decreases, its potential energy will also decrease. However, it will gain an increase in a certain quantity called the kinetic energy, which we will write as T and define as

\displaystyle T=\frac{1}{2}mv^{2}

where v is the magnitude of the velocity. In our case, since we are considering only motion in one dimension, this is simply given by the speed of the stone. At any point in the motion of the stone, however, the sum of the potential energy and the kinetic energy, called the total mechanical energy, stays at the same value. This is because as the amount by which the potential energy decreases is the same as the amount by which the kinetic energy decreases.

The expression for kinetic energy remains the same for any nonrelativistic system. The expression for the potential energy depends on the system, however, and is related to the force as follows:

\displaystyle F=-\frac{dV}{dx}.

We now give the definition of the quantity called the Lagrangian (denoted by L). It is simply given by

\displaystyle L=T-V.

There is a related quantity to the Lagrangian, called the action (denoted by S). It is defined as

\displaystyle S=\int_{t_{1}}^{t_{2}}L dt.

For a single particle, the Lagrangian depends on the position and the velocity of the particle. More generally, it will depend on the so-called “configuration” of the system, as well as the “rate of change” of this configuration. We will represent these variables by q and \dot{q} respectively (the “dot” notation is the one developed by Isaac Newton to represent the derivative with respect to time; in the notation of Liebniz, which we have used up to now, this is also written as \frac{dq}{dt}).

To explicitly show this dependence, we write the Lagrangian as L(q,\dot{q}). Therefore we shall write the action as follows:

\displaystyle S=\int_{t_{1}}^{t_{2}}L(q,\dot{q}) dt.

The Lagrangian formulation is important because it allows us to make a connection with Fermat’s principle in optics, which is the following statement:

Light always moves in such a way that it minimizes its time of travel.

Essentially, the Lagrangian formulation allows us to restate the good old Newton’s second law of motion as follows:

An object always moves in such a way that it minimizes its action.

In order to make calculations out of this “principle”, we have to make use of the branch of mathematics called the calculus of variations, which was specifically developed to deal with problems such as these. The calculations are fairly involved, but we will end up with the so-called Euler-Lagrange equations:

\displaystyle \frac{\partial L}{\partial q}-\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}=0

We are using the notation \frac{d}{dt}\frac{\partial L}{\partial\dot{q}} instead of the otherwise cumbersome notation \frac{d\frac{\partial L}{\partial\dot{q}}}{dt}. It is very common notation in physics to write \frac{d}{dt} to refer to the derivative “operator” (see also More Quantum Mechanics: Wavefunctions and Operators).

For a nonrelativistic system, Euler-Lagrange equations are merely a restatement of Newton’s second law; in fact we can plug in the expressions for the Lagrangian, the kinetic energy, and the potential energy we wrote down earlier and end up exactly with F=ma.

Why then, go to all the trouble of formulating this new language, just to express something that we are already familiar with? Well, aside from the “aesthetically pleasing” connection with the very elegant Fermat’s principle, there are also numerous advantages to using the Lagrangian formulation. For instance, it exposes the symmetries of the system, as well as its conserved quantities (both of which are very important in modern physics). Also, the configuration is not always simply just the position, which means that it can be used to describe systems more complicated than just a single particle. Using the concept of a Lagrangian density, it can also describe fields like the electromagnetic field.

We make a mention of the role of the Lagrangian formulation  in quantum mechanics. The probability that a system will be found in a certain state (which we write as |\phi\rangle) at time t_{2}, given that it was in a state |\psi\rangle at time t_{1}, is given by (see More Quantum Mechanics: Wavefunctions and Operators)

\displaystyle |\langle\phi|e^{-iH(t_{2}-t_{1})}|\psi\rangle|^{2}

where H is the Hamiltonian (more on this later). The quantity

\displaystyle \langle\phi|e^{-iH(t_{2}-t_{1})}|\psi\rangle

is called the transition amplitude and can be expressed in terms of the Feynman path integral

\displaystyle \int e^{iS}Dq.

This is not an ordinary integral, as may be inferred from the different notation using Dq instead of dq. What this means is that we sum the quantity inside the integral, e^{iS}, over all “paths” taken by our system. This has the rather mind blowing interpretation that in going from one point to another, a particle takes all paths. One of the best places to learn more about this concept is in the book QED: The Strange Theory of Light and Matter by Richard Feynman. This book is adapted from Feynman’s lectures at the University of Auckland, videos of which are freely and legally available online (see the references below).

We now discuss the Hamiltonian. The Hamiltonian is defined in terms of the Lagrangian L by first defining the conjugate momentum p:

\displaystyle p=\frac{\partial L}{\partial\dot{q}}.

Then the Hamiltonian H is given by the formula

\displaystyle H=p\dot{q}-L.

In contrast to the Lagrangian, which is a function of q and \dot{q}, the Hamiltonian is expressed as a function of q and p. For many basic examples the Hamiltonian is simply the total mechanical energy, with the kinetic energy T now written in terms of p instead of \dot{q} as follows:

\displaystyle T=\frac{p^{2}}{2m}.

The advantage of the Hamiltonian  formulation is that it shows how the state of the system “evolves” over time. This is given by Hamilton’s equations:

\displaystyle \dot{q}=\frac{\partial H}{\partial p}

\displaystyle \dot{p}=-\frac{\partial H}{\partial q}

These are differential equations which can be solved to know the value of q and p at any instant of time t. One can visualize this better by imagining a “phase space” whose coordinates are q and p. The state of the system is then given by a point in this phase space, and this point “moves” across the phase space according to Hamilton’s equations.

The Lagrangian and Hamiltonian formulations of classical mechanics may be easily generalized to more than one dimension. We will therefore have several different coordinates q_{i} for the configuration; for the most simple examples, these may refer to the Cartesian coordinates of 3-dimensional space, i.e. q_{1}=xq_{2}=xq_{3}=z. We summarize the important formulas here:

\displaystyle \frac{\partial L}{\partial q_{i}}-\frac{d}{dt}\frac{\partial L}{\partial\dot{q_{i}}}=0

\displaystyle H=\sum_{i}p_{i}\dot{q_{i}}-L

\displaystyle \dot{q_{i}}=\frac{\partial H}{\partial p_{i}}

\displaystyle \dot{p_{i}}=-\frac{\partial H}{\partial q_{i}}

In quantum mechanics, the Hamiltonian formulation still plays an important role. As described in More Quantum Mechanics: Wavefunctions and Operators, the Schrodinger equation describes the time evolution of the state of a quantum system in terms of the Hamiltonian. However, in quantum mechanics the Hamiltonian is not just a quantity but an operator, whose eigenvalues usually correspond to the observable values of the energy of the system.

In most modern publications discussing modern physics, the Lagrangian and Hamiltonian formulations are used, in particular for their various advantages. Although we have limited this discussion to nonrelativistic mechanics, in relativity both formulations are still very important. The equations of general relativity, also known as Einstein’s equations, may be obtained by minimizing from the Einstein-Hilbert action. Meanwhile, there also exists a Hamiltonian formulation of general relativity called the Arnowitt-Deser-Misner formalism. Even the proposed candidates for a theory of quantum gravity, string theory and loop quantum gravity, make use of these formulations (the Lagrangian formulation seems to be more dominant in string theory, while the Hamiltonian formulation is more dominant in loop quantum gravity). It is therefore vital that anyone interested in learning about modern physics be at least comfortable in the use of this language.


Lagrangian Mechanics on Wikipedia

Hamiltonian Mechanics on Wikipedia

Path Integral Formulation on Wikipedia

The Douglas Robb Memorial Lectures by Richard Feynman

QED: The Strange Theory of Light and Matter by Richard Feynman

Mechanics by Lev Landau and Evgeny Lifshitz

Classical Mechanics by Herbert Goldstein

Etale Cohomology of Fields and Galois Cohomology

In Cohomology in Algebraic Geometry we have introduced sheaf cohomology and Cech cohomology as well as the concept of etale morphisms, and the Grothendieck topology (see More Category Theory: The Grothendieck Topos) that it defines. In this post, we give one important application of these ideas, related to the ideas discussed in Galois Groups.

Let K be a field (see Rings, Fields, and Ideals). A field has only two ideals: (0) and (1), the latter of which is the unit ideal and is therefore the entire field itself as well. Its only prime ideal (which is also a maximal ideal) is (0); recall that in algebraic geometry (see Basics of Algebraic Geometry), the “points” of the mathematical object we call a scheme correspond (locally, at least) to the prime ideals of a ring R, and we refer to this set of “points” as \text{Spec }R. Therefore, for the field K, \text{Spec }K=(0), in other words, \text{Spec }K is made up of a single point.

Now we need to define sheaves on \text{Spec }K. Using ordinary concepts of topology will not be very productive, since our topological space consists only of a single point; therefore, we will not be able to obtain any interesting open covers out of this topological space. However, using the ideas in More Category Theory: The Grothendieck Topos, we can “expand” our idea of open covers. Instead of inclusions of open sets, we will instead make use of etale morphisms, as we have discussed in Cohomology in Algebraic Geometry.

Let K\rightarrow L be an etale morphism. This also means that L is an etale K-algebra (see also The Hom and Tensor Functors for the definition of algebra in our context). It is a theorem that an etale K-algebra is a direct product of finitely many separable field extensions of K (see Algebraic Numbers).

The definition of presheaf and sheaf remains the same, however the sheaf conditions can be restated in our case as the following (perhaps easier to understand) statement, which we copy verbatim from the book Etale Cohomology and the Weil Conjecture by Eberhard Freitag and Reinhardt Kiehl:

The elements s\in \mathcal{F}(B) correspond one-to-one to families of elements

s_{i}\in\mathcal{F}(B_{i}), i\in I

having the property

\text{Image }(s_{i})=\text{Image }(s_{j}), in (B_{i}\otimes_{B}B_{j})

This condition must also hold for i=j!

A separable closure \bar{K} of K is a separable field extension of K (see Cohomology in Algebraic Geometry) that is a subfield of the algebraic closure of K. The algebraic closure of K is an algebraic extension (see Algebraic Numbers) of K which is algebraically closed, i.e., it contains all the roots of polynomials with coefficients in this algebraic extension. Both the algebraic closure and the separable closure of K are unique up to isomorphism. In the case of the field of rational numbers \mathbb{Q}, the separable closure and the algebraic closure coincide and they are both equal to the field of algebraic numbers.

Given the separable closure \bar{K} of K, we define \mathcal{F}(\bar{K}) as the stalk (see Localization) of the sheaf at \bar{K}. It is also written using the language of direct limits (also called an inductive limit):

\displaystyle \mathcal{F}(\bar{K})=\varinjlim\mathcal{F}(L)

We digress slightly in order to explain what this means. The language of direct limits and inverse limits (the latter are also called projective limits) are ubiquitous in abstract algebra, algebraic geometry, and algebraic number theory, and are special cases of the notion of limits we have discussed in Even More Category Theory: The Elementary Topos.

A directed set I is an ordered set in which for every pair i,j there exists k such that i\leq k,j\leq k. A direct, resp. inverse system over I is a family \{A_{i},f_{ij}|i,j\in I,i\leq j\} of objects A_{i} and morphisms f_{ij}: A_{i}\rightarrow A_{j}, resp. f_{ij}: A_{j}\rightarrow A_{i} such that

\displaystyle f_{ii} is the identity map of A_{i}, and

\displaystyle f_{ik}=f_{jk}\circ f_{ij} resp. f_{ik}=f_{ij}\circ f_{jk}

for all i\leq j\leq k .

The direct limit of a direct system is then defined as the quotient

\displaystyle \varinjlim_{i\in I} A_{i}=\coprod_{i\in I} A_{i}/\sim

where two elements x_{i}\in A_{i} and x_{j}\in A_{j} are considered equivalent, x_{i}\sim x_{j} if there exists k such that f_{ik}(x_{i})=f_{jk}(x_{j}).

Meanwhile, the inverse limit of an inverse system is the subset

\displaystyle \varprojlim_{i\in I} A_{i}=\{(x_{i})_{i\in I}\in \prod_{i\in I}A_{i}|f_{ij}(x_{j})=x_{i}\text{ for }i\leq j \}

of the product \displaystyle \prod_{i\in I}A_{i}.

The classical definition of stalk, for a sheaf \mathcal{F} can then also be expressed as the direct limit of the direct system given by the sets (or abelian groups, or modules, etc.) \mathcal{F}(U) and the restriction maps \rho_{UV}: \mathcal{F(U)}\rightarrow \mathcal{F}(V) for open sets V\subseteq U. In our case, of course, instead of inclusion maps V\subseteq U we instead have more general maps induced by etale morphisms.

An example of an etale sheaf over \text{Spec }K is given by the following: Let

\displaystyle \mathcal{G}_{m}(B)=B^{*} where B^{*} is the multiplicative group of the etale K-algebra B.

In this case we have \mathcal{F}(\bar{K})=\bar{K}^{*}, the multiplicative group of the separable closure \bar{K} of K. We note that the multiplicative group of a field F is just the group F-\{0\}, with the law of composition given by multiplication.

In order to make contact with the theory of Galois groups, we now define the concept of G-modules, where G is a group. A left G-module is given by an abelian group M and a map \rho: G\times M\rightarrow M such that

\displaystyle \rho(e,a)=x,

\displaystyle \rho (gh,a)=\rho(g,\rho(h,a)),


\displaystyle \rho(g,(ab))=\rho(g,a)\rho(g,b).

Instead of \rho(g,a) we usually just write g\cdot a. A right G-module may be similarly defined, and may be obtained from a left G-module by defining a\cdot g=g^{-1}\cdot a.

The abelian group \mathcal{F}(\bar{K}) has the structure of a G-module, where G is the Galois group \text{Gal}(\bar{K}/K) (also written as G(\bar{K}/K)), the group of field automorphisms of \bar{K} that keep K fixed.

We see now that there is a connection between Galois theory and etale sheaves over a field. More generally, there is a connection between the Etale cohomology of a field and “Galois cohomology“, an important part of algebraic number theory that we now define. Galois cohomology is the derived functor (see More on Chain Complexes and The Hom and Tensor Functors) of the fixed module functor.

First we construct the standard resolution of the the Galois module (a G-module where G is the Galois group of some field extension) A. It is given by X^{n}(G,A), the abelian group of all functions from the direct product G^{n+1} to A, and the coboundary map

\displaystyle \partial^{n}: X^{n-1}\rightarrow X^{n}

given by

\displaystyle \partial^{n}x(\sigma_{0},...,\sigma{n})=\sum_{i=0}^{n}(-1)^{i}x(\sigma_{0},...,\hat{\sigma_{i}},...,\sigma_{n})

where \hat{\sigma_{i}} signifies that \sigma_{i} is to be omitted.

We now apply the fixed module functor to obtain the cochain complex

\displaystyle C^{n}(G,A)=X^{n}(G,A)^{G}.

The elements of C^{n}(G,A) are the functions x: G^{n+1}\rightarrow A such that

\displaystyle x(\sigma\sigma_{0},...,\sigma\sigma_{n})=\sigma x(\sigma_{0},...,\sigma_{n})

for all \sigma\in G.

The Galois cohomology groups H^{n}(G,A) are then obtained by taking the cohomology of this cochain complex, i.e.

\displaystyle H^{n}(G,A)=\text{Ker }\partial^{n+1}/\text{Im }\partial^{n}

Note: We have adopted here the notation of the book Cohomology of Number Fields by Jurgen Neukirch, Alexander Schmidt, and Kay Wingberg. Some references use a different notation; for instance X_{n} may be defined as the abelian group of functions from G^{n} to A instead of from G^{n+1} to A. This results in different notation for the cochain complexes and their boundary operators; however, the Galois cohomology groups themselves will remain the same.

It is a basic result of Galois cohomology that H^{0}(G,A) gives A^{G}, the subset of A such that \sigma\cdot a=a for all \sigma\in G. In other words, A^{G} is the subset of A that is fixed by G.

We have the following connection between Etale cohomology for fields and Galois cohomology:

\displaystyle H^{n}(K,\mathcal{F})=H^{n}(G,\mathcal{F}(\bar{K}))

We now mention some other basic results of the theory. In analogy with sheaf cohomology, the group H^{0}(K,\mathcal{F}) is just the set of “global sections” \Gamma(K,\mathcal{F})=\mathcal{F}(K) of \mathcal{F}. Letting \mathcal{F}=\mathcal{G}_{m} which we have defined earlier, we have

\displaystyle H^{0}(K,\mathcal{G}_{m})=\mathcal{G}_{m}(K)=K^{*}

In the language of Galois cohomology,

\displaystyle H^{0}(G,\mathcal{G}_{m}(\bar{K}))=(\bar{K}^{*})^{G}=K^{*}

Meanwhile, for H^{1}, we have the following result, called Hilbert’s Theorem 90:

\displaystyle H^{1}(K,\mathcal{G}_{m})=H^{1}(G,\bar{K}^{*})=\{1\}.

The group H^{2}(K,\mathcal{G}_{m})=H^{2}(G,\bar{K}^{*}) is called the Brauer group and also plays an important part in algebraic number theory. The etale cohomology of fields, or equivalently, Galois cohomology, are the topic of famous problems in modern mathematics such as the Milnor conjecture and its generalization, the Bloch-Kato conjecture, which was solved by Vladimir Voevodsky in 2009. They also play an important part in the etale cohomology of more general rings.


Etale Cohomology on Wikipedia

Stalk on Wikipedia

Direct Limit on Wikipedia

Inverse Limit on Wikipedia

Hilbert’s Theorem 90 on Wikipedia

Group Cohomology on Wikipedia

Galois Cohomology on Wikipedia

Milnor Conjecture on Wikipedia

Norm Residue Isomorphism Theorem

Etale Cohomology and the Weil Conjecture by Eberhard Freitag and Reinhardt Kiehl

Cohomology of Number Fields by Jurgen Neukirch, Alexander Schmidt, and Kay Wingberg

Cohomology in Algebraic Geometry

In Homology and Cohomology we discussed cohomology as used to study topological spaces. In this post we study cohomology in the context of algebraic geometry. We will need the concepts we have discussed in PresheavesSheavesMore on Chain Complexes, and The Hom and Tensor Functors.

Sheaf cohomology is simply the derived functor (see More on Chain Complexes and The Hom and Tensor Functors) of the global section functor \Gamma (X,-), which assigns to a sheaf \mathcal{F} its set of global sections \Gamma (X,\mathcal{F})=\mathcal{F}(X).

One thing to note here, in constructing our resolutions, is that we are dealing with sheaves of modules, instead of just modules. The morphisms of sheaves of modules on a topological space X are defined as homomorphisms of modules \mathcal{F}(U)\rightarrow \mathcal{G}(U) for every open set U of X.

Since our definition is quite abstract, we also discuss here Cech cohomology, which is more concrete. Let X be a topological space, and let \mathfrak{U}=(U_{i})_{i\in I} be an open covering of X. We let

\displaystyle C^{p}(\mathfrak{U},\mathcal{F})=\prod_{i_{0}<...<i_{p}}\mathcal{F}(U_{i_{0},...,i_{p}})


\displaystyle \mathcal{F}(U_{i_{0},...,i_{p}})=U_{i_{0}}\cap ...\cap U_{i_{p}}

The coboundary maps d^{p}:C^{p}\rightarrow C^{p+1} are given by

\displaystyle (d^{p}\alpha)_{i_{0},...,i_{p+1}}=\sum_{k=0}^{p+1}(-1)^{k}\alpha_{i_{0},...,\hat{i_{k}},...,i_{p+1}}|_{U_{i_{0},...,i_{p}}}

where \hat{i_{k}} means that the index i_{k} is to be omitted. The Cech cohomology is then given by the cohomology of this complex.

The Cech cohomology is equivalent to the sheaf cohomology, if the sheaf is quasi-coherent (see More on Sheaves). The injective resolution of the sheaf \mathcal{F} is given by a “sheafified” version of the chain complex we constructed earlier. For an open subset V of X, let f:V\rightarrow X be the inclusion map. We define

\displaystyle \mathcal{C}^{p}(\mathfrak{U},\mathcal{F})=\prod_{i_{0}<...<i_{p}}f_{*}(\mathcal{F}|_{U_{i_{0},...,i_{p}}})

with the same definition for the coboundary map as earlier. Then


from which it can be seen that the Cech cohomology is indeed  the derived functor of the global section functor \Gamma (X,-).

Up to now, for our topological spaces we have always used the Zariski topology. We now introduce another kind of topology called the Etale topology. The Etale topology is not a topology in the sense of Basics of Topology and Continuous Functions, but a Grothendieck topology, which we have discussed in More Category Theory: The Grothendieck Topos. Our underlying category will be written \text{Et}/X, and its objects will be etale morphisms (to be explained later) h:U\rightarrow X, and its morphisms will be etale morphisms f:U\rightarrow U' such that if g:U'\rightarrow X then f\circ g=h.

An etale morphism is a morphism of schemes that is both flat (see The Hom and Tensor Functors) and unramified. A morphism f:Y\rightarrow X is said to be unramified if for all points y of Y the morphism \mathcal{O}_{X,f(y)}\rightarrow\mathcal{O}_{Y,y} of local rings (see Localization) has the property that \mathfrak{m}_{Y,y}=\mathfrak{m}_{X,f(y)}\cdot \mathcal{O}_{Y,y} and the residue field \mathcal{O}_{Y,y}/\mathfrak{m}_{Y,y} is a finite separable field extension of \mathcal{O}_{X,f(y)}/\mathfrak{m}_{X,f(y)}.

The concept of field extensions was discussed in Algebraic Numbers. We have explored in that same post how field extensions F\subset K may be “generated” by the roots of polynomials with coefficients in F. The field extension is called separable if the aforementioned polynomial (called the minimal polynomial) has distinct roots.

An unramified morphism f:Y\rightarrow X is also required to be locally of finite type, which means that for every open subset \text{Spec }A (we recall that in Localization we have updated our definition of schemes to mean something that “locally” looks like our “old” definition of schemes – the mathematical objects referred to by this “old” definition will henceforth be referred to as affine schemes) of X and every open subset \text{Spec }B of f^{-1}(\text{Spec }A) the induced morphism A\rightarrow B makes B into a “finitely generated” A-algebra.

Using the etale topology to define the sheaves to be used in cohomology results in etale cohomology, the original driving force for the development of the concept of the Grothendieck topos. Hopefully we will be able to flesh out more of this interesting theory in future posts.


Sheaf Cohomology on Wikipedia

Cech Cohomology on Wikipedia

Etale Morphism on Wikipedia

Etale Cohomology on Wikipedia

Algebraic Geometry by Andreas Gathmann

The Rising Sea: Foundations of Algebraic Geometry by Ravi Vakil

Lectures on Etale Cohomology by J.S. Milne

Algebraic Geometry by Robin Hartshorne

More Quantum Mechanics: Wavefunctions and Operators

In Some Basics of Quantum Mechanics, we explained the role of vector spaces (which we first discussed in Vector Spaces, Modules, and Linear Algebra) in quantum mechanics. Linear transformations, which are functions between vector spaces, would naturally be expected to also play an important role in quantum mechanics. In particular, we would like to focus on the linear transformations from a vector space to itself. In this context, they are also referred to as linear operators.

But first, we explore a little bit more the role of infinite-dimensional vector spaces in quantum mechanics. In Some Basics of Quantum Mechanics, we limited our discussion to “two-state” systems, which are also referred to as “qubits”. We can imagine a system with more “classical states”. For example, consider a row of seats in a movie theater. One can sit in the leftmost chair, the second chair from the left, the third chair from the left, and so on. But if it was a quantum system, one can sit in all chairs simultaneously, at least until one is “measured”, in which case one will be found sitting in one seat only, and the probability of being found in a certain seat is the “absolute square” of the probability amplitude, which is the coefficient of the component of the “state vector” corresponding to that seat.

The number of “classical states” of the system previously discussed is the number of chairs in the row. But if we consider, for example, just “space”, and a system composed of a single particle in this space, and whose classical state is specified by the position of the particle, the number of states of the system is infinite, even if we only consider one dimension. It can be here, there, a meter from here, 0.1 meters from here, and so on. Even if the particle is constrained on, say, a one meter interval, there is still an infinite number of positions it could be in, since there are an infinite number of numbers between 0 to 1. Hence the need for infinite-dimensional vector spaces.

As we have explained in Eigenvalues and Eigenvectors, sets of functions can provide us with an example of an infinite-dimensional vector space. We elaborate more on why functions would do well to describe a quantum system like the one we have described above. Let’s say for example that the particle is constrained to be on an interval from 0 to 1. For every point on the interval, there is a corresponding value of the probability amplitude. This is exactly the definition of a function from the interval [0,1] to the set of complex numbers \mathbb{C}. We would also have to normalize later on, although the definition of normalization for infinite-dimensional vector spaces is kind of different, involving the notion of an integral (see An Intuitive Introduction to Calculus). For that matter, the square of the probability amplitude is not the probability, but the probability density.

The function that we have described is called the wave function. It is also a vector, an element of an infinite-dimensional vector space. It is most often written using the symbol \psi(x), reflecting its nature as a function. However, since it is also a vector, we can also still use Dirac notation and write it as |\psi\rangle. The wave function is responsible for the so-called wave-particle duality of quantum mechanics, as demonstrated in the famous double-slit experiment.

We have noted in My Favorite Equation in Physics that in classical mechanics the state of a one-particle system is given by the position and momentum of that particle, whlie in quantum mechanics the wave function is enough. How can this be, since the wave function only contains information about the position? Well, actually the wave function also contains information about the momentum – this is because of the so-called de Broglie relations, which relates the momentum of a particle in quantum mechanics to its wavelength as a wave.

Actually, the wave function is a function, and does not always have to look like what we normally think of as a wave. But whatever the shape of the wave function, even if it does not look like a wave, it is always a combination of different waves. This statement is part of the branch of mathematics called Fourier analysis. The wavelengths of the different waves are related to the momentum of the corresponding particle, and we should note that like the position, they are also in quantum superposition.

There is one thing to note about this. Suppose our wave function is really a wave (technically we mean a sinusoidal wave). This wave gives us information about where we are likely to find the particle if we make a measurement – it is near the “peaks” and the “troughs” of the wave. But there are many “peaks” and “troughs” in the wave, and so it is difficult to determine where the particle will be when we measure it. On the other hand, since the wave function is composed of only one wave, we can easily determine what the momentum is.

We can also put several different waves together, resulting in a function that is “peaked” only at one place. This means there is only one place where the particle is most likely to be. But since we have combined different waves together, there will not be a single wavelength, hence, the momentum cannot be determined easily! To summarize, if we know more about the position, we know less about the momentum – and if we know more about the momentum, we know less about the position. This observation leads to the very famous Heisenberg uncertainty principle.

The many technicalities of the wave function we leave to the references for now, and proceed to the role of linear transformations, or linear operators, in quantum mechanics. We have already encountered one special role of certain kinds of linear transformations in Eigenvalues and Eigenvectors. Observables are represented by self-adjoint operators. A self-adjoint operator A is a linear operator that satisfies the condition

\displaystyle \langle Au|v\rangle=\langle u|Av\rangle.

for a vector |v\rangle and linear functional \langle u| corresponding to the vector |u\rangle. The notation |Av\rangle refers to the image of the vector |v\rangle under the linear transformation A, while \langle Au| refers to the linear functional corresponding to the vector |Au\rangle, which is the image of |u\rangle under A. The role of linear functionals in quantum mechanics was discussed in Some Basics of Quantum Mechanics.

There is, for example, an operator corresponding to the position, another corresponding to the momentum, another corresponding to the energy, and so on. If we measure any of these observables for a certain quantum system in the state |\psi\rangle, we are certain to obtain one of the eigenvalues of that observable, with the probability of obtaining the eigenvalue \lambda_{n} given by

\displaystyle |\langle \psi_{n}|\psi\rangle|^{2}

where \langle \psi_{n}| is the linear functional corresponding to the vector |\psi_{n}\rangle , which is the eigenvector corresponding to the eigenvalue \lambda_{n}. For systems like our particle in space, whose states form an infinite-dimensional vector space, the quantity above gives the probability density instead of the probability. After measurement, the state of the system “collapses” to the state given by the vector |\psi_{n}\rangle.

Another very important kind of linear operator in quantum mechanics is a unitary operator. Unitary operators are kind of like the orthogonal matrices that represent rotations (see Rotating and Reflecting Vectors Using Matrices); in fact an orthogonal matrix is a special kind of unitary operator. We note that the orthogonal matrices had the special property that they preserved the “magnitude” of vectors; unitary operators are the same, except that they are more general, since the coefficients of vectors (the scalars) in this context are complex.

More technically, a unitary operator is a linear operator U that satisfies the following condition:

\displaystyle \langle u|v\rangle=\langle Uu|Uv\rangle

with the same conventions as earlier. What this means is that the probability of finding the system in the state given by the vector |u\rangle after measurement, given that it was in the state |v\rangle before measurement remains the same if we rotate the system – or perform other “operations” represented by unitary operators such as letting time pass (time evolution), or “translating” the system to a different location.

So now we know that in quantum mechanics observables correspond to self-adjoint operators, and the “operations” of rotation, translation, and time evolution correspond to unitary operators. We might as well give a passing mention to one of the most beautiful laws of physics, Noether’s theorem, which states that the familiar “conservation laws” of physics (conservation of linear momentum, conservation of angular momentum, and conservation of energy) arise because the laws of physics do not change with translation, rotation, or time evolution. So Noether’s theorem in some way connects some of our “observables” and our “operations”.

We now revisit one of the “guiding questions” of physics, which we stated in My Favorite Equation in Physics:

“Given the state of a system at a particular time, in what state will it be at some other time?”

For classical mechanics, we can obtain the answer by solving the differential equation F=ma (Newton’s second law of motion). In quantum mechanics, we have instead the Schrodinger equation, which is the “F=ma” of the quantum realm. The Schrodinger equation can be written in the form

\displaystyle i\hbar\frac{d}{dt}|\psi(t)\rangle=H|\psi(t)\rangle

where i=\sqrt{-1} as usual, \hbar is a constant called the reduced Planck’s constant (its value is around 1.054571800\times 10^{-34} Joule-seconds), and H is a linear operator called the Hamiltonian. The Hamiltonian is a self-adjoint operator and in many cases corresponds to the energy observable. In the case where the Hamiltonian is time-independent, this differential equation can be solved directly to obtain the equation

\displaystyle |\psi(t)\rangle=e^{-\frac{i}{\hbar}Ht}|\psi(0)\rangle.

Since H is a linear operator, e^{-\frac{i}{\hbar}Ht} is also a linear operator (actually a unitary operator) and is the explicit form of the time evolution operator. For a Hamiltonian with time-dependence, one must use other methods to obtain the time evolution operator, such as making use of the so-called interaction picture or Dirac picture. But in any case, it is the Schrodinger equation, and the time evolution operator we can obtain from it, that provides us with the answer to the “guiding question” we asked above.


Wave Function on Wikipedia

Matter Wave on Wikipedia

Uncertainty Principle on Wikipedia

Self-Adjoint Operator on Wikipedia

Unitary Operator on Wikipedia

Noether’s Theorem on Wikipedia

Schrodinger Equation on Wikipedia

Introduction to Quantum Mechanics by David J. Griffiths

Modern Quantum Mechanics by Jun John Sakurai

Quantum Mechanics by Eugen Merzbacher

Eigenvalues and Eigenvectors

Given a vector (see Vector Spaces, Modules, and Linear Algebra), we have seen that one of the things we can do to it is to “scale” it (in fact, it is one of the defining properties of a vector). We can also use a matrix (see Matrices) to scale vectors. Consider, for example, the matrix

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right).

Applying this matrix to any vector “doubles” the magnitude of the vector:

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}1\\ 0\end{array}\right)=\left(\begin{array}{c}2\\ 0\end{array}\right)=2\left(\begin{array}{c}1\\ 0\end{array}\right)

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}0\\ 5\end{array}\right)=\left(\begin{array}{c}0\\ 10\end{array}\right)=2\left(\begin{array}{c}0\\ 5\end{array}\right)

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}-2\\ 3\end{array}\right)=\left(\begin{array}{c}-4\\ 6\end{array}\right)=2\left(\begin{array}{c}-2\\ 3\end{array}\right)

This is applicable to any vector except, of course, the zero vector, which cannot be scaled and is therefore excluded in our discussion in this post.

The interesting case, however, is when the matrix “scales” only a few special vectors. Consider for example, the matrix

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right).

Applying it to the vector

\displaystyle \left(\begin{array}{c}1\\ 0\end{array}\right)

gives us

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}1\\ 0\end{array}\right)=\left(\begin{array}{c}2\\ 1\end{array}\right).

This is, of course, not an example of “scaling”. However, for the vector

\displaystyle \left(\begin{array}{c}1\\ 1\end{array}\right)

we get

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}1\\ 1\end{array}\right)=\left(\begin{array}{c}3\\ 3\end{array}\right).

This is a scaling, since

\left(\begin{array}{c}3\\ 3\end{array}\right)=3\left(\begin{array}{c}1\\ 1\end{array}\right).

The same holds true for the vector

\displaystyle \left(\begin{array}{c}-1\\ 1\end{array}\right)

from which we obtain

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}-1\\ 1\end{array}\right)=\left(\begin{array}{c}-1\\ 1\end{array}\right)

which is also a “scaling” by a factor of 1. Finally, this also holds true for scalar multiples of the two vectors we have enumerated. These vectors, the only “special” ones that are scaled by our linear transformation (represented by our matrix), are called the eigenvectors of the linear transformation, and the factors by which they are scaled are called the eigenvalues of the eigenvectors.

So far we have focused on finite-dimensional vector spaces, which give us a lot of convenience; for instance, we can express finite-dimensional vectors as column matrices. But there are also infinite-dimensional vector spaces; recall that the conditions for a set to be a vector space are that its elements can be added or subtracted, and scaled. An example of an infinite-dimensional vector space is the set of all continuous real-valued functions of the real numbers (with the real numbers serving as the field of scalars).

Given two continuous real-valued functions of the real numbers f and g, the functions f+g and f-g are also continuous real-valued functions of the real numbers, and the same is true for af, for any real number a. Thus we can see that the set of continuous real-valued functions of the real numbers form a vector space.

Matrices are not usually used to express linear transformations when it comes to infinite-dimensional vector spaces, but we still retain the concept of eigenvalues and eigenvectors. Note that a linear transformation is a function f from a vector space to another (possibly itself) which satisfies the conditions f(u+v)=f(u)+f(v) and f(av)=af(v).

Since our vector spaces in the infinite-dimensional case may be composed of functions, we may think of linear transformations as “functions from functions to functions” that satisfy the conditions earlier stated.

Consider the “operation” of taking the derivative (see An Intuitive Introduction to Calculus). The rules of calculus concerning derivatives (which can be derived from the basic definition of the derivative) state that we must we have

\displaystyle \frac{d(f+g)}{dx}=\frac{df}{dx}+\frac{dg}{dx}


\displaystyle \frac{d(af)}{dx}=a\frac{df}{dx}

where a is a constant. This holds true for “higher-order” derivatives as well. This means that the “derivative operator” \frac{d}{dx} is an example of a linear transformation from an infinite-dimensional vector space to another (note that the functions that comprise our vector space must be “differentiable”, and that the derivatives of our functions must possess the same defining properties we required for our vector space).

We now show an example of eigenvalues and eigenvectors in the context of infinite-dimensional vector spaces. Let our linear transformation be

\displaystyle \frac{d^{2}}{dx^{2}}

which stands for the “operation” of taking the second derivative with respect to x. We state again some of the rules of calculus pertaining to the derivatives of trigonometric functions (once again, they can be derived from the basic definitions, which is a fruitful exercise, or they can be looked up in tables):

\displaystyle \frac{d(\text{sin}(x))}{dx}=\text{cos}(x)

\displaystyle \frac{d(\text{cos}(x))}{dx}=-\text{sin}(x)

which means that

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=\frac{d(\frac{d(\text{sin}(x))}{dx})}{dx}

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=\frac{d(\text{cos}(x))}{dx}

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=-\text{sin}(x)

we can see now that the function \text{sin}(x) is an eigenvector of the linear transformation \frac{d^{2}}{dx^{2}}, with eigenvalue equal to -1.

Eigenvalues and eigenvectors play many important roles in linear algebra (and its infinite-dimensional version, which is called functional analysis). We will mention here something we have left off of our discussion in Some Basics of Quantum Mechanics. In quantum mechanics, “observables”, like the position, momentum, or energy of a system, correspond to certain kinds of linear transformations whose eigenvalues are real numbers (note that our field of scalars in quantum mechanics is the field of complex numbers \mathbb{C}. These eigenvalues correspond to the only values that we can obtain after measurement; we cannot measure values that are not eigenvalues.


Eigenvalues and Eigenvectors on Wikipedia

Observable on Wikipedia

Linear Algebra Done Right by Sheldon Axler

Algebra by Michael Artin

Calculus by James Stewart

Introductory Functional Analysis with Applications by Erwin Kreyszig

Introduction to Quantum Mechanics by David J. Griffiths

“The Most Important Function in Mathematics”

The title is in quotation marks because it comes from the book Real and Complex Analysis by Walter Rudin. One should always be cautious around superlative statements of course, not just in mathematics but also in life, but in this case I think the wording is not without good reason. Rudin is referring to  the function

\displaystyle e^{x}

which may be thought of as the constant

\displaystyle e=2.71828182845904523536...

raised to the power of the argument x. However, there is an even better definition, which is the one Rudin gives in the book:

\displaystyle e^{x}=1+x+\frac{x^{2}}{2}+\frac{x^{3}}{6}+\frac{x^{4}}{24}+\frac{x^{5}}{120}+...

written in more compact notation, this is

\displaystyle e^{x}=\sum_{n=0}^{\infty}\frac{x^{n}}{n!}

It can be shown that these two notions coincide. To emphasize its role as a function, e^{x} is also often written as \text{exp}(x). It is also known as the exponential function.

We now explore some properties of e^{x}. Let us start off with the case where x is a real number. The function e^{x} can be used to express any other function of the form


where a is some nonzero real constant not necessarily equal to e. By letting

y=x\text{ ln }a

where \text{ln a} is the logarithm of a to base e (also known as the natural logarithm of a), we will then have


In other words, any function of the form a^{x} where a is some nonzero real constant can be expressed in the form e^{y} simply by “rescaling” the argument by a constant.

For x a real number, the function e^{x}, which, as we have seen encompasses the other cases a^{x} where a is any nonzero real constant, is often used to express “growth” and “decay”. For instance, if we have a population A which doubles every year, then after x years we will have a population of 2^{x}A, which, using what we have discussed earlier, can also be expressed as Ae^{x\text{ ln }2}. If the population gets cut into half instead of doubling every year, we would then write it as Ae^{-x\text{ ln 2}}.

But the truly amazing stuff happens when the argument of the exponential function is a complex number. Let us start with the case where it is purely imaginary. In this case we have the following very important equation known as Euler’s formula:

e^{ix}=\text{cos }x+i\text{ sin }x

which for x=\pi gives the result



The second equation, known as Euler’s identity, is often referred to by many as the most beautiful equation in mathematics, as it displays five of the most important mathematical constants in one equation: e, \pi, i, 0, and 1.

For more general complex numbers with nonzero real and imaginary parts, we can use the rule for exponents

e^{x+iy}=e^{x}e^{iy}=e^{x}(\text{cos }y+i\text{ sin }y)

and treat them separately. The sine and cosine functions, aside from their original significance in trigonometry, are also used to represent oscillatory or periodic behavior. They are therefore useful in analyzing waves, which are a very important subject in both science and engineering. Equations such as the one above, combining growth and decay and oscillations, are used, for example, in designing shock absorbers in vehicles, which consist of a spring (which “fights back” the movement and oscillates) and a damper (which makes the movement “decay”).

There are certain technicalities regarding “multi-valuedness” that one must be wary of when dealing with complex arguments, but we will not discuss them for the time being (references are provided at the end of the post). Instead we will discuss a couple more properties of the exponential function.

First, we have the following expression for the exponential function as a product:

\displaystyle e^{x}\approx \bigg(1+\frac{x}{n}\bigg)^{n} where n is very big (which also means that \frac{x}{n} is very small)

Using the language of limits in calculus, we can actually write

\displaystyle e^{x}=\lim_{n\to\infty} \bigg(1+\frac{x}{n}\bigg)^{n} where n is very big (which also means that \frac{x}{n} is very small)

Historically, this is the motivation for the development of the exponential function, and the constant e. Suppose that somewhere there is this greedy loan shark who loans someone an amount of one million dollars, at a rate of 100% interest per year. So our loan shark finds out that he can make more money by “compounding” the interest at the middle of the year, computing for 50% interest and adding it to the money owed to him, and then computing 50% of that amount again at the end of the year (reasoning that “technically” it is still 100% interest per year). So instead of one million dollars, he would be owed 1.5 million dollars by the second half of the year, and by the end of the year he would be owed 1.5 million dollars plus half of that, which is 0.75 million dollars, which makes for a total of 2.25 million dollars, much bigger than the 2 million he would have been owed without “compounding.”

So the greedy loan shark computes further and discovers that he can make even more money by compounding further; perhaps he can compound every quarter, computing for 25% interest after the first three months, adding it to the amount owed to him, after three months he again computes for 25% interest, and so on. He could make even compound a hundred times in the year, with 1% added every time he compounds the interest. So in his infinite greediness, let’s say our loan shark compounds an infinite number of times, in infinitely small amounts. What would be the amount owed to him at the end of the year?

It turns out, no matter how many times he compounds the interest, the money owed to him will never be greater than $2,718,281.83, or roughly “e” million dollars, although it will approach that amount if he compounds it enough times. This quantity can be computed using the techniques of calculus, and in equation form it is exactly the expression for e^{x} as a product that we have written above.

Finally, we give an important property of e^{x} once again related to calculus, in particular to differential equations. We have discussed the notion of a derivative in An Intuitive Introduction to Calculus. An important property of the exponential function e^{x} is that its derivative is equal to itself. In other words, if f(x)=e^{x}, then


This property plays a very important part in the study of differential equations. As we have seen in My Favorite Equation in Physics, differential equations permeate even the most basic aspects of science and engineering. The special property of the exponential function related to differential equations means that it appears in many laws of physics (as well as other “laws” unrelated to physics), and therefore its study is important to understanding these subjects as well.


Exponential Function on Wikipedia

Exponentiation on Wikipedia

Euler’s Formula on Wikipedia

Euler’s Identity on Wikipedia

Introduction to Analysis of the Infinite by Leonhard Euler (translated by Ian Bruce)

Calculus by James Stewart

Real and Complex Analysis by Walter Rudin


In Presheaves we have compared functions on a topological space (as an example we considered the complex plane \mathbb{C} with the Zariski topology) and the functions on open subsets of this space (which in our example would be the complex plane \mathbb{C} with a finite number of points removed).

In this post we take on this topic again, with an emphasis on the functions which can be expressed in terms of polynomials; in Presheaves we saw that on the entire complex plane we could not admit \frac{1}{x} as a function (we will refer to these functions defined on a space as regular functions on the space) on the complex plane \mathbb{C} as it was undefined at the point x=0. It can, however, be admitted as a (regular) function on the open subset \mathbb{C}-\{0\}. We will restrict our topological spaces to the case of varieties (see Basics of Algebraic Geometry).

Note that if we are considering the entire complex plane, the regular functions are only those whose denominators are constants. But on the open subset \mathbb{C}-\{0\}, we may have polynomials in the denominators as long as their zeroes are not in the open subset, in this case 0, which is not in \mathbb{C}-\{0\}. If we take an other open subset, one that is itself a subset of \mathbb{C}-\{0\}, such as \mathbb{C}-\{0,1\}, we can admit even more regular functions on this open subset.

The difference between the properties of a topological space and an open subset of such a space is related to the difference between “local” properties and “global” properties. “Local” means it holds on a smaller part of the space, while “global” means it holds on the entire space. For example, “locally”, the Earth appears flat. Of course, “globally”, we know that the Earth is round. However, ideally we should be able to “patch together” local information to obtain global information. This is what the concept of sheaves (see Sheaves) are for.

We may think about what we will see if we only “look at” a single point, for example, in \mathbb{C}, we may only look at 0. We can look at the set of all ratios of polynomials that are always defined at 0, which means that the polynomial in the denominator is not allowed to have a zero at 0. However, there are many functions that we can have – for example \frac{1}{x-1}, \frac{1}{(x-1)^{2}}, \frac{1}{(x-1)(x-2)}, and so many others aside from those that are already regular on all of \mathbb{C}. The set of all these functions, which form a ring, is called the local ring at 0. The local ring at any point P of a variety X is written \mathcal{O}_{X,P}. Taking the local ring at P is an example of the process of localization.

A single point is not an open subset in our topology, so this does not fit into our definition of a sheaf or a presheaf. Instead, we say that the local ring at a point is the stalk of the sheaf of regular functions at that point. More technically, the stalk of a sheaf (or presheaf) is the set of equivalence classes (see Modular Arithmetic and Quotient Sets) of pairs (U,\varphi), under the equivalence relation (U,\varphi)\sim(U',\varphi') if there exists an open subset V in the intersection U\cap U' for which \varphi |_{V}=\varphi'  |_{V}. The elements of the stalk are called the germs of the sheaf (or presheaf).

An important property of a local ring at a point P is that it has only one maximal ideal (see More on Ideals), which is made up of the polynomial functions that vanish at P. This maximal ideal we will write as \mathfrak{m}_{X,P}. The quotient (again see Modular Arithmetic and Quotient Sets) \mathcal{O}_{X,P}/\mathfrak{m}_{X,P} is called the residue field.

We recall the Hilbert Nullensatz and the definition of varieties and schemes in Basics of Algebraic Geometry. There we established a correspondence between the points of a variety (resp. scheme) and the maximal ideals (resp. prime ideals) of its “ring of functions”. We can use the ideas discussed here concerning locality, via the concept of presheaves and sheaves, to construct more general varieties and schemes.

One of the great things about algebraic geometry is that it is kind of a “synthesis” of ideas from both abstract algebra and geometry, and ideas can be exchanged between both. For example, we have already mentioned in Basics of Algebraic Geometry that we can start with a ring R and look at the set of its maximal (resp. prime) ideals as forming a space. If we look at the set of its prime ideals (usually also referred to as its spectrum, and denoted \text{Spec } R – again we note that the word spectrum has many meanings in mathematics) then we have a scheme. This ring R may not even be a ring of polynomials – we may even consider the ring of integers \mathbb{Z}, and do algebraic “geometry” on the space \text{Spec }\mathbb{Z}!

We can also extract the idea of only looking at local information, an idea which has geometric origins, and apply it to abstract algebra. We can then define local rings completely algebraically, without reference to geometric ideas, as a ring with a unique maximal ideal.

A local ring which is also a principal ideal domain (a ring in which every ideal is a principal ideal, again see More on Ideals) and is not a field is called a discrete valuation ring. Discrete valuation rings are localizations of Dedekind domains, which are important in number theory, as we have discussed in Algebraic Numbers; for instance, in Dedekind domains, even though elements may not factor uniquely into irreducibles, ideals will always factor uniquely into prime ideals.

For the ring of integers \mathbb{Z}, an example of a local ring is given by the ring of fractions whose denominator is an integer not divisible by a certain prime number p. We denote this local ring by \mathbb{Z}_{(p)}. For p=2, \mathbb{Z}_{(2)} is composed of all fractions whose denominator is an odd number. The unique maximal ideal of this ring is given by the fractions whose numerator is an even number. Since \mathbb{Z} is a Dedekind domain, \mathbb{Z}_{(p)} is also a discrete valuation ring. We refer to the local ring \mathbb{Z}_{(p)} as the localization of \mathbb{Z} at the point (prime ideal) (p).

We started with the idea of “local” and “global” in geometry, in particular algebraic geometry, and ended up with ideas important to number theory. This is once more an example of how the exchange of ideas between different branches of mathematics leads to much fruitful development of each branch and of mathematics as a whole.


Localization on Wikipedia

Localization of a Ring on Wikipedia

Local Ring on Wikipedia

Stalk on Wikipedia

Algebraic Geometry by Andreas Gathmann

Algebraic Geometry by J.S. Milne

Algebraic Geometry by Robin Hartshorne

Algebraic Number Theory by Jurgen Neukirch

The Hom and Tensor Functors

We discussed functors in Category Theory, and in this post we discuss certain functors important to the study of rings and modules. Moreover, we look at these functors and how they affect exact sequences, whose importance was discussed in Exact Sequences. Our discussion in this post will also be related to some things that we discussed in More on Chain Complexes.

If M and N are two modules whose ring of scalars is the ring R (we refer to M and N as R-modules), then we denote by \text{Hom}_{R}(M,N) the set of linear transformations (see Vector Spaces, Modules, and Linear Algebra) from M to N. It is worth noting that this set has an abelian group structure (see Groups).

We define the functor \text{Hom}_{R}(M,-) as the functor that assigns to an R-module N the abelian group \text{Hom}_{R}(M,N) of linear transformations from M to N. Similarly, the functor \text{Hom}_{R}(-,N) assigns to the R-module M the abelian group \text{Hom}_{R}(M,N) of linear transformations from M to N.

These functors \text{Hom}_{R}(M,-) and \text{Hom}_{R}(-,N), combined with the idea of exact sequences, give us new definitions of projective and injective modules, which are equivalent to the old ones we gave in More on Chain Complexes.

We say that a functor is an exact functor if preserves exact sequences. In the case of \text{Hom}_{R}(M,-), we say that it is exact if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow \text{Hom}_{R}(M,A)\rightarrow \text{Hom}_{R}(M,B)\rightarrow \text{Hom}_{R}(M,C)\rightarrow 0

is also exact. The concept of an exact sequence of sets of linear transformations of R-modules makes sense because of the abelian group structure on these sets. In this case we also say that the R-module M is projective.

Similarly, an R-module N is injective if the functor \text{Hom}_{R}(-,N) is exact, i.e. if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow \text{Hom}_{R}(A,N)\rightarrow \text{Hom}_{R}(B,N)\rightarrow \text{Hom}_{R}(C,N)\rightarrow 0

is also exact.

We introduce another functor, which we write M\otimes_{R}-. This functor assigns to an R-module N the tensor product (see More on Vector Spaces and Modules) M\otimes_{R}N. Similarly, we also have the functor -\otimes_{R}N, which assigns to an R-module M the tensor product M\otimes_{R}N. If our ring R is commutative, then there will be no distinction between the functors M\otimes_{R}- and -\otimes_{R}M. We will continue assuming that our rings are commutative (an example of a noncommutative ring is the ring of n\times n matrices).

We say that a module N is flat if the functor -\otimes_{R}N is exact, i.e. if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow A\otimes_{R}N\rightarrow B\otimes_{R}N\rightarrow C\otimes_{R}N\rightarrow 0

is also exact.

We make a little digression to introduce the concept of an algebra. The word “algebra” has a lot of meanings in mathematics, but in our context, as a mathematical object in the subject of abstract algebra and linear algebra, it means a set with both a ring and a module structure. More technically, for a ring A, an A-algebra is a ring B and a ring homomorphism f:A\rightarrow B, which makes B into an A-module via the following definition of the scalar multiplication:

ab=f(a)b for a\in A, b\in B.

The notion of an algebra will be useful in defining the notion of a flat morphism. A ring homomorphism f: A\rightarrow B is a flat morphism if the functor -\otimes_{A}B is exact. Since B is an A-algebra, and an A-algebra is also an A-module, this means that f: A\rightarrow B is a flat morphism if B is flat as an A-module. The notion of a flat morphism is important in algebraic geometry, where the “points” of schemes are given by the prime ideals of a ring, since it corresponds to a “continuous” family of schemes parametrized by the “points” of another scheme.

Finally, the functors \text{Hom}_{R}(M,-), \text{Hom}_{R}(-,N), and -\otimes_{R}N, which we will also refer to as the “Hom” and “Tensor” functors, can be used to define the derived functors “Ext” and “Tor”, to which we have given a passing mention in More on Chain Complexes. We now elaborate on these constructions.

The Ext functor, written \text{Ext}_{R}^{n}(M,N) for a fixed R-module M, is calculated by taking an injective resolution of B,

0\rightarrow N\rightarrow E^{0}\rightarrow E^{1}\rightarrow ...

then applying the functor \text{Hom}_{R}(M,-):

0 \rightarrow \text{Hom}_{R}(M,N)\rightarrow \text{Hom}_{R}(M,E^{0})\rightarrow \text{Hom}_{R}(M,E^{1})\rightarrow ...

we “remove” \text{Hom}_{R}(M,N) to obtain the chain complex

0 \rightarrow \text{Hom}_{R}(M,E^{0})\rightarrow \text{Hom}_{R}(M,E^{1})\rightarrow ...

Then \text{Ext}_{R}^{n}(M,N) is the n-th homology group (see Homology and Cohomology) of this chain complex.

Alternatively, we can also define the Ext functor \text{Ext}_{R}^{n}(M,N) for a fixed R-module N by taking a projective resolution of M,

...\rightarrow P_{1}\rightarrow P_{0}\rightarrow M\rightarrow 0

then then applying the functor \text{Hom}_{R}(-,N), which “dualizes” the chain complex:

0 \rightarrow \text{Hom}_{R}(M,N)\rightarrow \text{Hom}_{R}(P_{0},N)\rightarrow \text{Hom}_{R}(P_{1},N)\rightarrow ...

we again “remove” \text{Hom}_{R}(M,N) to obtain the chain complex

0 \rightarrow \text{Hom}_{R}(P_{0},N)\rightarrow \text{Hom}_{R}(P_{1},N)\rightarrow ...

and \text{Ext}_{R}^{n}(M,N) is once again given by the n-th homology group of this chain complex.

The Tor functor, meanwhile, written \text{Tor}_{n}^{R}(M,N) for a fixed R-module N, is calculated by taking a projective resolution of M and applying the functor -\otimes_{R}N, followed by “removing” M\otimes_{R}N:

0\rightarrow M\otimes_{R}P_{0}\rightarrow M\otimes_{R}P_{1}\rightarrow ...

\text{Tor}_{n}^{R}(M,N) is then given by the n-th homology group of this chain complex.

The Ext and Tor functors were originally developed to study the concepts of “extension” and “torsion” of groups in abstract algebra, hence the names, but they have since then found utility in many other subjects, in particular algebraic topology, algebraic geometry, and algebraic number theory. Our exposition here has been quite abstract; to find more motivation, aside from checking out the references listed below, the reader may also compare with the ordinary homology and cohomology theories in algebraic topology. Hopefully we will be able to flesh out more aspects of what we have discussed here in future posts.


Hom Functor on Wikipedia

Tensor Product of Modules on Wikipedia

Flat Module on Wikipedia

Associative Algebra on Wikipedia

Derived Functor on Wikipedia

Ext Functor on Wikipedia

Tor Functor on Wikipedia

Abstract Algebra by David S. Dummit and Richard B. Foote

Commutative Algebra by M. F. Atiyah and I. G. MacDonald

An Introduction to Homological Algebra by Joseph J. Rotman