Connection and Curvature in Riemannian Geometry

In Geometry on Curved Spaces, we showed how different geometry can be when we are working on curved space instead of flat space, which we are usually more familiar with. We used the concept of a metric to express how the distance formula changes depending on where we are on this curved space. This gives us some way to “measure” the curvature of the space.

We also described the concept of parallel transport, which is in some way even more general than the metric, and can also be used to provide us with some measure of the curvature of a space. Although we can use concepts analogous to parallel transport even without the metric, if we do have a metric on the space and an expression for it, we can relate the concept of parallel transport to the metric, which is perhaps more intuitive. In this post, we formalize the concept of parallel transport by defining the Christoffel symbol and the Riemann curvature tensor, both of which we can obtain given the form of the metric. The Christoffel symbol and the Riemann curvature tensor are examples of the more general concepts of a connection and a curvature form, respectively, which need not be obtained from the metric.

Some Basics of Tensor Notation

First we establish some notation. We have already seen some tensor notation in Some Basics of (Quantum) Electrodynamics, but we explain a little bit more of that notation here, since it will be the language we will work in. Many of the ordinary vectors we are used to, such as the position, will be indexed by superscripts. We refer to these vectors as contravariant vectors. A common convention is to use Latin letters, such as i or j, as indices when we are working with space, and Greek letters, such as \mu and \nu, as indices when we are working with spacetime. Let us consider , for example, spacetime. An event in this spacetime is specified by its 4-position x^{\mu}, where x^{0}=ctx^{1}=xx^{2}=y, and x^{3}=z.

We will use the symbol g_{\mu\nu} for our metric, and we will also often express it as a matrix. For the case of flat spacetime, our metric is given by the Minkowski metric \eta_{\mu\nu}:

\displaystyle \eta_{\mu\nu}=\left(\begin{array}{cccc}-1&0&0&0\\0&1&0&0\\0&0&1&0\\ 0&0&0&1\end{array}\right)

We can use the metric to “raise” and “lower” indices. This is done by multiplying the metric and a vector, and summing over a common index (one will be a superscript and the other a subscript). We have introduced the Einstein summation convention in Some Basics of (Quantum) Electrodynamics, where repeated indices always imply summation, unless explicitly stated otherwise, and we will continue to use this convention for posts discussing differential geometry and the theory of relativity.

Here is an example of “lowering” the index of x^{\nu} in flat spacetime using the metric \eta_{\mu\nu} to obtain a new quantity x_{\mu}:

\displaystyle x_{\mu}=\eta_{\mu\nu}x^{\nu}

Explicitly, the components of the quantity x_{\mu} are given by x_{0}=-ctx_{1}=xx_{2}=y, and x_{3}=z. Note that the “time” component x_{0} has changed sign; this is because \eta_{00}=-1. A quantity such as x_{\mu}, which has a subscript index, is called a covariant vector.

In order to “raise” indices, we need the “inverse metricg^{\mu\nu}. For the Minkowski metric \eta_{\mu\nu}, the inverse metric \eta^{\mu\nu} has the exact same components as \eta_{\mu\nu}, but for more general metrics this may not be the case. The general procedure for obtaining the inverse metric is to consider the expression

\eta_{\mu\nu}\eta^{\nu\rho}=\delta_{\mu}^{\rho}

where \delta_{\mu}^{\rho} is the Kronecker delta, a quantity that can be expressed as the matrix

\displaystyle \delta_{\mu}^{\rho}=\left(\begin{array}{cccc}1&0&0&0\\0&1&0&0\\0&0&1&0\\ 0&0&0&1\end{array}\right).

As a demonstration of what our notation can do, we recall the formula for the invariant spacetime interval:

\displaystyle (ds)^2=-(cdt)^2+(dx)^2+(dy)^2+(dz)^2

Using tensor notation combined with the Einstein summation convention, this can be written simply as

\displaystyle (ds)^2=\eta_{\mu\nu}dx^{\mu}dx^{\nu}.

The Christoffel Symbol and the Covariant Derivative

We now come back to the Christoffel symbol \Gamma^{\mu}_{\nu\lambda}. The idea behind the Christoffel symbol is that it is used to define the covariant derivative \nabla_{\nu}V^{\mu} of a vector V^{\mu}.

The covariant derivative is a very important concept in differential geometry (and not just in Riemannian geometry). When we take derivatives, we are actually comparing two vectors. To further explain what we mean, we recall that individually the components of the vectors can be thought of as functions on the space, and we recall the expression for the derivative from An Intuitive Introduction to Calculus:

\displaystyle \frac{df}{dx}=\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)} when \epsilon is extremely small (essentially negligible)

More formally, we can write

\displaystyle \frac{df}{dx}=\lim_{\epsilon\to 0}\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)}.

Therefore, employing the language of partial derivatives, we could have written the following partial derivative of the \mu-th component of an m-dimensional vector V^{\mu} on an m-dimensional space with respect to the coordinate x^{\nu}:

\displaystyle \frac{\partial V^{\mu}}{\partial x^{\nu}}=\lim_{\Delta x^{\nu}\to 0}\frac{V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})-V^{\mu}(x_{1},...,x^{\nu},...,x^{m})}{(x^{\nu}+\Delta x^{\nu})-(x^{\nu})}

The problem is that we are comparing vectors from different vector spaces. Recall from Vector Fields, Vector Bundles, and Fiber Bundles that we can think of a vector bundle as having a vector space for every point on the base space. The vector V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) belongs to the vector space on the point (x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}), while the vector V^{\mu}(x_{1},...,x^{\nu},...,x^{m}) belongs to the vector space on the point (x_{1},...,x^{\nu},...,x^{m}). To be able to compare the two vectors we need to “transport” one to the other in the “correct” way, by which we mean parallel transport. Now we have seen in Geometry on Curved Spaces that parallel transport can have weird effects on vectors, and these weird effects are what the Christoffel symbol expresses.

Let \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) denote the vector V^{\mu}(x_{1},...,x^{\nu},...,x^{m}) parallel transported from its original vector space on (x_{1},...,x^{\nu},...,x^{m}) to the vector space on (x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}). The vector \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m}) is given by the following expression:

\displaystyle \tilde{V}^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})=V^{\mu}(x_{1},...,x^{\nu},...,x^{m})-V^{\lambda}(x_{1},...,x^{\nu},...,x^{m})\Gamma^{\mu}_{\nu\lambda}(x_{1},...,x^{\nu},...,x^{m})\Delta x^{\nu}

Therefore the Christoffel symbol provides a “correction” for what happens when we parallel transport a vector from one point to another. This is an example of the concept of a connection, which, like the covariant derivative, is part of more general differential geometry beyond Riemannian geometry. The object that is to be parallel transported may not be a vector, for example when we have more general fiber bundles instead of vector bundles. However, in Riemannian geometry we will usually focus on vector bundles, in particular a special kind of vector bundle called the tangent bundle, which consists of the tangent vectors at a point.

Now there is more than one way to parallel transport a mathematical object, which means that there are many choices of a connection. However, in Riemannian geometry there is a special kind of connection that we will prefer. This is the connection that satisfies the following two properties:

\displaystyle \Gamma^{\mu}_{\nu\lambda}=\Gamma^{\mu}_{\lambda\nu}    (torsion-free)

\displaystyle \nabla_{\rho}g_{\mu\nu}    (metric compatibility)

The connection that satisfies these two properties is the one that can be obtained from the metric via the following formula:

\displaystyle \Gamma^{\mu}_{\nu\lambda}=\frac{1}{2}g^{\mu\sigma}(\partial_{\lambda}g_{\mu\sigma}+\partial_{\mu}g_{\sigma\lambda}-\partial_{\sigma}g_{\lambda\mu}).

The covariant derivative is then defined as

\displaystyle \nabla_{\nu}V^{\mu}=\lim_{\Delta x^{\nu}\to 0}\frac{V^{\mu}(x^{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})-\tilde{V}^{\mu}(x_{1},...,x^{\nu}+\Delta x^{\nu},...,x^{m})}{(x^{\nu}+\Delta x^{\nu})-(x^{\nu})}.

We are now comparing vectors belonging to the same vector space, and evaluating the expression above leads to the formula for the covariant derivative:

\displaystyle \nabla_{\nu}V^{\mu}=\partial_{\nu}V^{\mu}+\Gamma^{\mu}_{\nu\lambda}V^{\lambda}.

The Riemann Curvature Tensor

Next we consider the quantity known as the Riemann curvature tensor. It is once again related to parallel transport, in the following manner. Consider parallel transporting a vector V^{\sigma} through an “infinitesimal” distance specified by another vector A^{\mu}, and after that, through another infinitesimal distance specified by a yet another vector B^{\nu}. Then we go parallel transport it again in the opposite direction to A^{\mu}, then finally in the opposite direction to B^{\nu}. The path forms a parallelogram, and when the vector V^{\sigma} returns to its starting point it will then be changed by an amount \delta V^{\rho}. We can think of the Riemann curvature tensor as the quantity that relates all of these:

\displaystyle \delta V^{\rho}=R^{\rho}_{\ \sigma\mu\nu}V^{\sigma}A^{\mu}B^{\nu}.

Another way to put this is to consider taking the covariant derivative of the vector V^{\rho} along the same path as described above. The Riemann curvature tensor is then related to this quantity as follows:

\displaystyle \nabla_{\mu}\nabla_{\nu}V^{\rho}-\nabla_{\nu}\nabla_{\mu}V^{\rho}=R^{\rho}_{\ \sigma\mu\nu}V^{\sigma}.

Expanding the left hand side, and using the torsion-free property of the Christoffel symbol, we will find that

\displaystyle R^{\rho}_{\ \sigma\mu\nu}=\partial_{\mu}\Gamma^{\rho}_{\nu\sigma}-\partial_{\nu}\Gamma^{\rho}_{\mu\sigma}+\Gamma^{\rho}_{\mu\lambda}\Gamma^{\lambda}_{\nu\sigma}-\Gamma^{\rho}_{\nu\lambda}\Gamma^{\lambda}_{\mu\sigma}.

For connections other than the torsion-free one that we chose, there will be another part of the expansion of the expression \nabla_{\mu}\nabla_{\nu}-\nabla_{\nu}\nabla_{\mu} called the torsion tensor. For our case, however, we need not worry about it and we can focus on the Riemann curvature tensor.

There is another quantity that can be obtained from the Riemann curvature tensor called the Ricci tensor, denoted by R_{\mu\nu}. It is given by

\displaystyle R_{\mu\nu}=R^{\lambda}_{\ \mu\lambda\nu}.

Following the Einstein summation convention, we sum over the repeated index \lambda, and therefore the resulting quantity will have only two indices instead of four. This is an example of the operation on tensors called contraction. If we raise one index using the metric and contract again, we obtain a quantity called the Ricci scalar, denoted R:

\displaystyle R=R^{\mu}_{\ \mu}

Example: The 2-Sphere

To provide an explicit example of the concepts discussed, we show their specific expressions for the case of a 2-sphere. We will only give the final results here. The explicit computations can be found among the references, but the reader may gain some practice, especially on manipulating tensors, by performing the calculations and checking only the answers here. In any case, since the metric is given, it is only a matter of substituting the relevant quantities into the formulas already given above.

We have already given the expression for the metric of the 2-sphere in Geometry on Curved Spaces. We recall that it in matrix form, it is given by (we change our notation for the radius of the 2-sphere to R_{0} to avoid confusion with the symbol for the Ricci scalar)

\displaystyle g_{mn}= \left(\begin{array}{cc}R_{0}^{2}&0\\ 0&R_{0}^{2}\text{sin}(\theta)^{2}\end{array}\right)

Individually, the components are (we will use \theta and \varphi instead of the numbers 1 and 2 for the indices)

\displaystyle g_{\theta\theta}=R_{0}^{2}

\displaystyle g_{\varphi\varphi}=R_{0}^{2}(\text{sin}(\theta))^{2}

The other components (g_{\theta\varphi} and g_{\varphi\theta}) are all equal to zero.

The Christoffel symbols are therefore given by

\displaystyle \Gamma^{\theta}_{\varphi\varphi}=-\text{sin}(\theta)\text{cos}(\theta)

\displaystyle \Gamma^{\varphi}_{\theta\varphi}=\text{cot}(\theta)

\displaystyle \Gamma^{\varphi}_{\varphi\theta}=\text{cot}(\theta)

The other components (\Gamma^{\theta}_{\theta\theta}, \Gamma^{\theta}_{\theta\varphi}, \Gamma^{\theta}_{\varphi\theta}, \Gamma^{\varphi}_{\theta\theta}, and \Gamma^{\varphi}_{\varphi\varphi}) are all equal to zero.

The components of the Riemann curvature tensor are given by

\displaystyle R^{\theta}_{\ \varphi\theta\varphi}=(\text{sin}(\theta))^{2}

\displaystyle R^{\theta}_{\ \varphi\varphi\theta}=-(\text{sin}(\theta))^{2}

\displaystyle R^{\varphi}_{\ \theta\theta\varphi}=-1

\displaystyle R^{\varphi}_{\ \theta\varphi\theta}=1

The other components (there are still twelve of them, so I won’t bother writing all their symbols down here anymore) are all equal to zero.

The components of the Ricci tensor is

\displaystyle R_{\theta\theta}=1

\displaystyle R_{\varphi\varphi}=(\text{sin}(\theta))^{2}

The other components (R_{\theta\varphi} and R_{\varphi\theta}) are all equal to zero.

Finally, the Ricci scalar is

\displaystyle R=\frac{2}{R_{0}^{2}}

We note that the larger the radius of the 2-sphere, the smaller the curvature. We can see this intuitively, for example, when it comes to the surface of our planet, which appears flat because the radius is so large. If our planet was much smaller, this would not be the case.

Bonus: The Einstein Field Equations of General Relativity

Given what we have discussed in this post, we can now write down here the expression for the Einstein field equations (also known simply as Einstein’s equations) of general relativity. It is given in terms of the Ricci tensor and the metric (of spacetime) via the following equation:

\displaystyle R_{\mu\nu}-\frac{1}{2}Rg_{\mu\nu}+\Lambda g_{\mu\nu}=\frac{8\pi}{c^{4}} GT_{\mu\nu}

where G is the gravitational constant, the same constant that appears in Newton’s law of universal gravitation (which is approximated by Einstein’s equations at certain limiting conditions), c is the speed of light in a vacuum, and T_{\mu\nu} is the energy-momentum tensor (also known as the stress-energy tensor), which gives the “density” of energy and momentum, as well as certain other related concepts, such as the pressure and shear stress. The symbol \Lambda refers to what is known as the cosmological constant, which was not there in Einstein’s original formulation but later added to support his view of an unchanging universe. Later, with the dawn of George Lemaitre’s theory of an expanding universe, later known as the Big Bang theory, the cosmological constant was abandoned. More recently, the universe was found to not only be expanding, but expanding at an accelerating rate, necessitating the return of the cosmological constant, with an interpretation in terms of the “vacuum energy”, also known as “dark energy”. Today the nature of the cosmological constant remains one of the great mysteries of modern physics.

Bonus: Connection and Curvature in Quantum Electrodynamics

The concepts of connection and curvature also appear in quantum field theory, in particular quantum electrodynamics (see Some Basics of (Quantum) Electrodynamics). It is the underlying concept in gauge theory, of which quantum electrodynamics is probably the simplest example. However, it is an example of differential geometry which does not make use of the metric. We consider a fiber bundle, where the base space is flat spacetime (also known as Minkowski spacetime), and the fiber is \text{U}(1), which is the group formed by the complex numbers with magnitude equal to 1, with law of composition given by multiplication (we can also think of this as a circle).

We want the group \text{U}(1) to act on the wave function (or field operator) \psi(x), so that the wave function has a “phase”, i.e. we have e^{i\phi(x)}\psi(x), where e^{i\phi(x)} is a complex number which depends on the location x in spacetime. Note that therefore different values of the wave function at different points in spacetime will have different values of the “phase”. In order to compare, them, we need a connection and a covariant derivative.

The connection we want is given by

\displaystyle i\frac{q}{\hbar c}A_{\mu}

where q is the charge of the electron, \hbar is the normalized Planck’s constant, c is the speed of light in a vacuum, and A_{\mu} is the four-potential of electrodynamics.

The covariant derivative (here written using the symbol D_{\mu})is

\displaystyle D_{\mu}\psi(x)=\partial_{\mu}\psi(x)+i\frac{q}{\hbar c}A_{\mu}\psi(x)

We will also have a concept analogous to the Riemann curvature tensor, called the field strength tensor, denoted F_{\mu\nu}. Of course, our “curvature” in this case is not the literal curvature of spacetime, as we have already specified that our spacetime is flat, but an abstract notion of “curvature” that specifies how the phase of our wavefunction changes as we move around the spacetime. This field strength tensor is given by the following expression:

F_{\mu\nu}=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}

This may be compared to the expression for the Riemann curvature tensor, where the connection is given by the Christoffel symbols. The first two terms of both expressions are very similar. The difference is that the expression for the Riemann curvature tensor has some extra terms that the expression for the field strength tensor does not have. However, a generalization of this procedure for quantum electrodynamics to groups other than \text{U}(1), called Yang-Mills theory, does feature extra terms in the expression for the field strength tensor that perhaps makes the two more similar.

The concepts we have discussed here can be used to derive the theory of quantum electrodynamics simply from requiring that the Lagrangian (from which we can obtain the equations of motion, see also Lagrangians and Hamiltonians) be invariant under \text{U}(1) transformations, i.e. even if we change the “phase” of the wave function at every point the Lagrangian remains the same. This is an example of what is known as gauge symmetry. Generalized to other groups such as \text{SU}(2) and \text{SU}(3), this is the idea behind gauge theories, which include Yang-Mills theory and leads to the standard model of particle physics.

References:

Christoffel Symbols on Wikipedia

Riemannian Curvature Tensor on Wikipedia

Einstein Field Equations on Wikipedia

Gauge Theory on Wikipedia

Riemann Tensor for Surface of a Sphere on Physics Pages

Ricci Tensor and Curvature Scalar for a Sphere on Physics Pages

Spacetime and Geometry by Sean Carroll

Geometry, Topology, and Physics by Mikio Nakahara

Introduction to Elementary Particle Physics by David J. Griffiths

Introduction to Quantum Field Theory by Michael Peskin and Daniel V. Schroeder

The Riemann Hypothesis for Curves over Finite Fields

The Riemann hypothesis is one of the most famous open problems in mathematics. Not only is there a million dollar prize currently being offered by the Clay Mathematical Institute for its solution, it also has a very long and interesting history spanning over a century and a half. It is part of many famous “lists” of open problems such as the famous 23 problems of David Hilbert, the 18 problems of Stephen Smale, and the 7 “millennium” problems of the aforementioned Clay Mathematical Institute.

The attention and reverence given to the Riemann hypothesis by the mathematical community is not without good reason. The problem originated in the paper “On the Number of Primes Less Than a Given Magnitude” by the mathematician Bernhard Riemann, where he applied the recently developed theory of complex analysis to number theory, in particular to come up with a function \pi(x) that counts the number of prime numbers less than x. The zeroes of the Riemann zeta function figure into the formula for this “prime counting function” \pi(x), and the Riemann hypothesis is a conjecture that concerns these zeroes. Aside from the knowledge about the prime numbers that a solution of the Riemann hypothesis will give us, it is hoped for that efforts toward this solution will lead to developments in mathematics that may be of interest to us for reasons much bigger, and perhaps outside of, the original motivations.

In the 1940’s, the mathematician Andre Weil solved a version of the Riemann hypothesis, which applies to the Riemann zeta function over finite fields. The ideas that Weil developed for solving this version of the Riemann hypothesis has led to many important developments in modern mathematics, whose applications are not limited to the original problem only. It is these ideas that we discuss in this post. But before we can give the statement of the Riemann hypothesis over finite fields (which is almost identical to that of the original Riemann hypothesis), we first review some concepts regarding zeta functions.

We have discussed zeta functions before in  Zeta Functions and L-Functions. We recall that the Riemann zeta function is given by the formula

\displaystyle \zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^{s}}

or, in Euler product form,

\displaystyle \zeta(s)=\prod_{p}\frac{1}{1-p^{-s}}.

We now generalize the Riemann zeta function to any finitely generated ring \mathcal{O}_{K} with field of fractions K by writing it in the following form (this zeta function \zeta(K,s) is also called the arithmetic zeta function):

\displaystyle \zeta(K,s)=\prod_{\mathfrak{m}}\frac{1}{1-(\# \mathcal{O}_{K}/\mathfrak{m})^{-s}}

where \mathfrak{m} runs over all the maximal ideals of the ring \mathcal{O}_{K}, \mathcal{O}_{K}/\mathfrak{m} is the residue field, and the expression \#\mathcal{O}_{K}/\mathfrak{m} stands for the number of elements of this residue field. In the case that \mathcal{O}_{K}=\mathbb{Z}, we get back our usual expression for the Riemann zeta function in its Euler product form, which we have written above, since the maximal ideals of \mathbb{Z} are the principal ideals (p) generated by the prime numbers, and the residue fields \mathbb{Z}/(p) are the fields \{0,1,...,p-1\}, therefore the number \# \mathbb{Z}/(p) is equal to p.

Next we discuss finite fields. All finite fields have a number of elements equal to some positive power of a prime number p; if this number is equal to q=p^{n}, we write the finite field as \mathbb{F}_{q} or \mathbb{F}_{p^{n}}. In the case that n=1, then \mathbb{F}_{q}=\mathbb{F}_{p} is isomorphic to \mathbb{Z}/p\mathbb{Z}.

Let C be a nonsingular projective curve defined over the finite field \mathbb{F}_{q}. “Nonsingular” roughly refers to the curve being “smooth”; or “differentiable”; “projective” roughly means that the curve is part, or a subset, of some projective space. We will not be dwelling too much on these technicalities in this post. “Defined over the finite field \mathbb{F}_{q}” means that the polynomial equation that defines the curve has coefficients which are elements of the finite field \mathbb{F}_{q}. We know that in algebraic geometry (see Basics of Algebraic Geometry), the points of a curve (or more general varieties) correspond to maximal ideals of a “ring of functions” \mathcal{O}_{K} on the curve C . For a point P on a curve over a finite field \mathbb{F}_{q}, the residue field \mathcal{O}_{K}/\mathfrak{m}, where \mathfrak{m} is the maximal ideal corresponding to P, is also a finite field of the form \mathbb{F}_{q^{m}}. The number m is called the degree of P and written \text{deg}(P), and we now define another zeta function (also called the local zeta function and written Z(C,t)) via the following formula:

\displaystyle Z(C,t)=\prod_{P\in C}\frac{1}{1-t^{\text{deg}(P)}}

or equivalently,

\displaystyle Z(C,t)=\prod_{\mathfrak{m}}\frac{1}{1-t^{\text{deg}(\mathfrak{m})}}.

Note that this zeta function Z(C,t) is related to the other zeta function \zeta(K,s) by the following relation:

\displaystyle \zeta(K,s)=Z(C,q^{-s}).

Next we take the “logarithm” of the zeta function Z(C,t). Using the familiar rules for taking the logarithms of products, we will obtain

\displaystyle \text{log}(Z(C,t))=\text{log}\bigg(\prod_{\mathfrak{m}}\frac{1}{1-t^{\text{deg}(\mathfrak{m})}}\bigg)

\displaystyle \text{log}(Z(C,t))=\sum_{\mathfrak{m}}\text{log}\bigg(\frac{1}{1-t^{\text{deg}(\mathfrak{m})}}\bigg)

\displaystyle \text{log}(Z(C,t))=-\sum_{\mathfrak{m}}\text{log}\bigg(1-t^{\text{deg}(\mathfrak{m})}\bigg)

Next we will need the following series expansion for logarithms:

\displaystyle \text{log}(1-a)=-\sum_{k=0}^{\infty}\frac{a^{k}}{k}.

This allows us to write the logarithm of the zeta function as follows:

\displaystyle \text{log}(Z(C,t))=\sum_{\mathfrak{m}}\sum_{k=1}^{\infty}\frac{(t^{\text{deg}(\mathfrak{m})})^{k}}{k}

\displaystyle \text{log}(Z(C,t))=\sum_{\mathfrak{m}}\sum_{k=1}^{\infty}\frac{(t^{\text{deg}(\mathfrak{m})})^{k}}{k\text{deg}(\mathfrak{m})}\text{deg}(\mathfrak{m})

We can condense this expression by writing

\displaystyle \text{log}(Z(C,t))=\sum_{n=1}^{\infty}N_{n}\frac{t^{n}}{n}

where

\displaystyle N_{n}=\sum_{d|n}d(\#\{\mathfrak{m}\subset R|\text{deg}(\mathfrak{m})=d\}).

The expression d|n means “n is divisible by d“, or “d divides n“, which means that the sum is taken over all d that divides n.

The numbers N_{n} can be thought of as the number of points on the curve C whose coordinates are elements of the finite field \mathbb{F}_{q^{n}}. In fact, we can actually define the zeta function Z(C,t) starting with the numbers N_{n}, i.e.

\displaystyle Z(C,t)=\text{exp}\bigg(\sum_{n=1}^{\infty}N_{n}\frac{t^{n}}{n}\bigg)

but we chose to start from the more familiar Riemann zeta function \zeta(s) and generalize to get the form we want for curves over finite fields.

We recall that the zeroes of a function f(z) are those z_{i} such that f(z_{i})=0.

We can now give the statement of the Riemann hypothesis for curves over finite fields:

The zeroes of the zeta function \zeta(K,s)=Z(C,q^{-s}) all have real part equal to \frac{1}{2}.

We will not discuss the entirety of Weil’s proof in this post, although the reader may consult the references provided for such a discussion. Instead we will give a rough overview of Weil’s strategy, which rests on three important assumptions. We will show, roughly, how these assumptions lead to the proof of the Riemann hypothesis, and although we will not prove the assumptions themselves, we will also give a kind of preview of the ideas involved in their respective proofs. It is these ideas, which may now be considered to have developed into entire areas of research in themselves, which are perhaps the most enduring legacy of Weil’s proof.

Assumption 1 (Rationality): The zeta function Z(C,t) can be written in the following form:

\displaystyle Z(C,t)=\frac{\prod_{i=1}^{2g}(1-\alpha_{i}t)}{(1-t)(1-qt)}

Given that this assumption holds, we can take the logarithm of the above expression,

\displaystyle \text{log}(Z(C,t))=\text{log}\bigg(\frac{\prod_{i=1}^{2g}(1-\alpha_{i}t)}{(1-t)(1-qt)}\bigg)

\displaystyle \text{log}(Z(C,t))=\sum_{i=1}^{2g}\text{log}(1-\alpha_{i}t)-\text{log}(1-t)-\text{log}(1-qt)

and we can then apply the series expansion for the logarithm that we have applied earlier to obtain the following expression,

\displaystyle \text{log}(Z(C,t))=\sum_{n=1}^{\infty}(-\sum_{i=1}^{2g}\alpha_{i}^{n}+1+q^{n})\frac{t^{n}}{n}

which we can now compare to the expression we obtained earlier for \text{log}(Z(C,t)) in terms of the number N_{n} of points with coordinates in \mathbb{F}_{q^{n}}:

\displaystyle \sum_{n=1}^{\infty}(-\sum_{i=1}^{2g}\alpha_{i}^{n}+1+q^{n})\frac{t^{n}}{n}=\sum_{n=1}^{\infty}N_{n}\frac{t^{n}}{n}.

Comparing the coefficients of \frac{t^{n}}{n}, we obtain, for each n,

\displaystyle -\sum_{i=1}^{2g}\alpha_{i}^{n}+1+q^{n}=N_{n}.

With a little algebraic manipulation we have

\displaystyle -\sum_{i=1}^{2g}\alpha_{i}^{n}=N_{n}-q^{n}-1

and taking the absolute value of both sides gives us

\displaystyle |\sum_{i=1}^{2g}\alpha_{i}^{n}|=|N_{n}-q^{n}-1|

Assumption 2 (Hasse-Weil Inequality):

\displaystyle |N_{n}-q^{n}-1|\leq 2gq^{\frac{n}{2}}

This assumption, together with the earlier discussion, means that

\displaystyle |\sum_{i=1}^{2g}\alpha_{i}^{n}|\leq 2gq^{\frac{n}{2}}

We can then make use of the expansion

\displaystyle \sum_{i=1}^{2g}\frac{1}{1-\alpha_{i}(q^{-\frac{1}{2}})}=\sum_{n=1}^{\infty}(\sum_{i=1}^{2g}\alpha_{i}^{n})(q^{-\frac{1}{2}})^{n}

which in turn implies that

|\alpha_{i}|\leq q^{\frac{1}{2}}    for all i from 1 to 2g.

Assumption 3 (Functional Equation):

\displaystyle Z\bigg(C,\frac{1}{qt}\bigg)=q^{1-g}t^{2-2g}Z(C,t)

Given this assumption, and writing the zeta function Z(C,t) explicitly, we have:

\displaystyle \frac{\prod_{i=1}^{2g}(1-\alpha_{i}\frac{1}{qt})}{(1-\frac{1}{qt})(1-q\frac{1}{qt})}=q^{1-g}t^{2-2g}\frac{\prod_{i=1}^{2g}(1-\alpha_{i}t)}{(1-t)(1-qt)}

With a little algebraic manipulation we can obtain the following equation:

\displaystyle q^{g}t^{2g}\prod_{i=1}^{2g}(1-\alpha_{i}\frac{1}{qt})=\prod_{i=1}^{2g}(1-\alpha_{i}t)

Let us write the product explicitly, and make the left side zero by letting t=\frac{\alpha_{1}}{q}:

\displaystyle q^{g}(\frac{\alpha_{1}}{q})^{2g}(0)(1-\alpha_{2}\frac{1}{q}\frac{q}{\alpha_{1}})...(1-\alpha_{2g}\frac{1}{q}\frac{q}{\alpha_{1}})=(1-\alpha_{1}\frac{\alpha_{1}}{q})(1-\alpha_{2}\frac{\alpha_{1}}{q})...(1-\alpha_{2g}\frac{\alpha_{1}}{q})

Now since the left side is zero, the right side also must be zero. Therefore one of the factors in the product must be zero. This means that for some i from 1 to 2g, we have

\displaystyle 1-\alpha_{i}\frac{\alpha_{1}}{q}=0

In other words,

\displaystyle \alpha_{i}\alpha_{1}=q

This applies to any other j from 1 to 2g, not just 1, therefore more generally we must have

\displaystyle \alpha_{i}\alpha_{j}=q    for some i and j from 1 to 2g.

If we combine this result with our earlier result that

\displaystyle |\alpha_{i}|\leq q^{\frac{1}{2}}    for all i from 1 to 2g,

this means that

\displaystyle |\alpha_{i}|=q^{\frac{1}{2}}    for all i from 1 to 2g.

With this last result, we know that the zeroes of Z(C,t) must have absolute value equal to q^{-\frac{1}{2}}. Since Z(C,q^{-s})=\zeta(K,s), this implies that the real part of s must be equal to \frac{1}{2}, and this proves the Riemann hypothesis for curves over finite fields. More explicitly, let t_{0} be a zero of the zeta function Z(C,q^{-s}). We then have

\displaystyle |t_{0}|=q^{-\frac{1}{2}}

\displaystyle |q^{-s}|=q^{-\frac{1}{2}}

\displaystyle |q^{-(\text{Re}(s)+\text{Im}(s))}|=q^{-\frac{1}{2}}

\displaystyle q^{-(\text{Re}(s))}=q^{-\frac{1}{2}}

\displaystyle \text{Re}(s)=\frac{1}{2}

The proof of the rationality of the zeta function Z(C,t) and the functional equation makes use of the theory of divisors (see Divisors and the Picard Group) and a very important theorem in algebraic geometry called the Riemann-Roch theorem. The Riemann-Roch theorem originates from complex analysis, which was the kind of the “specialty” of Bernhard Riemann (“On the Number of Primes Less Than a Given Magnitude” was his only paper on number theory, and it concerns the application of complex analysis to number theory). In its original formulation, the Riemann-Roch theorem gives the dimension of the vector space formed by the functions whose zeroes and poles (for a function which can be expressed as the ratio of two polynomials, the poles can be thought of as the zeroes of the denominator), and their “order of vanishing”, are specified. The Riemann-Roch theorem has since been generalized to aspects of algebraic geometry not necessarily directly concerned with complex analysis, and it is this generalization that allows us to make use of it for the case at hand.

In addition to the theory of divisors and the Riemann-Roch theorem, to prove the Hasse-Weil inequality, one must make use of the theory of fixed points, applied to what is known as the Frobenius morphism, which sends a point of C with coordinates a_{i} to the point with coordinates a_{i}^{q}. The theory of fixed points is related to the part of algebraic geometry known as intersection theory. Roughly, given a function f(x), we can think of its fixed points as the values of x for which f(x)=x. One way to obtain these fixed points is to draw the graph of y=x, and the graph of y=f(x), on the xy plane; the fixed points of f(x) are then given by the points where the two graphs intersect.

For the Frobenius morphism, the fixed points correspond to those points whose coordinates are elements of the finite field \mathbb{F}_{q}. Similarly, the fixed points of the n-th power of the Frobenius morphism (which we can think of as the Frobenius morphism applied n times) correspond to those points whose coordinates are elements of the finite field \mathbb{F}_{q^{n}}. Hence we can obtain the numbers N_{n} that go into the expression of the zeta function Z(C,t) using the Frobenius morphism. Combined with results from intersection theory such as the Castelnuovo-Severi inequality and the Hodge index theorem, this allows us to prove the Hasse-Weil inequality.

In algebraic geometry, curves are one-dimensional varieties, and just as there is a version of the Riemann hypothesis for curves over finite fields, there is also a version of the Riemann hypothesis for higher-dimensional varieties over finite fields, called the Weil conjectures, since they were proposed by Weil himself after he proved the case for curves. The Weil conjectures themselves follow the important assumptions involved in proving the Riemann hypothesis for curves over finite fields, such as the rationality of the zeta function and the functional equation. In addition, part of the Weil conjectures suggests a connection with the theory of cohomology (see Homology and Cohomology and Cohomology in Algebraic Geometry), which significant implications for the connections between algebraic geometry and methods originally developed for algebraic topology.

The Weil conjectures were proved by Bernard Dwork, Alexander Grothendieck, and Pierre Deligne. In his efforts to prove the Weil conjectures, Grothendieck developed the notion of topos (see More Category Theory: The Grothendieck Topos), as well as etale cohomology. As further part of his approach, Grothendieck also proposed conjectures, known as the standard conjectures on algebraic cycles, which remain open to this day. Grothendieck’s student, Pierre Deligne, was able to complete the proof of the Weil conjectures while bypassing the standard conjectures on algebraic cycles, by developing ingenious methods of his own. Still, the standard conjectures on algebraic cycles, as well as the related theory of motives, remain very much interesting on their own and continue to be a subject of modern mathematical research.

References:

Riemann Hypothesis on Wikipedia

Weil Conjectures on Wikipedia

Arithmetic Zeta Function on Wikipedia

Local Zeta Function on Wikipedia

The Weil Conjectures for Curves by Sam Raskin

Algebraic Geometry by Bas Edixhoven and Lenny Taelman

The Riemann Hypothesis over Finite Fields: From Weil to the Present Day by J.S. Milne

Algebraic Geometry by Robin Hartshorne

The Moduli Space of Elliptic Curves

A moduli space is a kind of “parameter space” that “classifies” mathematical objects. Every point of the moduli space stands for a mathematical object, in such a way that mathematical objects which are more similar to each other are closer and those that are more different from each other are farther apart. We may use the notion of equivalence relations (see Modular Arithmetic and Quotient Sets) to assign several objects which are in some sense “isomorphic” to each other to a single point.

We have discussed on this blog before one example of a moduli space – the projective line (see Projective Geometry). Every point on the projective line corresponds to a geometric object, a line through the origin. Two lines which have almost the same value of the slope will be closer on the projective line compared to two lines which are almost perpendicular.

Another example of a moduli space is that for circles on a plane – such a circle is specified by three real numbers, two coordinates for the center and one positive real number for the radius. Therefore the moduli space for circles on a plane will consist of a “half-volume” of some sort, like 3D space except that one coordinate is restricted to be strictly positive. But if we only care about the circles up to “congruence”, we can ignore the coordinates for the center – or we can also think of it as simply sending circles with the same radius to a single point, even if they are centered at different points. This moduli space is just the positive real line. Every point on this moduli space, which is a positive real number, corresponds to all the circles with radius equal to that positive real number.

We now want to construct the moduli space of elliptic curves. In order to do this we will need to first understand the meaning of the following statement:

Over the complex numbers, an elliptic curve is a torus.

We have already seen in Elliptic Curves what an elliptic curve looks like when graphed in the xy plane, where x and y are real numbers. This gives us a look at the points of the elliptic curve whose coordinates are real numbers, or to put it in another way, these are the real numbers x and y which satisfy the equation of the elliptic curve.

When we look at the points of the elliptic curve with complex coordinates, or in other words the complex numbers which satisfy the equation of the elliptic curve, the situation is more complicated. First off, what we actually have is not what we usually think of as a curve, but rather a surface, in the same way that the complex numbers do not form a line like the real numbers do, but instead form a plane. However, even though it is not easy to visualize, there is a function called the Weierstrass elliptic function which provides a correspondence between the (complex) points of an elliptic curve and the points in the “fundamental parallelogram” of a lattice in the complex plane. We can think of “gluing” the opposite sides of this fundamental parallelogram to obtain a torus. This is what we mean when we say that an elliptic curve is a torus. This also means that there is a correspondence between elliptic curves and lattices in the complex plane.

We will discuss more about lattices later on in this post, but first, just in case the preceding discussion seems a little contrived, we elaborate a bit on the Weierstrass elliptic function. We must first discuss the concept of a holomorphic function. We have discussed in An Intuitive Introduction to Calculus the concept of the derivative of a function. Now not all functions have derivatives that exist at all points; in the case that the derivative of the function does exist at all points, we refer to the function as a differentiable function.

The concept of a holomorphic function in complex analysis (analysis is the term usually used in modern mathematics to refer to calculus and its related subjects) is akin to the concept of a differentiable function in real analysis. The derivative is defined as the limit of a certain ratio as the numerator and the denominator both approach zero; on the real line, there are limited ways in which these quantities can approach zero, but on the complex plane, they can approach zero from several different directions; for a function to be holomorphic, the expression for its derivative must remain the same regardless of the direction by which we approach zero.

In previous posts on topology on this blog we have been treating two different topological spaces as essentially the same whenever we can find a bijective and continuous function (also known as a homeomorphism) between them; similarly, we have been treating different algebraic structures such as groups, rings, modules, and vector spaces as essentially the same whenever we can find a bijective homomorphism (an isomorphism) between two such structures. Following these ideas and applying them to complex analysis, we may treat two spaces as essentially the same if we can find a bijective holomorphic function between them.

The Weierstrass elliptic function is not quite holomorphic, but is meromorphic – this means that it would have been holomorphic everywhere if not for the “lattice points” where there exist “poles”. But it is alright for us, because such a lattice point is to be mapped to the “point at infinity”. All in all, this allows us to think of the complex points of the elliptic curve as being essentially the same as a torus, following the ideas discussed in the preceding paragraph.

Moreover, the torus has a group structure of its own, considered as the direct product group \text{U}(1)\times\text{U}(1) where \text{U}(1) is the group of complex numbers of magnitude equal to 1 with the law of composition given by the multiplication of complex numbers. When the complex points of the elliptic curve get mapped by the Weierstrass elliptic function to the points of the torus, the group structure provided by the “tangent and chord” or “tangent and secant” construction becomes the group structure of the torus. In other words, the Weierstrass elliptic function provides us with a group isomorphism.

All this discussion means that the study of elliptic curves becomes the study of lattices in the complex plane. Therefore, what we want to construct is the moduli space of lattices in the complex plane, up to a certain equivalence relation – two lattices are to be considered equivalent if one can be obtained by multiplying the other by a complex number (this equivalence relation is called homothety). Going back to elliptic curves, this corresponds to an isomorphism of elliptic curves in the sense of algebraic geometry.

Now given two complex numbers \omega_{1} and \omega_{2}, a lattice \Lambda in the complex plane is given by

\Lambda=\{m\omega_{1}+n\omega_{2}|m,n\in\mathbb{Z}\}

For example, setting \omega_{1}=1 and \omega_{2}=i, gives a “square” lattice. This lattice is also the set of all Gaussian integers. The fundamental parallelogram is the parallelogram formed by the vertices 0, \omega_{1}, \omega_{2}, and \omega_{1}+\omega_{2}. Here is an example of a lattice, courtesy of Alvaro Lozano-Robledo:

fundamental_parallelogram

The fundamental parallelogram is in blue. Here is another, courtesy of Sam Derbyshire:

200px-lattice_torsion_points-svg

Because we only care about lattices up to homothety, we can “rescale” the lattice by multiplying it with a complex number equal to \frac{1}{\omega_{1}}, so that we have a new lattice equivalent under homothety to the old one, given by

\Lambda=\{m+n\omega|m,n\in\mathbb{Z}\}

where

\displaystyle \tau=\frac{\omega_{2}}{\omega_{1}}.

We can always interchange \omega_{1} and \omega_{2}, but we will fix our convention so that the complex number \tau=\frac{\omega_{2}}{\omega_{1}}, when written in polar form \tau=re^{i\theta} always has a positive angle \theta between 0 and 180 degrees. If we cannot obtain this using our choice of \omega_{1} and \omega_{2}, then we switch the two.

Now what this means is that a complex number \omega, which we note is a complex number in the upper half plane \mathbb{H}=\{z\in \mathbb{C}|\text{Im}(z)>0\}, because of our convention in choosing \omega_{1} and \omega_{2}, uniquely specifies a homothety class of lattices \Lambda. However, a homothety class of lattices may not always uniquely specify such a complex number \tau. Several such complex numbers may refer to the same homothety class of lattices.

What \omega_{1} and \omega_{2} specify is a choice of basis (see More on Vector Spaces and Modules) for the lattice \Lambda; we may choose several different bases to refer to the same lattice. Hence, the upper half plane is not yet the moduli space of all lattices in the complex plane (up to homothety); instead it is an example of what is called a Teichmuller space. To obtain the moduli space from the Teichmuller space, we need to figure out when two different bases specify lattices that are homothetic.

We will just write down the answer here; two complex numbers \tau and \tau' refer to homothetic lattices if there exists the following relation between them:

\displaystyle \tau'=\frac{a\tau+b}{c\tau+d}

for integers abc, and d satisfying the identity

\displaystyle ad-bc=1.

We can “encode” this information into a 2\times 2 matrix (see Matrices) which is an element of the group (see Groups) called \text{SL}(2,\mathbb{Z}). It is the group of 2\times 2 matrices with integer entries and determinant equal to 1. Actually, the matrix with entries abc, and d and the matrix with entries -a-b-c, and -d specify the same transformation, therefore what we actually want is the group called \text{PSL}(2,\mathbb{Z}), also known as the modular group, and also written \Gamma(1), obtained from the group \text{SL}(2,\mathbb{Z}) by considering two matrices to be equivalent if one is the negative of the other.

We now have the moduli space that we want – we start with the upper half plane \mathbb{H}, and then we identify two points if we can map one point into the other via the action of an element of the modular group, as we have discussed earlier. In technical language, we say that they belong to the same orbit. We can write our moduli space as \Gamma(1)\backslash\mathbb{H} (the notation means that the group \Gamma(1) acts on \mathbb{H} “on the left”).

When dealing with quotient sets, which are sets of equivalence classes, we have seen in Modular Arithmetic and Quotient Sets that we can choose from an equivalence class one element to serve as the “representative” of this equivalence class. For our moduli space \Gamma(1)\backslash\mathbb{H}, we can choose for the representative of an equivalence class a point from the “fundamental domain” for the modular group. Any point on the upper half plane can be obtained by acting on a point from the fundamental domain with an element of the modular group. The following diagram, courtesy of user Fropuff on Wikipedia, shows the fundamental domain in gray:

modulargroup-fundamentaldomain-01

The other parts of the diagram show where the fundamental domain gets mapped to by certain special elements, in particular the “generators” of the modular group, which are the two elements where a=0, b=-1, c=1, and d=-1, and a=1, b=1, c=1, and d=0. We will not discuss too much of these concepts for now. Instead we will give a preview of some concepts related to this moduli space. Topologically, this moduli space looks like a sphere with a missing point; in order to make the moduli space into a sphere (topologically), we take the union of the upper half plane \mathbb{H} with the projective line (see Projective Geometry) \mathbb{P}^{1}(\mathbb{Q}). This projective line may be thought of as the set of all rational numbers \mathbb{Q} together with a “point at infinity.” The modular group also acts on this projective line, so we can now take the quotient of \mathbb{H}\cup\mathbb{P}^{1}(\mathbb{Q}) (denoted \mathbb{H}^{*} by the same equivalence relation as earlier; this new space, topologically equivalent to the sphere, is called the modular curve X(1).

The functions and “differential forms” on the modular curve X(1) are of special interest. They can be obtained from functions on the upper half plane (with the “point at infinity”) satisfying certain conditions related to the modular group. If they are holomorphic everywhere, including the “point at infinity”, they are called modular forms. Modular forms are an interesting object of study in themselves, and their generalizations, automorphic forms, are a very active part of modern mathematical research.

Moduli Space on Wikipedia

Elliptic Curve on Wikipedia

Weierstrass’s Elliptic Functions on Wikipedia

Fundamental Pair of Periods on Wikipedia

Modular Group on Wikipedia

Fundamental Domain on Wikipedia

Modular Form on Wikipedia

Automorphic Form on Wikipedia

Image by Alvano Lozano Robledo on Wikipedia

Image by Sam Derbyshire on Wikipedia

Image by User Fropuff of Wikipedia

Advanced Topics in the Arithmetic of Elliptic Curves by Joseph H. Silverman

A First Course in Modular Forms by Fred Diamond and Jerry Shurman

Elliptic Curves

An elliptic curve (not to be confused with an ellipse) is a certain kind of polynomial equation which can usually be expressed in the form

\displaystyle y^{2}=x^{3}+ax+b

where a and b are numbers (more precisely, elements of some field) which satisfy the condition that the quantity

\displaystyle 4a^{3}+27b^{2}

is not equal to zero. This is not the most general form of an elliptic curve, as it will not hold for coefficients of “finite characteristic” equal to 2 or 3; however, for our present purposes, this definition will suffice.

Examples of elliptic curves are the following:

\displaystyle y^{2}=x^{3}-x

\displaystyle y^{2}=x^{3}-x+1

which, for real x and y may be graphed in the “Cartesian” or “xy” plane as follows (image courtesy of user YassineMrabet of Wikipedia):

ecclines-3-svg-1

This rather simple mathematical object has very interesting properties which make it a central object of study in many areas of modern mathematical research.

In this post we focus mainly on one of these many interesting properties, which is the following:

The points of an elliptic curve form a group.

A group is a set with a law of composition which is associative, and the set contains an “identity element” under this law of composition, and every element of this set has an “inverse” (see Groups). Now this law of composition applies whether the points of the elliptic curve have rational numbers, real numbers, or complex numbers for coordinates, and it is always given by the same formula. It is perhaps most visible if we consider real numbers, since in that case we can plot it on the xy plane as we have done earlier. The law of composition is also often called the “tangent and chord” or “tangent and secant” construction.

We now expound on this construction. Given two points on the elliptic curve P and Q on the curve, we draw a line passing through both of them. In most cases, this line will pass through another point R on the curve. Then we draw a vertical line that passes through the point R. This vertical line will pass through another point R' on the curve. This gives us the law of composition of the points of the elliptic curve, and we write P+Q=R'. Here is an image courtesy of user SuperManu of Wikipedia:

ecclines-2-svg

The usual case that we have described is on the left; the other three images show other different cases where the line drawn does not necessarily go through three points. This happens, for example, when the line is tangent to the curve at some point Q, as in the second picture; in this case, we think of the line as passing through Q twice. Therefore, when we compute P+Q, the third point is Q itself, and it is through Q that we draw our vertical line to locate Q', which is equal to P+Q.

The second picture also shows another computation, that of Q+Q, or 2Q. Again, since this necessitates taking a line that passes through the point Q twice, this means that the line must be tangent to the elliptic curve at Q. The third point that it passes through is the point P, and we draw the vertical line through P to find the point P', which is equal to 2Q.

Now we discuss the case described by the third picture, where the line going through the two points P and Q which we want to “add” is a vertical line. To explain what happens, we need the notion of a “point at infinity” (see Projective Geometry). We write the point at infinity as 0, expressing the idea that it is the identity element of our group. We cannot find this point at infinity in the xy plane, but we can think of it as the third point that the vertical line passes through aside from P and Q. In this case, of course, there is no need to draw another vertical line – we simply write P+Q=0.

Finally we come to the case described by the fourth picture; this is simply a combination of the earlier cases we have described above. The vertical line is tangent to the curve at the point P, so we can think of it as passing through P twice, and the third point is passes through is the point at infinity 0, so we can write 2P=0.

We will not prove explicitly that the points form a group under this law of composition, i.e. that the conditions for a set to form a group are satisfied by our procedure, but it is an interesting exercise to attempt to do so; readers may try it out for themselves or consult the references provided at the end of the post. It is worth mentioning that our group is also an abelian group, i.e. we have P+Q=Q+P, and hence we have written our law of composition “additively”.

Now, to make the group law apply even when x and y are not real numbers, we need to write this procedure algebraically. This is a very powerful approach, since this allows us to operate with mathematical concepts even when we cannot visualize them.

Let x_{P} and y_{P} be the x and y coordinates of a point P, and let x_{Q} and y_{Q} be the x and y coordinates of another point Q. Let

\displaystyle m=\frac{y_{Q}-y_{P}}{x_{Q}-x_{P}}

be the slope of the line that connects the points P and Q. Then the point P+Q has x and y coordinates given by the following formulas:

\displaystyle x_{P+Q}=m^{2}-x_{P}-x_{Q}

\displaystyle y_{P+Q}=-y_{P}-m(x_{P+Q}-x_{P})

In the case that Q is the same point as P, then we define the slope of the tangent line to the elliptic curve at the point P using the formula

\displaystyle m=\frac{3x_{P}^{2}+a}{2y_{P}}

where a is the coefficient of x in the formula, of the elliptic curve, i.e.

\displaystyle y^{2}=x^{3}+ax+b.

Then the x and y coordinates of the point 2P are given by the same formulas as above, appropriately modified to reflect the fact that now the points P and Q are the same:

\displaystyle x_{2P}=m^{2}-2x_{P}

\displaystyle y_{2P}=-y_{P}-m(x_{2P}-x_{P})

This covers the first two cases in the image above; for the third case, when P and Q are distinct points and y_{P}=-y_{Q}, we simply set P+Q=0. For the fourth case, when P and Q refer to the same point, and y_{P}=0, we set 2P=0. The point at infinity itself can be treated as a mere point and play into our computations, by setting P+0=P, reflecting its role as the identity element of the group.

The group structure on the points of elliptic curves have practical applications in cryptography, which is the study of “encrypting” information so that it cannot be deciphered by parties other than the intended recipients, for example in military applications, or when performing financial transactions over the internet.

On the purely mathematical side, the study of the group structure is currently a very active field of research. An important theorem called the Mordell-Weil theorem states that even though there may be an infinite number of points whose coordinates are given by rational numbers (called rational points), these points may all be obtained by performing the “tangent and chord” or “tangent and secant” construction on a finite number of points. In more technical terms, the group of rational points on an elliptic curve is finitely generated.

There is a theorem concerning finitely generated abelian groups stating that any finitely generated abelian group G is isomorphic to the direct sum of r copies of the integers and a finite abelian group called the torsion subgroup of G. The number r is called the rank of G. The famous Birch and Swinnerton-Dyer conjecture, which currently carries a million dollar prize for its proof (or disproof), concerns the rank of the finitely generated abelian group of rational points on an elliptic curve.

Another thing that we can do with elliptic curves is use them to obtain representations of Galois groups (see Galois Groups). A representation of a group G on a vector space V over a field K is a homomorphism from G to GL(V), the group of bijective linear transformations of the vector space V to itself. We know of course from Matrices that linear transformations of vector spaces can always be written as matrices (in our case the matrices must have nonzero determinant to ensure that the linear transformations are bijective). Representation theory allows us to study the objects of abstract algebra using the methods of linear algebra.

To any elliptic curve we can associate a certain algebraic number field (see Algebraic Numbers). The elements of these algebraic number fields are “generated” by the algebraic numbers that provide the coordinates of “p-torsion” points of the elliptic curve, i.e. those points P for which pP=0 for some prime number p.

The set of p-torsion points of the elliptic curve is a 2-dimensional vector space over the finite field \mathbb{Z}/p\mathbb{Z} (see Modular Arithmetic and Quotient Sets), also written as \mathbb{F}_{p}. Among other things this means that we can choose two p-torsion points P and Q of the elliptic curve such that any other p-torsion point can be written as aP+bQ for integers a and b between 0 and p-1. When an element of the Galois group of the algebraic number field generated by the coordinates of the p-torsion points of the elliptic curve permutes the elements of the algebraic number field, it also permutes the p-torsion points of the elliptic curve. This permutation can then be represented by a 2\times 2 matrix with coefficients in \mathbb{F}_{p}.

The connection between Galois groups and elliptic curves is a concept that is central to many developments and open problems in mathematics. It plays a part, for example in the proof of the famous problem called Fermat’s Last Theorem. It is also related to the open problem called the Kronecker Jugendtraum (which is German for Kronecker’s Childhood Dream, and named after the mathematician Leopold Kronecker), also known as Hilbert’s Twelfth Problem, which seeks a procedure for obtaining all field extensions of algebraic number fields whose Galois group is an abelian group. This problem has been solved only in the special case of imaginary quadratic fields, and the solution involves special kinds of “symmetries” of elliptic curves called complex multiplication (not to be confused with the multiplication of complex numbers). David Hilbert, who is one of the most revered mathematicians in history, is said to have referred to the theory of complex multiplication as “…not only the most beautiful part of mathematics but of all science.”

References:

Elliptic Curve on Wikipedia

Mordell-Weil Theorem on Wikipedia

Birch and Swinnerton-Dyer Conjecture on Wikipedia

Wiles’ Proof of Fermat’s Last Theorem on Wikipedia

Hilbert’s Twelfth Problem on Wikipedia

Complex Multiplication on Wikipedia

Image by User YassineMrabet of Wikipedia

Image by User SuperManu of Wikipedia

Fearless Symmetry: Exposing the Hidden Patterns of Numbers by Avner Ash and Robert Gross

Elliptic Tales: Curves, Counting, and Number Theory by Avner Ash and Robert Gross

Rational Points on Elliptic Curves by Joseph H. Silverman

Basics of Arithmetic Geometry

Here is a mathematics problem well-known since ancient times: Find integers a, b, and c that solve the famous equation in the Pythagorean theorem,

\displaystyle a^{2}+b^{2}=c^{2}

Examples are a=3, b=4, c=5, and a=5, b=12, c=13 (a and b are of course interchangeable).

The general solution was already known to the ancient Greek mathematician Euclid. Let m and n be integers; then a, b, and c are given by

\displaystyle a=m^{2}-n^{2}
\displaystyle b=2m^{2}n^{2}
\displaystyle c=m^{2}+n^{2}

Direct substitution and a little algebra completes the proof.

Now for some geometry. If we divide both sides of the equation by c^{2}, and let x=\frac{a}{c}, and y=\frac{b}{c}, then the equation becomes

x^{2}+y^{2}=1

which is the equation of a circle of radius 1 centered at the origin. The problem of finding integer solutions to the equation of the Pythagorean theorem now becomes the problem of finding points in the unit circle whose coordinates are rational numbers.

There are analogous problems of finding “rational points” in “shapes” other than circles (the technical term for shapes described by polynomial equations is “variety”). For the other quadratic equations like the conic sections (parabola, hyperbola, and ellipse) this problem has already been solved.

However for cubic equations (like the so-called “elliptic curves”) and equations with an even higher degree this is still a very fruitful area of research, part of a field of mathematics called arithmetic geometry (also called Diophantine geometry).

One famous theorem in this field is Faltings’ theorem (formerly the Mordell conjecture): The number of rational points on a curve (a curve is a one-dimensional variety – take note that over the complex numbers this is actually a surface) with rational coefficients and genus greater than one (the genus is a number related to the degree) is finite.

References:

Diophantine Geometry on Wikipedia

Diophantine Equation on Wikipedia

Elliptic Curve on Wikipedia

Faltings’s Theorem on Wikipedia

Rational Points on Elliptic Curves by Joseph H. Silverman

Geometry on Curved Spaces

Differential geometry is the branch of mathematics used by Albert Einstein when he formulated the general theory of relativity, where gravity is the curvature of spacetime. It was originally invented by Carl Friedrich Gauss to study the curvature of hills and valleys in the Kingdom of Hanover.

From what I described, one may guess that differential geometry has something to do with curvature. The geometry we learn in high school only occurs on a flat surface. There we can put coordinates x and y and compute distances, angles, areas, and so on.

To imagine what geometry on curved spaces looks like, imagine a globe. Instead of x and y coordinates, we can use latitude and longitude. One can now see just how different geometry is on this globe. Vertical lines (the lines of constant x) on a flat surface are always the same distance apart. On a globe, the analogues of these vertical lines, the lines of constant longitude, are closer near the poles than they are near the equator.

Other weird things happen on our globe: One can have triangles with angles that sum to more than 180 degrees. Run two perpendicular line segments from the north pole to the equator. They will meet the equator at a right angle and form a triangle with three right angles for a total of 270 degrees. Also on the globe the ratio between the circumference of a circle to its diameter might no longer be equal to the number \pi.

To make things more explicit, we will introduce the concept of a metric (the word “metric” refers to a variety of mathematical concepts related to notion of distance – in this post we use it in the sense of differential geometry to refer to what is also called the metric tensor). The metric is an example of a mathematical object called a tensor, which we will not discuss much of in this post. Instead, we will think of the metric as expressing a kind of “distance formula” for our space, which may be curved. The part of differential geometry that makes use of the metric is called Riemannian geometry, named after the mathematician Bernhard Riemann, a student of Gauss who extended his results on curved spaces to higher dimensions.

We recall from From Pythagoras to Einstein several important versions of the “distance formula”, from the case of 2D space, to the case of 4D spacetime. We will focus on the simple case of 2D space in this post, since it is much easier to visualize; in fact, we have already given an example of a 2D space earlier, the globe, which we shall henceforth technically refer to as the 2-sphere. As we have learned in From Pythagoras to Einstein, a knowledge of the most simple cases can go very far toward the understanding of more complicated ones.

We will make a little change in our notation so as to stay consistent with the literature. Instead of the latitude, we will make use of the colatitude, written using the symbol \theta, and defined as the complementary angle to the latitude, i.e. the colatitude is 90 degrees minus the latitude. We will keep using the longitude, and we write it using the symbol \varphi. Note that even though we colloquially express our angles in degrees, for calculations we will always use radians, as is usual practice in mathematics and physics.

On a flat 2D space, the distance formula is given by

\displaystyle (\Delta x)^{2}+(\Delta y)^{2}=(\Delta s)^{2}.

It will be productive for us to work with extremely small quantities for now; from them we can obtain larger quantities later on using the language of calculus (see An Intuitive Introduction to Calculus). Adopting the notation of this language, we write

\displaystyle (dx)^{2}+(dy)^{2}=(ds)^{2}

We now give the distance formula for a 2-sphere:

\displaystyle R^{2}(d\theta)^{2}+R^{2}\text{sin}(\theta)^{2}(d\varphi)^{2}=(ds)^{2}

where R is the radius of the 2-sphere. This formula agrees with our intuition; the same difference in latitude and longitude result in a bigger distance for a bigger 2-sphere than for a smaller one, and the same difference in longitude results in a bigger distance for points near the equator than for points near the poles.

The idea behind the concept of the metric is that it gives how the distance formula changes depending on the coordinates. It is often written as a matrix (see Matrices) whose entries are the “coefficients” of the distance formula. Hence, for a flat 2D space it is given by

\displaystyle \left(\begin{array}{cc}1&0\\ 0&1\end{array}\right)

while for a 2-sphere it is given by

\displaystyle \left(\begin{array}{cc}R^{2}&0\\ 0&R^{2}\text{sin}(\theta)^{2}\end{array}\right).

We have seen that the metric can express how a space is curved. There are several other quantities related to the metric (and which can be derived from it), such as the Christoffel symbol and the Riemann curvature tensor, which express ideas related to curvature – however, unlike the metric which expresses curvature in terms of the distance formula, the Christoffel symbol and the Riemann curvature tensor express curvature in terms of how vectors (see Vector Fields, Vector Bundles, and Fiber Bundles) change as they move around the space.

The main equations of Einstein’s general theory of relativity, called the Einstein equations, relate the Riemann curvature tensor of 4D spacetime to the distribution of mass (or, more properly, the distribution of energy and momentum), expressed via the so-called energy-momentum tensor (also known as the stress-energy tensor).

The application of differential geometry is not limited to general relativity of course, and its objects of study are not limited to the metric. For example, in particle physics, gauge theories such as electrodynamics (see Some Basics of (Quantum) Electrodynamics) use the language of differential geometry to express forces like the electromagnetic force as a kind of “curvature”, even though a metric is not used to express this more “abstract” kind of curvature. Instead, a generalization of the concept of “parallel transport” is used. Parallel transport is the idea behind objects like the Christoffel symbol and the Riemann curvature tensor – it studies how vectors change as they move around the space. To generalize this, we replace vector bundles by more general fiber bundles (see Vector Fields, Vector Bundles, and Fiber Bundles).

To give a rough idea of parallel transport, we give a simple example again in 2D space – this 2D space will be the surface of our planet. Now space itself is 3D (with time it forms a 4D spacetime). But we will ignore the up/down dimension for now and focus only on the north/south and east/west dimensions. In other words, we will imagine ourselves as 2D beings, like the characters in the novel Flatland by Edwin Abbott. The discussion below will not make references to the third up/down dimension.

Imagine that you are somewhere at the Equator, holding a spear straight in front of you, facing north. Now imagine you take a step forward with this spear. The spear will therefore remain parallel to its previous direction. You take another step, and another, walking forward (ignoring obstacles and bodies of water) until you reach the North Pole. Now at the North Pole, without turning, you take a step to the right. The spear is still parallel to its previous direction, because you did not turn. You just keep stepping to the right until you reach the Equator again. You are not at your previous location of course. To go back you need to walk backwards, which once again keeps the spear parallel to its previous direction.

When you finally come back to your starting location, you will find that you are not facing the same direction as when you first started. In fact, you (and the spear) will be facing the east, which is offset by 90 degrees clockwise from the direction you were facing at the beginning, despite the fact that you were keeping the spear parallel all the time.

This would not have happened on a flat space; this “turning” is an indicator that the space (the surface of our planet) is curved. The amount of turning depends, among other things, on the curvature of the space. Hence the idea of parallel transport gives us a way to actually measure this curvature. It is this idea, generalized to mathematical objects other than vectors, which leads to the abstract notion of curvature – it is a measure of the changes that occur in certain mathematical objects when you move around a space in a certain way, which would not have happened if you were on a flat space.

In closing, I would like to note that although differential geometry is probably most famous for its applications in physics (another interesting application in physics, by the way, is the so-called Berry’s phase in quantum mechanics), it is by no means limited to these applications alone, as already reflected in its historical origins, which barely have anything to do with physics. It has even found applications in number theory, via Arakelov theory. Still, it has an especially important role in physics, with much of modern physics written in its language, and many prospects for future theories depending on it. Whether in pure mathematics or theoretical physics, it is one of the most fruitful and active fields of research in modern times.

Bonus:

Since we have restricted ourselves to 2D spaces in this post, here is an example of a metric in 4D spacetime – this is the Schwarzschild metric, which describes the curved spacetime around objects like stars or black holes (it makes use of spherical polar coordinates):

\displaystyle \left(\begin{array}{cccc}-(1-\frac{2GM}{rc^{2}})&0&0&0\\0&(1-\frac{2GM}{rc^{2}})^{-1}&0&0\\0&0&r^{2}&0\\ 0&0&0&r^{2}\text{sin}(\theta)^{2}\end{array}\right)

In other words, the “infinitesimal distance formula” for this curved spacetime is given by

\displaystyle -(1-\frac{2GM}{rc^{2}})(d(ct))^{2}+(1-\frac{2GM}{rc^{2}})^{-1}(dr)^{2}+r^{2}(d\theta)^{2}+r^{2}\text{sin}(\theta)^{2}(d\varphi)^{2}=(ds)^{2}

where G is the gravitational constant and M is the mass. Note also that as a matter of convention the time coordinate is “scaled” by the constant c (the speed of light in a vacuum).

References:

Differential Geometry on Wikipedia

Riemannian Geometry on Wikipedia

Metric Tensor on Wikipedia

Parallel Transport on Wikipedia

Differential Geometry of Curves and Surfaces by Manfredo P. do Carmo

Geometry, Topology, and Physics by Mikio Nakahara

Valuations and Completions

In ordinary everyday life, there are several notions of closeness. There is for example, a physical notion of distance, and we say, for instance, that we are close to our next-door neighbors. But there is another sense of closeness, such that we can say that we are “close” to our relatives, or to our friends, even though physically they may be far away.

There is also a similar notion of “closeness” between numbers. The most basic method is provided by the familiar “absolute value“. Given three numbers x, x_{1}, and x_{2}, to say that x is closer to x_{1} than to x_{2} means that |x-x_{1}|<|x-x_{2}|. So for example, since |(-1)-(2)|=3 and |(8)-(2)|=6, we therefore say that the number 2 is “closer” to the number -1 than to the number 8. In other words, the smaller the value of |x-y|, the closer x and y are to each other.

But there are also other notions of “closeness” for numbers, just as we have explained above, that with our relatives or friends we may be “close” even if we are far away from each other. Consider the numbers 1 and 10001. Simply by looking, they can perhaps be said to be “relatives” or “friends”, which makes them in some way closer than, say, 1 and 18. The same may be said for 5 and 3000005, that they are perhaps members of the same “family”. This is, of course, because their difference is divisible by a large power of 10, and since we use the decimal system to write our numbers, there is some sort of visual cue that these numbers are “family members”.

But in number theory, 10 is not really very special. Perhaps it just so happens that we have 10 fingers which we use for counting, so we used 10 as a base for our number system. What is really special in number theory are the prime numbers. So for our notion of closeness we choose a prime, and define our measure of closeness so that two numbers are closer together whenever their difference is divisible by a large power of that prime number. For our chosen prime p, we want an analogue of the absolute value, which we will call the p-adic absolute value, and written |x-y|_{p}, which is smaller if the difference of x and y is divisible by a large power of p. The “ordinary” absolute value will now be denoted by |x-y|_{\infty}.

We want to define this for rational numbers as follows. Given a rational number a, we express it as

a=p^{m}\frac{b}{c}

such that b, c, and p are mutually prime, i.e. they have no factors in common except 1. Then we set

|a|_{p}=\frac{1}{p^{m}}.

We can see that this definition gives us the properties we are looking for – the value of |a|_{p} is indeed smaller if a is divisible by a large power of p.

The absolute value (both the “ordinary” absolute value and the p-adic absolute value) is also called the multiplicative valuation. There is also a related notion called the exponential valuation, which, in the p-adic case, we denote by v_{p}(a) for a rational number a. The exponential valuation is obtained from the multiplicative valuation by setting

v_{p}(a)=-\text{log}_{p}|a|_{p}.

In the case above, where a=p^{m}\frac{b}{c} and b, c, and p are mutually prime, we simply have

v_{p}(a)=m.

For the ordinary absolute value, we just set

v_{\infty}(a)=-\text{ln}|a|_{\infty}

where \text{ln } of course stands for the natural logarithm.

The concept of “closeness” between numbers, even just the “ordinary” one, was used to discover something interesting about the number line. If it was merely composed of the rational numbers, then there would be “gaps” in the line. To make a “true” number line, one must fill in these gaps, and this lead to the construction of the real numbers by the mathematician Richard Dedekind in the 19th century.

We elaborate on the nature of these “gaps”, following closely the idea behind Dedekind’s construction. Consider the real number \sqrt{2}. It is known from ancient times that this number cannot be written as a ratio of two integers and is therefore not a rational number. However, we can construct an infinite sequence of rational numbers such that every successive rational number in the sequence is “closer” to \sqrt{2}, compared to the one before it.

The mathematician Leopolod Kronecker once claimed, “God made the integers, all else is the work of man.” We know how to construct the rational numbers from the integers (for those who would like to think of the natural numbers as being even more basic than the integers, it is also easy to construct the integers from the natural numbers), by taking pairs of integers, and considering sets of equivalence classes (see Modular Arithmetic and Quotient Sets) of these pairs; for example, we set \frac{1}{2} and \frac{2}{4} as equivalent, because “cross multiplication” on the numerators and denominators gives us the same result. So the rational numbers are really equivalence classes of pairs of integers.

The problem we face now is how to construct the real numbers from the rational numbers. We have seen that we can construct sequences which “converge” in some sense to some value that is not a rational number. By “converge”, we mean that successive terms become closer and closer to each other late in the sequence. Technically, we do not refer to such a sequence as a convergent sequence, since it is a sequence of rational numbers but it does not converge to a rational number. Instead, we refer to it as a Cauchy sequence.

And this gives us a possible solution to our problem above – we could simply define the real numbers as the set of all Cauchy sequences. Those that converge to a rational number “represent” that rational number, and those that do not “represent” an irrational number such as \sqrt{2}. However, there is still one more problem that we have to take care of. There may be more than one Cauchy sequence that “represents” a certain rational or irrational number.

Consider, for instance, the sequence

\displaystyle 5,5,5,5,5,...

which obviously converges to the rational number 5, and consider another sequence

\displaystyle 6,5,5,5,5,...

which is different in the first term but similarly converges to the rational number 5. They are different sequences, but they “represent” the same rational number. We would like to have a method of “identifying” these two sequences under some equivalence relation. In order to do this, we consider the “difference” of these two sequences:

\displaystyle 1,0,0,0,0,...

We see that it converges to 0. Such a sequence is called a nullsequence, and this gives us our equivalence relation – two Cauchy sequences are to be considered equivalent if they differ by a nullsequence. The set of real numbers \mathbb{R} is then defined as the set of equivalence classes of Cauchy sequences under this equivalence relation.

The process of “filling in” the “gaps” between the rational numbers is called completion. Note that a notion of “closeness” is important in the process of completion. If we had a different notion of closeness, for example, by using the p-adic absolute value instead of the ordinary absolute value, we would obtain a different kind of completion. Instead of the real numbers \mathbb{R}, we would have instead the p-adic numbers \mathbb{Q}_{p}. The p-adic numbers play an important role in number theory, as they encode information related to primes.

References:

Valuation on Wikipedia

Complete Metric Space on Wikipedia

p-adic Number on Wikipedia

Algebraic Number Theory by Jurgen Neukirch

Algebraic Number Theory by J. W. S. Cassels and A. Frohlich

Divisors and the Picard Group

In this post, once again focusing on the subject of algebraic geometry, we will consider a “curve”, which, confusingly, refers what we usually think of as a surface. The reason for this is that if we are considering complex numbers x and y, an equation such as y^{2}=x^{3}-x, which we would normally think of as a “curve” if x and y were real numbers, actually refers to something that looks like a surface, in the same way the real numbers form a line and complex numbers form a plane. We will rely on this intuition and leave the more formal definitions of curves, surfaces, and dimension to the references for now.

A divisor is a finite “linear combination” of points on the curve, with integer coefficients. For example, if we have points P_{1} and P_{2} on the curve, we can have something like

\displaystyle 5P_{1}-3P_{2}.

The degree of a divisor is the sum of its coefficients. For the example above, the degree is equal to 2.

A special kind of divisor called a principal divisor comes from  so-called “rational functions” (which, despite the name, may not really be “functions” in the set-theoretic sense but merely expressions involving a “fraction” whose numerator and denominator are both polynomials) on the curve. We let the coefficients of each point denote the “order of vanishing” of the function. For instance, the function

\displaystyle \frac{x(x-1)^{2}}{(x-3)^5}

gives rise to the principal divisor

(f)=P_{1}+2P_{2}-5P_{3}

where P_{1} is the point x=0P_{2} is the point x=1, and P_{3} is the point x=3.

The Picard group of a curve is a group (whose law of composition is given by addition – see also Groups) obtained from the divisors by considering two divisors D and D' equivalent (see Modular Arithmetic and Quotient Sets) if their difference D-D' is a  principal divisor. An element of the Picard group is also called a divisor class.

The Picard group of a curve can say a lot of things about the curve. For instance, it can be used to prove that on the curve y^{2}=x^{3}-x, which is an example of what is called an elliptic curve, the points form a group. The group structure on the elliptic curve, along with other properties such as its being a Riemann surface (a surface which “locally” looks like the complex plane), makes it one of the most interesting objects in mathematics.

The Picard group is also important because its elements, the divisor classes on the curve, correspond to line bundles (vector bundles of dimension 1 – see Vector Fields, Vector Bundles, and Fiber Bundles – but do keep in mind our discussion earlier regarding complex numbers and how this changes our conventions regarding dimension, as in the case of the line and the plane, and curves and surfaces) on the curve. Line bundles are also related to sheaves, in particular those called “locally free sheaves of rank 1” (more general vector bundles correspond to locally free sheaves of finite rank). There is, therefore, a relation between the concept of divisors, the concept of vector bundles, and the concept of sheaves.

We now relate the theory of divisors and the Picard group to number theory. We have mentioned in Localization that we can obtain a scheme out of the integers \mathbb{Z}; the points of this scheme are the prime ideals of \mathbb{Z}, and the set of all these points (prime ideals) we call \text{Spec }\mathbb{Z}. As we can make a scheme out of a more general ring, we can therefore make a scheme out of the ring of integers \mathcal{O}_{K} of an algebraic number field K (see Algebraic Numbers); its points will be the prime ideals of \mathcal{O}_{K}, and the “rational functions” on this scheme will be the elements of K.

In this case, the divisors are made up of “linear combinations” of prime ideals. The principal divisors, which come from rational functions, then correspond, accordingly, to principal fractional ideals, ideals which are generated by a single element of K, which as we have mentioned above correspond to the rational functions. Finally, the Picard group is none other than the ideal class group, which “measures” the failure of unique factorization in an algebraic number field!

More explicitly, an example of a divisor may be written in this way:

\displaystyle 5\mathfrak{p}_{1}-3\mathfrak{p}_{2}

for prime ideals \mathfrak{p}_{1} and \mathfrak{p}_{2}, which as we have said correspond to points. For a principal divisor, we may have, for example, the following element of the rational numbers \mathbb{Q}

\displaystyle \frac{63}{64}

which generates the principal fractional ideal

\displaystyle (\frac{63}{64})=\{...,-\frac{189}{64},-\frac{126}{64},-\frac{63}{64},0,\frac{63}{64},\frac{126}{64},\frac{189}{64},...\}

which in turn gives us the principal divisor

\displaystyle 2\mathfrak{p}_{1}+\mathfrak{p}_{2}-6\mathfrak{p}_{3}

where \mathfrak{p}_{1}=(3)\mathfrak{p}_{2}=(7), and \mathfrak{p}_{3}=(2), the principal ideals generated by 3, 7, and 2 respectively. Note that if we “factorize” the numerator and denominator of \frac{63}{64}, we obtain

\displaystyle \frac{63}{64}=\frac{3^{2}\cdot 7}{2^{6}}.

More generally, we should “factorize” in terms of ideals, in case we don’t have unique factorization:

\displaystyle (\frac{63}{64})=\frac{(3)^{2}(7)}{(2)^{6}}.

The coefficients of a principal divisor, measuring “how much” of a certain prime is in the factorization of the principal fractional ideal it corresponds to, are called valuations. The theory of valuations offers us another way to develop the entire field of algebraic number theory under a new perspective.

References:

Divisor on Wikipedia

Picard Group on Wikipedia

Algebraic Geometry by Robin Hartshorne

Algebraic Number Theory by Jurgen Neukirch

Direct Images and Inverse Images of Sheaves

In this post we will be working with the etale topology once again, so we start by formalizing some concepts. We want to first define the category (see Category Theory) called \text{Et}(X) or \text{Et}/X in the literature. The objects of this category are etale morphisms (see Cohomology in Algebraic Geometry) \varphi:U\rightarrow X of schemes to  X, while the morphisms are etale morphisms \psi:U\rightarrow U' such that if \varphi ':U'\rightarrow X is another etale morphism to X, then \varphi' \circ \psi=\varphi.

Presheaves of sets, abelian groups, etc. on the category \text{Et}(X) are defined as contravariant functors from the category \text{Et}(X) to sets, abelian groups, etc. They are sheaves if they satisfy the sheaf conditions commonly referred to as local identity and gluing (see Even More Category Theory: The Elementary Topos). We will refer to presheaves (resp. sheaves) on \text{Et}(X) as etale presheaves (resp. etale sheaves) or simply as presheaves (resp. sheaves) on X.

Let f:X\rightarrow Y be a morphism of schemes, and let \mathcal{F} be a sheaf on X. There is a sheaf on Y determined by the f and \mathcal{F}, called the direct image sheaf, written f_{*}\mathcal{F} and defined by

\displaystyle f_{*}\mathcal{F}(U)=\mathcal{F}(X\times_{Y}U)

for an etale morphism U\rightarrow Y.

The direct image functor f_{*} is the functor that assigns to a sheaf \mathcal{F} the direct image sheaf f_{*}\mathcal{F}. The derived functor (see More on Chain Complexes and The Hom and Tensor Functors) of f_{*} is called the higher direct image functor and written R^{n}f_{*}.

On the other hand, a morphism f:X\rightarrow Y of schemes and a sheaf \mathcal{G} on Y also determine a sheaf on X called the inverse image sheaf, written f^{*}\mathcal{F} and obtained via the following construction:

Let U\rightarrow X be an etale morphism of schemes. The presheaf \mathcal{G}' is given by

\mathcal{G}'(U)=\varinjlim \mathcal{G}(V)

where the direct limit (see Etale Cohomology of Fields and Galois Cohomology) is taken over all V\rightarrow Y such that the morphisms commute (i.e. the composition of morphisms U\rightarrow V and V\rightarrow Y is equal to the composition of morphisms U\rightarrow X and X\rightarrow Y).

We then define the sheaf f^{*}\mathcal{G} as the sheaf associated to the presheaf \mathcal{G}' (the process of associating a sheaf to a presheaf, also known as sheafification, is left to the references for now).

We now introduce the notions of open subschemes and closed subschemes, and open immersions and closed immersions. We quote directly from the book Algebraic Geometry by Robin Hartshorne:

An open subscheme of a scheme X is a scheme U whose topological space is an open subset of X, and whose structure sheaf \mathcal{O}_{U} is isomorphic to the restriction \mathcal{O}_{X|U} of the structure sheaf of X. An open immersion is a morphism f:X\rightarrow Y which induces an isomorphism of X with an open subscheme of Y.

Note that every open subset of a scheme carries a unique structure of open subscheme.

A closed immersion is a morphism f: Y\rightarrow X of schemes such that induces a homeomorphism of \text{sp}(Y) onto a closed subset of \text{sp}(X) and furthermore the induced map f^{\#}:\mathcal{O}_{X}\rightarrow f_{*}\mathcal{O}_{Y} of sheaves on X is surjective. A closed subscheme of a scheme X is an equivalence class of closed immersions, where we say f:Y\rightarrow X and f':Y'\rightarrow X are equivalent if there is an isomorphism i: Y'\rightarrow Y such that f'=f\circ i.

Now it may happen that given an open immersion j:U\rightarrow X and a sheaf \mathcal{F} on U, the stalks of j_{*}\mathcal{F} may not be zero for points outside U. Therefore we define another sheaf j_{!} on X, given by the following construction:

Given a sheaf \mathcal{F} on U, and an etale morphism \varphi:V\rightarrow X, let

\displaystyle \mathcal{F}_{!}(V)=\mathcal{F}(V) if \varphi(V)\subseteq U

\displaystyle \mathcal{F}_{!}(V)=0 if \varphi(V)\nsubseteq U

Once again, \mathcal{F}_{!} is a presheaf on X, but it need not be a sheaf, therefore we define instead j_{!}\mathcal{F} to be the sheaf associated to the presheaf \mathcal{F}_{!}.

One concept related to this “extension by zero” functor j_{!} is cohomology with compact support, written H_{c}^{n}(U,\mathcal{F})=H^{r}(X,j_{!}\mathcal{F}).

The functors f_{*}, f^{*}, and the generalization of j_{!}, called the direct image with compact support and denoted f_{!}, are part of the so-called “six operations” which play an important role in modern algebraic geometry.

References:

Image Functors for Sheaves on Wikipedia

Direct Image Functor on Wikipedia

Inverse Image Functor on Wikipedia

Direct Image with Compact Support on Wikipedia

Six Operations on Wikipedia

Six Operations on the nLab

Lectures on Etale Cohomology by J.S. Milne

Algebraic Geometry by Robin Hartshorne

Etale Cohomology and the Weil Conjecture by Eberhard Freitag and Reinhardt Kiehl