Galois Representations Coming From Weight 2 Eigenforms

In Galois Representations we mentioned briefly that Galois representations can be obtained from modular forms. In this post we elaborate more on this construction, in the case that the modular form is a weight 2 eigenform (a weight 2 cusp form that is a simultaneous eigenfunction for all Hecke operators not dividing the level N). This specific case is also known as the Shimura construction, after Goro Shimura.

Let f be a weight 2 Hecke eigenform, of some level \Gamma_{0}(N) (this also works with other level structures). We want to construct a p-adic Galois representation associated to this Hecke eigenform, such that the two are going to be related in the following manner. For every prime \ell not dividing N and not equal to p, the characteristic polynomial of the image of the Frobenius element associated to \ell under this Galois representation will be of the form

\displaystyle x^{2}-a_{\ell}x+\ell\chi(\ell)

where a_{\ell} is the eigenvalue of the Hecke operator T_{\ell} and \chi is a Dirichlet character associated to another kind of Hecke operator called the diamond operator \langle \ell\rangle. This diamond operator acts on the argument of the modular form by an upper triangular element of \mathrm{SL}_{2}(\mathbb{Z}) whose bottom right entry is \ell mod N. This action is the same as the action of a Dirichlet character \chi:\mathbb{Z}/N\mathbb{Z}\to\mathbb{C}^{\times}. The above polynomial is also known as the Hecke polynomial.

The first thing that we will need is the identification of the weight 2 cusp forms with the holomorphic differentials on the modular curve (as mentioned in Modular Forms in the case of \mathbb{SL}_{2}(\mathbb{Z}), although this is can be done more generally).

The second thing that we will need is the Jacobian. One can think of the Jacobian as the space given by the equivalence classes of all path integrals on a curve (in general we can do this for any algebraic curve, not just modular curves), where two path integrals are to be considered equivalent if they differ by integration along a loop. Since path integration can be considered as a linear functional from holomorphic differentials to the complex numbers, we consider such path integrals as the dual space to the space of holomorphic differentials. However, the loops we wanted to quotient out by can also be expressed as elements of the homology group of the curve (see also Homology and Cohomology)!

Therefore we now define the Jacobian of a curve X as

\displaystyle J(\Gamma)=\Omega^{\vee}/H_{1}(X,\mathbb{Z})

where \Omega denotes the holomorphic differentials on X. The notation \Omega^{\vee} denotes the dual to \Omega, since as we said the path integrals form the dual to the holomorphic differentials. The Jacobian can also described in other ways – for instance it is also the connected component of the Picard group (see also Divisors and the Picard Group), and the connection to the description given here is an important classical theorem called the Abel-Jacobi theorem.

The Jacobian is a higher-dimensional complex torus, and actually more is true – it is also an abelian variety, i.e. a projective variety whose points form a group (and hence a generalization of elliptic curves). Note that every complex torus is an elliptic curve, but this is not true in higher dimensions – only certain special kinds of higher dimensional complex tori (namely those with a polarization) are abelian varieties. In this vein the Jacobian of a curve has yet another description – it is “universal” among abelian varieties in that, if there is a morphism from a curve to any abelian variety, it can be expressed as a morphism from the curve to its Jacobian, followed by a morphism to that other abelian variety.

Now we go back to the case of modular curves. Denoting by S_{2}(\Gamma_{0}(N)) the space of cusp forms of weight two for the level structure \Gamma_{0}(N), which as discussed above is isomorphic to the space of holomorphic differentials on the corresponding modular curve X(\Gamma_{0}(N)), we can now define the Jacobian J(\Gamma_{0}(N)) as

\displaystyle J(\Gamma_{0}(N))=S_{2}(\Gamma_{0}(N))^{\vee}/H_{1}(X,\mathbb{Z})

The third ingredient that we need is a certain ideal of the Hecke algebra (the ring of endomorphisms of S_{2}(\Gamma_{0}(N)) generated by the actions of the Hecke operators and diamond operators) corresponding to the weight 2 Hecke eigenform f (let us denote this ideal by \mathbb{I}_{f}) that we want to obtain our Galois representation from. This ideal \mathbb{I}_{f}) is defined to be the one generated by all elements of the Hecke algebra whose eigenvalue when acting on f is zero.

Since the Hecke operators and diamond operators act on the Jacobian (we can see this this way – since the Jacobian is the quotient of the linear functionals on S_{2}(\Gamma_{0}(N)), the action is obtained by first applying the Hecke operator or diamond operator to the weight 2 eigenform, then applying the linear functional), we can use the ideal \mathbb{I}_{f} to cut down a quotient of the Jacobian which is another abelian variety A_{f}:

\displaystyle A_{f}=J(\Gamma_{0}(N))/\mathbb{I}_{f}J(\Gamma_{0}(N))

Finally, we can take the Tate module of A_{f}, and this will give us precisely the Galois representation that we want. The abelian variety A_{f} will have dimension equal to the degree of the number field generated by the eigenvalues of the Hecke operators.

If the eigenvalues are all rational, then A_{f} will actually be an elliptic curve – in other words, given an eigenform of weight 2 whose Hecke eigenvalues are all rational, we can always use it to construct an elliptic curve! This also gives us a map from the modular curve X(\Gamma_{0}(N)) to this elliptic curve, called a modular parametrization. The resulting elliptic curve will have the property that its L-function, built from point counts when it is reduced modulo primes, is the same as the L-function of the modular form which is built from its Fourier coefficients! This is because the Frobenius and the Fourier coefficients (which are also the eigenvalues of the Hecke operators) are related, as discussed above. The question of whether, given an elliptic curve, it comes from a modular form in this way, is another restatement of the question of modularity. The affirmative answer to this question, at least for certain elliptic curves over \mathbb{Q}, led to the proof of Fermat’s Last Theorem.

This theory, which is only very roughly sketched here, is just a very special case – one can also obtain, for instance, Galois representations from modular forms which are not of weight 2. We leave this for the future.

References:

Jacobian variety on Wikipedia

Abel-Jacobi map on Wikipedia

Modularity theorem on Wikipedia

Course on Mazur’s Theorem Lecture 10: Jacobians by Andrew Snowden

Course on Mazur’s Theorem Lecture 17: Eichler-Shimura by Andrew Snowden

A First Course in Modular Forms by Fred Diamond and Jerry Shurman

More on Galois Deformation Rings

In Galois Deformation Rings we introduced the concept of a Galois deformation ring, and how it is used to prove “R=T” theorems. In this post we will look at a very simple example to help make things more concrete. Then we will explore more about the structure of Galois deformation rings, in particular we want to relate the tangent space of such a Galois deformation ring to the Selmer group in Galois cohomology (which also shows up in a lot of contexts all over arithmetic geometry and number theory).

Let F be a finite extension of \mathbb{Q}, and let k be some finite field, with ring of Witt vectors W(k) (for example if k=\mathbb{F}_{p} then W(k)=\mathbb{Z}_{p}). Let our residual representation \overline{\rho}:\mathrm{Gal}(\overline{F}/F)\to GL_{1}(k) be the trivial representation, i.e. the group acts as the identity. A lift will be a Galois representation \overline{\rho}:\mathrm{Gal}(\overline{F}/F)\to GL_{1}(A), where A is a complete Noetherian algebra over W(k). Then our Galois deformation ring is given by the completed group ring

\displaystyle R _{\overline{\rho}}=W(k)[[\mathrm{Gal}(\overline{F}/F)^{\mathrm{ab,p}}]]

where \mathrm{Gal}(\overline{F}/F)^{\mathrm{ab,p}} means the pro-p completion of the abelianization of the Galois group \mathrm{Gal}(\overline{F}/F). Using local class field theory, we can express this even more explicitly as

\displaystyle R_{\overline{\rho}}=W(k)[\mu_{p^{\infty}}(F)][[X_{1},\ldots,X_{[F:\mathbb{Q}]}]]

Let us now consider a useful fact about the tangent space (see also Tangent Spaces in Algebraic Geometry) of such a deformation ring. Let us first consider the framed deformation ring R _{\overline{\rho}}^{\Box}. It is local, and has a unique maximal ideal \mathfrak{m}. There is only one tangent space, defined to be the dual of \mathfrak{m}/\mathfrak{m^{2}}, but this can also be expressed as the set of its dual number-valued points, i.e. \mathrm{Hom}(R_{\overline{\rho}}^{\Box},k[\epsilon]), which by the definition of the framed deformation functor, is also D_{\overline{\rho}}(k[\epsilon])^{\Box}. Any such deformation must be of the form

\displaystyle \rho(\sigma)=(1+\varepsilon c(\sigma))\overline{\rho}(\sigma)

where c is some n\times n matrix with coefficients in k. If \sigma and \tau are elements of \mathrm{Gal}(\overline{F}/F), if we substitute the above form of \rho into the equation \rho(\sigma\tau)=\rho(\sigma)\rho(\tau) we have

\displaystyle  (1+\varepsilon c(\sigma\tau))\overline{\rho}(\sigma\tau) = (1+\varepsilon c(\sigma))\overline{\rho}(\sigma) (1+\varepsilon c(\tau))\overline{\rho}(\tau)

from which we can see that

\displaystyle  c(\sigma\tau))\overline{\rho}(\sigma\tau) = c(\sigma)\overline{\rho}(\sigma)\overline{\rho}(\tau)+\overline{\rho}(\sigma)c(\tau)\overline{\rho}(\tau)

and, multiplying by \overline{\rho}(\sigma\tau)^{-1}= \overline{\rho}(\tau)^{-1}\overline{\rho}(\sigma)^{-1} on the right,

\displaystyle  c(\sigma\tau))=c(\sigma)(\tau)+c(\tau) \overline{\rho}(\sigma)\overline{\rho}(\sigma)^{-1}

In the language of Galois cohomology, we say that c is a 1-cocycle, if we take the n\times n matrices to be a Galois module coming from the “Lie algebra” of GL_{n}(k). We call this Galois module \mathrm{Ad}\overline{\rho}.

Now consider two different lifts (framed deformations) \rho_{1} and \rho_{2} which give rise to the same deformation of \overline{\rho}. Then there exists some n\times n matrix X such that

\displaystyle \rho_{1}(\sigma)=(1+\varepsilon X)\rho_{2}(\sigma)(1-\varepsilon X)

Plugging in \rho_{1}=(1+\varepsilon c_{1})\overline{\rho} and \rho_{2}=(1+\varepsilon c_{2})\overline{\rho} we obtain

\displaystyle  (1+\varepsilon c_{1})\overline{\rho}=(1+\varepsilon X) (1+\varepsilon c_{2})\overline{\rho}(1-\varepsilon X)

which will imply that

\displaystyle  c_{1}(\sigma)=c_{2}(\sigma)+X-\overline{\rho}(\sigma)X\overline{\rho}(\sigma)^{-1}

In the language of Galois cohomology (see also Etale Cohomology of Fields and Galois Cohomology) we say that c_{1} and c_{2} differ by a coboundary. This means that the tangent space of the Galois deformation ring is given by the first Galois cohomology with coefficients in \mathrm{Ad}\overline{\rho}:

\displaystyle D_{\overline{\rho}}(k[\epsilon])\simeq H^{1}(\mathrm{Gal}(\overline{F}/F),\mathrm{Ad}\overline{\rho})

More generally, when our Galois deformation ring is subject to conditions, it will be given by a subgroup of the first Galois cohomology known as the Selmer group (note that the Selmer group shows up in many places in arithmetic geometry and number theory, for instance, in the proof of the Mordell-Weil theorem where the Galois module used comes from the torsion points of an elliptic curve – in this post we are considering the case where the Galois module is \mathrm{Ad}\overline{\rho}, as stated earlier). The advantage of expressing the tangent space in the language of Galois deformation ring using Galois cohomology is that in Galois cohomology there are certain formulas such as Tate duality and the Euler characteristic formula that we can use to perform computations.

Finally to end this post we remark that under certain conditions (namely that for every open subgroup H of \mathrm{Gal}(\overline{F}/F) the space of continuous homomorphisms from H to \mathbb{F}_{p} has finite dimension) this tangent space is going to be a finite-dimensional vector space over k. Then the Galois deformation ring has the following form

\displaystyle R_{\overline{\rho}}=W(k)[[x_{1},\ldots,x_{g}]]/(f_{1},\ldots,f_{r})

i.e. it is a quotient of a W(k)-power series in g variables, where the number g is given by the dimension of H^{1}(\mathrm{Gal}(\overline{F}/F),\mathrm{Ad}\overline{\rho}) as a k-vector space, while the number of relations r is given by the dimension of H^{2}(\mathrm{Gal}(\overline{F}/F),\mathrm{Ad}\overline{\rho}) as a k-vector space.

Knowing the structure of Galois deformation rings is going to be important in proving R=T theorems, since such proofs often reduce to commutative algebra involving these rings. More details will be discussed in future posts on this blog.

References:

Group cohomology on Wikipedia

Galois cohomology on Wikipedia

Selmer group on Wikipedia

Tate duality on Wikipedia

Modularity Lifting Theorems by Toby Gee

Modularity Lifting (Course Notes) by Patrick Allen

Motives and L-functions by Frank Calegari

Beyond the Taylor-Wiles Method by Jack Thorne

Galois Deformation Rings

In Galois Representations we talked about obtaining continuous Galois representations for example from the \ell-adic etale cohomology of algebraic varieties, and hinted at being able to obtain such Galois representations from modular forms as well. While we postpone the discussion of how to obtain such a Galois representation to some future blog post (hopefully), we now mention the very important topic of modularity – which investigates, given some Galois representation, whether it comes from a modular form, and furthermore whether it provides some other information about the modular form that it comes from.

The topic of modularity is composed of two parts. The first is residual modularity – where we are given a Galois representation over a finite field (we call such a Galois representation a residual representation, in reference to the finite field being the residue field of some other ring) and figure out whether it comes from a modular form (in which case we also say that it is modular). The second part is modularity lifting, where, given a residual representation we know to be modular, we figure out whether it “lifts” to a Galois representation over \mathbb{Q}_{\ell}.

In this post, we focus only on one small ingredient of the approach to proving modularity lifting. Proofs of modularity lifting rely on “R=T” theorems, where R refers to a Galois deformation ring and T comes from a (localization of) a Hecke algebra (see also Hecke Operators). The small ingredient we will focus on in this post is the R, the Galois deformation ring.

A “deformation” in our context is an equivalence class of “lifts” and before we give the precise definitions we give a little bit of intuition about why we are interested in lifts. Roughly, in our context, a lift of some field \overline{R} is a local ring R such that \overline{R} is the residue field of R, i.e. \overline{R}=R/\mathfrak{m} where \mathfrak{m} is the unique maximal ideal of R (since R is a local ring by definition it has a unique maximal ideal).

So now for the intuition. Consider the real numbers \mathbb{R}. The “dual numbers” are defined to be \mathbb{R}[x]/(x^{2}). Its elements are of the form a+bx where a and b are real numbers. We can consider x here to be an “infinitesimal element”. So we may think of an element of the dual numbers to be a number, given by a, but with a “tangent vector” given by the number b. Another way to think about it is that is at “position a“, but it also has a “velocity b“. It’s like numbers, but with a little “wiggle”. Now that we know about the dual numbers \mathbb{R}[x]/(x^{2}), what about elements of \mathbb{R}[x]/(x^{3})? We may think of such an element, which is of the form a+bx+cx^{2}, to be a position “a“, with “velocity b“, and “acceleration c“, a kind of “higher wiggle”.

If we continue including higher and higher derivatives, then we have something whose elements are formal power series a+bx+cx^2+dx^3+\ldots. This is the ring \mathbb{R}[[x]], which is the inverse limit of the rings \mathbb{R}/(x^{n}). Now the ring \mathbb{R}[[x]] is a local ring with maximal ideal (x), and modding out by this maximal ideal gives \mathbb{R}. So this power series ring is a lift of \mathbb{R}, kind of numbers with “higher wiggles”. This is what the term “deformation” is supposed to bring to mind.

We now give more precise definitions. Let F be a finite extension of \mathbb{Q}, and let k be a finite field. A Galois representation \overline{\rho}:\text{Gal}(\overline{F}/F)\to \text{GL}_{2}(k) is also called a residual representation. Now let W(k) be the ring of Witt vectors of k; for example, if k=\mathbb{F}_{p}, then W(k)=\mathbb{Z}_{p}. A lift, or framed deformation of the residual representation \overline{\rho} is a Galois representation \overline{\rho}:\text{Gal}(\overline{\mathbb{Q}}/\mathbb{Q})\to \text{GL}_{n}(A) where A is a complete Noetherian local W(k)-algebra, such that modding out by the unique maximal ideal of A gives the residual representation \overline{\rho}. A deformation of \overline{\rho} is an equivalence class of lifts of \overline{\rho}, where two lifts are considered equivalent if they are conjugates under the kernel of the modding out map.

Consider the functor \text{Def}_{\overline{\rho}}^{\Box} from the category of complete Noetherian local W(k)-algebras to the category of sets, which assigns to a complete Noetherian local W(k)-algebra A the set of all its lifts. This functor happens to be representable, i.e. there is a Galois representation \overline{\rho}:\text{Gal}(\overline{F}/F)\to \text{GL}_{n}(R_{\overline{\rho}}^{\Box}) over some ring R_{\overline{\rho}}^{\Box} called the universal framed deformation ring, such that the lifts of \overline{\rho} are given by maps from the Galois deformations to the universal Galois deformation.

We can also do the same for deformations instead of framed deformations, as long as our residual representation satisfies a condition called “Schur’s condition”.

We can also impose conditions on our deformations – for instance, we may want to consider only lifts with a certain fixed determinant. These conditions are also called deformation problems and they are important because it is conjectured that Galois representations coming from modular forms have certain properties, and we want to match up these Galois representations with modular forms.

Roughly, the way these are matched up goes in the following manner. We have said above that deformations of a certain fixed Galois representation \overline{\rho} to A, possibly with some conditions, correspond to maps R_{\overline{\rho},\mathrm{conditions}}\to A. We state that, given an isomorphism between the complex numbers and the p-adic complex numbers we can always construct a map R_{\overline{\rho}, \mathrm{conditions} }\to \mathbb{C} from the preceding map.

Now a Hecke algebra \mathbb{T} acts on Hecke eigenforms (which say we want to match up with the Galois representations, to show that these Galois representations come from them) and therefore have associated systems of eigenvalues. It is known that any such system of eigenvalues comes from some Hecke eigenform.

We choose only a localization of the Hecke algebra, which we call \mathbb{T}_{\mathfrak{m}} , corresponding to only the modular forms that are expected to give rise to the Galois representations we are considering (the Eichler-Shimura theorem gives relations between the Fourier coefficients of the Hecke eigenform and the form of the characteristic polynomial of the Frobenius under the Galois representation, restricting it). On the other hand, these systems of eigenvalues corresponds to maps \mathbb{T}_{\mathfrak{m}}\to \mathbb{C}.

So if we can show that R_{\overline{\rho}, \mathrm{conditions} }=\mathbb{T}_{\mathfrak{m}}, then these two sets of maps to \mathbb{C} match up, then we can show that these Galois representations come from modular forms. Showing that R_{\overline{\rho}, \mathrm{conditions} }=\mathbb{T}_{\mathfrak{m}} is itself an elaborate process that involves a fascinating strategy pioneered by Richard Taylor and Andrew Wiles known as patching. We will hopefully discuss R=T theorems, and the method of patching, on this blog in more detail in the future.

References:

Deformation on Wikipedia

Modularity Lifting Theorems by Toby Gee

Modularity Lifting (Course Notes) by Patrick Allen

Motives and L-functions by Frank Calegari

Beyond the Taylor-Wiles Method by Jack Thorne

Perturbations, Deformations, and Variations (and “Near-Misses”) in Geometry, Physics, and Number Theory by Barry Mazur

Hecke Operators

A Hecke operator is a certain kind of linear transformation on the space of modular forms or cusp forms (see also Modular Forms) of a certain fixed weight k. They were originally used (and now named after) Erich Hecke, who used them to study L-functions (see also Zeta Functions and L-Functions) and in particular to determine the conditions for whether an L-series \sum_{n=1}^{\infty}a_{n}n^{-s} has an Euler product. Together with the meromorphic continuation and the functional equation, these are the important properties of the Riemann zeta function, which L-functions are supposed to be generalizations of. Hecke’s study was inspired by the work of Bernhard Riemann on the zeta function.

An example of a Hecke operator is the one commonly denoted T_{p}, for p a prime number. To understand it conceptually, we must take the view of modular forms as functions on lattices. This is equivalent to the definition of modular forms as functions on the upper half-plane, if we recall that a lattice \Lambda can also be expressed as \mathbb{Z}+\tau\mathbb{Z} where \tau is a complex number in the upper half-plane (see also The Moduli Space of Elliptic Curves).

In this view a modular form is a function on the space of lattices on \mathbb{C} such that

  • f(\mathbb{Z}+\tau\mathbb{Z}) is holomorphic as a function on the upper half-plane
  • f(\mathbb{Z}+\tau\mathbb{Z}) is bounded as \tau goes to i\infty
  • f(\mu\Lambda)=\mu^{-k}f(\Lambda) for some nonzero complex number \mu, and k is the weight of the modular form 

Now we define the Hecke operator T_{p} by what it does to a modular form f(\Lambda) of weight k as follows:

\displaystyle T_{p}f(\Lambda)=p^{k-1}\sum_{\Lambda'\subset \Lambda}f(\Lambda')

where \Lambda' runs over the sublattices of \Lambda of index p. In other words, applying T_{p} to a modular form gives back a modular form whose value on a lattice \Lambda is the sum of the values of the original modular form on the sublattices of \Lambda  of index p, times some factor that depends on the Hecke operator and the weight of the modular form.

Hecke operators are also often defined via their effect on the Fourier expansion of a modular form. Let f(\tau) be a modular form of weight k whose Fourier expansion is given by \sum_{n=0}^{\infty}a_{i}q^{i}, where we have adopted the convention q=e^{2\pi i \tau} which is common in the theory of modular forms (hence this Fourier expansion is also known as a q-expansion). Then the effect of a Hecke operator T_{p} is as follows:

\displaystyle T_{p}f(\tau)=\sum_{n=0}^{\infty}(a_{pn}+p^{k-1}a_{n/p})q^{n}

where a_{n/p}=0 when p does not divide n. To see why this follows from our first definition of the Hecke operator, we note that if our lattice is given by \mathbb{Z}+\tau\mathbb{Z}, there are p+1 sublattices of index p: There are p of these sublattices given by p\mathbb{Z}+(j+\tau)\mathbb{Z} for j ranging from 0 to p-1, and another one given by \mathbb{Z}+(p\tau)\mathbb{Z}. Let us split up the Hecke operators as follows:

\displaystyle T_{p}f(\mathbb{Z}+\tau\mathbb{Z})=p^{k-1}\sum_{j=0}^{p-1}f(p\mathbb{Z}+(j+\tau)\mathbb{Z})+p^{k-1}f(\mathbb{Z}+p\tau\mathbb{Z})=\Sigma_{1}+\Sigma_{2}

where \Sigma_{1}=p^{k-1}\sum_{j=0}^{p-1}f(p\mathbb{Z}+(j+\tau)\mathbb{Z}) and \Sigma_{2}=p^{k-1}f(\mathbb{Z}+p\tau\mathbb{Z}). Let us focus on the former first. We have

\displaystyle \Sigma_{1}=p^{k-1}\sum_{j=0}^{p-1}f(p\mathbb{Z}+(j+\tau)\mathbb{Z})

But applying the third property of modular forms above, namely that f(\mu\Lambda)=\mu^{-k}f(\Lambda) with \mu=p, we have

\displaystyle \Sigma_{1}=p^{-1}\sum_{j=0}^{p-1}f(\mathbb{Z}+((j+\tau)/p)\mathbb{Z})

Now our argument inside the modular forms being summed are in the usual way we write them, except that instead of \tau we have ((j+\tau)/p), so we expand them as a Fourier series

\displaystyle \Sigma_{1}=p^{-1}\sum_{j=0}^{p-1}\sum_{n=0}^{\infty}a_{n}e^{2\pi i n((j+\tau)/p)}

We can switch the summations since one of them is finite

\displaystyle \Sigma_{1}=p^{-1}\sum_{n=0}^{\infty}\sum_{j=0}^{p-1}a_{n}e^{2\pi i n((j+\tau)/p)}

The inner sum over j is zero unless p divides n, in which case the sum is equal to p. This gives us

\displaystyle \Sigma_{1}=p^{-1}\sum_{n=0}^{\infty}a_{pn}q^{n}

where again q=e^{2\pi i \tau}. Now consider \Sigma_{2}. We have

\displaystyle \Sigma_{2}=p^{k-1}f(\mathbb{Z}+p\tau\mathbb{Z})

Expanding the right hand side into a Fourier series, we have

\displaystyle \Sigma_{2}=p^{k-1}\sum_{n}a_{n}e^{2\pi i n p\tau}

Reindexing, we have

\displaystyle \Sigma_{2}=p^{k-1}\sum_{n}a_{n/p}q^{n}

and adding together \Sigma_{1} and \Sigma_{2} gives us our result.

The Hecke operators can be defined not only for prime numbers, but for all natural numbers, and any two Hecke operators T_{m} and T_{n} commute with each other. They preserve the weight of a modular form, and take cusp forms to cusp forms (this can be seen via their effect on the Fourier series). We can also define Hecke operators for modular forms with level structure, but it is more complicated and has some subtleties when for the Hecke operator T_{n} we have n sharing a common factor with the level.

If a cusp form f is an eigenvector for a Hecke operator T_{n}, and it is normalized, i.e. its Fourier coefficient a_{1} is equal to 1, then the corresponding eigenvalue of the Hecke operator T_{n} on f is precisely the Fourier coefficient a_{n}.

Now the Hecke operators satisfy the following multiplicativity properties:

  • T_{m}T_{n}=T_{mn} for m and n mutually prime
  • T_{p^{n}}T_{p}=T_{p^{n+1}}+p^{k-1}T_{p} for p prime

Suppose we have an L-series \sum_{n}a_{n}n^{-s}. This L-series will have an Euler product if and only if the coefficients a_{n} satisfy the following:

  • a_{m}a_{n}=a_{mn} for m and n mutually prime
  • a_{p^{n}}T_{p}=a_{p^{n+1}}+p^{k-1}a_{p} for p prime

Given that the Fourier coefficients of a normalized Hecke eigenform (a normalized cusp form that is a simultaneous eigenvector for all the Hecke operators) are the eigenvalues of the Hecke operators, we see that the L-series of a normalized Hecke eigenform has an Euler product.

In addition to the Hecke operators T_{n}, there are also other closely related operators such as the diamond operator \langle n\rangle and another operator denoted U_{p}. These and more on Hecke operators, such as other ways to define them with double coset operators or Hecke correspondences will hopefully be discussed in future posts.

References:

Hecke Operator on Wikipedia

Modular Forms by Andrew Snowden

Congruences between Modular Forms by Frank Calegari

A First Course in Modular Forms by Fred Diamond and Jerry Shurman

Advanced Topics in the Arithmetic of Elliptic Curves by Joseph H. Silverman

Iwasawa theory, p-adic L-functions, and p-adic modular forms

In Bernoulli Numbers, Fermat’s Last Theorem, and the Riemann Zeta Function, we introduced the Kubota-Leopold p-adic L-function, which encodes the congruences discovered by Kummer between special values of the Riemann zeta function. In this post, we will connect them to Iwasawa theory and p-adic modular forms.

Let us start with a little introduction to Iwasawa theory. Consider the Galois group \text{Gal}(\mathbb{Q}(\mu_{p^{\infty}})/\mathbb{Q}), where \mathbb{Q}(\mu_{p^{\infty}}) is the extension of the rational numbers \mathbb{Q} obtained by adjoining all the p-th-power roots of unity to \mathbb{Q}. This Galois group is isomorphic to \mathbb{Z}_{p}^{\times}, the group of units of the p-adic integers \mathbb{Z}_p.

The group \mathbb{Z}_{p}^{\times} decomposes into the product of a group isomorphic to 1+p\mathbb{Z}_{p} and a group isomorphic to (p-1)-th roots of unity. Let \Gamma be the subgroup of this Galois group isomorphic to 1+p\mathbb{Z}_{p}. The Iwasawa algebra is defined to be the group ring \mathbb{Z}_{p}[[\Gamma]], which also happens to be isomorphic to the power series ring \mathbb{Z}_{p}[[T]].

The interest in the Iwasawa algebra comes from the fact that many important objects of interest in number theory are modules over the Iwasawa algebra, and such modules have a structure that makes them easy to study. For instance, the inverse limit of the p-part of the ideal class groups of cyclotomic fields is such a module. The “main conjecture of Iwasawa theory“, a high-powered version of Kummer’s theorem that relates ideal class groups and Bernoulli numbers, describes this module. Namely, the main conjecture of Iwasawa theory states that as a module over the Iwasawa algebra, the inverse limit of the p-part of the ideal class groups of cyclotomic fields has a characteristic ideal generated by none other than the Kubota-Leopoldt p-adic L-function!

Let us describe more the relation between the Iwasawa algebra and the Kubota-Leopoldt zeta function by relating them to measures. Our measure here takes functions on the group \mathbb{Z}_p^{\times} and gives an element of \mathbb{Z}_{p}. This should remind us of measures and integrals in real analysis, except instead of our functions being on \mathbb{R}, they are on the group \mathbb{Z}_{p}^{\times}, and instead of taking values in \mathbb{R}, they take values in \mathbb{Z}_{p}. This is just an example of a more general kind of measure.

Now these measures are actually in one-to-one correspondence with the elements of the Iwasawa algebra!

The Iwasawa algebra is \mathbb{Z}_{p}[[\Gamma]], and note that \Gamma is a subset of \mathbb{Z}_{p}^{\times}. Suppose we have an element of the Iwasawa algebra. We define the corresponding measure by saying what it does to a function f on \mathbb{Z}_{p}^{\times}. Note that if we extend this function linearly, we can evaluate it on the element of the Iwasawa algebra and get an element of \mathbb{Z}_{p}^{\times}. Thus we define our measure by evaluation. The other direction is a bit more involved, but given the measure, we build an element of the Iwasawa algebra by exploiting the profinite nature of \mathbb{Z}_{p}^{\times}, which means the measure was built from functions on the finite pieces of it.

Now we know how the Iwasawa algebra and measures are related, what about the Kubota-Leopoldt zeta function? For those we must now take a detour through p-adic modular forms, in particular p-adic Eisenstein series.

The reason modular forms are brought into this is that the value of the zeta function at 1-k shows up in the constant term in the Fourier expansion of the Eisenstein series G_{k}:

\displaystyle G_{k}(\tau):=\frac{\zeta(1-k)}{2}+\sum_{n=1}^{\infty}\left(\sum_{d\vert n}d^{k-1}\right)q^{n}

where q=e^{2\pi i \tau}, as is common convention in the theory (hence the Fourier expansion is also known as the q-expansion). This Eisenstein series G_{k} is a modular form of weight k. A similar relationship holds between the Kubota-Leopoldt p-adic L-function and p-adic Eisenstein series, the latter of which is an example of a p-adic modular form. We will define this now. Let f be a modular form defined over \mathbb{Q}. This means that, when we consider its Fourier expansion

\displaystyle f(\tau)=\sum_{n=0}^{\infty}a_{n}q^{n},

the coefficients a_{n} are rational numbers. We define a p-adic valuation on the space of modular form by taking the biggest power of p among the coefficients a_{n}, i.e.

\displaystyle v_{p}(f)=\inf_{n} v_{p}(a_{n})

We recall that the bigger the power of p dividing a rational number, the smaller its p-adic valuation. This lets us consider the limit of a sequence. A p-adic modular form is the limit of a sequence of classical modular forms.

The weight of a p-adic modular form is the limit of the weights of the classical ones of which it is the limit. Serre showed that for classical modular forms f and g, if the p-adic valuation

\displaystyle v(f-g)>=v(f)+m

for some m, then the weights of f and g will be congruent mod (p-1)p^m.

This implies that the weight of a p-adic modular form takes values in the inverse limit of \mathbb{Z}/(p-1)p^{m}\mathbb{Z}, which is isomorphic to the product of \mathbb{Z}_{p} and (p-1)\mathbb{Z}. Here is where measures come in – this space of weights can be identified with characters of \mathbb{Z}_{p}^{\times}, i.e. a weight k is a function on \mathbb{Z}_{p}^{\times} and being such a function, it is an input for a measure!

Now, we will create a measure, with a bit of a twist. Given a weight k, we can build a p-adic Eisenstein series of weight k (recall that this is a limit of classical Eisenstein series):

\displaystyle G_{k}^{*}:=\varinjlim_{i}G_{k_{i}}

We think of this as a “measure” that takes a weight k (again recall that the weight k is a character, i.e. a function on \mathbb{Z}_{p}) and gives a weight k Eisenstein series, i.e an “Eisenstein measure“. But the value of the Kubota-Leopoldt zeta function at 1-k is the constant in the Fourier expansion! Therefore, if we take the constant term of this p-adic Eisenstein series, we have a good old measure, a recipe for taking a function on \mathbb{Z}_{p} (the weight k) and giving us an element of \mathbb{Z}_{p}. But by our earlier discussion, this is an element of the Iwasawa algebra!

There are some subtleties I swept under the rug, but to summarize – important objects in number theory are modules over the Iwasawa algebra. p-adic L-functions which interpolate L-functions at special values are elements of the Iwasawa algebra.

This is a modern, high-powered version of Kummer’s discovery that relates certain ideal class groups and Bernoulli numbers (which are special values of the Riemann zeta function). The Eisenstein measure, which gives a p-adic modular form when evaluated at a certain weight, leads to the notion of a “Hida family“, a “p-adic family” of p-adic modular forms. But that discussion is for another time!

References:

Iwasawa theory on Wikipedia

Iwasawa algebra on Wikipedia

p-adic L-function on Wikipedia

Main conjecture of Iwasawa theory on Wikipedia

An introduction to Eisenstein measures by E. E. Eischen

Modular curves and cyclotomic fields by Romyar Sharifi

Desde Fermat, Lamé y Kummer hasta Iwasawa: Una introducción a la teoría de Iwasawa (in Spanish) by Álvaro Lozano-Robledo

Bernoulli Numbers, Fermat’s Last Theorem, and the Riemann Zeta Function

The Bernoulli numbers are the Taylor series coefficients of the function

\displaystyle \frac{x}{e^{x}-1}.

The n-th Bernoulli number B_{n} is zero for odd n, except for n=1, where it is equal to -1/2. For the first few even numbers, we have

\displaystyle B_0=1,\; B_{2}=\frac{1}{6}, \; B_{4}=-\frac{1}{30}, \; B_6=\frac{1}{42}, \; B_{8}=-\frac{1}{30}, \; B_{10}=\frac{5}{66}.

Bernoulli numbers have many interesting properties, and many mathematicians have studied them for a very long time. They are named after Jacob Bernoulli, but were also studied by Seki Takakazu in Japan at around the same time (end of 17th/beginning of 18th century). In this post I want to focus more on the work of Ernst Edouard Kummer, more than a century after Bernoulli and Takakazu.

We’re going to come back to Bernoulli numbers later, but for now let’s talk about something completely different – Fermat’s Last Theorem, which Kummer was working on. In the time of Kummer, a proposal to study Fermat’s Last Theorem by factoring both sides of the famous equation into linear terms. Just as x^2+y^2 factors into

\displaystyle x^2+y^2=(x+iy)(x-iy),

we would have that x^{p}+y^{p} also factors into

\displaystyle x^{p}+y^{p}=(x+\zeta_{p}y)(x+\zeta_{p}^{2} y)...(x+\zeta_{p}^{p-1} y)

where \zeta_{p} is a p-th root of unity.

However, there is a problem. In these kinds of numbers where p-th roots of unity are adjoined, factorization may not be unique! Hence Kummer developed the theory of “ideals” to study this (see also The Fundamental Theorem of Arithmetic and Unique Factorization).

Unique factorization does not work with the numbers themselves, but it works with ideals (this is true for number fields, since they form what is called a “Dedekind domain”). Hence the original name of ideals was “ideal numbers”. To number fields we associate an “ideal class group“. If this group has only one element, unique factorization holds. If not, then things can get complicated. The ideal class group (together with the Galois group) is probably the most important group in number theory.

Kummer found that if p is a “regular prime“, i.e. if p does not divide the number of elements of the ideal class group (also known as the class number) of the “p-th cyclotomic field” (the rational numbers with p-th roots of unity adjoined), then Fermat’s Last Theorem is true for p.

Let’s go back to Bernoulli numbers now – Kummer also found that a prime p is regular if and only if it does not divide the numerator for the nth Bernoulli number, for all n less than p-1. In other words, Kummer proved Fermat’s Last Theorem for prime exponents not dividing the numerators of Bernoulli numbers! Fermat’s Last Theorem has now been proved in all cases, but the work of Kummer remains influential.

So we’ve related Bernoulli numbers to ideal class groups and the very famous Fermat’s Last Theorem. Now let us relate Bernoulli numbers to another very famous thing in math – the Riemann zeta function (see also Zeta Functions and L-Functions).

It is known that the Bernoulli numbers are related to values of the Riemann zeta function at the negative integers (so we need the analytic continuation to do this) by the following equation: B_n=n \zeta(1-n) for n greater than or equal to 1.

Now, Kummer also discovered that Bernoulli numbers satisfy certain congruences modulo powers of a prime p, in particular

\displaystyle \frac{B_{m}}{m}\equiv \frac{B_{n}}{n} \mod p

where m and n are positive even integers neither of which are divisible by (p-1), and m\equiv n \mod (p-1). Here congruence for two rational numbers \frac{a}{b} and \frac{c}{d} means that ad is congruent to cd mod p.

We also have a more general congruence for bigger powers of p:

\displaystyle (1-p^{m-1})\frac{B_{m}}{m}\equiv (1-p^{n-1})\frac{B_{n}}{n} \mod p^{a+1}

where m and n are positive even integers neither of which are divisible by (p-1), and m\equiv n \mod \varphi(p^{a}+1), \varphi^{a}+1 being the number of positive integers less than p^{a+1} that are also mutually prime to it.

By by our earlier discussion, this means the special values of the Riemann zeta function also satisfy congruences modulo powers of p.

Congruences modulo powers of p is encoded in modern language by the “p-adic numbers” (see also Valuations and Completions) introduced by Kurt Hensel near the end of the 19th century. The congruences between the special values of the Riemann zeta function is now similarly encoded in a p-adic analytic function known as the Kubota-Leopoldt p-adic L-function.

So again, to summarize the story so far – Bernoulli numbers are related to the ideal class group and also to the special values of the Riemann zeta function, and bridge the two subjects.

If this reminds you of the analytic class number formula, well in fact that is one of the ingredients in the proof of Kummer’s result relating regular primes and the Bernoulli numbers. Moreover, the information that they encode is related to divisibility or congruence modulo primes or their powers. This is where the p-adic L-functions come in.

The Bernoulli numbers also appear in the constant term of the Fourier expansion of Eisenstein series. The Eisenstein series is an example of a modular form (see also Modular Forms), which gives us Galois representations. The Galois group, on the other hand is related to the ideal class group by class field theory (see also Some Basics of Class Field Theory). So this is one way to create the bridge between the two concepts. In fact, this was used to prove the Herbrand-Ribet theorem, a stronger version of Kummer’s result.

So we also have modular forms in the picture. In modern research all of these are deeply intertwined – ideal class groups, zeta functions, congruences, and modular forms.

References:

Bernoulli number on Wikipedia

Riemann zeta function on Wikipedia

Kummer’s congruence on Wikipedia

p-adic L-function on Wikipedia

Herbrand-Ribet theorem on Wikipedia

Bernoulli numbers, Hurwitz numbers, p-adic L-functions and
Kummer’s criterion
by Alvaro Lozano-Robledo

An introduction to Eisenstein measures by E. E. Eischen

How can we construct abelian Galois extensions of basic number
fields?
by Barry Mazur

Modular Forms

We have previously mentioned modular forms in The Moduli Space of Elliptic Curves and discussed them very briefly in the context of modular curves in Shimura Varieties. In this post, we will discuss this very important and central concept in modern number theory in more detail.

First we recall some facts about the group \text{SL}_{2}(\mathbb{Z}), which is so important that it is given the special name of the modular group. It is defined as the group of 2\times 2 matrices with integer coefficients and determinant equal to 1, and it acts on the upper half-plane (the set of complex numbers with positive imaginary part) in the following manner. Suppose an element \gamma of \text{SL}_{2}(\mathbb{Z}) is written in the form \left(\begin{array}{cc}a&b\\ c&d\end{array}\right). Then for \tau an element of the upper half-plane we write

\displaystyle \gamma(\tau)=\frac{a\tau+b}{c\tau+d}

A modular form (with respect to \text{SL}_{2}(\mathbb{Z})) is a holomorphic function on the upper half-plane such that

\displaystyle f(\gamma(\tau))=(c\tau+d)^{k}f(\tau)

for some k and such that f(\tau) is bounded as the imaginary part of \tau goes to infinity. The number k is called the weight of the modular form. If the function is not required to be bounded as the imaginary part of \tau goes to infinity it is a weakly modular form, and if furthermore it is merely required to be meromorphic, , it is a meromorphic modular form. A meromorphic modular form of weight 0 is just a meromorphic function on the upper half-plane which is invariant under the action of \text{SL}_{2}(\mathbb{Z}) (and bounded as the imaginary part of its argument goes to infinity) – we also call it a modular function.

We denote the set of modular forms of weight k with respect to \text{SL}_{2}(\mathbb{Z}) by \mathcal{M}_{k}(\text{SL}_{2}(\mathbb{Z})). Adding together two modular forms of the same weight gives another modular form of the same weight, and modular forms can be scaled by a complex number, so \mathcal{M}_{k}(\text{SL}_{2}(\mathbb{Z})) actually forms a vector space. We can also multiple a modular form of weight k with a modular form of weight l to get a modular form of weight k+l, so modular forms of a certain weight form a graded piece of a graded ring \mathcal{M}(\text{SL}_{2}(\mathbb{Z}):

\displaystyle \mathcal{M}(\text{SL}_{2}(\mathbb{Z}))=\bigoplus_{k}\mathcal{M}_{k}(\text{SL}_{2}(\mathbb{Z}))

Modular functions are actually functions on the moduli space of elliptic curves – but what about modular forms of higher weight? It turns out that he modular forms of weight 2 correspond to coefficients of differential forms on this space. To see this, consider d\tau and how the group \text{SL}(\mathbb{Z}) acts on it:

\displaystyle d\gamma(\tau)=\gamma'(\tau)d\tau=(c\tau+d)^{-2}d\tau

where \gamma'(\tau) is just the usual derivative of he action of \gamma as describe earlier. For a general differential form given by f(\tau)d\tau to be invariant under the action of \text{SL}(\mathbb{Z}) we must therefore have

\displaystyle f(\gamma(\tau))=(c\tau+d)^{2}f(\tau).

The modular forms of weight greater than 2 arise when we consider products of these differential forms. More technically, modular forms are sections of line bundles on modular curves, which come about when we compactify moduli spaces of elliptic curves (possibly with extra structure).

Let us now look at some examples of modular forms. Since modular forms “live on” moduli spaces of elliptic curves, we will keep in mind elliptic curves as we look at these examples. Our first family of examples are Eisenstein series of weight k, denoted by G_{k}(\tau) which is of the form

\displaystyle G_{k}(\tau)=\sum_{(m,n)\in\mathbb{Z}^{2}\setminus (0,0)}\frac{1}{(m+n\tau)^{k}}

Any modular form can in fact be written in terms of Eisenstein series G_{4}(\tau) and G_{6}(\tau).

Now, let us relate this to elliptic curves. An elliptic curve over the complex numbers may be written as a Weierstrass equation

\displaystyle y^{2}=4x^{3}-g_{2}x-g_{3}

The coefficients on the right-hand side g_{2} and g_{3} are in fact modular forms, of weight 4 and weight 6 respectively, given in terms of the Eisenstein series by g_{2}(\tau)=60G_{4}(\tau) and g_{3}(\tau)=140G_{6}(\tau).

Another example of a modular form is the modular discriminant of an elliptic curve, as a modular form denoted \Delta(\tau). It is a modular form of weight 12, and can be expressed via the elliptic curve coefficients that we defined earlier:

\Delta(\tau)=(g_{2}(\tau))^{3}-27(g_{3}(\tau))^{2}.

Our final example in this post is not of a modular form, but a meromorphic modular form of weight 0, i.e. a modular function. It is holomorphic on the upper half-plane, but goes to infinity as the imaginary part of \tau goes to infinity. It is the j-invariant associated to an elliptic curve. Once again we may express it in terms of the elliptic curve coefficients g_{2} and g_{3}:

\displaystyle j(\tau)=1728\frac{(g_{2}(\tau))^{3}}{(g_{2}(\tau))^{3}-27(g_{3}(\tau))^{2}}

Note that the denominator is also the modular discriminant.  The points of the moduli space of elliptic curves correspond to isomorphism classes of elliptic curves, and since the j-invariant is an honest-to-goodness holomorphic function on the moduli space of elliptic curves over \mathbb{C}, we can see that isomorphic elliptic curves will have the same j-invariant. This is not the case for the other modular forms we described above, which are not modular functions, i.e. they have nonzero weight! Why is this so? Let us recall that an elliptic curve over \mathbb{C} corresponds to a lattice. Acting on a basis of this lattice by an element of \text{SL}_{2}(\mathbb{Z}) changes the basis, but preserves the lattice. This will be reflected as “admissible changes of coordinates” in the Weierstrass equations, and also changes these modular forms associated to the elliptic curves even though the elliptic curves are still isomorphic. But they change in a predictable way, according to the definition of modular forms.

A modular form f(\tau) is also called a cusp form if the limit of f(\tau) is zero as the imaginary part of \tau approaches infinity. We denote the set of cusp forms of weight k by \mathcal{S}_{k}(\text{SL}_{2}(\mathbb{Z}). They are a vector subspace of \mathcal{M}_{k}(\text{SL}_{2}(\mathbb{Z}) and the graded ring formed by their direct sum for all k, denoted \mathcal{S}_{k}(\text{SL}_{2}(\mathbb{Z}), is an ideal of the graded ring \mathcal{M}(\text{SL}_{2}(\mathbb{Z}). Cusp forms form a very important part of modern research, but we will not discuss them much in this introductory post and leave them for the future.

Let us now discuss congruence subgroups of \text{SL}_{2}(\mathbb{Z}) (we have also discussed this briefly in Shimura Varieties), so that we can define more general modular forms with respect to such a congruence subgroup instead of just \text{SL}_{2}(\mathbb{Z}). Given an integer N, the principal congruence subgroup \Gamma(N) of \text{SL}_{2}(\mathbb{Z}) is the subgroup consisting of the elements which reduce to the identity when we reduce the entries modulo N. A congruence subgroup is any subgroup \Gamma that contains the principal congruence subgroup \Gamma(N). We refer to N as the level of the congruence subgroup.

There are two important kinds of congruence subgroups of \text{SL}_{2}(\mathbb{Z}), denoted by \Gamma_{0}(N) and \Gamma_{1}(N). The subgroup \Gamma_{0}(N) consists of the elements that become upper triangular after reduction modulo N, while the subgroup \Gamma_{1}(N) consists of the elements that become upper triangular with ones on the diagonal after reduction modulo N. As we discussed in Shimura Varieties, these are related to moduli spaces of “elliptic curves with level structure”.

Now we can define the modular forms of weight k with respect to such a congruence subgroup \Gamma. We shall once again require them to be holomorphic functions on the upper half-plane, and we require that for \gamma\in \Gamma written as \left(\begin{array}{cc}a&b\\ c&d\end{array}\right) we must have

\displaystyle f(\gamma(\tau))=(c\tau+d)^{k}f(\tau).

However, the condition that the function be bounded as the imaginary part of \tau goes to infinity must be modified. The reason is that the “point at infinity” is a cusp, a point of the modular curve that does not correspond to an elliptic curve over \mathbb{C} but rather to a “degeneration” of it (this point is therefore not a part of the usual moduli space of elliptic curves –  we can think of it as a “puncture” in this space).

We recall that the construction of the moduli space of elliptic curves over \mathbb{C} starts with the upper half-plane, then we quotient out by the action of \text{SL}_{2}(\mathbb{Z}). The cusps come from taking the union of the rational numbers with the upper half-plane, as well as the point at infinity. When we take the quotient by \text{SL}_{2}(\mathbb{Z}) this all gets sent to the same point, therefore the usual moduli space has only one cusp. But if we take the quotient by a congruence subgroup, we may have several cusps. Therefore, what we really require is for the modular form to be “holomorphic at the cusps“. We can still express this condition in familiar terms by requiring that not f(\tau), but rather (c\tau+d)^{-k}f(\gamma(\tau)) for \gamma\in \text{SL}_{2}(\mathbb{Z}) be bounded as the imaginary part of \tau goes to infinity. We can then define cusp forms with respect to \Gamma by requiring vanishing at the cusps instead. The set of modular forms (resp. cusp forms) of weight k with respect to \Gamma are denoted \mathcal{M}_{k}(\Gamma) (resp. \mathcal{S}_{k}(\Gamma)), and they also have the same structures of being vector spaces and being graded pieces of graded rings as the ones for \text{SL}_{2}(\mathbb{Z}).

Having only discussed the very basics of modular forms we end the post here, with the hope  that in the near future we will be able to discuss things such as Hecke operators, modular curves and their Jacobians, and their associated Galois representations. We redirect the interested reader to the references for now.

References:

Modular Form on Wikipedia

Eisenstein Series in Wikipedia

j-invariant on Wikipedia

Modular Form on Wikipedia

Congruence Subgroups on Wikipedia

A First Course in Modular Forms by Fred Diamond and Jerry Shurman

Advanced Topics in the Arithmetic of Elliptic Curves by Joseph H. Silverman

The Moduli Space of Elliptic Curves

A moduli space is a kind of “parameter space” that “classifies” mathematical objects. Every point of the moduli space stands for a mathematical object, in such a way that mathematical objects which are more similar to each other are closer and those that are more different from each other are farther apart. We may use the notion of equivalence relations (see Modular Arithmetic and Quotient Sets) to assign several objects which are in some sense “isomorphic” to each other to a single point.

We have discussed on this blog before one example of a moduli space – the projective line (see Projective Geometry). Every point on the projective line corresponds to a geometric object, a line through the origin. Two lines which have almost the same value of the slope will be closer on the projective line compared to two lines which are almost perpendicular.

Another example of a moduli space is that for circles on a plane – such a circle is specified by three real numbers, two coordinates for the center and one positive real number for the radius. Therefore the moduli space for circles on a plane will consist of a “half-volume” of some sort, like 3D space except that one coordinate is restricted to be strictly positive. But if we only care about the circles up to “congruence”, we can ignore the coordinates for the center – or we can also think of it as simply sending circles with the same radius to a single point, even if they are centered at different points. This moduli space is just the positive real line. Every point on this moduli space, which is a positive real number, corresponds to all the circles with radius equal to that positive real number.

We now want to construct the moduli space of elliptic curves. In order to do this we will need to first understand the meaning of the following statement:

Over the complex numbers, an elliptic curve is a torus.

We have already seen in Elliptic Curves what an elliptic curve looks like when graphed in the xy plane, where x and y are real numbers. This gives us a look at the points of the elliptic curve whose coordinates are real numbers, or to put it in another way, these are the real numbers x and y which satisfy the equation of the elliptic curve.

When we look at the points of the elliptic curve with complex coordinates, or in other words the complex numbers which satisfy the equation of the elliptic curve, the situation is more complicated. First off, what we actually have is not what we usually think of as a curve, but rather a surface, in the same way that the complex numbers do not form a line like the real numbers do, but instead form a plane. However, even though it is not easy to visualize, there is a function called the Weierstrass elliptic function which provides a correspondence between the (complex) points of an elliptic curve and the points in the “fundamental parallelogram” of a lattice in the complex plane. We can think of “gluing” the opposite sides of this fundamental parallelogram to obtain a torus. This is what we mean when we say that an elliptic curve is a torus. This also means that there is a correspondence between elliptic curves and lattices in the complex plane.

We will discuss more about lattices later on in this post, but first, just in case the preceding discussion seems a little contrived, we elaborate a bit on the Weierstrass elliptic function. We must first discuss the concept of a holomorphic function. We have discussed in An Intuitive Introduction to Calculus the concept of the derivative of a function. Now not all functions have derivatives that exist at all points; in the case that the derivative of the function does exist at all points, we refer to the function as a differentiable function.

The concept of a holomorphic function in complex analysis (analysis is the term usually used in modern mathematics to refer to calculus and its related subjects) is akin to the concept of a differentiable function in real analysis. The derivative is defined as the limit of a certain ratio as the numerator and the denominator both approach zero; on the real line, there are limited ways in which these quantities can approach zero, but on the complex plane, they can approach zero from several different directions; for a function to be holomorphic, the expression for its derivative must remain the same regardless of the direction by which we approach zero.

In previous posts on topology on this blog we have been treating two different topological spaces as essentially the same whenever we can find a bijective and continuous function (also known as a homeomorphism) between them; similarly, we have been treating different algebraic structures such as groups, rings, modules, and vector spaces as essentially the same whenever we can find a bijective homomorphism (an isomorphism) between two such structures. Following these ideas and applying them to complex analysis, we may treat two spaces as essentially the same if we can find a bijective holomorphic function between them.

The Weierstrass elliptic function is not quite holomorphic, but is meromorphic – this means that it would have been holomorphic everywhere if not for the “lattice points” where there exist “poles”. But it is alright for us, because such a lattice point is to be mapped to the “point at infinity”. All in all, this allows us to think of the complex points of the elliptic curve as being essentially the same as a torus, following the ideas discussed in the preceding paragraph.

Moreover, the torus has a group structure of its own, considered as the direct product group \text{U}(1)\times\text{U}(1) where \text{U}(1) is the group of complex numbers of magnitude equal to 1 with the law of composition given by the multiplication of complex numbers. When the complex points of the elliptic curve get mapped by the Weierstrass elliptic function to the points of the torus, the group structure provided by the “tangent and chord” or “tangent and secant” construction becomes the group structure of the torus. In other words, the Weierstrass elliptic function provides us with a group isomorphism.

All this discussion means that the study of elliptic curves becomes the study of lattices in the complex plane. Therefore, what we want to construct is the moduli space of lattices in the complex plane, up to a certain equivalence relation – two lattices are to be considered equivalent if one can be obtained by multiplying the other by a complex number (this equivalence relation is called homothety). Going back to elliptic curves, this corresponds to an isomorphism of elliptic curves in the sense of algebraic geometry.

Now given two complex numbers \omega_{1} and \omega_{2}, a lattice \Lambda in the complex plane is given by

\Lambda=\{m\omega_{1}+n\omega_{2}|m,n\in\mathbb{Z}\}

For example, setting \omega_{1}=1 and \omega_{2}=i, gives a “square” lattice. This lattice is also the set of all Gaussian integers. The fundamental parallelogram is the parallelogram formed by the vertices 0, \omega_{1}, \omega_{2}, and \omega_{1}+\omega_{2}. Here is an example of a lattice, courtesy of used Alvaro Lozano Robledo of Wikipedia:

fundamental_parallelogram

The fundamental parallelogram is in blue. Here is another, courtesy of user Sam Derbyshire of Wikipedia:

200px-lattice_torsion_points-svg

Because we only care about lattices up to homothety, we can “rescale” the lattice by multiplying it with a complex number equal to \frac{1}{\omega_{1}}, so that we have a new lattice equivalent under homothety to the old one, given by

\Lambda=\{m+n\omega|m,n\in\mathbb{Z}\}

where

\displaystyle \tau=\frac{\omega_{2}}{\omega_{1}}.

We can always interchange \omega_{1} and \omega_{2}, but we will fix our convention so that the complex number \tau=\frac{\omega_{2}}{\omega_{1}}, when written in polar form \tau=re^{i\theta} always has a positive angle \theta between 0 and 180 degrees. If we cannot obtain this using our choice of \omega_{1} and \omega_{2}, then we switch the two.

Now what this means is that a complex number \omega, which we note is a complex number in the upper half plane \mathbb{H}=\{z\in \mathbb{C}|\text{Im}(z)>0\}, because of our convention in choosing \omega_{1} and \omega_{2}, uniquely specifies a homothety class of lattices \Lambda. However, a homothety class of lattices may not always uniquely specify such a complex number \tau. Several such complex numbers may refer to the same homothety class of lattices.

What \omega_{1} and \omega_{2} specify is a choice of basis (see More on Vector Spaces and Modules) for the lattice \Lambda; we may choose several different bases to refer to the same lattice. Hence, the upper half plane is not yet the moduli space of all lattices in the complex plane (up to homothety); instead it is an example of what is called a Teichmuller space. To obtain the moduli space from the Teichmuller space, we need to figure out when two different bases specify lattices that are homothetic.

We will just write down the answer here; two complex numbers \tau and \tau' refer to homothetic lattices if there exists the following relation between them:

\displaystyle \tau'=\frac{a\tau+b}{c\tau+d}

for integers abc, and d satisfying the identity

\displaystyle ad-bc=1.

We can “encode” this information into a 2\times 2 matrix (see Matrices) which is an element of the group (see Groups) called \text{SL}(2,\mathbb{Z}). It is the group of 2\times 2 matrices with integer entries and determinant equal to 1. Actually, the matrix with entries abc, and d and the matrix with entries -a-b-c, and -d specify the same transformation, therefore what we actually want is the group called \text{PSL}(2,\mathbb{Z}), also known as the modular group, and also written \Gamma(1), obtained from the group \text{SL}(2,\mathbb{Z}) by considering two matrices to be equivalent if one is the negative of the other.

We now have the moduli space that we want – we start with the upper half plane \mathbb{H}, and then we identify two points if we can map one point into the other via the action of an element of the modular group, as we have discussed earlier. In technical language, we say that they belong to the same orbit. We can write our moduli space as \Gamma(1)\backslash\mathbb{H} (the notation means that the group \Gamma(1) acts on \mathbb{H} “on the left”).

When dealing with quotient sets, which are sets of equivalence classes, we have seen in Modular Arithmetic and Quotient Sets that we can choose from an equivalence class one element to serve as the “representative” of this equivalence class. For our moduli space \Gamma(1)\backslash\mathbb{H}, we can choose for the representative of an equivalence class a point from the “fundamental domain” for the modular group. Any point on the upper half plane can be obtained by acting on a point from the fundamental domain with an element of the modular group. The following diagram, courtesy of user Fropuff on Wikipedia, shows the fundamental domain in gray:

modulargroup-fundamentaldomain-01

The other parts of the diagram show where the fundamental domain gets mapped to by certain special elements, in particular the “generators” of the modular group, which are the two elements where a=0, b=-1, c=1, and d=-1, and a=1, b=1, c=1, and d=0. We will not discuss too much of these concepts for now. Instead we will give a preview of some concepts related to this moduli space. Topologically, this moduli space looks like a sphere with a missing point; in order to make the moduli space into a sphere (topologically), we take the union of the upper half plane \mathbb{H} with the projective line (see Projective Geometry) \mathbb{P}^{1}(\mathbb{Q}). This projective line may be thought of as the set of all rational numbers \mathbb{Q} together with a “point at infinity.” The modular group also acts on this projective line, so we can now take the quotient of \mathbb{H}\cup\mathbb{P}^{1}(\mathbb{Q}) (denoted \mathbb{H}^{*} by the same equivalence relation as earlier; this new space, topologically equivalent to the sphere, is called the modular curve X(1).

The functions and “differential forms” on the modular curve X(1) are of special interest. They can be obtained from functions on the upper half plane (with the “point at infinity”) satisfying certain conditions related to the modular group. If they are holomorphic everywhere, including the “point at infinity”, they are called modular forms. Modular forms are an interesting object of study in themselves, and their generalizations, automorphic forms, are a very active part of modern mathematical research.

Moduli Space on Wikipedia

Elliptic Curve on Wikipedia

Weierstrass’s Elliptic Functions on Wikipedia

Fundamental Pair of Periods on Wikipedia

Modular Group on Wikipedia

Fundamental Domain on Wikipedia

Modular Form on Wikipedia

Automorphic Form on Wikipedia

Image by User Alvano Lozano Robledo of Wikipedia

Image by User Sam Derbyshire of Wikipedia

Image by User Fropuff of Wikipedia

Advanced Topics in the Arithmetic of Elliptic Curves by Joseph H. Silverman

A First Course in Modular Forms by Fred Diamond and Jerry Shurman