Differential Forms

Differential forms are important concepts in differential geometry and mathematical physics. For example, they can be used to express Maxwell’s equations (see Some Basics of (Quantum) Electrodynamics) in a very elegant form. In this post, however, we will introduce these mathematical objects as generalizing certain aspects of integral calculus (see An Intuitive Introduction to Calculus), allowing us to perform integration over surfaces, volumes, or their higher-dimensional analogues.

We recall from An Intuitive Introduction to Calculus the statement of the fundamental theorem of calculus:

\displaystyle \int_{a}^{b}\frac{df}{dx}dx=f(b)-f(a).

Regarding the left hand side of this equation, we usually we say that we integrate over the interval from a to b; we may therefore also write it more suggestively as

\displaystyle \int_{[a,b]}\frac{df}{dx}dx=f(b)-f(a).

We note that a and b form the boundary of the interval [a,b]. We denote the boundary of some “shape” M by \partial M. Therefore, in this case, \partial [a,b]=\{a\}\cup\{b\}.

Next we are going to perform some manipulations on the notation, which, while we will not thoroughly justify in this post, are meant to be suggestive and provide intuition for the discussion on differential forms. First we need the notion of orientation. We can imagine, for example, an “arrow” pointing from a to b; this would determine one orientation. Another would be determined by an “arrow” pointing from b to a. This is important because we need a notion of integration “from a to b” or “from b to a“, and the two are not the same. In fact,

\displaystyle \int_{a}^{b}\frac{df}{dx}dx=-\int_{b}^{a}\frac{df}{dx}dx

i.e. there is a change of sign if we “flip” the orientation. Although an interval such as [a,b] is one-dimensional, the notion of orientation continues to make sense in higher dimension. If we have a surface, for example, we may consider going “clockwise” or “counterclockwise” around the surface. Alternatively we may consider an “arrow” indicating which “side” of the surface we are on. For three dimensions or higher it is harder to visualize, but we will be able to make this notion more concrete later on with differential forms.

Given the notion of orientation, let us now denote the boundary of the interval [a,b], taken with orientation, for instance, “from a to b“, by \{a\}^{-}\cup\{b\}^{+}.

Let us now write

\displaystyle \frac{df}{dx}dx=df

and then we can write the fundamental theorem of calculus as

\displaystyle \int_{[a,b]}df=f(b)-f(a).

Then we consider the idea of “integration over points”, by which we refer to simply evaluating the function at those points, with the orientation taken into account, such that we have

\displaystyle \int_{\{a\}^{-}\cup\{b\}^{+}}f=f(b)-f(a)

Recalling that \partial [a,b]=\{a\}^{-}\cup\{b\}^{+}, this now gives us the following expression for the fundamental theorem of calculus:

\displaystyle \int_{[a,b]}df=\int_{\{a\}^{-}\cup\{b\}^{+}}f

\displaystyle \int_{[a,b]}df=\int_{\partial [a,b]}f

Things may still be confusing to the reader at this point – for instance, that integral on the right hand side looks rather weird – we will hopefully make things more concrete shortly. For now, the rough idea that we want to keep in mind is the following:

The integral of a “differential” of some function over some shape is equal to the integral of the function over the boundary of the shape.

In one dimension, this is of course the fundamental theorem of calculus as we have stated it earlier. For two dimensions, there is a famous theorem called Green’s theorem. In three dimensions, there are two manifestations of this idea, known as Stokes’ theorem and the divergence theorem. The more “concrete” version of this statement, which we want to discuss in this post, is the following:

The integral of the exterior derivative of a differential form over a manifold with boundary is equal to the integral of the differential form over the boundary.

We now discuss what these differential forms are. Instead of the formal definitions, we will start with special cases, develop intuition with examples, and attempt to generalize. The more formal definitions will be left to the references. We will start with the so-called 1-forms, which are “linear combinations” of the “differentials”.

We can think of these “differentials” as merely symbols for now, or perhaps consider them analogous to “infinitesimal quantities” in calculus. In differential geometry, however, they are actually “dual” to vectors, mapping vectors to numbers in the same way that row matrices map column matrices to the numbers which serve as their scalars (see Matrices) of the coordinates, with coefficients which are functions:

\displaystyle f_{1}dx+f_{2}dy+f_{3}dz

From now on, to generalize, instead of the coordinates x, y, and z we will use x^{1}, x^{2}, x^{3}, and so on. We will write exponents as (x^{1})^{2}, to hopefully avoid confusion.

From these 1-forms we can form 2-forms by taking the wedge product. In ordinary multivariable calculus, the following expression

\displaystyle dxdy

represents an “infinitesimal area”, and so for example the integral

\displaystyle \int_{0}^{1}\int_{0}^{1}dxdy

gives us the area of a square with vertices at (0,0)(1,0)(0,1), and (1,1). The wedge product expresses this same idea (in fact the wedge product dx\wedge dy is often also called the area form, mirroring the idea expressed by dxdy earlier), except that we want to include the concept of orientation that we discussed earlier. Therefore, in order to express this idea of orientation, we require the wedge product to satisfy the following property called antisymmetry:

\displaystyle dx^{1}\wedge dx^{2}=-dx^{2}\wedge dx^{1}

Note that antisymmetry implies the following relation:

\displaystyle dx^{i}\wedge dx^{i}=-dx^{i}\wedge dx^{i}

\displaystyle dx^{i}\wedge dx^{i}=0

In other words, the wedge product of such a differential form with itself is equal to zero.

We can also form 3-forms, 4-forms, etc. using the wedge product. The collection of all these n-forms, for every n, is the algebra of differential forms. This means that we can add, subtract, and form wedge products of differential forms. Ordinary functions themselves form the 0-forms.

We can also take what is called the exterior derivative of differential forms. If, for example, we have a differential form \omega given by the following expression,

\displaystyle \omega=f dx^{a}

then the exterior derivative of \omega, written d\omega, is given by

\displaystyle d\omega=\sum_{i=1}^{n}\frac{\partial f}{\partial x_{i}}dx^{i}\wedge dx^{a}.

We note that the exterior derivative of a n-form is an n+1-form. We also note that the exterior derivative of an exterior derivative is always zero, i.e. d(d\omega)=0 for any differential form \omega. A differential form which is the exterior derivative of some other differential form is called exact. A differential form whose exterior derivative is zero is called closed. The statement d(d\omega)=0 can also be expressed as follows:

All exact forms are closed.

However, not all closed forms are exact. This is reminiscent of the discussion in Homology and Cohomology, and in fact the study of closed forms which are not exact leads to the theory of de Rham cohomology, which is a very important part of modern mathematics and mathematical physics.

Given the idea of the exterior derivative, the general form of the fundamental theorem of calculus is now given by the generalized Stokes’ theorem (sometimes simply called the Stokes’ theorem; historically however, as alluded to earlier, the original Stokes’ theorem only refers to a special case in three dimensions):

\displaystyle \int_{M}d\omega=\int_{\partial M}\omega

This is the idea we alluded to earlier, relating the integral of a differential form (which includes functions as 0-forms) over some “shape” to the integral of the exterior derivative of the differential form over the boundary of that “shape”.

There is much more to the theory of differential forms than we have discussed here. For example, although we have referred to these “shapes” as manifolds with boundary, more generally they are “chains” (see also Homology and Cohomology – the similarities are not coincidental!). There are restrictions on these chains in order for the integral to give a function; for example, an n-form must be integrated over an n-dimensional chain (or simply n-chain) to give a function, otherwise they will give some other differential form. An m-form integrated over an n-chain gives an m-n form. Also, more rigorously the concept of integration on more complicated spaces involves the notion of “pullback”. We will leave these concepts to the references for now, contenting ourselves with the discussion of the wedge product and exterior derivative in this post. The application of differential forms to physics is discussed in the very readable book Gauge Fields, Knots and Geometry by John Baez and Javier P. Muniain.

References:

Differential Forms on Wikipedia

Green’s Theorem on Wikipedia

Divergence Theorem on Wikipedia

Stokes’ Theorem on Wikipedia

De Rham Cohomology on Wikipedia

Calculus on Manifolds by Michael Spivak

Gauge Fields, Knots and Gravity by John Baez and Javier P. Muniain

Geometry, Topology, and Physics by Mikio Nakahara

Some Basics of Statistical Mechanics

The branch of physics now known as statistical mechanics started out as thermodynamics, the study of heat and related concepts. The relation of thermodynamics to the rest of physics, i.e. the relation of heat and motion, was studied by scientists like James Prescott Joule in the 19th century. Due to their efforts, we have the idea that what they used to refer to as “heat” is a form of energy which is transferred from one object to another, manifesting in ways other than the bulk motion of the objects (in particular, as a change in the “internal energy” of the objects involved) .

Energy, a concept that was already associated to the motion of objects (see also Lagrangians and Hamiltonians), can be transferred from one object to another, or one system to another, and in the case of heat, this transfer involves the concept of temperature. Temperature is what we measure on a thermometer, and when we say something is “hot” or “cold”, we are usually referring to its temperature.

The way by which temperature dictates the direction in which heat is transferred is summarized in the second law of thermodynamics (here we give one of its many equivalent statements):

Heat flows from a hotter object to a colder one.

This process of transfer of heat will continue, decreasing the internal energy of the hotter object and increasing the internal energy of the cooler one, until the two objects have equal temperatures, in which case we say that they are in thermal equilibrium.

But if heat is a transfer of energy, and energy is associated to motion, then what was it, exactly, that was moving (or had the capacity to cause something to move)? What is this “internal energy?” For us, who have been taught about atoms and molecules since childhood, the answer might come rather easily. Internal energy is the energy that comes from the motion of the atoms and molecules that comprise the object. But for the scientists who were developing the subject during the 19th century, the concept of atoms and molecules was still in its very early stages, with many of them facing severe criticism for adopting ideas that at the time were still not completely verified.

Still, these scientists continued to take the point of view that the subject of thermodynamics was just the same physics that had already been applied to, say, the motion of cannonballs and pendulums and other objects, except that now they had to be applied to a very large quantity of very small particles (quantum mechanics would later have much to contribute also, but even before the introduction of that theory the concept of atoms and molecules was already starting to become very fruitful in thermodynamics).

Now we have an explanation for what internal energy is in terms of the motion of the particles that make up an object. But what about temperature? It is possible to explain temperature (and therefore the laws that decide the direction of the transfer of heat) using more “basic” concepts such as Newton’s laws of motion, like we have done for the internal energy?

It was the revolutionary ideas of Ludwig Boltzmann that provided the solution. It indeed involved a more “basic” concept, but not one we would usually think of as belonging to the realm of physics or the study of motion. The idea of Boltzmann was to relate temperature to the concepts of information, probability, and statistics, via the notion of entropy. We may therefore think of this era as the time when “thermodynamics” became “statistical mechanics”.

In order to discuss the idea of entropy, for a moment we step away from physics, and discuss instead cards. It is not cards themselves that we are interested in, but information. Entropy is really about information, which is why it also shows up, for instance, when discussing computer passwords. Cards will give us a simple but concrete way to discuss information.

Consider now, therefore, a deck of 52 ordinary playing cards. A hand, of course, consists of five cards. Using the rules of combinatorics, we can find that there are 2,598,960 different hands (combinations of 52 different playing cards taken five at a time, in any order). In the game of poker, there are certain special combinations, the rarest (and highest-ranking) of which is called the “royal flush”. There are only four possible ways to get a royal flush (one for each suit). In contrast, the most common kind of hand is one which has no special combination (sometimes called “no pair”), and there are 1,302,540 different combinations which fit this description.

Now suppose the deck is shuffled and we are dealt a hand. The shuffling process is not entirely random (not in the way that quantum mechanics is), but there are so many things going on that it is near-impossible for us to follow and determine what kind of hand we are going to get. The most we can do is make use of what we know about probability and statistics. We know that it is more likely for us to obtain a no pair rather than a royal flush, simply because there are so many more combinations that are considered a no pair than there are combinations that are considered a royal flush. There are no laws of physics involved in making this prediction; there is only the intuitive idea that an event with more ways of happening is more likely to happen compared to an event with less ways of happening, in the absence of any more information regarding the system.

We now go back to physics. Let us consider a system made up of a very large number of particles. The state of a single particle is specified by its position and momentum, and the state of the entire system is specified by the position and momentum of every one of its particles. This state is almost impossible for us to determine, because there are simply too many particles to keep track of.

However, we may be able to determine properties of the system without having to look at every single particle. Such properties may involve the total energy, pressure, volume, and so on. These properties determine the “macrostate” of a system. The “true” state that may only be specified by the position and momentum of every single particle is called the microstate” of the system. There may be several different microstates that correspond to a single macrostate, just like there are four different combinations that correspond to a royal flush, or 1,302,540 different combinations that correspond to a no pair.

Let the system be in a certain macrostate, and let the number of microstates that correspond to this macrostate be denoted by \Omega. The entropy of the system is then defined as

\displaystyle S=k_{B}\text{ln }\Omega.

where k_{B} is a constant known as Boltzmann’s constant. We may think of this constant and the logarithm as merely convenient ways (in terms of calculation, and in terms of making contact with older ideas in thermodynamics) to express the idea that the higher the number of microstates, the higher the entropy.

Now even though the system may not seem to be changing, imperceptible to us, there may be many things that happen on a microscopic level. Molecules may be moving around in many directions, in motions that are too difficult for us to keep track of, not only because the particles are very small but also because there are just too many of them. This is analogous to the shuffling of cards. All that we have at our disposal are the tools of probability and statistics. Hence the term, “statistical mechanics“.

What have we learned from the example of the shuffling of cards? Even though we could not keep track of things and determine results, we could still make predictions. And the predictions we made were simply of the nature that an event with more ways of happening was more likely to happen than an even with less ways of happening.

Therefore, we have the following restatement of the second law of thermodynamics:

The entropy of a closed system never decreases.

This simply reflects the idea that under these processes we cannot keep track of, the system is more likely to adopt a configuration with more ways of happening, compared to one with less ways of happening. In other words,it will be in a macrostate that will have more microstates. Microscopically, it may happen that “miraculously” the entropy increases; but given how many particles there are, and how many processes happen, this is extremely unlikely to be a sustained phenomenon, and macroscopically, the second law of thermodynamics is always satisfied. This is like obtaining a royal flush on one deal of cards; but if we are going to reshuffle multiple times, it is unlikely that we keep getting royal flushes for a sustained period of time.

The “closed system” requirement is there to ensure that the system is “left to its own devices” so to speak, or that there is no “outside interference”.

Considering that the entirety of the universe is an example of a “closed system” (there is nothing outside of it, since by definition the universe means the collection of everything that exists), the second law of thermodynamics has some interesting (perhaps disturbing, to some people) implications. What we usually consider to be an “ordered” configuration is very specific; for example, a room is only in order when all of the furniture is upright, all the trash is in the wastebasket, and so on. There are fewer such configurations compared to the “disordered” ones, since there are so many ways in which the furniture can be “not upright”, and so many ways in which the trash may be outside of the wastebasket, etc. In other words, disordered configurations have higher entropy. All of these things considered, what the second law of thermodynamics implies is that the entropy of the universe is ever increasing, moving toward an increasing state of disorder, away from the delicate state of order that we now enjoy.

We now want to derive the “macroscopic” from the “microscopic”. We want to connect the “microscopic” concept of entropy to the “macroscopic” concept of temperature. We do this by defining “temperature” as the following relationship between the entropy and the energy (in this case the internal energy, as the system may have other kinds of energy, for example arising from its motion in bulk):

\displaystyle T=\frac{\partial E}{\partial S}

Although we will not discuss the specifics in this post, we make the following claim – the entropy of the system is at its maximum when the system is in thermal equilibrium. Or perhaps more properly, the state of “thermal equilibrium” may be defined as the macrostate which has the most amount of microstates corresponding to it. This in turn explains why heat flows from a hotter object to a cooler one.

We have now discussed some of the most basic concepts in thermodynamics and statistical mechanics. We now briefly discuss certain technical and calculational aspects of the theory. Aside from making the theory more concrete, this is important also because there are many analogies to be made outside of thermodynamics and statistical mechanics. For example, in the Feynman path integral formulation of quantum field theory (see Some Basics of Relativistic Quantum Field Theory) we calculate correlation functions, which mathematically have a form very similar to some of the quantities that we will discuss.

In modern formulations of statistical mechanics, a central role is played by the partition function Z, which is defined as

\displaystyle Z=\sum_{i}e^{-\beta E_{i}}

where \beta, often simply referred to as the “thermodynamic beta”, is defined as

\displaystyle \beta=\frac{1}{k_{B}T}.

The partition function is a very convenient way to package information about the system we are studying, and many quantities of interest can be obtained from it. One of the most important ones is the probability P_{i} for the system to be in a macrostate with energy E_{i}:

\displaystyle P_{i}=\frac{1}{Z}e^{-\beta E_{i}}.

Knowing this formula for the probabilities of certain macrostates allows us to derive the formulas for expectation values of quantities that may be of interest to us, such as the average energy of the system:

\displaystyle \langle E\rangle=\frac{1}{Z}\sum_{i}e^{-\beta E_{i}}.

After some manipulation we may find that the expectation value of the energy is also equal to the following more compact expression:

\displaystyle \langle E\rangle=\frac{\partial \text{ln }Z}{\partial \beta} .

Another familiar quantity that we can obtain from the partition function is the entropy of the system:

\displaystyle S=\frac{\partial (k_{B}T\text{ln }Z)}{\partial T} .

There are various other quantities that can be obtained from the partition function, such as the variance of the energy (or energy fluctuations), the heat capacity, and the so-called Helmholtz free energy. We note that for “continuous” systems, expressions involving sums are replaced by expressions involving integrals. Also, for quantum mechanical systems, there are some modifications, as well as for systems which exchange particles with the environment.

The development of statistical mechanics, and the introduction of the concept of entropy, is perhaps a rather understated revolution in physics. Before Boltzmann’s redefinition of these concepts, physics was thought of as studying only motion, in the classical sense of Newton and his contemporaries. Information has since then taken just as central a role in modern physics as motion.

The mathematician and engineer Claude Elwood Shannon further modernized the notion of entropy by applying it to systems we would not have ordinarily thought of as part of physics, for example the bits on a computer. According to some accounts, Shannon was studying a certain quantity he wanted to name “information”; however, the physicist and mathematician John von Neumann told him that a version of his concept had already been developed before in physics, and was called “entropy”. With Neumann’s encouragement, Shannon adopted the name, symbolically unifying subjects formerly thought of as separate.

Information theory, the subject which Shannon founded, has together with quantum mechanics led to quantum information theory, which not only has many potential applications in technology but also is one of the methods by which we attempt to figure out deep questions regarding the universe.

Another way in which the concept of entropy is involved in modern issues in physics is in the concept of entropic gravity, where gravity, as expressed in Einstein’s general theory of relativity, is derived from more fundamental concepts similar to how the simple statistical concept of entropy gives rise to something that manifests macroscopically as a law of physics. Another part of modern physics where information, quantum mechanics, and general relativity meet is the open problem called the black hole information paradox, which concerns the way in which black holes seemingly do not conserve information, and is a point of contention among many physicists even today.

Finally, we mention another very interesting aspect of statistical mechanics, perhaps, on the surface, a little more mundane compared to what we have mentioned on the preceding paragraphs, but not the slightest bit less interesting – phase transitions. Phase transitions are “abrupt” changes in the property of an object brought about by some seemingly continuous process, like, for example, the freezing of water into ice. We “cool” water, taking away heat from it by some process, and for a long time it seems that nothing happens except that the water becomes colder and colder, but at some point it freezes – an abrupt change, even though we have done just the same thing we did to it before. What really happens, microscopically, is that the molecules have arranged themselves into a some sort of structure, and the material loses some of symmetry (the “disordered” molecules of water were more symmetric than the “ordered” molecules in ice) – a process known as symmetry breaking. Phase transitions and symmetry breaking are ubiquitous in the sciences, and have applications from studying magnets to tackling the problem of why we have observed so much more matter compared to antimatter.

References:

Thermodynamics on Wikipedia

Statistical Mechanics on Wikipedia

Entropy on Wikipedia

Partition Function on Wikipedia

Entropy in Thermodynamics and Information Theory on Wikipedia

Quantum Information on Wikipedia

Black Hole Information Paradox on Wikipedia

Phase Transition on Wikipedia

Symmetry Breaking on Wikipedia

It From Bit – Entropic Gravity for Pedestrians on Science 2.0

Black Hole Information Paradox: An Introduction on Of Particular Significance

Thermal Physics by Charles Kittel and Herbert Kroemer

Fundamentals of Statistical and Thermal Physics by Frederick Reif

A Modern Course in Statistical Physics by Linda Reichl

Algebraic Cycles and Intersection Theory

In this post, we will take on intersection theory – which is pretty much just what it sounds like, except for a few modifications which we will later discuss. For example, we may ask, where do the curves y=x^{2} (a parabola) and the line y=1 (a horizontal line) intersect? It only takes a little high school algebra and analytic geometry (which is really a more elementary form of what we now more properly call algebraic geometry) to find that they intersect at the two points (-1,1) and (1,1).

Suppose, instead, that we want to take the intersection of the parabola y=x^{2} and the horizontal line y=-1. If the coordinates x and y are real numbers, we would have no intersection. But if they are complex numbers, then we will find that they do intersect, once again at two points, namely (-i,-1) and (i,-1). When complex numbers are involved, it may be difficult to visualize things – for example, the complex numbers are often visualized as a plane, not as a line – but we will continue to refer, say, to y=-1 as a “line”. This is rather common practice in algebraic geometry – recall that we have also been referring to a torus as an elliptic curve (see The Moduli Space of Elliptic Curves)!

Consider now the intersection of the parabola y=x^{2} and the horizontal line y=0. This time, contrary to the earlier two cases, the curves intersect only at one point, namely at (0,0). But we would like to think of them as intersecting “twice”, even though the intersection occurs only at a single point. Hence, we say that the point (0,0) has intersection multiplicity equal to 2.

The notion of intersection multiplicities make sense – generally speaking, for instance, imagining a random parabola and a random line in the xy plane, they will generally intersect at two points, except in certain instances, such as when the line is tangent to the parabola – this is of course the special, less general, case. In order to make our counting of intersections consistent, even with these special cases, we need this idea of “intersection multiplicities.”

Another way to think of the previous example is that the parabola and the line having only one intersection is such a special case that simply “displacing” or “moving” either curve by a little bit results in them having two intersections. Consider, for example, the following diagram courtesy of user Jakob.scholbach of Wikipedia:

Intersection_number

Equipped with the idea of intersection multiplicities (which we will explicitly give the formula for later), we have Bezout’s theorem, which states that the number of intersections of two curves, counted with their intersection multiplicities, is equal to the product of the degrees of the polynomials that define them. For example, two parabolas will generally intersect at four points, except in special cases where their intersections have multiplicities greater than 1.

For higher-dimensional varieties, the intersections need not be points, but other kinds of varieties. For an n-dimensional variety W embedded in a larger m-dimensional variety V (which we may think of as the space the n-dimensional variety is living in), the codimension of W in V is given by m-n. If the codimension of the intersection of two varieties is equal to the sum of the codimensions of the intersecting varieties, then we say that the intersection is proper.

Proper intersections correspond to our intuition. For example, consider again curves such as parabolas and lines in the plane. The plane is 2-dimensional, while curves are 1 dimensional. Therefore their codimension in the plane is equal to 2-1=1. Proper intersections will then be points, which have dimension equal to 0 and therefore have codimension in the plane equal to 2. Similarly, the proper intersection of two surfaces, for example two planes, in some 3-dimensional space, is a curve (a line in the case of two planes), since surfaces have codimension equal to 1 inside the 3-dimensional space, while curves have codimension equal to 2.

We can now give the definition of the intersection multiplicity. It is quite technical, involving the Tor functor (see The Hom and Tensor Functors), but we will also give the special case for curves, which is a little less technical compared to the general case. Let V and W be two subvarieties of some smooth variety X (for a discussion of smoothness and singularities see Reduction of Elliptic Curves Modulo Primes) which intersect properly and let Z be their set-theoretic intersection. Then the intersection multiplicity \mu(Z;V,W) is given by

\displaystyle \mu(Z;V,W)=\sum_{i=0}^{\infty}(-1)^{i}\text{length}_{\mathcal{O}_{X,z}}(\text{Tor}_{\mathcal{O}_{X,z}}^{i}(\mathcal{O}_{X,z}/I, \mathcal{O}_{X,z}/J))

where I and J are the ideals corresponding to the varieties V and W respectively, and z is the generic point of the variety Z (we are using here the definition of a variety as a scheme satisfying certain conditions, which we have not actually discussed in this blog yet – we will leave this to the references for now). The concept of length is a generalization of the concept of dimension in algebraic geometry, and refers to the length (the ordinary use of the term) of the longest chain of modules that contain one another (while dimension refers to the length of the longest chain of rings that contain one another).

If V and W are curves on a surface X then the above formula reduces to

\displaystyle \mu(Z;V,W)=\text{length}_{\mathcal{O}_{X,z}}(\mathcal{O}_{X,z}/I\otimes_{\mathcal{O}_{X,z}} \mathcal{O}_{X,z}/J).

Another concept in algebraic geometry closely related to intersection theory is that of an algebraic cycle. Algebraic cycles generalize the idea of divisors (see Divisors and the Picard Group). Algebraic cycles on a variety X can be thought of as “linear combinations” of the subvarieties (satisfying certain conditions, such as being closed, reduced, and irreducible, so that they are not unions of other subvarieties) on X. Divisors themselves are just algebraic cycles of codimension 1; in other words, they are algebraic cycles whose dimension is 1 less than the variety in which they are embedded. In Divisors and the Picard Group, we considered curves, which are varieties of dimension 1, hence the divisors on the curves were linear combinations (with integer coefficients) of points, i.e. subvarieties of dimension equal to 0.

Analogous to the Picard group for divisors we have the Chow group of algebraic cycles modulo rational equivalence. Two algebraic cycles V and W on a variety Y are said to be rationally equivalent if there is a rational function f:Y\rightarrow \mathbb{P} such that V-W=f^{-1}(0)-f^{-1}(\infty), counting multiplicities. Chow’s moving lemma states that for any two algebraic cycles V and W on a smooth, quasi-projective (quasi-projective means it is the intersection of a Zariski-open and Zariski-closed subset in some projective space) variety X, there exists another algebraic cycle W' rationally equivalent to W such that V and W intersect properly. Besides rational equivalence, there are also other notions of equivalence for algebraic cycles, such as algebraic, homological, and numerical equivalence; all of these are important in the study of algebraic cycles, but we will leave them to the references for now.

Taking intersections of subvarieties gives the Chow group a ring structure (we therefore have the concept of an intersection product). In this context we may also refer to the Chow group as the Chow ring. The Chow ring is also an example of a graded ring, which means that the intersection product is a mapping that sends a pair of equivalence classes of algebraic cycles, one with codimension i and another with codimension j, to an equivalence class of algebraic cycles with codimension i+j:

\displaystyle CH^{i}\times CH^{j}\rightarrow CH^{i+j}.

Algebraic cycles on a smooth variety are related to cohomology (see Homology and Cohomology and Cohomology in Algebraic Geometry) via the notion of a cycle map:

\displaystyle \text{cl}:CH^{j}(X)\rightarrow H^{2j}(X).

The intersection product carries over into cohomology, corresponding to the so-called cup product of cohomology classes. Actually, there are many cohomology theories, but the ones considered to be “good” cohomology theories (more technically, they are the ones referred to as the Weil cohomology theories) are required to have a cycle map. Related to the notion of the cycle map is the famous Hodge conjecture in complex algebraic geometry, which states that under a certain well-known decomposition of the cohomology groups H^{k}=\oplus_{p+q=k}H^{p,q}, all cohomology classes of a certain kind (the so-called Hodge classes) come from algebraic cycles. Another similar conjecture is the Tate conjecture, which relates the cohomology classes coming from algebraic cycles to the elements that are fixed by the action of the Galois group (see Galois Groups). Other important conjectures in the study of algebraic cycles are the so-called standard conjectures formulated by Alexander Grothendieck as part of his strategy to prove the Weil conjectures (see The Riemann Hypothesis for Curves over Finite Fields). The Weil conjectures were proved without the need to prove the standard conjectures, but the standard conjectures themselves continue to be the object of modern mathematical research.

References:

Intersection Theory on Wikipedia

Bezout’s Theorem on Wikipedia

Algebraic Cycle on Wikipedia

Chow group on Wikipedia

Chow’s Moving Lemma on Wikipedia

Motives – Grothendieck’s Dream by James S. Milne

The Riemann Hypothesis over Finite Fields: From Weil to the Present Day by J. S. Milne

Algebraic Geometry by Andreas Gathmann

Algebraic Geometry by Robin Hartshorne

Simplices

In Homology and Cohomology, we showed how to study the topology of spaces using homology and cohomology groups, which are obtained via the construction of chain complexes out of abelian groups of subspaces of the topological space. However, we have not elaborated how this is achieved. In this post we introduce the notion of a simplex, which gives us a method of “triangulating” the space so that we can have construct the chains that make up our chain complex.

An n-simplex can be thought of as the n-dimensional analogue of a triangle. More technically, it is the smallest convex set in some Euclidean space \mathbb{R}^{m} containing n+1 points v_{0},...,v_{n}, called its vertices, such that the “difference vectors” defined by v_{1}-v_{0},...,v_{n}-v_{0} are linearly independent. We will use the notation [v_{0},...,v_{n}] to denote a simplex. We will keep track of the ordering of the vertices of the simplex, and we will always make use of the convention that the subscripts indexing the vertices are to be written in increasing order.

To make things more concrete, we discuss one of the most basic examples of an n-simplex, the standard n-simplex. It is defined to be the subset of n+1-dimensional Euclidean space \mathbb{R}^{n+1} given by

\displaystyle \Delta^{n}=\{(t_{0},...,t_{n})\in\mathbb{R}^{n+1}|\sum_{i=0}^{n}t_{i}=1\text{ and }t_{i}\geq 0\text{ for all }i\}

The standard 0-simplex is a point (actually the point x=1 on the real line \mathbb{R}), the standard 1-simplex is a line segment connecting the points (1,0) and (0,1) in the xy plane, the standard 2-simplex is a triangle (including its interior) whose vertices are located at (1,0,0), (0,1,0), and (0,0,1) in 3-dimensional Euclidean space, and the standard 3-simplex is a tetrahedron (again including its interior) whose vertices are located at (1,0,0,0), (0,1,0,0), (0,0,1,0), and (0,0,0,1)  in 4-dimensional Euclidean space. The standard higher-dimensional simplices have analogous descriptions. Here is an image depicting the standard 2-simplex, courtesy of user Tosha of Wikipedia:

2D-simplex.svg

Consider now a 2-simplex [v_{0},v_{1},v_{2}]. This is of course a triangle. To this 2-simplex there are three related 1-simplices, namely [v_{0},v_{1}][v_{1},v_{2}], and [v_{0},v_{2}]. They can be thought of as the “edges” of the 2-simplex [v_{0},v_{1},v_{2}], and together they form the boundary of the 2-simplex, written \partial [v_{0},v_{1},v_{2}].

We want to use the concept of simplices in order to construct chains, which in turn form chain complexes, so that we can make use of the techniques of homology and cohomology. Crucial to the notion of chains is the abelian group structure, so that we can “add” and “subtract” n-chains, for a given n, to form new n-chains. This abelian group structure will also help us in making the idea of a boundary of a simplex more concrete, and at the same time provide us with an explicit expression for the boundary operator (also known as the boundary map, or boundary function) also crucial to the idea of a chain complex and its homology. This boundary operator (written \partial) is given by

\displaystyle \partial [v_{0},...,v_{n}]=\sum_{i}^{n}(-1)^{i}[v_{0},...,\hat{v_{i}},...,v_{n}]

where \hat{v_{i}} means that the vertex v_{i} is to be omitted. Therefore, for the 2-simplex [v_{0},v_{1},v_{2}], our boundary \partial[v_{0},v_{1},v_{2}] is given by

\displaystyle \partial [v_{0},v_{1},v_{2}]=[v_{0},v_{1}]-[v_{0},v_{2}]+[v_{1},v_{2}].

Simplices, with the boundary operators, can therefore be used to form chain complexes. The chains in this chain complex consist of “linear combinations” of simplices. We can then apply the notions of cycles and boundaries, and the principle that a space that is the boundary of another space has itself no boundary (but not all spaces that have no boundaries are the boundaries of other spaces – this is what the homology groups express), to study topology.

Of course, not all spaces look like simplices. But for many spaces, we can always map simplices to them via a homoeomorphism. Intuitively, this corresponds to “triangulating” the space. For example, we may map the boundary of a tetrahedron (made up of four triangles – 2-simplices – with certain edges and vertices in common) homeomorphically onto the sphere. What this means is that we can essentially then take the techniques that we have developed for the tetrahedron and apply them to the sphere.

Similarly, we can take a square with a diagonal (made up of two 2-simplices, again with certain edges and vertices in common), identify opposite edges of the boundary, and map it homeomorphically to the torus. This allows us to calculate the homology groups of the torus.

The use of simplices to construct chain complexes for taking the homology groups of a topological space is called simplicial homology. A generalization that involves maps that may not be homeomorphic is called singular homology. There are also other ways to construct chain complexes, for instance, we also have cellular cohomology, which instead makes use of of “cells” instead of simplices. Just as simplices are generalizations of triangles and tetrahedrons, cells are generalizations of discs and balls. A space made up of simplices is called a simplicial complex, while a space made up of cells is called a CW-complex.

Aside from algebraic topology in the usual sense, the notion of simplices are also useful in higher category theory. We recall from Category Theory that a category is made up of objects and morphisms, sometimes also called arrows, between these objects. In higher category theory, we also consider “morphisms between morphisms”, “morphisms between morphisms between morphisms”, and so on. This is reminiscent of simplices, in which we have vertices, edges, faces, and higher-dimensional analogues. Hence, the idea of simplices can be abstracted so that they can be used for the constructions of higher category theory. This leads to the theory of simplicial categories.

References:

Simplex on Wikipedia

Simplicial Complex on Wikipedia

Simplicial Homology on Wikipedia

Singular Homology on Wikipedia

Higher Category Theory on Wikipedia

Image by User Tomruen of Wikipedia (Image Created by Robert Webb’s Stella Software)

Image by User Tosha of Wikipedia

Algebraic Topology by Allen Hatcher

A Concise Course in Algebraic Topology by J. P. May

Book List

There’s a new page on the blog: Book List. It’s far from comprehensive, but I hope to be able to update it from time to time. I don’t intend to put every book on mathematics and physics on the list of course, just the ones I have read and liked, or heavily recommended by other people. I hope to strike a balance between being somewhat comprehensive, with more than one book on the same subject if they happen to complement each other, and yet not listing too many so as not to confuse people with an overwhelming list of multiple books on the same subjects. Links to older, and more comprehensive, book lists (with helpful reviews) are included at the bottom of the page.

Some Basics of Fourier Analysis

Why do we study sine and cosine waves so much? Most waves, like most water waves and most sound waves, do not resemble sine and cosine waves at all (we will henceforth refer to sine and cosine waves as sinusoidal waves).

Well, it turns out that while most waves are not sinusoidal waves, all of them are actually combinations of sinusoidal waves of different sizes and frequencies. Hence we can understand much about essentially any wave simply by studying sinusoidal waves. This idea that any wave is a combination of multiple sinusoidal waves is part of the branch of mathematics called Fourier analysis.

Here’s a suggestion for an experiment from the book Vibrations and Waves by A.P. French: If you speak into the strings of piano (I believe one of the pedals have to be held down first) the strings will vibrate, and since each string corresponds to a sine wave of a certain frequency, it will give you the breakdown of the sine wave components that make up your voice. If a string vibrates more strongly more than others it means there’s a bigger part of that in your voice, i.e. that sine wave component has a bigger amplitude.

More technically, we can express these concepts in the following manner. Let f(x) be a function that is integrable over some interval from x_{0} to x_{0}+P (for a wave, we can take P to be the “period” over which the wave repeats itself). Then over this interval the function can be expressed as the sum of sine and cosine waves of different sizes and frequencies, as follows:

\displaystyle f(x)=\frac{a_{0}}{2}+\sum_{n=1}^{\infty}\bigg(a_{n}\text{cos}\bigg(\frac{2\pi nx}{P}\bigg)+b_{n}\text{sin}\bigg(\frac{2\pi nx}{P}\bigg)\bigg)

This expression is called the Fourier series expansion of the function f(x). The coefficient \frac{a_{0}}{2} is the “level” around which the waves oscillate; the other coefficients a_{n} and b_{n} refer to the amplitude, or the “size” of the respective waves whose frequencies are equal to n. Of course, the bigger the frequency, the “faster” these waves oscillate.

Now given a function f(x) that satisfies the condition given earlier, how do we know what sine and cosine waves make it up? For this we must know what the coefficients a_{n} and b_{n} are.

In order to solve for a_{n} and b_{n}, we will make use of the property of the sine and cosine functions called orthogonality (the rest of the post will make heavy use of the language of calculus, therefore the reader might want to look at An Intuitive Introduction to Calculus):

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{cos}(nx)dx=0    if m\neq n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{cos}(nx)dx=1    if m=n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{sin}(mx)\text{sin}(nx)dx=0    if m\neq n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{sin}(mx)\text{sin}( nx)dx=1    if m=n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{sin}(nx)dx=0    for all m,n

What this means is that when a sine or cosine function is not properly “paired” then its integral over an interval equal to its period will always be zero. It will only give a nonzero value if it is properly paired, and we can “rescale” this value to make it equal to 1.

Now we can look at the following expression:

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx

Knowing that the function f(x) has a Fourier series expansion as above, we now have

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}\bigg(\frac{a_{0}}{2}+\sum_{n=1}^{\infty}\bigg(a_{n}\text{cos}\bigg(\frac{2\pi nx}{P}\bigg)+b_{n}\text{sin}\bigg(\frac{2\pi nx}{P}\bigg)\bigg)\text{cos}(\frac{2\pi x}{P})dx.

But we know that integrals involving the cosine function will always be zero unless it is properly paired; therefore it will be zero for all terms of the infinite series except for one, in which case it will yield (the constants are all there to properly scale the result)

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}\bigg(a_{1}\text{cos}\bigg(\frac{2\pi x}{P}\bigg)\text{cos}(\frac{2\pi x}{P})dx

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=a_{1}.

We have therefore used the orthogonality property of the cosine function to “filter” a single frequency component out of the many that make up our function.

Next we might use \text{cos}(\frac{4\pi x}{P}) instead of \text{cos}(\frac{2\pi x}{P}). This will give us

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{4\pi x}{P})dx=a_{2}.

We can continue to the procedure to solve for the coefficients a_{3}, a_{4}, and so on, and we can replace the cosine function by the sine function to solve for the coefficients b_{1}, b_{2}, and so on. Of course, the coefficient a_{0} can also be obtained by using \text{cos}(0)=1.

In summary, we can solve for the coefficients using the following formulas:

\displaystyle a_{n}=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi nx}{P})dx

\displaystyle b_{n}=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{sin}(\frac{2\pi nx}{P})dx

Now that we have shown how a function can be “broken down” or “decomposed” into a (possibly infinite) sum of sine and cosine waves of different amplitudes and frequencies, we now revisit the relationship between the sine and cosine functions and the exponential function (see “The Most Important Function in Mathematics”) in order to give us yet another expression for the Fourier series. We recall that, combining the concepts of the exponential function and complex numbers we have the beautiful and important equation

\displaystyle e^{ix}=\text{cos}(x)+i\text{sin}(x)

which can also be expressed in the following forms:

\displaystyle \text{cos}(x)=\frac{e^{ix}+e^{-ix}}{2}

\displaystyle \text{sin}(x)=\frac{e^{ix}-e^{-ix}}{2i}.

Using these expressions, we can rewrite the Fourier series of a function in a more “shorthand” form:

\displaystyle f(x)=\sum_{n=-\infty}^{\infty}c_{n}e^{\frac{2\pi i nx}{P}}

where

\displaystyle c_{n}=\frac{1}{P}\int_{x_{0}}^{x_{0}+P}f(x)e^{-\frac{2\pi i nx}{P}}dx.

Finally, we discuss more concepts related to the process we used in solving for the coefficients a_{n}, b_{n}, and c_{n}. As we have already discussed, these coefficients express “how much” of the waves with frequency equal to n are in the function f(x). We can now abstract this idea to define the Fourier transform \hat{f}(k) of a function f(x) as follows:

\displaystyle \hat{f}(k)=\int_{-\infty}^{\infty}f(x)e^{-2\pi i kx}dx

There are of course versions of the Fourier transform that use the sine and cosine functions instead of the exponential function, but the form written above is more common in the literature. Roughly, the Fourier transform \hat{f}(k) also expresses “how much” of the waves with frequency equal to k are in the function f(x). The difference lies in the interval over which we are integrating; however, we may consider the formula for obtaining the coefficients of the Fourier series as taking the Fourier transform of a single cycle of a periodic function, with its value set to 0 outside of the interval occupied by the cycle, and with variables appropriately rescaled.

The Fourier transform has an “inverse”, which allows us to recover f(x) from \hat{f}(k):

\displaystyle f(x)=\int_{-\infty}^{\infty}\hat{f}(k)e^{2\pi i kx}dk.

Fourier analysis, aside from being an interesting subject in itself, has many applications not only in other branches of mathematics and also in the natural sciences and in engineering. For example, in physics, the Heisenberg uncertainty principle of quantum mechanics (see More Quantum Mechanics: Wavefunctions and Operators) comes from the result in Fourier analysis that the more a function is “localized” around a small area, the more its Fourier transform will be spread out over all of space, and vice-versa. Since the probability amplitudes for the position and the momentum are related to each other as the Fourier transform and inverse Fourier transform of each other (a result of the de Broglie relations), this manifests in the famous principle that the more we know about the position, the less we know about the momentum, and vice-versa.

Fourier analysis can even be used to explain the distinctive “distorted” sound of electric guitars in rock and heavy metal music. Usually, plucking a guitar string produces a sound wave which is sinusoidal. For electric guitars, the sound is amplified using transistors; however, there is a limit to how much amplification can be done, and at a certain point (technically, this is when the transistor is operating outside of the “linear region”), the sound wave looks like a sine function with its peaks and troughs “clipped”. In Fourier analysis this corresponds to an addition of higher-frequency components, and this results in the distinctive sound of that genre of music.

Yet another application of Fourier analysis, and in fact its original application, is the study of differential equations. The mathematician Joseph Fourier, after whom Fourier analysis is named, developed the techniques we have discussed in this post in order to study the differential equation expressing the flow of heat in a material. It so happens that difficult calculations, for example differentiation, involving a function correspond to easier ones, such as simple multiplication, involving its Fourier transform. Therefore it is a common technique to convert a difficult problem to a simple one using the Fourier transform, and after the problem has been solved, we use the inverse Fourier transform to get the solution to the original problem.

Despite the crude simplifications we have assumed in order to discuss Fourier analysis in this post, the reader should know that it remains a deep and interesting subject in modern mathematics. A more general and more advanced form of the subject is called harmonic analysis, and it is one of the areas where there is much research, both on its own, and in connection to other subjects.

References:

Fourier Analysis on Wikipedia

Fourier Series on Wikipedia

Fourier Transform on Wikipedia

Harmonic Analysis on Wikipedia

Vibrations and Waves by A.P. French

Fourier Analysis: An Introduction by Elias M. Stein and Rami Shakarchi

Reduction of Elliptic Curves Modulo Primes

We have discussed elliptic curves over the rational numbers, the real numbers, and the complex numbers in Elliptic Curves. In this post, we discuss elliptic curves over finite fields of the form \mathbb{F}_{p}, where p is a prime, obtained by “reducing” an elliptic curve over the integers modulo p (see Modular Arithmetic and Quotient Sets).

We recall that in Elliptic Curves we gave the definition of an elliptic curve as a polynomial equation that we may write as

\displaystyle y^{2}=x^{3}-ax+b

with a and b satisfying the condition that

\displaystyle 4a^{3}+27b^{2}\neq 0.

Still, we claimed that we will not be able to write the equation of the elliptic curve when the coefficients of the elliptic curve are of characteristic equal to 2 or 3, as is the case for the finite fields \mathbb{F}_{2} or \mathbb{F}_{3}, therefore we will give more general forms for the equation of the elliptic curve later, along with the appropriate conditions. To help us with the latter, we will first look at the case of curves over the real numbers, where we can still make use of the equations above, and see what happens when the conditions on a and b are not satisfied.

Let both a and b both be equal to 0, in which case the condition is not satisfied. Then our curve (which is not an elliptic curve) is given by the equation

\displaystyle y^{2}=x^{3}

whose graph in the xy plane is given by the following figure (plotted using the WolframAlpha software):

msp8421ceh7049d511806600001ha509ah5dee52ed

Next let a=-3 and b=2. Once again the condition is not satisfied. Our curve is given by

\displaystyle y^{2}=x^{3}-3x+2

and whose graph is given by the following figure (again plotted using WolframAlpha):

msp51201b91cb194c07ih2g0000670ibac2gfha7c42

Note also that in both cases, the right hand side of the equations of the curves are polynomials in x with a double or triple root; for y^{2}=x^{3}, the right hand side, x^{3}, has a triple root at x=0, while for y^{2}=x^{3}-3x+2, the right hand side, x^{3}-3x+2, factors into y^{2}=(x-1)^{2}(x+2) and therefore has a double root at x=1.

The two curves, y^{2}=x^{3} and y^{2}=x^{3}-3x+2, are examples of singular curves. It is therefore a requirement for a curve to be an elliptic curve, that it must be nonsingular.

We now introduce the general form of an elliptic curve, applicable even when the coefficients belong to fields of characteristic 2 or 3, along with the general condition for it to be nonsingular. We note that the elliptic curve has a “point at infinity“; in order to make this idea explicit, we make use of the notion of projective space (see Projective Geometry) and write our equation in homogeneous coordinates X, Y, and Z:

\displaystyle Y^{2}Z+a_{1}XYZ+a_{3}YZ^{2}=X^{3}+a_{2}XZ^{2}+a_{4}X^{2}Z+a_{6}Z^{3}

This equation is called the long Weierstrass equation. We may also say that it is in long Weierstrass form.

We can now define what it means for a curve to be singular. Let

\displaystyle F=Y^{2}Z+a_{1}XYZ+a_{3}YZ^{2}-X^{3}-a_{2}XZ^{2}-a_{4}X^{2}Z-a_{6}Z^{3}

Then a singular point on this curve F is a point with coordinates a, b, and c such that

\displaystyle \frac{\partial F}{\partial X}(a,b,c)=\frac{\partial F}{\partial Y}(a,b,c)=\frac{\partial F}{\partial Z}(a,b,c)=0

It might be difficult to think of calculus when we are considering, for example, curves over finite fields, where there are a finite number of points on the curve, so we might instead just think of the partial derivatives of the curve as being obtained “algebraically” using the “power rule” of basic calculus,

\displaystyle \frac{d(x^{n})}{dx}=nx^{n-1}

and applying it, along with the usual rules for partial derivatives and constant factors, to every term of the curve. Such is the power of algebraic geometry; it allows us to “import” techniques from calculus and other areas of mathematics which we would not ordinarily think of as being applicable to cases such as curves over finite fields.

If a curve has no singular points, then it is called a nonsingular curve. We may also say that the curve is smooth. In order for a curve written in long Weierstrass form to be an elliptic curve, we require that it be a nonsingular curve as well.

If the coefficients of the curve are not of characteristic equal to 2, we can make a projective transformation of variables to write its equation in a simpler form, known as the short Weierstrass equation, or short Weierstrass form:

Y^{2}Z=X^{3}+a_{2}X^{2}Z+a_{4}XZ^{2}+a_{6}Z^{3}

In this case the condition for the curve to be nonsingular can be written in the following form:

\displaystyle -4a_{2}^{3}a_{6}+a_{2}^{2}a_{4}^{2}+18a_{4}a_{2}a_{6}-4a_{4}^{3}-27a_{6}^{2}=0

The quantity

\displaystyle D=-4a_{2}^{3}a_{6}+a_{2}^{2}a_{4}^{2}+18a_{4}a_{2}a_{6}-4a_{4}^{3}-27a_{6}^{2}

is called the discriminant of the curve.

We note now, of course, that the usual expressions for the elliptic curve, in what we call affine coordinates x and y, can be recovered from our expression in terms of homogeneous coordinates X, Y, and Z simply by setting x=\frac{X}{Z} and y=\frac{Y}{Z}. The case Z=0 of course corresponds to the “point at infinity”.

We now consider an elliptic curve whose equation has coefficients which are rational numbers. We can make a projective transformation of variables to rewrite the equation into one which has integers as coefficients. Then we can reduce the coefficients modulo a prime p and investigate the points of the elliptic curve considered as having coordinates in the finite field \mathbb{F}_{p}.

It may happen that when we reduce an elliptic curve modulo p, the resulting curve over the finite field \mathbb{F}_{p} is no longer nonsingular. In this case we say that it has bad reduction at p. Consider, for example, the following elliptic curve (written in affine coordinates):

\displaystyle y^{2}=x^{3}-4x^{2}+16

Let us reduce this modulo the prime p=11. Then, since -4\equiv 7 \text{mod }11 and 16\equiv 5 \text{mod }11, we obtain the curve

\displaystyle y^{2}=x^{3}+7x^{2}+5

over \mathbb{F}_{11}. The right hand side actually factors into (x+1)^{2}(x+5) over \mathbb{F}_{11}, which means that it has a double root at x=10 (which is equivalent to x=-1 modulo 11), and has discriminant equal to zero over \mathbb{F}_{11}, hence, this curve over \mathbb{F}_{11} is singular, and the elliptic curve given by y^{2}=x^{3}+7x^{2}+5 has bad reduction at p=11. It also has bad reduction at p=2; in fact, we mentioned earlier that we cannot even write an elliptic curve in the form y^{2}=x^{3}+a_{2}x^{2}+a_{4}x+a_{6} when the field of coefficients have characteristic equal to 2. This is because such a curve will always be singular over such a field. The curve y^{2}=x^{3}+7x^{2}+5 remains nonsingular over all other primes, however; we also say that the curve has good reduction over all primes p except for p=2 and p=11.

In the case that an elliptic curve has bad reduction at p, we say that it has additive reduction if there is only one tangent line at the singular point (we also say that the singular point is a cusp), for example in the case of the curve y^{2}=x^{3}, and we say that it has multiplicative reduction if there are two distinct tangent lines at the singular point (in this case we say that the singular point is a node), for example in the case of the curve y^{2}=x^{3}-3x+2. If the slope of these tangent lines are given by elements of the same field as the coefficients of the curve (in our case rational numbers), we say that it has split multiplicative reduction, otherwise, we say that it has nonsplit multiplicative reduction. We note that since we are working with finite fields, what we describe as “tangent lines” are objects that we must define “algebraically”, as we have done earlier when describing the notion of a curve being singular.

As we have already seen in The Riemann Hypothesis for Curves over Finite Fields, whenever we have a curve over some finite field \mathbb{F}_{q} (where q=p^{n} for some natural number n), our curve will also have a finite number of points, and these points will have coordinates in \mathbb{F}_{q}. We denote the number of these points by N_{q}. In our case, we are interested in the case n=1, so that q=p. When our elliptic curve has good reduction over p, we define a quantity a_{p}, sometimes called the p-defect, or also known as the trace of Frobenius, as

\displaystyle a_{p}=p+1-N_{p}.

We can now define the Hasse-Weil L-function of an elliptic curve E as follows:

\displaystyle L_{E}(s)=\prod_{p}L_{p}(s)

where p runs over all prime numbers, and

\displaystyle L_{p}(s)=\frac{1}{(1-a_{p}p^{-s}+p^{1-2s})}    if E has good reduction at p

\displaystyle L_{p}(s)=\frac{1}{(1-p^{-s})}    if E has split multiplicative reduction at p

\displaystyle L_{p}(s)=\frac{1}{(1+p^{-s})}    if E has nonsplit multiplicative reduction at p

\displaystyle L_{p}(s)=1    if E has additive reduction at p.

The Hasse-Weil L-function encodes number-theoretic information related to the elliptic curve, and much of modern mathematical research involves this function. For example, the Birch and Swinnerton-Dyer conjecture says that the rank of the group formed by the rational points of the elliptic curve (see Elliptic Curves), also known as the Mordell-Weil group, is equal to the order of the zero of the Hasse-Weil L-function at s=1, i.e. we have the following Taylor series expansion of the Hasse-Weil L-function at s=1:

\displaystyle L_{E}(s)=c(s-1)^{r}+\text{higher order terms}

where c is a constant and r is the rank of the elliptic curve.

Meanwhile, the Shimura-Taniyama-Weil conjecture, now also known as the modularity conjecture, central to Andrew Wiles’s proof of Fermat’s Last Theorem, states that the Hasse-Weil L-function can be expressed as the following series:

\displaystyle L_{E}(s)=\sum_{n=1}^{\infty}\frac{a_{n}}{n^{s}}

and the coefficients a_{n} are also the coefficients of the Fourier series expansion of some modular form f(E,\tau) (see The Moduli Space of Elliptic Curves):

\displaystyle f(E,\tau)=\sum_{n=1}^{\infty}a_{n}e^{2\pi i \tau}.

For more on the modularity theorem and Wiles’s proof of Fermat’s Last Theorem, the reader is encouraged to read the award-winning article A Marvelous Proof by Fernando Q. Gouvea, which is freely and legally available online. A link to this article (hosted on the website of the Mathematical Association of America) is provided among the list of references below.

References:

Elliptic Curve on Wikipedia

Hasse-Weil Zeta Function on Wikipedia

Birch and Swinnerton-Dyer Conjecture on Wikipedia

Modularity Theorem on Wikipedia

Wiles’s Proof of Fermat’s Last Theorem on Wikipedia

The Birch and Swinnerton-Dyer Conjecture by Andrew Wiles

A Marvelous Proof by Fernando Q. Gouvea

A Friendly Introduction to Number Theory by Joseph H. Silverman

The Arithmetic of Elliptic Curves by Joseph H. Silverman

Advanced Topics in the Arithmetic of Elliptic Curves by Joseph H. Silverman

Invitation to the Mathematics of Fermat-Wiles by Yves Hellegouarch

A First Course in Modular Forms by Fred Diamond and Jerry Shurman

More on Sheaves

In Sheaves we introduced the concept of sheaves as mathematical objects that “live on” a space and can be “patched together ” in a certain way. As an example we introduced the concept of the sheaf of regular functions on the complex plane \mathbb{C}. The complex plane is one example of a variety, which we defined in Basics of Algebraic Geometry as a “shape” that is described by the zero set of polynomial equations. The concept of the sheaf of regular functions can also be generalized to varieties other than \mathbb{C}. The sheaf of regular functions on a variety X is also called the structure sheaf on X and is written \mathcal{O}_{X}.

In this post we will give some more important kinds of sheaves.

Twisting Sheaves

We will start with the twisting sheaf. For this we need the notion of projective space, which we introduced in Projective Geometry. We know that projective space provides us with many advantages, in particular “points at infinity”, but they come at the cost of some new language – for example, we require that our polynomials be homogeneous, which means that every term of such a polynomial must be of the same degree. The zero set of such a polynomial then defines a projective variety.

The definition of the sheaf of regular functions on a projective variety also has some differences compared to that on an affine space. To protect our definition of projective space, we need the numerator and the denominator to always have the same degree. This has the effect that the only regular functions defined everywhere on a projective variety are the constant functions.

The twisting sheaves, written \mathcal{O}_{X}(n) for an integer n, are made up of expressions \frac{f}{g} where f and g are homogeneous polynomials and the degree of f is equal to d+n , where d is the degree of g. We also require for each open set U that g never be zero on any point of U, as in the definition of the regular functions on U. The sheaf of regular functions on X is then just the twisting sheaf when n=0. Twisting sheaves are isomorphic to the sheaf of regular functions “locally“, i.e. on open sets of the space, but not “globally“.

Sheaves of Modules and Quasi-Coherent Sheaves

Twisting sheaves can be thought of as sheaves of modules, with the sheaf of regular functions serving as their “scalars”. More generally, sheaves of modules play an important part in algebraic geometry. In the same way that a ring R determines the sheaf of regular functions \mathcal{O}_{X} on the affine scheme X=\text{Spec}(R), R-modules can always give rise to sheaves of \mathcal{O}_{X}-modules on X. However, not all sheaves of \mathcal{O}_{X}-modules come from R-modules. In the special case that they do, they are referred to as quasi-coherent sheaves. Quasi-coherent sheaves are interesting because we have ways of constructing new modules from old ones, for instance using the tensor product or direct product, hence, we can also construct new sheaves of modules from old ones.

Locally Free Sheaves, Vector Bundles, and Line Bundles

A quasi-coherent sheaf \mathcal{F} such that \mathcal{F}|_{U_{i}} is isomorphic to the quasi-coherent sheaf \mathcal{O}_{U_{i}}^{\oplus^{r}} ,i.e. a direct sum of r copies of the sheaf of regular functions, is called a locally free sheaf of rank r. Locally free sheaves correspond to vector bundles, which we have already discussed in the context of differential geometry and algebraic topology (see Vector Fields, Vector Bundles, and Fiber Bundles). A locally free module of rank 1 is also known as a line bundle. As we have mentioned earlier, a twisting sheaf is locally isomorphic to the sheaf of regular functions, therefore, it is an example of a line bundle.

Sheaves of Differentials and the Cotangent Bundle

We now discuss the concept of differentials. As may be inferred from the name, this concept is somewhat related to concepts in calculus, such as tangents. However, in algebraic geometry we want to be able to define things algebraically, as this contributes to the strength of algebraic geometry in relating algebra and geometry. In addition, in algebraic geometry we may consider not only real and complex numbers but also rational numbers or even finite fields, and some of the methods we have developed in calculus may not always be immediately applicable to the case at hand. Therefore we must “redefine” these objects algebraically, even if they are going to be conceptually inspired by the objects we are already familiar with from calculus.

We now give the definition of differentials (which in the context of algebraic geometry are also called Kahler differentials). Given a homomorphism of rings S\rightarrow R, the module of relative differentials, denoted \Omega_{R/S}, to be the free R-module generated by the formal symbols \{dr|r\in R\}, modulo the following relations:

\displaystyle d(r_{1}+r_{2})=dr_{1}+dr_{2} for r_{1},r_{2}\in R

\displaystyle d(r_{1}r_{2})=r_{1}dr_{2}+r_{2}dr_{1} for r_{1},r_{2}\in R

\displaystyle ds=0 for s\in S

If we have schemes X and Y whose open subsets are given by the spectrum \text{Spec}(R) of some ring R, and a morphism X\rightarrow Y, we have for each of these open subsets a module of relative differentials which we can “glue together” to form a quasi-coherent sheaf called the sheaf of relative differentials, written using the symbol \Omega_{X/Y}. If Y is a point, i.e. it is the spectrum \text{Spec }k of some field k, we simply write \Omega_{X}.

The sheaf of relative differentials \Omega_{X} is also known as the cotangent bundle, since it is dual to the tangent bundle. From the cotangent bundle we can form the canonical bundle by taking exterior products. The exterior product x\wedge y is the tensor product of x and y modulo the relation x\wedge y=-y\wedge x. The canonical bundle is then the top exterior power of the cotangent bundle, i.e. \wedge^{\text{dim}(X)}\Omega_{X}. It is yet another example of a line bundle.

Line Bundles and Divisor Classes

Line bundles (including the canonical bundle) on curves are closely related to divisors (see Divisors and the Picard Group). In fact, the set of all line bundles on a curve X is the same as the Picard group (the group of divisor classes) of X. We will not prove this, but we will elaborate a little on the construction that gives the correspondence between line bundles and divisor classes. Since a line bundle is locally isomorphic to the sheaf of regular functions, a section s of the line bundle corresponds, at least on some open set U to some regular function on U that we denote by \psi(s). Let P be a point in U. We define the order of vanishing \text{ord}_{P}s of the section s as the order of vanishing of the regular function \psi(s) at P.

A rational section of a line bundle is a section of the bundle possibly multiplied by a rational function (which may not necessarily be a function in the set-theoretic sense but merely an expression which is a “fraction” of polynomials). Similar to the case of ordinary sections of the line bundle and regular functions, there is also a correspondence between rational sections and rational functions. We then define the divisor (s) associated to a rational section s by

\displaystyle (s)=\sum_{P\in X}\text{ord}_{P}s\cdot P

On the other hand, given a divisor D, we may obtain a line bundle by associating to the divisor D the set of all rational functions with divisor (\psi), such that

\displaystyle (\psi)+D\geq 0.

The notation means that when we formally add the divisors (\psi) and D, the resulting sum has coefficients which are all greater than or equal to 0. We refer to such a divisor as an effective divisor. Thus we have a means of associating divisors to line bundles and vice-versa, and it is a theorem, which we will not prove, that this gives a correspondence between line bundles and divisor classes.

Preview of the Riemann-Roch Theorem

The correspondence between line bundles and divisor classes will allow us to state the Riemann-Roch theorem (once again, without proof, for now) for the case of complex smooth projective curves. Let h^{0}(D) denote the dimension of the vector space of global sections of the line bundle \mathcal{L}, with corresponding divisor D. We recall that the degree \text{deg}(D) of a divisor D is the sum of its coefficients. The genus g of a curve roughly gives the “number of holes” of the curve as a space whose points have coordinates that are complex numbers (recall that the complex points of a curve actually form a surface – for example, an elliptic curve is actually a torus, which has genus equal to 1). The Riemann-Roch theorem relates all these concepts. Let K_{X} denote the divisor corresponding to the canonical bundle of the curve X, and let D be any divisor on X. Then the Riemann-Roch theorem is the following statement:

\displaystyle h^{0}(D)-h^{0}(K_{X}-D)=\text{deg}(D)+1-g

More on the Riemann-Roch theorem, including its proof, examples of its applications, and generalization to varieties other than curves, will be left to the references for now. It is intended and hoped for, however, that these subjects will be tackled at some later time on this blog.

References:

Sheaf of Modules on Wikipedia

Coherent Sheaf on Wikipedia

Kahler Differential on Wikipedia

Cotangent Sheaf on Wikipedia

Cotangent Bundle on Wikipedia

Canonical Bundle on Wikipedia

Sheaves of Modules by Charles Siegel on Rigorous Trivialities

Locally Free Sheaves and Vector Bundles by Charles Siegel on Rigorous Trivialities

Line Bundles and the Picard Group by Charles Siegel on Rigorous Trivialities

Differential Forms and the Canonical Bundle by Charles Siegel on Rigorous Trivialities

Riemann-Roch Theorem for Curves by Charles Siegel on Rigorous Trivialities

Algebraic Geometry by Andreas Gathmann

Algebraic Geometry by Robin Hartshorne