Charles Siegel and Matt DeLand’s AG from the Beginning

Algebraic geometry is one of the subjects I’m most interested in, as might be inferred from many of my posts on this blog. I’ve discussed the idea of what algebraic geometry is all about, and hopefully for the readers of this blog this will suffice as motivation and introduction to this beautiful and grand subject whose developments have become a central ingredient to the story of modern mathematics, following the lead of one of the most influential mathematicians in modern times, Alexander Grothendieck.

The subject of algebraic geometry is vast, and many ideas are pretty abstract. On top of that, as I myself am still learning the subject, it might not be a very good idea for me to try and take on the subject, even just the basic aspects of it, in its entirety. One of my goals has been to present modern mathematics (and physics) to readers in a more accessible way, “bridging” perhaps the gap between the subject as taught in a university, and a layperson or a beginner who might not have had the access to more formal education in the subject. For this reason I have tried to make this blog as self-contained as possible, although I always provide references for those who want to follow-up on the introductory discussions made on the blog.

However, this goal would perhaps be too ambitious for me, or at least too inconvenient; it is also my goal (in fact my primary goal) to make this blog a sort of repository of notes that I make while studying, and I cannot accomplish this goal if I am bogged down too much in the exposition of the basics.

With all that said, there is actually a blog which has accomplished this ambitious goal of presenting algebraic geometry “from the beginning”. I therefore present the series AG from the Beginning at the blog Rigorous Trivialities, written mostly by Charles Siegel with contributions from Matt DeLand. My experience writing on my own blog has made me appreciate how much of an effort is put into a series like this, and as I continue to study I continue to read the posts to help with my own understanding. In fact, along with the blog This Week’s Finds in Mathematical Physics by John Baez (which goes back all the way to 1993), it was one of the blogs I looked up to for inspiration when starting this blog. From there I got the idea of trying to make things as self-contained as possible, but as I have said above with certain things I seem to have reached my limit, and with the existence of such a project, another attempt would perhaps be redundant.

As to the content of this blog, things will mostly remain the same, but I’ll take some of the load off my shoulders by simply linking to this series in my posts. This will allow me to not get bogged down too much in exposition, as I have said earlier, and I can take on certain subjects with much more freedom. Also, things are poised to get busier for me in the coming weeks, and I want to keep writing on this blog, so I think I could do with less mental pressure trying to discuss an entire subject in logical order (this, especially, has not historically been my strong suit). With some (but not all) of the subjects I have already discussed on this blog I am quite confident that I have already provided a reasonable amount of motivation and introduction for the reader; perhaps I can now also do more of what the tagline of the blog says, “ramblings” on math and physics (and stuff).

Vector Fields, Vector Bundles, and Fiber Bundles

In physics we have the concept of a vector field. Intuitively, a vector field is given by specifying a vector (in the sense of a quantity with magnitude and direction) at every point in a certain “space”. For instance, the wind velocity on the surface of our planet is a vector field. If we neglect the upward or downward dimension, and look only at the northward, southward, eastward, and westward directions, we have what we usually see on weather maps on the news. In one city the wind might be blowing strongly to the north, in another city it might be blowing weakly to the east, and in a third city it might be blowing moderately to the southwest.

If, instead of specifying a vector space (see Vector Spaces, Modules, and Linear Algebra) at every point, instead of just a single vector, we obtain instead the concept of a vector bundle. Given a vector bundle, we can obtain a vector field by choosing just one vector in the vector space. More technically, we say that a vector field is a section of the vector bundle.

A vector space can be thought of as just a certain kind of space; in our example of wind velocities on the surface of the Earth, the vector space that we attach to every point is the plane \mathbb{R}^{2} endowed with an intuitive vector space structure. Given a point on the plane, we draw an “arrow” with its “tail” at the chosen origin of the plane and its “head” at the given point. We can then add and scale these arrows to obtain other arrows, hence, these arrows form a vector space. This “graphical” method of studying vectors (again in the sense of a quantity with magnitude and direction) is in fact one of the most common ways of introducing the concept of vectors in physics.

If, instead of a vector space such as the plane \mathbb{R}^{2} we generalize to other kinds of spaces such as the circle S^{1}, we obtain the notion of a fiber bundle. A vector space is therefore just a special case of a fiber bundle. In Category Theory, we described the torus as a fiber bundle, obtained by “gluing” a circle to every point of another circle. The shape that is glued is called the “fiber“, and the shape to which the fibers are glued is called the “base“.

Simply gluing spaces to the points of another space does not automatically mean that the space obtained is a fiber bundle, however. There is another requirement. Consider, for example, a cylinder. This can be described as a fiber bundle, with the fibers given by lines, and the base given by a circle (this can also be done the other way around, but we use this description for the moment because we will use it to describe an important condition for a space to be a fiber bundle). However, another fiber bundle can be obtained from lines (as the fibers) and a circle (as the base). This other fiber bundle can be obtained by “twisting” the lines as we “glue” them to the points of a circle, resulting in the very famous shape known as the Mobius strip.

The cylinder, which exhibits no “twisting”, is the simplest kind of fiber bundle, called a trivial bundle. Still, even if the Mobius strip has some kind of “twisting”, if we look at them “locally”, i.e. only on small enough areas, there is no difference between the cylinder and the Mobius strip. It is only when we look at them “globally” that we can distinguish the two. This is the important requirement for a space to be a fiber bundle. Locally, they must “look like” the trivial bundle. This condition is related to the notion of continuity (see Basics of Topology and Continuous Functions).

The concept of fiber bundles can be found everywhere in physics, and forms the language for many of its branches. We have already stated an example, with vector fields on a space. Aside from wind velocities (and the velocities of other fluids), the concept of vector fields are also used to express quantities such as electric and magnetic fields.

Fiber bundles can also be used to express ideas that are not so easily visualized. For example, in My Favorite Equation in Physics we mentioned the concept of a phase space, whose coordinates represent the position and momentum of a system, which is used in the Hamiltonian formulation of classical mechanics. The phase space of a system is an example of a kind of fiber bundle called a cotangent bundle. Meanwhile, in Einstein’s general theory of relativity, the concept of a tangent bundle is used to study the curvature of spacetime (which in the theory is what we know as gravity, and is related to mass, or more generally, energy and momentum).

More generally, the tangent bundle can be used to study the curvature of objects aside from spacetime, including more ordinary objects like a sphere, or hills and valleys on a landscape. This leads to a further generalization of the notion of “curvature” involving other kinds of fiber bundles aside from tangent bundles. This more general idea of curvature is important in the study of gauge theories, which is an important part of the standard model of particle physics. A good place to start for those who want to understand curvature in the context of tangent bundles and fiber bundles is by looking up the idea of parallel transport.

Meanwhile, in mathematics, fiber bundles are also very interesting in their own right. For example, vector bundles on a space can be used to study the topology of a space. One famous result involving this idea is the “hairy ball theorem“, which is related to the observation that on our planet every typhoon must have an “eye”. However, on something that is shaped like a torus instead of a sphere (like, say, a space station with an artificial atmosphere), one can have a typhoon with no eye, simply by running the wind along the walls of the torus. Replacing wind velocities by magnetic fields, this becomes the reason why fusion reactors that use magnetic fields to contain the very hot plasma are shaped like a torus instead of like a sphere. We recall, of course, that the sphere and the torus are topologically inequivalent, and this is reflected in the very different characteristics of vector fields on them.

The use of vector bundles in topology has led to such subjects of mathematics such as the study of characteristic classes and K-theory. The concept of mathematical objects “living on” spaces should be reminiscent of the ideas discussed in Presheaves and Sheaves; in fact, in algebraic geometry the two ideas are very much related. Since algebraic geometry serves as a “bridge” between ideas from geometry and ideas from abstract algebra, this then leads to the subject called algebraic K-theory, where ideas from topology get carried over to abstract algebra and linear algebra (even number theory).

References:

Fiber Bundle on Wikipedia

Vector Bundle on Wikipedia

Vector Field on Wikipedia

Parallel Transport on Wikipedia

What is a Field? at Rationalizing the Universe

Algebraic Geometry by Andreas Gathmann

Algebraic Topology by Allen Hatcher

A Concise Course in Algebraic Topology by J. P. May

Geometrical Methods of Mathematical Physics by Bernard F. Schutz

Geometry, Topology and Physics by Mikio Nakahara

Galois Groups

In Algebraic Numbers we discussed algebraic number fields and a very important group associated associated to an algebraic number field called its ideal class group. In this post we define another very important group called the Galois group. They are named after the mathematician Evariste Galois, who lived in the early 19th century and developed the theory before his early death in a duel (with mysterious circumstances) at the age of 20 years old.

The problem that motivated the development of Galois groups was the solution of polynomial equations of higher degree. We know that for quadratic equations (equations of degree 2) there exists a “quadratic formula” that allows us to solve for the roots of any quadratic equation. For cubic equations (equations of degree 3) and quartic equations (equations of degree 4), there is also a similar “cubic formula” and a “quartic formula”, although they are not as well-known as the  quadratic formula.

However for quintic equations (equations of degree 5) there is no “quintic formula”. What this means is that not every quintic equation can be solved by a finite number of additions, subtractions, multiplications, divisions, and extractions of roots. Some quintic equations, of course, can be easily solved using these operations, such as x^{5}-1=0. However this does not hold true for all quintic equations. This was proven by another mathematician, Niels Henrik Abel, but it was Galois who gave the conditions needed to determine whether a quintic equation could be solved using the aforementioned operations or not.

The groundbreaking strategy that Galois employed was to study the permutations of roots of polynomial equations. These permutations are the same as the field automorphisms of the smallest field extension (see Algebraic Numbers for the definition of a field extension) that contains these roots (called the splitting field of the polynomial equation) which also fix the field of coefficients of the polynomial.

By “field automorphisms” we mean a function f from a field to itself such that the following conditions are satisfied:

f(a+b)=f(a)+f(b)

f(ab)=f(a)f(b)

By “fix” we mean that if a is an element of the field of coefficients of the polynomial equation, then we must have

f(a)=a

We might perhaps do better by discussing an example. We do not delve straight into quintic equations, and consider first the much simpler case of a quadratic equation such as x^{2}+1=0. We consider the polynomial x^{2}+1 as having coefficients in the field \mathbb{Q} of rational numbers. The roots of this equation are i and -i, and the splitting field is the field \mathbb{Q}[i].

Since there are only two roots, we only have two permutations of these roots. One is the identity permutation, which sends i to i and -i to -i, and the other is the permutation that exchanges the two, sending i to -i and -i to i. The first one corresponds to the identity field automorphism of \mathbb{Q}[i], while the second one corresponds to the complex conjugation field automorphism of \mathbb{Q}[i]. Both these permutations preserve \mathbb{Q}.

These permutations (or field automorphisms) form a group (see Groups), which is what we refer to as the Galois group of the field extension (the splitting field, considered as a field extension of the field of coefficients of the polynomial) or the polynomial.

The idea is that the “structure” of the Galois group, as a group, is related to the “structure” of the field extension. For example, the subgroups of the Galois groups correspond to the “intermediate fields” contained in the splitting field but containing the field of coefficients of the polynomial.

Using this idea, Galois showed that whenever the Galois group of an irreducible quintic polynomial is the symmetric group S_{5} (the group of permutations of the set with 5 elements) or the alternating group A_{5} (the group of “even” permutations of the set with 5 elements), then the polynomial cannot be solved using a finite number of additions, subtractions, multiplications, division, and extractions of roots. This happens, for example, when the irreducible quintic polynomial has three real roots, as in the case of x^{5}-16x+2. More details of the proof can be found in the last chapter of the book Algebra by Michael Artin.

Although the Galois group was initially developed to deal with problems regarding the solvability of polynomial equations, they have found applications beyond this original purpose and have become a very important part of many aspects of modern mathematics, especially in (but not limited to, rather surprisingly) number theory.

For example, the study of “representations” of Galois groups in terms of linear transformations of vector spaces (see Vector Spaces, Modules, and Linear Algebra) is an important part of the proof of the very famous problem called Fermat’s Last Theorem by the mathematician Andrew Wiles in 1994. A very active field of research in the present day related to representations of Galois groups is called the Langlands program. In particular, what is usually being studied is the “absolute” Galois group – the group of field automorphisms of the set of all algebraic numbers that fix the field \mathbb{Q} of rational numbers. A book that makes these ideas accessible to a more general audience is Fearless Symmetry: Exposing the Hidden Patterns of Numbers by Avner Ash and Robert Gross.

References:

Galois Theory on Wikipedia

Galois Group on Wikipedia

Wiles’ Proof of Fermat’s Last Theorem on Wikipedia

Langlands Program on Wikipedia

Fearless Symmetry: Exposing the Hidden Patterns of Numbers by Avner Ash and Robert Gross

Algebra by Michael Artin

Algebraic Numbers

In this post we revisit certain topics discussed in one of the earliest posts on this blog, namely, The Fundamental Theorem of Arithmetic and Unique Factorization. In that post we introduced certain “numbers” such as \mathbb{Z}[i], also referred to as the Gaussian integers, and \mathbb{Z}[\sqrt{-5}], which I currently do not know the name of, despite it being one of the most basic examples of “numbers” displaying “weird” behavior such as the failure of unique factorization.

In that post we have been quite vague, and it is the intention of this post to start taking on the same topics with a little more clarity and rigor.

We define two important concepts – algebraic numbers and finite degree field extensions of the field of rational numbers \mathbb{Q}. These two concepts are the objects of study of the branch of mathematics called algebraic number theory.

An algebraic number is a complex number which is the root of a polynomial with integer coefficients. The square root of -1, which we of course write as i, is an example of an algebraic number. It is a root of the equation

x^{2}+1=0

Numbers that are not algebraic numbers are called transcendental numbers. Examples of transcendental numbers are the constants \pi and e.

Given a field (see Rings, Fields, and Ideals) F, a field extension of F is another field K that contains F as a subset (or rather, a subfield). The degree of a field extension of F is its dimension (see More on Vector Spaces and Modules) as a vector space whose field of scalars is F.

It is known that every element of a finite degree field extension of the field of rational numbers \mathbb{Q} is an algebraic number. Hence, such a field extension is also called an algebraic number field.

An algebraic number which is the root of a monic polynomial with integer coefficients is called an algebraic integer. A monic polynomial is a polynomial where the term with the highest degree has a coefficient of 1. Hence, i is not only an algebraic number, but is also an algebraic integer, since the polynomial x^{2}+1 is monic. The algebraic integers in an algebraic number field form a ring. They are related to the elements of the algebraic number field in an analogous way to how ordinary integers are related to rational numbers.

The ring of Gaussian integers \mathbb{Z}[i] is the ring of algebraic integers of the algebraic number field \mathbb{Q}[i], which is made up of complex numbers whose real and imaginary parts are both rational numbers, while the ring \mathbb{Z}[\sqrt{-5}] is the ring of algebraic integers of the algebraic number field \mathbb{Q}[\sqrt{-5}], which is made up of complex numbers which can be written in the form a+b\sqrt{-5}, where a and b are rational numbers.

A unit is an element of the ring of algebraic integers of an algebraic number field which has a multiplicative inverse.As we have already seen in previous posts, it is important to identify the units in the ring of algebraic integers because we have to exclude them when we talk about unique factorization.

One of the things we can do with an algebraic number field is to study the factorization of its ring of algebraic integers. We have explored a little bit of this in The Fundamental Theorem of Arithmetic and Unique Factorization, and we have seen that in the ring \mathbb{Z}[\sqrt{-5}] the factorization into irreducible elements fails to be unique. For example, we may have

6=2\cdot 3=(1+\sqrt{-5})(1-\sqrt{-5})

The numbers 2, 3, 1+\sqrt{-5}, and 1-\sqrt{-5} are all irreducible in the ring \mathbb{Z}[\sqrt{-5}].

However, for certain rings called Dedekind domains, even if unique factorization into irreducible elements does not hold, the ideals of the ring may still be factored uniquely as a product of the prime ideals (see More on Ideals) of the ring. The ring of algebraic integers of an algebraic number field happens to be a Dedekind domain. We will discuss this factorization of ideals next.

We recall that an ideal of a ring is a subset of the ring which is closed under addition and multiplication by elements of the ring. In other words, it is a subset of the ring which is also a module with the ring itself as its ring of scalars. Perhaps the most simple kind of ideal is a principal ideal, written (a) for an element of the ring a, which consists of all products of a with all the other elements of the ring. We may also say that the ideal (a) is the set of all multiples of a.

In the ring of ordinary integers \mathbb{Z}, all ideals are principal ideals. However, this may not be true for more general rings. For example, in the ring \mathbb{Z}[\sqrt{-5}], consider the set of linear combinations of 2 and 1+\sqrt{-5}, i.e. the set of elements of \mathbb{Z}[\sqrt{-5}] which can be written as a(2)+b(1+\sqrt{-5}), where a and b are elements \mathbb{Z}[\sqrt{-5}]. This set, written (2, 1+\sqrt{-5}), forms an ideal, but this ideal is not a principal ideal. It is not the set of multiples of a single element. However, it is closed under addition and multiplication by any element of \mathbb{Z}[\sqrt{-5}].

 Given two ideals \mathfrak{a} and \mathfrak{b} in some ring, the product \mathfrak{a}\mathfrak{b} is the set of all elements of the ring which can be written as a_{1}b_{1}+a_{2}b_{2}+a_{3}b_{3}+... where a_{1}, a_{2}, a_{3},... are elements of the ideal \mathfrak{a} and b_{1}, b_{2}, b_{3},... are elements of the ideal \mathfrak{b}.

We can now state the following “ideal-theoretic” analogue of the fundamental theorem of arithmetic (quoted from the book Algebraic Number Theory by Jurgen Neukirch):

Every ideal of \mathcal{O} different from (0) and (1) admits a factorization

\mathfrak{a}=\mathfrak{p}_{1}...\mathfrak{p}_{r}

into nonzero prime ideals \mathfrak{p}_{i} of \mathcal{O} which is unique up to the order of the factors.

Here the symbol \mathcal{O} refers to the ring of algebraic integers of an algebraic number field.

We recall once again our example showing the failure of unique factorization in \mathbb{Z}[\sqrt{-5}]:

6=2\cdot 3=(1+\sqrt{-5})(1-\sqrt{-5})

If we instead consider ideals instead of individual elements, we would have

(6)=(2)(3)=(1+\sqrt{-5})(1-\sqrt{-5})

(Note: Parentheses are used to denote principal ideals in abstract algebra and algebraic number theory. However, they are also used to denote multiplication of expressions, as in basic arithmetic and algebra. Hopefully the intended purpose of the parentheses will be obvious from the context and will not cause too much confusion for the reader. In the examples above, we have first used them for individual elements of the ring, and later on, for ideals, which are sets of elements of the ring.)

But the ideals in the last expression can be factored even further:

(2)=(2,1+\sqrt{-5})(2,1-\sqrt{-5})

(3)=(3,1+\sqrt{-5})(3,1-\sqrt{-5})

(1+\sqrt{-5})=(2,1+\sqrt{-5})(3,1+\sqrt{-5})

(1-\sqrt{-5})=(2,1-\sqrt{-5})(3,1-\sqrt{-5})

Therefore, the principal ideal (6) admits a unique factorization as a product of ideals as follows:

(6)=(2,1+\sqrt{-5})(2,1-\sqrt{-5})(3,1+\sqrt{-5})(3,1-\sqrt{-5})

We turn next to the definition of the class number of an algebraic number field, which was given a passing mention in The Fundamental Theorem of Arithmetic and Unique Factorization. The class number “measures” in some way the failure of unique factorization, and if its value is equal to 1, then unique factorization holds (this also means that all ideals in the ring of algebraic integers of the algebraic number field are principal ideals).

To define the class number, we first have to introduce the concept of a fractional ideal. A fractional ideal is a module which is obtained by taking the linear combinations of products of a finite number of elements of an algebraic number field with its ring of algebraic integers. Note that these elements need not be an algebraic integer itself. For example, the set

...-\frac{3}{2}, -1, -\frac{1}{2}, 0, \frac{1}{2}, 1, \frac{3}{2}...

is obtained by taking the products of the rational number \frac{1}{2} with the ordinary integers. We write it as (\frac{1}{2}), and, in analogy with principal ideals, we refer to such fractional ideals which are “generated” by a single element as principal fractional ideals. It is a property of fractional ideals that one can multiply them by a certain algebraic integer (which is an element of the ring of algebraic integers of the algebraic number field to which it belongs) and get back the ring of algebraic integers of the algebraic number field. For the example above, we can multiply each element by 2 and get back the ordinary integers.

The fractional ideals, including the principal fractional ideals, form a group (see Groups) under multiplication. The ideal class group is then the group obtained by taking the quotient (see Modular Arithmetic and Quotient Sets) of the group of fractional ideals by the group of principal fractional ideals. The ideal class group only has a finite number of elements (called ideal classes), and this number is called the class number.

There is another way to define the ideal classes. We will say that two ideals \mathfrak{a} and \mathfrak{b} are equivalent, written \mathfrak{a}\sim \mathfrak{b}, if there exist principal ideals (a) and (b) such that (a)\mathfrak{a}=(b)\mathfrak{b}. The ideals that are equivalent to each other then form an equivalence class, and these equivalence classes are the ideal classes. The set of ideal classes form a group, which is the ideal class group.

The ring \mathbb{Z}[\sqrt{-5}], which does not possess unique factorization (of elements) has two ideal classes – the class of principal fractional ideals, and another class, which includes the ideal (2, 1+\sqrt{-5}). Hence its class number is 2.

In summary, algebraic number fields are not always uniquely factorizable into irreducible elements. The class number (which requires the concept of ideals to be properly defined) allows us to somehow “measure” the failure of unique factorization. However, despite the failure of factorization of elements, there is always the uniqueness of factorization for ideals.

This makes up the basics of the subject of algebraic number theory, which I find to be interestingly named – on one hand, it is “algebraic” number theory, which means that it uses concepts from abstract algebra to study numbers. On the other hand, it is “algebraic number” theory, which means that it is the study of algebraic numbers, which we have defined above as the numbers that are zeroes of polynomials with integer coefficients. Algebraic number theory is one of the oldest and most revered branches of mathematics, and has developed consistently and grown in beauty and elegance throughout history – including in modern times.

References:

Algebraic Number Theory in Wikipedia

Algebra by Michael Artin

A Classical Introduction to Modern Number Theory by Kenneth Ireland and Michael Rosen

Algebraic Number Theory by Jurgen Neukirch

Some Basics of Quantum Mechanics

In My Favorite Equation in Physics we discussed a little bit of classical mechanics, the prevailing view of physics from the time of Galileo Galilei up to the start of the 20th century. Keeping in mind the ideas we introduced in that post, we now move on to one of the most groundbreaking ideas in the history of physics since that time (along with Einstein’s theory of relativity, which we have also discussed a little bit of in From Pythagoras to Einstein), the theory of quantum mechanics (also known as quantum physics).

We recall one of the “guiding” questions of physics that we mentioned in My Favorite Equation in Physics:

“Given the state of a system at a particular time, in what state will it be at some other time?”

This emphasizes the importance of the concept of “states” in physics. We recall that the state of a system (for simplicity, we consider a system made up of only one object whose internal structure we ignore – it may be a stone, a wooden block, a planet – but we may refer to this object as a “particle”) in classical mechanics is given by its position and velocity (or alternatively its position and momentum).

A system consisting of a single particle, whose state is specified by its position and velocity, or its position and momentum, might just be the simplest system that we can study in classical mechanics. But in this post, discussing quantum mechanics, we will start with something even simpler.

Consider a light switch. It can be in a “state” of “on” or “off”. Or perhaps we might consider a coin. This coin can be in a “state” of “heads” or “tails”. We consider a similar system for reasons of simplicity. In real life, there also exist such systems with two states, and they are being studied, for example, in cutting-edge research on quantum computing. In the context of quantum mechanics, such systems are called “qubits“, which is short for “quantum bits”.

Now an ordinary light switch may only be in a state of “on” or “off”, and an ordinary coin may be in a state of “heads” or “tails”, but we cannot have a state that is some sort of “combination” of these states. It would be unthinkable in our daily life. But a quantum mechanical system which can be in any of two states may also be in some combination of these states! This is the idea at the very heart of quantum mechanics, and it is called the principle of quantum superposition. Its basic statement can be expressed as follows:

If a system can exist in any number of classical states, it can also exist in any linear combination of these states.

This means that the space of states of a quantum mechanical system form a vector space. The concept of vector spaces, and the branch of mathematics that studies it, called linear algebra, can be found in Vector Spaces, Modules, and Linear Algebra. Linear algebra (and its infinite-dimensional variant called functional analysis) is the language of quantum mechanics. We have to mention that the field of “scalars” of this vector space is the set of complex numbers \mathbb{C}.

There is one more mathematical procedure that we have to apply to these states, called “normalization“, which we will learn about later on in this post. First we have to explain what it means if we have a state that is in a “linear combination” of other states.

We write our quantum state in the so-called “Dirac notation”. Consider the “quantum light switch” we described earlier (in real life, we would have something like an electron in “spin up” or “spin down” states, or perhaps a photon in “horizontally polarized” or “vertically polarized” states). We write the “on” state as

|\text{on}\rangle

and the “off” state as

|\text{off}\rangle

The principle of quantum superposition states that we may have a state such as

|\text{on}\rangle+|\text{off}\rangle

This state is made up of equal parts “on” and “off”. Quantum-mechanically, such a state may exist, but when we, classical beings as we are (in the sense that we are very big) interact or make measurements of this system, we only find it in either in a state of “on” or “off”, and never in a state that is in a linear combination of both. What then, does it mean for it to be in a state that is a linear combination of both “on” and “off”, if we can never even find it in such a state?

If a system is in the state |\text{on}\rangle+|\text{off}\rangle before we make our measurement, then there are equal chances that we will find it in the “on” state or in the “off” state after measurement. We do not know beforehand whether we will get an “on” state or “off” state, which implies that there is a certain kind of “randomness” involved in quantum-mechanical systems.

It is at this point that we reflect on the nature of randomness. Let us consider a process we would ordinarily suppose to be “random”, for example the flipping of a coin, or the throwing of a die. We consider these processes random because we do not know all the factors at play, but if we had all the information, such as the force of the flip or the throw, the air resistance and its effects, and so on, and we make all the necessary calculations, at least “theoretically” we would be able to predict the result. Such a process is not really random; we only consider it random because we lack a certain knowledge that if we only possessed, we could use in order to determine the result with absolute certainty.

The “randomness” in quantum mechanics involves no such knowledge; we could know everything that is possible for us to know about the system, and yet, we could never predict with absolute certainty whether we would get an “on” or an “off” state when we make a measurement on the system. We might perhaps say that this randomness is “truly random”. All we can conclude, from our knowledge that the state of the system before measurement is |\text{on}\rangle+|\text{off}\rangle, is that there are equal chances of finding it in the “on” or “off” state after measurement.

If the state of the system before measurement is |\text{on}\rangle, then after measurement it will also be in the state |\text{on}\rangle. If we had some state like 1000|\text{on}\rangle+5|\text{off}\rangle before measurement, then there will be a much greater chance that it will be in the state |\text{on}\rangle after measurement, although there is still a small chance that it will be in the state |\text{off}\rangle.

We now introduce the concept of normalization. We have seen that the “coefficients” of the components of our “state vector” correspond to certain probabilities, although we have not been very explicit as to how these coefficients are related to the probabilities. We have a well-developed mathematical language to deal with probabilities. When an event is absolutely certain to occur, for instance, we say that the event has a probability of 100%, or that it has a probability of 1. We want to use this language in our study of quantum mechanics.

We discussed in Matrices the concept of a linear functional, which assigns a real or complex number (or more generally an element of the field of scalars) to any vector. For vectors expressed as column matrices, the linear functionals were expressed as row matrices. In Dirac notation, we also call our “state vectors”, such as |\text{on}\rangle and |\text{off}\rangle, “kets”, and we will have special linear functionals \langle \text{on}| and \langle \text{off}|, called “bras” (the words “bra” and “ket” come from the word “bracket”, with the fate of the letter “c” unknown; this notation was developed by the physicist Paul Dirac, who made great contributions to the development of quantum mechanics).

The linear functional \langle \text{on}| assigns to any “state vector” representing the state of the system before measurement a certain number which when squared (or rather “absolute squared” for complex numbers) gives the probability that it will be found in the state |\text{on}\rangle after measurement. We have said earlier that if the system is known to be in the state |\text{on}\rangle before the measurement, then after the measurement the system will also be in the state |\text{on}\rangle. In other words, given that the system is in the state |\text{on}\rangle before measurement, the probability of finding it in the state |\text{on}\rangle after measurement is equal to 1. We express this explicitly as

|\langle \text{on}|\text{on}\rangle|^{2}=1

From this observation, we make the requirement that

\langle \psi |\psi \rangle=1

for any state |\psi\rangle. This will lead to the requirement that if we have the state C_{1}|\text{on}\rangle+C_{2}|\text{off}\rangle, the coefficients C_{1} and C_{2} must satisfy the equation

|C_{1}|^{2}+|C_{2}|^{2}=1

or

C_{1}^{*}C_{1}+C_{2}^{*}C_{2}=1

The second expression is to remind the reader that these coefficients are complex. Since we express probabilities as real numbers, it is necessary that we use the “absolute square” of these coefficients, given by multiplying each coefficient by its complex conjugate.

So, in order to express the state where there are equal chances of finding the system in the state |\text{on}\rangle or in the state |\text{off}\rangle after measurement, we do not write it anymore as |\text{on}\rangle+|\text{off}\rangle, but instead as

\frac{1}{\sqrt{2}}|\text{on}\rangle+\frac{1}{\sqrt{2}}|\text{off}\rangle

The factors of \frac{1}{\sqrt{2}} are there to make our notation agree with the notation in use in the mathematical theory of probabilities, where an event which is certain has a probability of 1. They are called “normalizing factors”, and this process is what is known as normalization.

We may ask, therefore, what is the probability of finding our system in the state |\text{on}\rangle after measurement, given that before the measurement it was in the state \frac{1}{\sqrt{2}}|\text{on}\rangle+\frac{1}{\sqrt{2}}|\text{off}\rangle. We already know the answer; since there are equal chances of finding it in the state |\text{on}\rangle or |\text{off}\rangle, then we should have a 50% probability of finding it in the state |\text{on}\rangle after measurement, or that this result has probability 0.5. Nevertheless, we show how we use Dirac notation and normalization to compute this probability:

|\langle \text{on}|(\frac{1}{\sqrt{2}}|\text{on}\rangle+\frac{1}{\sqrt{2}}|\text{off}\rangle)|^{2}

|\langle \text{on}|\frac{1}{\sqrt{2}}|\text{on}\rangle+\langle \text{on}|\frac{1}{\sqrt{2}}|\text{off}\rangle)|^{2}

|\frac{1}{\sqrt{2}}\langle \text{on}|\text{on}\rangle+\frac{1}{\sqrt{2}}\langle \text{on}|\text{off}\rangle)|^{2}

We know that \langle \text{on}|\text{on}\rangle=1, and that \langle \text{on}|\text{off}\rangle=0, which leads to

|\frac{1}{\sqrt{2}}|^{2}

\frac{1}{2}

as we expected. We have used the “linear” property of the linear functionals here, emphasizing once again how important the language of linear algebra is to describing quantum mechanics.

For now that’s about it for this post. We have glossed over many aspects of quantum mechanics in favor of introducing and emphasizing how linear algebra is used as the foundation for its language; and the reason why linear algebra is chosen is because it fits with the principle at the very heart of quantum mechanics, the principle of quantum superposition.

So much of the notorious “weirdness” of quantum mechanics comes from the principle of quantum superposition, and this “weirdness” has found many applications both in explaining why our world is the way it is, and also in improving our quality of life through technological inventions such as semiconductor electronics.

I’ll make an important clarification at this point; we do not really “measure” the state of the system. What we really measure are “observables” which tell us something about the state of the system. These observables are represented by linear transformations, but to understand them better we need the concept of eigenvectors and eigenvalues, which I have not yet discussed in this blog, and did not want to discuss too much in this particular post. In the future perhaps we will discuss it; for now the reader is directed to the references listed at the end of the post. What we have discussed here, the probability of finding the system in a certain state after measurement given that it is in some other state before measurement, is related to the phenomenon known as “collapse“.

Also, despite the fact that we have only tackled two-state (or qubit) systems in this post, it is not too difficult to generalize, at least conceptually, to systems with more states, or even systems with an infinite number of states. The case where the states are given by the position of a particle leads to the famous wave-particle duality. The reader is encouraged once again to read about it in the references below, and at the same time try to think about how one should generalize what we have discussed in here to that case. Such cases will hopefully be tackled in future posts.

(Side remark: I had originally intended to cover quite a bit of ground in at least the basics of quantum mechanics in this post; but before I noticed it had already become quite a hefty post. I have not even gotten to the Schrodinger equation. Well, hopefully I can make more posts on this subject in the future. There’s so much one can make a post about when it comes to quantum mechanics.)

References:

Quantum Mechanics on Wikipedia

Quantum Superposition on Wikipedia

Bra-Ket Notation on Wikipedia

Wave Function Collapse on Wikipedia

Parallel Universes #1 – Basic Copenhagen Quantum Mechanics at Passion for STEM

If You’re Losing Your Grip on Quantum Mechanics, You May Want to Read Up on This at quant-ph/math-ph

The Feynman Lectures on Physics by Richard P. Feynman

Introduction to Quantum Mechanics by David J. Griffiths

Modern Quantum Mechanics by Jun John Sakurai