Differential Forms

Differential forms are important concepts in differential geometry and mathematical physics. For example, they can be used to express Maxwell’s equations (see Some Basics of (Quantum) Electrodynamics) in a very elegant form. In this post, however, we will introduce these mathematical objects as generalizing certain aspects of integral calculus (see An Intuitive Introduction to Calculus), allowing us to perform integration over surfaces, volumes, or their higher-dimensional analogues.

We recall from An Intuitive Introduction to Calculus the statement of the fundamental theorem of calculus:

\displaystyle \int_{a}^{b}\frac{df}{dx}dx=f(b)-f(a).

Regarding the left hand side of this equation, we usually we say that we integrate over the interval from a to b; we may therefore also write it more suggestively as

\displaystyle \int_{[a,b]}\frac{df}{dx}dx=f(b)-f(a).

We note that a and b form the boundary of the interval [a,b]. We denote the boundary of some “shape” M by \partial M. Therefore, in this case, \partial [a,b]=\{a\}\cup\{b\}.

Next we are going to perform some manipulations on the notation, which, while we will not thoroughly justify in this post, are meant to be suggestive and provide intuition for the discussion on differential forms. First we need the notion of orientation. We can imagine, for example, an “arrow” pointing from a to b; this would determine one orientation. Another would be determined by an “arrow” pointing from b to a. This is important because we need a notion of integration “from a to b” or “from b to a“, and the two are not the same. In fact,

\displaystyle \int_{a}^{b}\frac{df}{dx}dx=-\int_{b}^{a}\frac{df}{dx}dx

i.e. there is a change of sign if we “flip” the orientation. Although an interval such as [a,b] is one-dimensional, the notion of orientation continues to make sense in higher dimension. If we have a surface, for example, we may consider going “clockwise” or “counterclockwise” around the surface. Alternatively we may consider an “arrow” indicating which “side” of the surface we are on. For three dimensions or higher it is harder to visualize, but we will be able to make this notion more concrete later on with differential forms.

Given the notion of orientation, let us now denote the boundary of the interval [a,b], taken with orientation, for instance, “from a to b“, by \{a\}^{-}\cup\{b\}^{+}.

Let us now write

\displaystyle \frac{df}{dx}dx=df

and then we can write the fundamental theorem of calculus as

\displaystyle \int_{[a,b]}df=f(b)-f(a).

Then we consider the idea of “integration over points”, by which we refer to simply evaluating the function at those points, with the orientation taken into account, such that we have

\displaystyle \int_{\{a\}^{-}\cup\{b\}^{+}}f=f(b)-f(a)

Recalling that \partial [a,b]=\{a\}^{-}\cup\{b\}^{+}, this now gives us the following expression for the fundamental theorem of calculus:

\displaystyle \int_{[a,b]}df=\int_{\{a\}^{-}\cup\{b\}^{+}}f

\displaystyle \int_{[a,b]}df=\int_{\partial [a,b]}f

Things may still be confusing to the reader at this point – for instance, that integral on the right hand side looks rather weird – we will hopefully make things more concrete shortly. For now, the rough idea that we want to keep in mind is the following:

The integral of a “differential” of some function over some shape is equal to the integral of the function over the boundary of the shape.

In one dimension, this is of course the fundamental theorem of calculus as we have stated it earlier. For two dimensions, there is a famous theorem called Green’s theorem. In three dimensions, there are two manifestations of this idea, known as Stokes’ theorem and the divergence theorem. The more “concrete” version of this statement, which we want to discuss in this post, is the following:

The integral of the exterior derivative of a differential form over a manifold with boundary is equal to the integral of the differential form over the boundary.

We now discuss what these differential forms are. Instead of the formal definitions, we will start with special cases, develop intuition with examples, and attempt to generalize. The more formal definitions will be left to the references. We will start with the so-called 1-forms, which are “linear combinations” of the “differentials”.

We can think of these “differentials” as merely symbols for now, or perhaps consider them analogous to “infinitesimal quantities” in calculus. In differential geometry, however, they are actually “dual” to vectors, mapping vectors to numbers in the same way that row matrices map column matrices to the numbers which serve as their scalars (see Matrices) of the coordinates, with coefficients which are functions:

\displaystyle f_{1}dx+f_{2}dy+f_{3}dz

From now on, to generalize, instead of the coordinates x, y, and z we will use x^{1}, x^{2}, x^{3}, and so on. We will write exponents as (x^{1})^{2}, to hopefully avoid confusion.

From these 1-forms we can form 2-forms by taking the wedge product. In ordinary multivariable calculus, the following expression

\displaystyle dxdy

represents an “infinitesimal area”, and so for example the integral

\displaystyle \int_{0}^{1}\int_{0}^{1}dxdy

gives us the area of a square with vertices at (0,0)(1,0)(0,1), and (1,1). The wedge product expresses this same idea (in fact the wedge product dx\wedge dy is often also called the area form, mirroring the idea expressed by dxdy earlier), except that we want to include the concept of orientation that we discussed earlier. Therefore, in order to express this idea of orientation, we require the wedge product to satisfy the following property called antisymmetry:

\displaystyle dx^{1}\wedge dx^{2}=-dx^{2}\wedge dx^{1}

Note that antisymmetry implies the following relation:

\displaystyle dx^{i}\wedge dx^{i}=-dx^{i}\wedge dx^{i}

\displaystyle dx^{i}\wedge dx^{i}=0

In other words, the wedge product of such a differential form with itself is equal to zero.

We can also form 3-forms, 4-forms, etc. using the wedge product. The collection of all these n-forms, for every n, is the algebra of differential forms. This means that we can add, subtract, and form wedge products of differential forms. Ordinary functions themselves form the 0-forms.

We can also take what is called the exterior derivative of differential forms. If, for example, we have a differential form \omega given by the following expression,

\displaystyle \omega=f dx^{a}

then the exterior derivative of \omega, written d\omega, is given by

\displaystyle d\omega=\sum_{i=1}^{n}\frac{\partial f}{\partial x_{i}}dx^{i}\wedge dx^{a}.

We note that the exterior derivative of a n-form is an n+1-form. We also note that the exterior derivative of an exterior derivative is always zero, i.e. d(d\omega)=0 for any differential form \omega. A differential form which is the exterior derivative of some other differential form is called exact. A differential form whose exterior derivative is zero is called closed. The statement d(d\omega)=0 can also be expressed as follows:

All exact forms are closed.

However, not all closed forms are exact. This is reminiscent of the discussion in Homology and Cohomology, and in fact the study of closed forms which are not exact leads to the theory of de Rham cohomology, which is a very important part of modern mathematics and mathematical physics.

Given the idea of the exterior derivative, the general form of the fundamental theorem of calculus is now given by the generalized Stokes’ theorem (sometimes simply called the Stokes’ theorem; historically however, as alluded to earlier, the original Stokes’ theorem only refers to a special case in three dimensions):

\displaystyle \int_{M}d\omega=\int_{\partial M}\omega

This is the idea we alluded to earlier, relating the integral of a differential form (which includes functions as 0-forms) over some “shape” to the integral of the exterior derivative of the differential form over the boundary of that “shape”.

There is much more to the theory of differential forms than we have discussed here. For example, although we have referred to these “shapes” as manifolds with boundary, more generally they are “chains” (see also Homology and Cohomology – the similarities are not coincidental!). There are restrictions on these chains in order for the integral to give a function; for example, an n-form must be integrated over an n-dimensional chain (or simply n-chain) to give a function, otherwise they will give some other differential form. An m-form integrated over an n-chain gives an m-n form. Also, more rigorously the concept of integration on more complicated spaces involves the notion of “pullback”. We will leave these concepts to the references for now, contenting ourselves with the discussion of the wedge product and exterior derivative in this post. The application of differential forms to physics is discussed in the very readable book Gauge Fields, Knots and Geometry by John Baez and Javier P. Muniain.

References:

Differential Forms on Wikipedia

Green’s Theorem on Wikipedia

Divergence Theorem on Wikipedia

Stokes’ Theorem on Wikipedia

De Rham Cohomology on Wikipedia

Calculus on Manifolds by Michael Spivak

Gauge Fields, Knots and Gravity by John Baez and Javier P. Muniain

Geometry, Topology, and Physics by Mikio Nakahara

Advertisements

Some Basics of Fourier Analysis

Why do we study sine and cosine waves so much? Most waves, like most water waves and most sound waves, do not resemble sine and cosine waves at all (we will henceforth refer to sine and cosine waves as sinusoidal waves).

Well, it turns out that while most waves are not sinusoidal waves, all of them are actually combinations of sinusoidal waves of different sizes and frequencies. Hence we can understand much about essentially any wave simply by studying sinusoidal waves. This idea that any wave is a combination of multiple sinusoidal waves is part of the branch of mathematics called Fourier analysis.

Here’s a suggestion for an experiment from the book Vibrations and Waves by A.P. French: If you speak into the strings of piano (I believe one of the pedals have to be held down first) the strings will vibrate, and since each string corresponds to a sine wave of a certain frequency, it will give you the breakdown of the sine wave components that make up your voice. If a string vibrates more strongly more than others it means there’s a bigger part of that in your voice, i.e. that sine wave component has a bigger amplitude.

More technically, we can express these concepts in the following manner. Let f(x) be a function that is integrable over some interval from x_{0} to x_{0}+P (for a wave, we can take P to be the “period” over which the wave repeats itself). Then over this interval the function can be expressed as the sum of sine and cosine waves of different sizes and frequencies, as follows:

\displaystyle f(x)=\frac{a_{0}}{2}+\sum_{n=1}^{\infty}\bigg(a_{n}\text{cos}\bigg(\frac{2\pi nx}{P}\bigg)+b_{n}\text{sin}\bigg(\frac{2\pi nx}{P}\bigg)\bigg)

This expression is called the Fourier series expansion of the function f(x). The coefficient \frac{a_{0}}{2} is the “level” around which the waves oscillate; the other coefficients a_{n} and b_{n} refer to the amplitude, or the “size” of the respective waves whose frequencies are equal to n. Of course, the bigger the frequency, the “faster” these waves oscillate.

Now given a function f(x) that satisfies the condition given earlier, how do we know what sine and cosine waves make it up? For this we must know what the coefficients a_{n} and b_{n} are.

In order to solve for a_{n} and b_{n}, we will make use of the property of the sine and cosine functions called orthogonality (the rest of the post will make heavy use of the language of calculus, therefore the reader might want to look at An Intuitive Introduction to Calculus):

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{cos}(nx)dx=0    if m\neq n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{cos}(nx)dx=1    if m=n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{sin}(mx)\text{sin}(nx)dx=0    if m\neq n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{sin}(mx)\text{sin}( nx)dx=1    if m=n

\displaystyle \frac{1}{\pi}\int_{x_{0}}^{x_{0}+2\pi}\text{cos}(mx)\text{sin}(nx)dx=0    for all m,n

What this means is that when a sine or cosine function is not properly “paired” then its integral over an interval equal to its period will always be zero. It will only give a nonzero value if it is properly paired, and we can “rescale” this value to make it equal to 1.

Now we can look at the following expression:

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx

Knowing that the function f(x) has a Fourier series expansion as above, we now have

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}\bigg(\frac{a_{0}}{2}+\sum_{n=1}^{\infty}\bigg(a_{n}\text{cos}\bigg(\frac{2\pi nx}{P}\bigg)+b_{n}\text{sin}\bigg(\frac{2\pi nx}{P}\bigg)\bigg)\text{cos}(\frac{2\pi x}{P})dx.

But we know that integrals involving the cosine function will always be zero unless it is properly paired; therefore it will be zero for all terms of the infinite series except for one, in which case it will yield (the constants are all there to properly scale the result)

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}\bigg(a_{1}\text{cos}\bigg(\frac{2\pi x}{P}\bigg)\text{cos}(\frac{2\pi x}{P})dx

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi x}{P})dx=a_{1}.

We have therefore used the orthogonality property of the cosine function to “filter” a single frequency component out of the many that make up our function.

Next we might use \text{cos}(\frac{4\pi x}{P}) instead of \text{cos}(\frac{2\pi x}{P}). This will give us

\displaystyle \frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{4\pi x}{P})dx=a_{2}.

We can continue to the procedure to solve for the coefficients a_{3}, a_{4}, and so on, and we can replace the cosine function by the sine function to solve for the coefficients b_{1}, b_{2}, and so on. Of course, the coefficient a_{0} can also be obtained by using \text{cos}(0)=1.

In summary, we can solve for the coefficients using the following formulas:

\displaystyle a_{n}=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{cos}(\frac{2\pi nx}{P})dx

\displaystyle b_{n}=\frac{2}{P}\int_{x_{0}}^{x_{0}+P}f(x)\text{sin}(\frac{2\pi nx}{P})dx

Now that we have shown how a function can be “broken down” or “decomposed” into a (possibly infinite) sum of sine and cosine waves of different amplitudes and frequencies, we now revisit the relationship between the sine and cosine functions and the exponential function (see “The Most Important Function in Mathematics”) in order to give us yet another expression for the Fourier series. We recall that, combining the concepts of the exponential function and complex numbers we have the beautiful and important equation

\displaystyle e^{ix}=\text{cos}(x)+i\text{sin}(x)

which can also be expressed in the following forms:

\displaystyle \text{cos}(x)=\frac{e^{ix}+e^{-ix}}{2}

\displaystyle \text{sin}(x)=\frac{e^{ix}-e^{-ix}}{2i}.

Using these expressions, we can rewrite the Fourier series of a function in a more “shorthand” form:

\displaystyle f(x)=\sum_{n=-\infty}^{\infty}c_{n}e^{\frac{2\pi i nx}{P}}

where

\displaystyle c_{n}=\frac{1}{P}\int_{x_{0}}^{x_{0}+P}f(x)e^{-\frac{2\pi i nx}{P}}dx.

Finally, we discuss more concepts related to the process we used in solving for the coefficients a_{n}, b_{n}, and c_{n}. As we have already discussed, these coefficients express “how much” of the waves with frequency equal to n are in the function f(x). We can now abstract this idea to define the Fourier transform \hat{f}(k) of a function f(x) as follows:

\displaystyle \hat{f}(k)=\int_{-\infty}^{\infty}f(x)e^{-2\pi i kx}dx

There are of course versions of the Fourier transform that use the sine and cosine functions instead of the exponential function, but the form written above is more common in the literature. Roughly, the Fourier transform \hat{f}(k) also expresses “how much” of the waves with frequency equal to k are in the function f(x). The difference lies in the interval over which we are integrating; however, we may consider the formula for obtaining the coefficients of the Fourier series as taking the Fourier transform of a single cycle of a periodic function, with its value set to 0 outside of the interval occupied by the cycle, and with variables appropriately rescaled.

The Fourier transform has an “inverse”, which allows us to recover f(x) from \hat{f}(k):

\displaystyle f(x)=\int_{-\infty}^{\infty}\hat{f}(k)e^{2\pi i kx}dk.

Fourier analysis, aside from being an interesting subject in itself, has many applications not only in other branches of mathematics and also in the natural sciences and in engineering. For example, in physics, the Heisenberg uncertainty principle of quantum mechanics (see More Quantum Mechanics: Wavefunctions and Operators) comes from the result in Fourier analysis that the more a function is “localized” around a small area, the more its Fourier transform will be spread out over all of space, and vice-versa. Since the probability amplitudes for the position and the momentum are related to each other as the Fourier transform and inverse Fourier transform of each other (a result of the de Broglie relations), this manifests in the famous principle that the more we know about the position, the less we know about the momentum, and vice-versa.

Fourier analysis can even be used to explain the distinctive “distorted” sound of electric guitars in rock and heavy metal music. Usually, plucking a guitar string produces a sound wave which is sinusoidal. For electric guitars, the sound is amplified using transistors; however, there is a limit to how much amplification can be done, and at a certain point (technically, this is when the transistor is operating outside of the “linear region”), the sound wave looks like a sine function with its peaks and troughs “clipped”. In Fourier analysis this corresponds to an addition of higher-frequency components, and this results in the distinctive sound of that genre of music.

Yet another application of Fourier analysis, and in fact its original application, is the study of differential equations. The mathematician Joseph Fourier, after whom Fourier analysis is named, developed the techniques we have discussed in this post in order to study the differential equation expressing the flow of heat in a material. It so happens that difficult calculations, for example differentiation, involving a function correspond to easier ones, such as simple multiplication, involving its Fourier transform. Therefore it is a common technique to convert a difficult problem to a simple one using the Fourier transform, and after the problem has been solved, we use the inverse Fourier transform to get the solution to the original problem.

Despite the crude simplifications we have assumed in order to discuss Fourier analysis in this post, the reader should know that it remains a deep and interesting subject in modern mathematics. A more general and more advanced form of the subject is called harmonic analysis, and it is one of the areas where there is much research, both on its own, and in connection to other subjects.

References:

Fourier Analysis on Wikipedia

Fourier Series on Wikipedia

Fourier Transform on Wikipedia

Harmonic Analysis on Wikipedia

Vibrations and Waves by A.P. French

Fourier Analysis: An Introduction by Elias M. Stein and Rami Shakarchi

Eigenvalues and Eigenvectors

Given a vector (see Vector Spaces, Modules, and Linear Algebra), we have seen that one of the things we can do to it is to “scale” it (in fact, it is one of the defining properties of a vector). We can also use a matrix (see Matrices) to scale vectors. Consider, for example, the matrix

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right).

Applying this matrix to any vector “doubles” the magnitude of the vector:

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}1\\ 0\end{array}\right)=\left(\begin{array}{c}2\\ 0\end{array}\right)=2\left(\begin{array}{c}1\\ 0\end{array}\right)

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}0\\ 5\end{array}\right)=\left(\begin{array}{c}0\\ 10\end{array}\right)=2\left(\begin{array}{c}0\\ 5\end{array}\right)

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}-2\\ 3\end{array}\right)=\left(\begin{array}{c}-4\\ 6\end{array}\right)=2\left(\begin{array}{c}-2\\ 3\end{array}\right)

This is applicable to any vector except, of course, the zero vector, which cannot be scaled and is therefore excluded in our discussion in this post.

The interesting case, however, is when the matrix “scales” only a few special vectors. Consider for example, the matrix

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right).

Applying it to the vector

\displaystyle \left(\begin{array}{c}1\\ 0\end{array}\right)

gives us

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}1\\ 0\end{array}\right)=\left(\begin{array}{c}2\\ 1\end{array}\right).

This is, of course, not an example of “scaling”. However, for the vector

\displaystyle \left(\begin{array}{c}1\\ 1\end{array}\right)

we get

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}1\\ 1\end{array}\right)=\left(\begin{array}{c}3\\ 3\end{array}\right).

This is a scaling, since

\left(\begin{array}{c}3\\ 3\end{array}\right)=3\left(\begin{array}{c}1\\ 1\end{array}\right).

The same holds true for the vector

\displaystyle \left(\begin{array}{c}-1\\ 1\end{array}\right)

from which we obtain

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}-1\\ 1\end{array}\right)=\left(\begin{array}{c}-1\\ 1\end{array}\right)

which is also a “scaling” by a factor of 1. Finally, this also holds true for scalar multiples of the two vectors we have enumerated. These vectors, the only “special” ones that are scaled by our linear transformation (represented by our matrix), are called the eigenvectors of the linear transformation, and the factors by which they are scaled are called the eigenvalues of the eigenvectors.

So far we have focused on finite-dimensional vector spaces, which give us a lot of convenience; for instance, we can express finite-dimensional vectors as column matrices. But there are also infinite-dimensional vector spaces; recall that the conditions for a set to be a vector space are that its elements can be added or subtracted, and scaled. An example of an infinite-dimensional vector space is the set of all continuous real-valued functions of the real numbers (with the real numbers serving as the field of scalars).

Given two continuous real-valued functions of the real numbers f and g, the functions f+g and f-g are also continuous real-valued functions of the real numbers, and the same is true for af, for any real number a. Thus we can see that the set of continuous real-valued functions of the real numbers form a vector space.

Matrices are not usually used to express linear transformations when it comes to infinite-dimensional vector spaces, but we still retain the concept of eigenvalues and eigenvectors. Note that a linear transformation is a function f from a vector space to another (possibly itself) which satisfies the conditions f(u+v)=f(u)+f(v) and f(av)=af(v).

Since our vector spaces in the infinite-dimensional case may be composed of functions, we may think of linear transformations as “functions from functions to functions” that satisfy the conditions earlier stated.

Consider the “operation” of taking the derivative (see An Intuitive Introduction to Calculus). The rules of calculus concerning derivatives (which can be derived from the basic definition of the derivative) state that we must we have

\displaystyle \frac{d(f+g)}{dx}=\frac{df}{dx}+\frac{dg}{dx}

and

\displaystyle \frac{d(af)}{dx}=a\frac{df}{dx}

where a is a constant. This holds true for “higher-order” derivatives as well. This means that the “derivative operator” \frac{d}{dx} is an example of a linear transformation from an infinite-dimensional vector space to another (note that the functions that comprise our vector space must be “differentiable”, and that the derivatives of our functions must possess the same defining properties we required for our vector space).

We now show an example of eigenvalues and eigenvectors in the context of infinite-dimensional vector spaces. Let our linear transformation be

\displaystyle \frac{d^{2}}{dx^{2}}

which stands for the “operation” of taking the second derivative with respect to x. We state again some of the rules of calculus pertaining to the derivatives of trigonometric functions (once again, they can be derived from the basic definitions, which is a fruitful exercise, or they can be looked up in tables):

\displaystyle \frac{d(\text{sin}(x))}{dx}=\text{cos}(x)

\displaystyle \frac{d(\text{cos}(x))}{dx}=-\text{sin}(x)

which means that

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=\frac{d(\frac{d(\text{sin}(x))}{dx})}{dx}

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=\frac{d(\text{cos}(x))}{dx}

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=-\text{sin}(x)

we can see now that the function \text{sin}(x) is an eigenvector of the linear transformation \frac{d^{2}}{dx^{2}}, with eigenvalue equal to -1.

Eigenvalues and eigenvectors play many important roles in linear algebra (and its infinite-dimensional version, which is called functional analysis). We will mention here something we have left off of our discussion in Some Basics of Quantum Mechanics. In quantum mechanics, “observables”, like the position, momentum, or energy of a system, correspond to certain kinds of linear transformations whose eigenvalues are real numbers (note that our field of scalars in quantum mechanics is the field of complex numbers \mathbb{C}. These eigenvalues correspond to the only values that we can obtain after measurement; we cannot measure values that are not eigenvalues.

References:

Eigenvalues and Eigenvectors on Wikipedia

Observable on Wikipedia

Linear Algebra Done Right by Sheldon Axler

Algebra by Michael Artin

Calculus by James Stewart

Introductory Functional Analysis with Applications by Erwin Kreyszig

Introduction to Quantum Mechanics by David J. Griffiths

“The Most Important Function in Mathematics”

The title is in quotation marks because it comes from the book Real and Complex Analysis by Walter Rudin. One should always be cautious around superlative statements of course, not just in mathematics but also in life, but in this case I think the wording is not without good reason. Rudin is referring to  the function

\displaystyle e^{x}

which may be thought of as the constant

\displaystyle e=2.71828182845904523536...

raised to the power of the argument x. However, there is an even better definition, which is the one Rudin gives in the book:

\displaystyle e^{x}=1+x+\frac{x^{2}}{2}+\frac{x^{3}}{6}+\frac{x^{4}}{24}+\frac{x^{5}}{120}+...

written in more compact notation, this is

\displaystyle e^{x}=\sum_{n=0}^{\infty}\frac{x^{n}}{n!}

It can be shown that these two notions coincide. To emphasize its role as a function, e^{x} is also often written as \text{exp}(x). It is also known as the exponential function.

We now explore some properties of e^{x}. Let us start off with the case where x is a real number. The function e^{x} can be used to express any other function of the form

a^{x}

where a is some nonzero real constant not necessarily equal to e. By letting

y=x\text{ ln }a

where \text{ln a} is the logarithm of a to base e (also known as the natural logarithm of a), we will then have

a^{x}=e^{y}

In other words, any function of the form a^{x} where a is some nonzero real constant can be expressed in the form e^{y} simply by “rescaling” the argument by a constant.

For x a real number, the function e^{x}, which, as we have seen encompasses the other cases a^{x} where a is any nonzero real constant, is often used to express “growth” and “decay”. For instance, if we have a population A which doubles every year, then after x years we will have a population of 2^{x}A, which, using what we have discussed earlier, can also be expressed as Ae^{x\text{ ln }2}. If the population gets cut into half instead of doubling every year, we would then write it as Ae^{-x\text{ ln 2}}.

But the truly amazing stuff happens when the argument of the exponential function is a complex number. Let us start with the case where it is purely imaginary. In this case we have the following very important equation known as Euler’s formula:

e^{ix}=\text{cos }x+i\text{ sin }x

which for x=\pi gives the result

e^{i\pi}=-1

e^{i\pi}+1=0

The second equation, known as Euler’s identity, is often referred to by many as the most beautiful equation in mathematics, as it displays five of the most important mathematical constants in one equation: e, \pi, i, 0, and 1.

For more general complex numbers with nonzero real and imaginary parts, we can use the rule for exponents

e^{x+iy}=e^{x}e^{iy}=e^{x}(\text{cos }y+i\text{ sin }y)

and treat them separately. The sine and cosine functions, aside from their original significance in trigonometry, are also used to represent oscillatory or periodic behavior. They are therefore useful in analyzing waves, which are a very important subject in both science and engineering. Equations such as the one above, combining growth and decay and oscillations, are used, for example, in designing shock absorbers in vehicles, which consist of a spring (which “fights back” the movement and oscillates) and a damper (which makes the movement “decay”).

There are certain technicalities regarding “multi-valuedness” that one must be wary of when dealing with complex arguments, but we will not discuss them for the time being (references are provided at the end of the post). Instead we will discuss a couple more properties of the exponential function.

First, we have the following expression for the exponential function as a product:

\displaystyle e^{x}\approx \bigg(1+\frac{x}{n}\bigg)^{n} where n is very big (which also means that \frac{x}{n} is very small)

Using the language of limits in calculus, we can actually write

\displaystyle e^{x}=\lim_{n\to\infty} \bigg(1+\frac{x}{n}\bigg)^{n} where n is very big (which also means that \frac{x}{n} is very small)

Historically, this is the motivation for the development of the exponential function, and the constant e. Suppose that somewhere there is this greedy loan shark who loans someone an amount of one million dollars, at a rate of 100% interest per year. So our loan shark finds out that he can make more money by “compounding” the interest at the middle of the year, computing for 50% interest and adding it to the money owed to him, and then computing 50% of that amount again at the end of the year (reasoning that “technically” it is still 100% interest per year). So instead of one million dollars, he would be owed 1.5 million dollars by the second half of the year, and by the end of the year he would be owed 1.5 million dollars plus half of that, which is 0.75 million dollars, which makes for a total of 2.25 million dollars, much bigger than the 2 million he would have been owed without “compounding.”

So the greedy loan shark computes further and discovers that he can make even more money by compounding further; perhaps he can compound every quarter, computing for 25% interest after the first three months, adding it to the amount owed to him, after three months he again computes for 25% interest, and so on. He could make even compound a hundred times in the year, with 1% added every time he compounds the interest. So in his infinite greediness, let’s say our loan shark compounds an infinite number of times, in infinitely small amounts. What would be the amount owed to him at the end of the year?

It turns out, no matter how many times he compounds the interest, the money owed to him will never be greater than $2,718,281.83, or roughly “e” million dollars, although it will approach that amount if he compounds it enough times. This quantity can be computed using the techniques of calculus, and in equation form it is exactly the expression for e^{x} as a product that we have written above.

Finally, we give an important property of e^{x} once again related to calculus, in particular to differential equations. We have discussed the notion of a derivative in An Intuitive Introduction to Calculus. An important property of the exponential function e^{x} is that its derivative is equal to itself. In other words, if f(x)=e^{x}, then

\frac{df}{dx}=f(x)

This property plays a very important part in the study of differential equations. As we have seen in My Favorite Equation in Physics, differential equations permeate even the most basic aspects of science and engineering. The special property of the exponential function related to differential equations means that it appears in many laws of physics (as well as other “laws” unrelated to physics), and therefore its study is important to understanding these subjects as well.

References:

Exponential Function on Wikipedia

Exponentiation on Wikipedia

Euler’s Formula on Wikipedia

Euler’s Identity on Wikipedia

Introduction to Analysis of the Infinite by Leonhard Euler (translated by Ian Bruce)

Calculus by James Stewart

Real and Complex Analysis by Walter Rudin

An Intuitive Introduction to Calculus

In this post, we dial back a bit on the mathematical sophistication and take on a subject that should be familiar to most people in STEM (Science, Technology, Engineering and Mathematics) fields but largely possessing a scary reputation to those with less mathematical training – calculus. For that matter, there are even those who can wield calculus as a tool for producing numbers, but with little appreciation for the ideas at work behind it.

This post will be dependent largely on intuition as opposed to rigor; references will be provided at the end of the post for those who want a look at the inner workings behind calculus. However, we will not shy away from notation either, in the hopes that people who find the symbols intimidating will instead appreciate them as helpful tools expressing elegant ideas.

I. The Derivative

We will begin with a somewhat whimsical example. Consider a house, and suppose in this house the living room is twice as large (in area for example) as the dining room. Now suppose we hit the entire house with a “shrinking ray”, shrinking it down so that it can no longer be seen with the naked eye nor measured with our usual measuring instruments. So if someone asks for the size of the house now, we might as well say that it’s negligible. What about the size of the living room? Negligible as well. The size of the dining room? Also negligible.

However, as long as this “shrinking ray” works the way it does in the movies, we can ask how big the living room is compared to the dining room, and the answer is the same as it was before. It is twice as big, even though both are now of essentially negligible size.

It is this idea that lies at the heart of calculus; an expression built out of negligible quantities may turn out to not be negligible at all. In our example, the particular expression in question is a ratio of two negligible quantities. This will lead us to the notion of a derivative.

But first, we will consider another example, which at first glance will seem unrelated to our idea of negligible quantities. Suppose that we want to find out about the speed of a certain car as it travels a distance of 40 kilometers from City A to City B. Suppose the car has no speedometer, or that the speedometer is busted.  The only way we can figure out the speed is from its definition as the ratio of the distance traveled to the time of travel. Suppose the journey from City A to City B takes place in an hour. This would mean that the car is travelling at a rather leisurely (or slow) speed of 40 kilometers per hour.

But suppose we provide additional information about the trip. Let’s say that the first half of the distance between City A and City B took up 45 minutes, due to heavy traffic. Therefore the second half of the distance was traversed in only fifteen minutes. Then we can say that the car was actually travelling pretty slowly at around 27 kilometers per hour for the first half of the distance. For the second half of the distance, the car was actually travelling pretty fast at 80 kilometers per hour.

Let’s provide even more information about the trip. Let’s say that the last quarter of the distance took up ten minutes of the trip. So the car was traveling slower again, at 60 kilometers per hour. This means the third quarter of the distance, a distance of 10 kilometers, was traversed in only 5 minutes. In other words, the car was travelling quite fast at 120 kilometers per hour.

Although the car is travelling at a rather slow speed on the average, there is a part of the trip where the car travels pretty fast. We only learn about this if we look at parts of the trip instead of just averaging on the whole. And we learn more if we look at smaller and smaller parts – perhaps we can learn most if we look at parts so small, that they might as well be negligible. And so we make contact, once again, with the core idea that we are trying to discuss.

As may be familiar from high school physics, the average speed (we symbolize it here by v) is often written as the ratio between differences in the distances (symbolized here by x) and the times (symbolized here by t). In mathematics, a quantity which is a difference of two other quantities is often written using the symbol \Delta. Therefore, the formula for the velocity may be written as follows:

\displaystyle v=\frac{\Delta x}{\Delta t}

However, as we have demonstrated in our example above, we learn more by dividing the quantities into smaller and smaller parts.

When our quantities, in this particular example the differences of distances and times, are so small that they might as well be negligible, instead of using the symbols \Delta x and \Delta t we instead write dx and dt. Therefore we write

\displaystyle v=\frac{dx}{dt}

We review some of the concepts we have discussed. What is the value of dx? It’s essentially negligible; it’s not quite zero, but very close to it, that we can’t really say anything about it anymore. What about dt? Again, essentially negligible. But what about v? Despite it being a ratio of two “essentially negligible” quantities, v itself is not negligible!

As demonstrated earlier in our example, v will be different depending on which part of the journey we are considering. To make this more explicit, we can write

\displaystyle v=\frac{dx}{dt}|_{t=t_{0}}

or

\displaystyle v=\frac{dx}{dt} (t_{0})

to mean that we mean v at the specific instant of time t_{0}. Because we are looking at specific instants of time, we refer to this speed as the instantaneous speed. If we had a speedometer, we could simply take a reading from it at the specific instant of time t_{0}, and the reading would be the instantaneous speed. However, we assumed that we could not do this, therefore to figure out the instantaneous speed we need to take extremely small measurements of distance and extremely small measurements of time, somewhere around the  specific instant of time t_{0}, and take their ratio.

The problem, of course, is that the quantities are so small that they are essentially negligible. We may not be able to measure them, since they may be so much smaller than the limits of accuracy of our measuring instruments. So how could we get such a ratio, which is not negligible, from two essentially negligible quantites that we cannot even measure? Luckily, if we can express one quantity as a function of the other, we can have a way of calculating this ratio. We discuss this method next.

In mathematics, we often use the concept of functions to express how one quantity depends on another. In this case, given a particular form of a function f(x), we can obtain \frac{df}{dx} as another function of x. We need our function f(x) to be “continuous” so that we always know that df is extremely small whenever dx is extremely small.

Consider the quantity

\displaystyle \frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)}

This is a ratio of differences; the numerator is a difference, and so is the denominator; therefore we can also write this as

\displaystyle \frac{\Delta f}{\Delta x}

Now suppose the quantity \epsilon is extremely small. In this case, the denominator may be rewritten; instead of writing \Delta x, we write dx, since the difference between x+\epsilon and x is extremely small (in fact it is just \epsilon). As for the numerator, if the function is continuous as described above, then we know automatically that it is also extremely small, and we may therefore also write df instead of \Delta f. Therefore, we have

\displaystyle \frac{df}{dx}=\frac{f(x+\epsilon)-f(x)}{(x+\epsilon)-(x)} when \epsilon is extremely small (essentially negligible)

Let’s see an explicit example of this in action. Suppose we have f(x)=x^{2}. Then we have

\displaystyle \frac{\Delta f}{\Delta x}=\frac{(x+\epsilon)^{2}-x^{2}}{(x+\epsilon)-(x)}

Using some high school algebra, we can expand and simplify the right hand side:

\displaystyle \frac{\Delta f}{\Delta x}=\frac{x^{2}+2x\epsilon+\epsilon^{2}-x^{2}}{x+\epsilon-x}

\displaystyle \frac{\Delta f}{\Delta x}=\frac{2x\epsilon+\epsilon^{2}}{\epsilon}

\displaystyle \frac{\Delta f}{\Delta x}=\frac{\epsilon(2x+\epsilon)}{\epsilon}

\displaystyle \frac{\Delta f}{\Delta x}=\frac{(2x+\epsilon)}{1}

\displaystyle \frac{\Delta f}{\Delta x}=2x+\epsilon

Now to obtain \frac{df}{dx}, we just need \epsilon to be extremely small, essentially negligible; in any case, it should be much smaller than any other term in the expression that we might as well chalk it up to measurement error, like the difference between the weight of a sack of flour and the weight of a sack of flour with one extra grain of flour. In other words, 2x+\epsilon and 2x are essentially the same; and \epsilon is so small that we might as well set it to zero. Therefore we have

\displaystyle \frac{df}{dx}=2x

Note that df by itself is essentially negligible; in this case it is equal to 2x\epsilon+\epsilon^{2}, and since \epsilon is extremely small, the entire expression itself is also extremely small and essentially negligible. Of course, dx is just \epsilon, and is also extremely small and essentially negligible. But in accordance with our “important idea” stated earlier, the ratio \frac{df}{dx}, called the derivative of f with respect to x, is not neglible. The derivative of  f with respect to x is also often written f'(x).

So going back to our example of the car going from City A to City B, if we could have an expression that gives us the distance traveled by the car in terms of its time of travel, we could calculate the instantaneous speed at any time by taking the derivative of that expression.

If a certain quantity depends on several other quantities, for example, if it is a function f(x,y,z) of several independent variables x, y, and z, the derivative of f with respect to any one of the independent variables, suppose x, is called the partial derivative of f with respect to x, and written \frac{\partial f}{\partial x}, or sometimes \partial_{x} f.

II. The Integral

We next discuss another expression, aside from a ratio, that is made out of essentially negligible quantities but is not itself negligible. Consider the weight of a grain of flour; like stated in an earlier example, we often think of it as essentially negligible. The weight of a sack of flour, on the other hand, is certainly not often thought of as negligible. But the sack of flour itself is made up of grains of flour; therefore these “essentially” negligible things, when there are many enough of them, may combine into something that is not itself negligible.

We consider a summation of a certain number of terms, and we also consider an interval of real numbers from the real number a to the real number b. If we will sum two terms then we will divide this interval into two, if we will sum three terms we will divide this interval into three, and so on. Consider now the summation of five terms

f(b)(b-x_{4})+f(x_{4})(x_{4}-x_{3})+f(x_{3})(x_{3}-x_{2})+f(x_{2})(x_{2}-x_{1})+f(x_{1})(x_{1}-a).

where f(x) is a function of real numbers, defined on the interval from a to b and x_{1}, x_{2}, x_{3}, andx_{4} are quantities between a and b dividing the interval from a to b into five equal parts. If we have, for example a=0 and b=100, then x_{1}=20x_{2}=40x_{3}=60, and x_{4}=80.

For further simplification we can also write x_{5}=b and x_{0}=a. We can then write the same sum as

f(x_{5})(x_{5}-x_{4})+f(x_{4})(x_{4}-x_{3})+f(x_{3})(x_{3}-x_{2})+f(x_{2})(x_{2}-x_{1})+f(x_{1})(x_{1}-x_{0}).

This can be expressed in more compact notation as

\displaystyle \sum_{i=1}^{5} f(x_{i})(x_{i}-x_{i-1})

or, to keep it general, we divide the interval between a and b into n subdivisions where n is any positive integer instead of just five, so that we have

\displaystyle \sum_{i=1}^{n} f(x_{i})(x_{i}-x_{i-1})

Note that as the number n of subdivisions of the interval between a and b increases, the quantity (x_{i}-x_{i-1}) decreases. We consider another sum, namely

\displaystyle \sum_{i=1}^{n} f(x_{i-1})(x_{i}-x_{i-1})

This is different from the other sum above. However, we note that as we increase the number of subdivisions, the quantity (x_{i}-x_{i-1}) decreases, and the quantities x_{i} and  x_{i-1} become essentially equal to each other. If our function f(x) is “continuous”, then f(x_{i}) and f(x_{i-1}) become essentially equal to each other too. When we reach so many subdivisions that the quantity (x_{i}-x_{i-1}) becomes extremely small or essentially negligible, and the quantities f(x_{i}) and f(x_{i-1}) become essentially equal to each other, we write

\displaystyle \int_{a}^{b}f(x)dx=\sum_{i=1}^{n} f(x_{i})(x_{i}-x_{i-1})=\sum_{i=1}^{n} f(x_{i-1})(x_{i}-x_{i-1})

The summation \int_{a}^{b}f(x)dx is called the integral of f(x) from a to b.

The integral is a sum of terms of the form f(x)dx, which is the product of f(x) multiplied by dx. In fact, the integral symbol itself \int, is simply an old version of the letter “s”, to show that it stands for sum, in the same way that the letter “d” in dx represents a very small difference.

If we think of dx as a very small “width” and f(x) as some sort of “height”, then f(x)dx is some kind of very small “area” of a very “thin” rectangle, and the integral \int_{a}^{b}f(x)dx gives the total area under the curve f(x) from a to b, taken by dividing this area into very “thin” rectangles and adding up the areas of each of these rectangles.

But there are also other quantities of the form f(x)dx. For example, if we think of f(x)dx as the probability that a certain quantity, say, the height of a random person one may meet on the street, has a value very close to x, then the integral \int_{a}^{b}f(x)dx gives us the total probability that this quantity has a value between a and b.

III. The Fundamental Theorem of Calculus

Unlike the derivative, the integral is actually rather difficult to compute. It is a sum of very many terms, each of which is a product of one quantity with an essentially negligible quantity. However, there is a shortcut to computing the integral, which involves the derivative, and the discovery of this “shortcut” is one of the greatest achievements in the history of mathematics.

This “shortcut” which relates the integral and the derivative is so important and so monumental that it is called the fundamental theorem of calculus, and its statement is as follows:

\displaystyle \int_{a}^{b}\frac{df}{dx}dx=f(b)-f(a)

The notation itself is already very suggestive as to the intuition between this theorem. It is also perhaps worth noting that the integral is a sum of products, while the derivative is a ratio of differences. The function f(x) is defined in the interval from a to b, so if we sum all the tiny differences df from f(a) to f(b) we would get f(b)-f(a). Of course the rigorous proof of this theorem is much more involved than this “plausibility argument” that we have presented here.

In practice what this means if we want to solve for the integral of a function we only need to “reverse” what we did to solve for the derivative. This is why the integral (more precisely the “indefinite” integral) is also sometimes called the antiderivative. Earlier we solved for the derivative of the function f(x)=x^{2} with respect to x and obtained \frac{df}{dx}=2x. If we now ask what is the integral of 2x from a to b, we will find, using the fundamental theorem of calculus, that \int_{a}^{b}2xdx=b^{2}-a^{2}.

We now summarize, in an effort to show that there’s nothing to be scared of when it comes to derivatives and integrals:

\displaystyle \frac{df}{dx} is a ratio of extremely small differences \displaystyle df and \displaystyle dx

\displaystyle \int_{a}^{b}f(x)dx is a sum of the products of \displaystyle f(x) and the extremely small differences \displaystyle dx

That being said, despite the rather simple intuition we have laid out here, making the language of calculus “precise”, or rather “rigorous”, is a much bigger task that has taken centuries of development, and even after mathematicians have agreed on how to develop calculus it has been “resurrected” time and time again to come up with even more powerful versions of calculus. For example, there is a version of the integral called the Lebesgue integral, in which instead of dx we use a quantity called the “measure” which may not necessarily be extremely small. Still, this integral is still a sum of products, and the concept of extremely small and essentially negligible quantities still shows up, since there will be parts where the measure will become extremely small and essentially negligible.

As for dealing with the extremely small and essentially negligible, this is made more precise using the language of limits. In older times, another concept was used, called infinitesimals, however there were so many questions from philosophers that the concept was basically abandoned in favor of the language of limits. The concept of infinitesimals itself is the subject of much research in modern times, since it was found that along with modern developments in mathematics they can still become useful and now made more precise. The modern study of calculus and its related subjects goes by the much simpler-sounding name of analysis.

Finally, even though we attempted an answer to the question “What is calculus?” we did not really explain how to do calculus or how to apply it, although there are tons and tons of applications of calculus in the modern world.For all these we will direct the reader to the references at the end of the post, which will help those who want to learn more about the subject.

In my opinion however, the best way to learn calculus is to master first the language of limits, or more generally learn how to deal with extremely small and essentially negligible quantities, without necessarily going to derivatives and integrals, since derivatives and integrals themselves are merely specific applications of the philosophy, once more recalled here, that an expression built out of negligible quantities may turn out to not be negligible at all. And it is best to learn how to deal with this in the most general case. In some schools this makes up the subject of precalculus. I would recommend very much the classic book of one of the greatest mathematicians of all time, Leonhard Euler, entitled Introduction to Analysis of the Infinite, of which translations abound on the internet.

References:

Calculus on Wikipedia

Introduction to Analysis of the Infinite by Leonhard Euler (translated by Ian Bruce)

Calculus by James Stewart

Principles of Mathematical Analysis by Walter Rudin