Eigenvalues and Eigenvectors

Given a vector (see Vector Spaces, Modules, and Linear Algebra), we have seen that one of the things we can do to it is to “scale” it (in fact, it is one of the defining properties of a vector). We can also use a matrix (see Matrices) to scale vectors. Consider, for example, the matrix

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right).

Applying this matrix to any vector “doubles” the magnitude of the vector:

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}1\\ 0\end{array}\right)=\left(\begin{array}{c}2\\ 0\end{array}\right)=2\left(\begin{array}{c}1\\ 0\end{array}\right)

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}0\\ 5\end{array}\right)=\left(\begin{array}{c}0\\ 10\end{array}\right)=2\left(\begin{array}{c}0\\ 5\end{array}\right)

\displaystyle \left(\begin{array}{cc}2&0\\ 0&2\end{array}\right)\left(\begin{array}{c}-2\\ 3\end{array}\right)=\left(\begin{array}{c}-4\\ 6\end{array}\right)=2\left(\begin{array}{c}-2\\ 3\end{array}\right)

This is applicable to any vector except, of course, the zero vector, which cannot be scaled and is therefore excluded in our discussion in this post.

The interesting case, however, is when the matrix “scales” only a few special vectors. Consider for example, the matrix

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right).

Applying it to the vector

\displaystyle \left(\begin{array}{c}1\\ 0\end{array}\right)

gives us

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}1\\ 0\end{array}\right)=\left(\begin{array}{c}2\\ 1\end{array}\right).

This is, of course, not an example of “scaling”. However, for the vector

\displaystyle \left(\begin{array}{c}1\\ 1\end{array}\right)

we get

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}1\\ 1\end{array}\right)=\left(\begin{array}{c}3\\ 3\end{array}\right).

This is a scaling, since

\left(\begin{array}{c}3\\ 3\end{array}\right)=3\left(\begin{array}{c}1\\ 1\end{array}\right).

The same holds true for the vector

\displaystyle \left(\begin{array}{c}-1\\ 1\end{array}\right)

from which we obtain

\displaystyle \left(\begin{array}{cc}2&1\\ 1&2\end{array}\right) \left(\begin{array}{c}-1\\ 1\end{array}\right)=\left(\begin{array}{c}-1\\ 1\end{array}\right)

which is also a “scaling” by a factor of 1. Finally, this also holds true for scalar multiples of the two vectors we have enumerated. These vectors, the only “special” ones that are scaled by our linear transformation (represented by our matrix), are called the eigenvectors of the linear transformation, and the factors by which they are scaled are called the eigenvalues of the eigenvectors.

So far we have focused on finite-dimensional vector spaces, which give us a lot of convenience; for instance, we can express finite-dimensional vectors as column matrices. But there are also infinite-dimensional vector spaces; recall that the conditions for a set to be a vector space are that its elements can be added or subtracted, and scaled. An example of an infinite-dimensional vector space is the set of all continuous real-valued functions of the real numbers (with the real numbers serving as the field of scalars).

Given two continuous real-valued functions of the real numbers f and g, the functions f+g and f-g are also continuous real-valued functions of the real numbers, and the same is true for af, for any real number a. Thus we can see that the set of continuous real-valued functions of the real numbers form a vector space.

Matrices are not usually used to express linear transformations when it comes to infinite-dimensional vector spaces, but we still retain the concept of eigenvalues and eigenvectors. Note that a linear transformation is a function f from a vector space to another (possibly itself) which satisfies the conditions f(u+v)=f(u)+f(v) and f(av)=af(v).

Since our vector spaces in the infinite-dimensional case may be composed of functions, we may think of linear transformations as “functions from functions to functions” that satisfy the conditions earlier stated.

Consider the “operation” of taking the derivative (see An Intuitive Introduction to Calculus). The rules of calculus concerning derivatives (which can be derived from the basic definition of the derivative) state that we must we have

\displaystyle \frac{d(f+g)}{dx}=\frac{df}{dx}+\frac{dg}{dx}

and

\displaystyle \frac{d(af)}{dx}=a\frac{df}{dx}

where a is a constant. This holds true for “higher-order” derivatives as well. This means that the “derivative operator” \frac{d}{dx} is an example of a linear transformation from an infinite-dimensional vector space to another (note that the functions that comprise our vector space must be “differentiable”, and that the derivatives of our functions must possess the same defining properties we required for our vector space).

We now show an example of eigenvalues and eigenvectors in the context of infinite-dimensional vector spaces. Let our linear transformation be

\displaystyle \frac{d^{2}}{dx^{2}}

which stands for the “operation” of taking the second derivative with respect to x. We state again some of the rules of calculus pertaining to the derivatives of trigonometric functions (once again, they can be derived from the basic definitions, which is a fruitful exercise, or they can be looked up in tables):

\displaystyle \frac{d(\text{sin}(x))}{dx}=\text{cos}(x)

\displaystyle \frac{d(\text{cos}(x))}{dx}=-\text{sin}(x)

which means that

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=\frac{d(\frac{d(\text{sin}(x))}{dx})}{dx}

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=\frac{d(\text{cos}(x))}{dx}

\displaystyle \frac{d^{2}(\text{sin}(x))}{dx^{2}}=-\text{sin}(x)

we can see now that the function \text{sin}(x) is an eigenvector of the linear transformation \frac{d^{2}}{dx^{2}}, with eigenvalue equal to -1.

Eigenvalues and eigenvectors play many important roles in linear algebra (and its infinite-dimensional version, which is called functional analysis). We will mention here something we have left off of our discussion in Some Basics of Quantum Mechanics. In quantum mechanics, “observables”, like the position, momentum, or energy of a system, correspond to certain kinds of linear transformations whose eigenvalues are real numbers (note that our field of scalars in quantum mechanics is the field of complex numbers \mathbb{C}. These eigenvalues correspond to the only values that we can obtain after measurement; we cannot measure values that are not eigenvalues.

References:

Eigenvalues and Eigenvectors on Wikipedia

Observable on Wikipedia

Linear Algebra Done Right by Sheldon Axler

Algebra by Michael Artin

Calculus by James Stewart

Introductory Functional Analysis with Applications by Erwin Kreyszig

Introduction to Quantum Mechanics by David J. Griffiths

The Hom and Tensor Functors

We discussed functors in Category Theory, and in this post we discuss certain functors important to the study of rings and modules. Moreover, we look at these functors and how they affect exact sequences, whose importance was discussed in Exact Sequences. Our discussion in this post will also be related to some things that we discussed in More on Chain Complexes.

If M and N are two modules whose ring of scalars is the ring R (we refer to M and N as R-modules), then we denote by \text{Hom}_{R}(M,N) the set of linear transformations (see Vector Spaces, Modules, and Linear Algebra) from M to N. It is worth noting that this set has an abelian group structure (see Groups).

We define the functor \text{Hom}_{R}(M,-) as the functor that assigns to an R-module N the abelian group \text{Hom}_{R}(M,N) of linear transformations from M to N. Similarly, the functor \text{Hom}_{R}(-,N) assigns to the R-module M the abelian group \text{Hom}_{R}(M,N) of linear transformations from M to N.

These functors \text{Hom}_{R}(M,-) and \text{Hom}_{R}(-,N), combined with the idea of exact sequences, give us new definitions of projective and injective modules, which are equivalent to the old ones we gave in More on Chain Complexes.

We say that a functor is an exact functor if preserves exact sequences. In the case of \text{Hom}_{R}(M,-), we say that it is exact if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow \text{Hom}_{R}(M,A)\rightarrow \text{Hom}_{R}(M,B)\rightarrow \text{Hom}_{R}(M,C)\rightarrow 0

is also exact. The concept of an exact sequence of sets of linear transformations of R-modules makes sense because of the abelian group structure on these sets. In this case we also say that the R-module M is projective.

Similarly, an R-module N is injective if the functor \text{Hom}_{R}(-,N) is exact, i.e. if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow \text{Hom}_{R}(A,N)\rightarrow \text{Hom}_{R}(B,N)\rightarrow \text{Hom}_{R}(C,N)\rightarrow 0

is also exact.

We introduce another functor, which we write M\otimes_{R}-. This functor assigns to an R-module N the tensor product (see More on Vector Spaces and Modules) M\otimes_{R}N. Similarly, we also have the functor -\otimes_{R}N, which assigns to an R-module M the tensor product M\otimes_{R}N. If our ring R is commutative, then there will be no distinction between the functors M\otimes_{R}- and -\otimes_{R}M. We will continue assuming that our rings are commutative (an example of a noncommutative ring is the ring of n\times n matrices).

We say that a module N is flat if the functor -\otimes_{R}N is exact, i.e. if for an exact sequence of modules

0\rightarrow A\rightarrow B\rightarrow C\rightarrow 0

the sequence

0\rightarrow A\otimes_{R}N\rightarrow B\otimes_{R}N\rightarrow C\otimes_{R}N\rightarrow 0

is also exact.

We make a little digression to introduce the concept of an algebra. The word “algebra” has a lot of meanings in mathematics, but in our context, as a mathematical object in the subject of abstract algebra and linear algebra, it means a set with both a ring and a module structure. More technically, for a ring A, an A-algebra is a ring B and a ring homomorphism f:A\rightarrow B, which makes B into an A-module via the following definition of the scalar multiplication:

ab=f(a)b for a\in A, b\in B.

The notion of an algebra will be useful in defining the notion of a flat morphism. A ring homomorphism f: A\rightarrow B is a flat morphism if the functor -\otimes_{A}B is exact. Since B is an A-algebra, and an A-algebra is also an A-module, this means that f: A\rightarrow B is a flat morphism if B is flat as an A-module. The notion of a flat morphism is important in algebraic geometry, where the “points” of schemes are given by the prime ideals of a ring, since it corresponds to a “continuous” family of schemes parametrized by the “points” of another scheme.

Finally, the functors \text{Hom}_{R}(M,-), \text{Hom}_{R}(-,N), and -\otimes_{R}N, which we will also refer to as the “Hom” and “Tensor” functors, can be used to define the derived functors “Ext” and “Tor”, to which we have given a passing mention in More on Chain Complexes. We now elaborate on these constructions.

The Ext functor, written \text{Ext}_{R}^{n}(M,N) for a fixed R-module M, is calculated by taking an injective resolution of B,

0\rightarrow N\rightarrow E^{0}\rightarrow E^{1}\rightarrow ...

then applying the functor \text{Hom}_{R}(M,-):

0 \rightarrow \text{Hom}_{R}(M,N)\rightarrow \text{Hom}_{R}(M,E^{0})\rightarrow \text{Hom}_{R}(M,E^{1})\rightarrow ...

we “remove” \text{Hom}_{R}(M,N) to obtain the chain complex

0 \rightarrow \text{Hom}_{R}(M,E^{0})\rightarrow \text{Hom}_{R}(M,E^{1})\rightarrow ...

Then \text{Ext}_{R}^{n}(M,N) is the n-th homology group (see Homology and Cohomology) of this chain complex.

Alternatively, we can also define the Ext functor \text{Ext}_{R}^{n}(M,N) for a fixed R-module N by taking a projective resolution of M,

...\rightarrow P_{1}\rightarrow P_{0}\rightarrow M\rightarrow 0

then then applying the functor \text{Hom}_{R}(-,N), which “dualizes” the chain complex:

0 \rightarrow \text{Hom}_{R}(M,N)\rightarrow \text{Hom}_{R}(P_{0},N)\rightarrow \text{Hom}_{R}(P_{1},N)\rightarrow ...

we again “remove” \text{Hom}_{R}(M,N) to obtain the chain complex

0 \rightarrow \text{Hom}_{R}(P_{0},N)\rightarrow \text{Hom}_{R}(P_{1},N)\rightarrow ...

and \text{Ext}_{R}^{n}(M,N) is once again given by the n-th homology group of this chain complex.

The Tor functor, meanwhile, written \text{Tor}_{n}^{R}(M,N) for a fixed R-module N, is calculated by taking a projective resolution of M and applying the functor -\otimes_{R}N, followed by “removing” M\otimes_{R}N:

0\rightarrow M\otimes_{R}P_{0}\rightarrow M\otimes_{R}P_{1}\rightarrow ...

\text{Tor}_{n}^{R}(M,N) is then given by the n-th homology group of this chain complex.

The Ext and Tor functors were originally developed to study the concepts of “extension” and “torsion” of groups in abstract algebra, hence the names, but they have since then found utility in many other subjects, in particular algebraic topology, algebraic geometry, and algebraic number theory. Our exposition here has been quite abstract; to find more motivation, aside from checking out the references listed below, the reader may also compare with the ordinary homology and cohomology theories in algebraic topology. Hopefully we will be able to flesh out more aspects of what we have discussed here in future posts.

References:

Hom Functor on Wikipedia

Tensor Product of Modules on Wikipedia

Flat Module on Wikipedia

Associative Algebra on Wikipedia

Derived Functor on Wikipedia

Ext Functor on Wikipedia

Tor Functor on Wikipedia

Abstract Algebra by David S. Dummit and Richard B. Foote

Commutative Algebra by M. F. Atiyah and I. G. MacDonald

An Introduction to Homological Algebra by Joseph J. Rotman

Rotating and Reflecting Vectors Using Matrices

In Vector Spaces, Modules, and Linear Algebra we learned about vectors, and defined them as elements of a set that is closed under addition and scalar multiplication. This is a pretty abstract concept, and in that post we used an example of “apples and oranges” to express it. However we also mentioned that many other things are vectors; for instance, states in quantum mechanics, and quantities with a magnitude and direction, such as forces. It is these quantities with a magnitude and direction that we will focus on in this post.

We will use the language that we developed in Matrices in order to make things more concrete. We will focus on two dimensions only in this post, in order to simplify things, although it will not be difficult to generalize to higher dimensions. We develop first a convention. The vector

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)

represents a quantity with magnitude “1” (meter, or meter per second, or Newton, etc.) going to the right (or east). Similarly, the vector

\displaystyle \left(\begin{array}{c}-1\\0\end{array}\right)

represents a quantity with magnitude 1 going to the left (or west). Meanwhile, the vector

\displaystyle \left(\begin{array}{c}0\\1\end{array}\right)

represents a quantity with magnitude 1 going upward (or to the north). Finally, the vector

\displaystyle \left(\begin{array}{c}0\\-1\end{array}\right)

represents a quantity with magnitude 1 going downward (or to the south). These vectors we have enumerated all have magnitude 1, therefore they are also called unit vectors. Since they are vectors, we can “scale” them or add or subtract them from each other to form new vectors. For example, we can “double” the upward-pointing unit vector,

\displaystyle 2\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}0\\2\end{array}\right)

to obtain a vector again pointing upward but with a magnitude of 2. We can also “add” the right-pointing unit vector to the upward-pointing unit vector, as follows:

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)+\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}1\\1\end{array}\right)

We can easily infer that this vector will point “diagonally” upward and to the right (or to the northwest). But what will be its magnitude? For this we introduce the concept of the transpose. The transpose of a matrix is just another matrix but with its rows and columns interchanged. For a column matrix, we have only one column, so its transpose is a matrix with only one row, as follows:

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)^{T}=\left(\begin{array}{cc}a&b\end{array}\right)

Now, to take the magnitude of a vector, we take the square root of the product of the transpose of a vector and the vector itself. Note that the multiplication of matrices is not commutative, so it is important that the row matrix be on the left and the column matrix (the vector) be on the right. It is the only way we will obtain an ordinary number from the matrices.

Applying the rules of matrix multiplication, we see that for a vector

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

the magnitude will be given by the square root of

\displaystyle \left(\begin{array}{cc}a&b\end{array}\right) \left(\begin{array}{c}a\\b\end{array}\right)=a^{2}+b^{2}

This should be reminiscent of the Pythagorean theorem. As we have already seen in From Pythagoras to Einstein, this ancient theorem always shows up in many aspects of modern mathematics and physics. Going back to our example of the vector

\displaystyle \left(\begin{array}{c}1\\1\end{array}\right)

we can now compute for its magnitude. Multiplying the transpose of this vector and the vector itself, in the proper order, we obtain

\displaystyle \left(\begin{array}{cc}1&1\end{array}\right) \left(\begin{array}{c}1\\1\end{array}\right)=1^{2}+1^{2}=2

and taking the square root of this number, we see that the magnitude of our vector is equal to \sqrt{2}.

In Matrices we mentioned that a square matrix may be used to describe linear transformations between vectors. Now that we have used the language of vectors to describe quantities with magnitude and direction, we also show a very special kind of linear transformation – one that sends a vector to another vector with the same value of the magnitude, but “rotated” or “reflected”, i.e. with a different direction. We may say that this linear transformation describes the “operation” of rotation or reflection. This analogy is the reason why linear transformations from a vector space to itself are also often referred to as linear operators, especially in quantum mechanics.

We make this idea clearer with an explicit example. Consider the matrix

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)

We look at its effect on some vectors:

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}0\\1\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}-1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}-1\\0\end{array}\right)=\left(\begin{array}{c}0\\-1\end{array}\right)

\displaystyle \left(\begin{array}{cc}0&-1\\ 1&0\end{array}\right)\left(\begin{array}{c}0\\-1\end{array}\right)=\left(\begin{array}{c}1\\0\end{array}\right)

From these basic examples one may infer that our matrix represents a counterclockwise “rotation” of ninety degrees. The reader is encouraged to visualize (or better yet draw) how this is so. In fact, we can express a counterclockwise rotation of any angle \theta using the matrix

\displaystyle \left(\begin{array}{cc}\text{cos }\theta&-\text{sin }\theta\\ \text{sin }\theta&\text{cos }\theta\end{array}\right)

We consider next another matrix, given by

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)

We likewise look at its effect on some vectors:

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}0\\-1\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}-1\\0\end{array}\right)=\left(\begin{array}{c}-1\\0\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}{c}0\\-1\end{array}\right)=\left(\begin{array}{c}0\\1\end{array}\right)

What we see now is that this matrix represents a “reflection” along the horizontal axis. Any reflection along a line specified by an angle of \frac{\theta}{2} is represented by the matrix

\displaystyle \left(\begin{array}{cc}\text{cos }\theta&\text{sin }\theta\\ \text{sin }\theta&-\text{cos }\theta\end{array}\right)

The matrices representing rotations and reflections form a group (see Groups) called the orthogonal group. Since we are only looking at rotations in the plane, i.e. in two dimensions, it is also more properly referred to as the orthogonal group in dimension 2, written \text{O}(2). The matrices representing rotations form a subgroup (a subset of a group that is itself also a group) of the orthogonal group in dimension 2, called the special orthogonal group in dimension 2 and written \text{SO}(2).

The reader is encouraged to review the concept of a group as discussed in Groups, but intuitively what this means is that by multiplying two matrices, for instance, representing counterclockwise rotations of angles \alpha and \beta, then we will get a matrix which represents a counterclockwise rotation of angle \alpha+\beta. In other words, we can “compose” rotations; and the composition is associative, possesses an “identity” (a rotation of zero degrees) and for every counterclockwise rotation of angle \theta there is an “inverse” (a clockwise rotation of angle \theta, which is also represented as a counterclockwise rotation of angle -\theta).

Explicitly,

\displaystyle \left(\begin{array}{cc}\text{cos }\alpha&-\text{sin }\alpha\\ \text{sin }\alpha&\text{cos }\alpha\end{array}\right)\left(\begin{array}{cc}\text{cos }\beta&-\text{sin }\beta\\ \text{sin }\beta&\text{cos }\beta\end{array}\right)=\left(\begin{array}{cc}\text{cos}(\alpha+\beta)&-\text{sin}(\alpha+\beta)\\ \text{sin}(\alpha+\beta)&\text{cos}(\alpha+\beta)\end{array}\right)

It can be a fun exercise to derive this equation using the laws of matrix multiplication and the addition formulas for the sine and cosine functions from basic trigonometry.

This is what it means for \text{SO}(2), the matrices representing rotations, to form a group. Reflections can also be considered in addition to rotations, and reflections and rotations can be composed with each other. This is what it means for \text{O}(2), the matrices representing rotations and reflections, to form a group. The matrices representing reflections alone do not form a group however, since the composition of two reflections is not a reflection, but a rotation.

Technically, the distinction between the matrices representing rotations and the matrices representing reflections can be seen by examining the determinant, which is a concept we will leave to the references for now.

It is worth repeating how we defined the orthogonal group \text{O}(2) technically – it is the group of matrices that preserve the magnitudes of vectors. This gives us some intuition as to why they are so special. There are other equivalent definitions of \text{O}(2). For example, they can also be defined as the matrices A which satisfy the equation

\displaystyle AA^{T}=A^{T}A=I

where the matrix A^{T} is the transpose of  the matrix A, which is given by interchanging the rows and the columns of A, as discussed earlier, and

\displaystyle I=\left(\begin{array}{cc}1&0\\ 0&1\end{array}\right)

is the “identity” matrix, which multiplied to any other matrix A (on either side) just gives back A. This may also be expressed by saying that the group \text{O}(2) is made up of the matrices whose transpose is also its inverse (and vice versa).

In summary, we have shown in this post one specific aspect of vector spaces and linear transformations between vector spaces, and “fleshed out” the rather skeletal framework of sets that are closed under addition and scalar multiplication, and functions that respect this structure. It is important to note of course, that the applications of vector spaces and linear transformations are by no means limited to describing quantities with magnitude and direction.

Another concept that we have “fleshed out” in this post is the concept of groups, which we have only treated rather abstractly in Groups. We have also been using the concept of groups in algebraic topology, in particular homotopy groups in Homotopy Theory and homology groups and cohomology groups in Homology and Cohomology, but it is perhaps the example of the orthogonal group, or even better the special orthogonal group, where we have intuitive and concrete examples of the concept. Rotations can be composed, the composition is associative, there exists an “identity”, and there exists an “inverse” for every element. The same holds for rotations and reflections together.

These two subjects that we have discussed in this post, namely linear algebra and group theory, are in fact closely related. The subject that studies these two subjects in relation to one another is called representation theory, and it is a very important part of modern mathematics.

References:

Orthogonal Matrix on Wikipedia

Orthogonal Group on Wikipedia

Algebra by Michael Artin

Matrices

We discussed linear algebra in Vector Spaces, Modules, and Linear Algebra, and there we focused on “finite-dimensional” vector spaces (the concept of dimension for vector spaces was discussed in More on Vector Spaces and Modules), writing vectors in the form

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

Vectors need not be written in this way, since the definition of the concept of vector space only required that it be a set closed under addition and scalar multiplication. For example, we could have just denoted vectors by v, or, in quantum mechanics, we use what we call “Dirac notation”, writing vectors as |\psi\rangle.

However, the notation that we used in Vector Spaces, Modules, and Linear Algebra is quite convenient; it allowed us to display explicitly the “components”; if we declare that our scalars, for example, be the set of real numbers \mathbb{R}, and that our vector space is the set of all vectors of the form

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)

where a,b\in \mathbb{R}, then we already know that we can use the following vectors for our basis:

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)

and

\displaystyle \left(\begin{array}{c}0\\1\end{array}\right)

since any vector can be expressed uniquely as a linear combination

\displaystyle \left(\begin{array}{c}a\\b\end{array}\right)=a\left(\begin{array}{c}1\\0\end{array}\right)+b\left(\begin{array}{c}0\\1\end{array}\right)

It is also quite easy to see that our vector space here has dimension 2. What we have done is express our vector as a matrix, more specifically a column matrix. A matrix is a rectangular array of numbers (which we refer to as its “entries”), with some specific properties as we will discuss later. If a matrix has m rows and n columns, we refer to it as an m\times n matrix. A matrix that has only one row is often referred to as a row matrix, and a matrix with only one column, as we have been using to express our vectors up to now, is referred to as a column matrix. A matrix with the same number of columns and rows is referred to as a square matrix.

Here are some examples of matrices (with real number entries):

\displaystyle \left(\begin{array}{cc}1&-0.25\\ 100&0\\2&-5\end{array}\right)        (3\times 2 matrix)

\displaystyle \left(\begin{array}{cc}1&0\\ 0&\frac{3}{2}\end{array}\right)        (2\times 2 square matrix)

\displaystyle \left(\begin{array}{cccc}1&27&-\frac{4}{5}&10\\ \end{array}\right)       (1\times 4 row matrix)

We will often adopt the notation that the entry in the first row and first column of a matrix A will be labeled by A_{1,1}, the entry in the second row and first column of the same matrix will be labeled A_{2,1}, and so on. Since we often denote vectors by v, we will denote its first component (the entry in the first row) by v_{1}, the second component by v_{2}, and so on.

We can perform operations on matrices. The set of m\times n matrices, for fixed m and n form a vector space, which means we can “scale” them or multiply them by a “scalar”, and we can also add or subtract them from each other. This is done so “componentwise”, i.e.

\displaystyle c\left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)=\left(\begin{array}{cc}cA_{1,1}&cA_{1,2}\\ cA_{2,1}&cA_{2,2}\end{array}\right)

\displaystyle \left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)+\left(\begin{array}{cc}B_{1,1}&B_{1,2}\\ B_{2,1}&B_{2,2}\end{array}\right)=\left(\begin{array}{cc}A_{1,1}+B_{1,1}&A_{1,2}+B_{1,2}\\ A_{2,1}+B_{2,1}&A_{2,2}+B_{2,2}\end{array}\right)

\displaystyle \left(\begin{array}{cc}A_{1,1}&A_{1,2}\\ A_{2,1}&A_{2,2}\end{array}\right)-\left(\begin{array}{cc}B_{1,1}&B_{1,2}\\ B_{2,1}&B_{2,2}\end{array}\right)=\left(\begin{array}{cc}A_{1,1}-B_{1,1}&A_{1,2}-B_{1,2}\\ A_{2,1}-B_{2,1}&A_{2,2}-B_{2,2}\end{array}\right)

Multiplication of matrices is more complicated. A j\times k matrix can be multiplied by a k\times l matrix to form a j\times l matrix. Note that the number of columns of the first matrix must be equal to the number of rows of the second matrix. Note also that multiplication of matrices is not commutative; a product AB of two matrices A and B may not be equal to the product BA of the same matrices, contrary to what we find in the multiplication of ordinary numbers.

The procedure to obtaining the entries of this product matrix is as follows: Let’s denote the product of the j\times k matrix A and the k\times l matrix B by AB (this is a j\times l matrix, as we have mentioned above) and let AB_{m,n} be its entry in the m-th row and n-th column. Then

\displaystyle AB_{m,n}=\sum_{i=1}^{k}A_{m,i}B_{i,n}

For example, we may have

\displaystyle \left(\begin{array}{cc}1&-3\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}5&-2&0&1\\ 0&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}(1)(5)+(-3)(0)&(1)(-2)+(-3)(1)&(1)(0)+(-3)(-1)&(1)(1)+(-3)(4)\\ (2)(5)+(0)(0)&(2)(-2)+(0)(1)&(2)(0)+(0)(-1)&(2)(1)+(0)(4)\\(-2)(5)+(6)(0)&(-2)(-2)+(6)(1)&(-2)(0)+(6)(-1)&(-2)(1)+(6)(4)\end{array}\right)

\displaystyle \left(\begin{array}{cc}1&-3\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}5&-2&0&1\\ 0&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}5&-5&3&-11\\ 10&-4&0&2\\-10&10&-6&22\end{array}\right)

We highlight the following step to obtain the entry in the first row and first column:

\displaystyle \left(\begin{array}{cc}\mathbf{1}&\mathbf{-3}\\ 2&0\\-2&6\end{array}\right) \left(\begin{array}{cccc}\mathbf{5}&-2&0&1\\ \mathbf{0}&1&-1&4\end{array}\right)=\left(\begin{array}{cccc}\mathbf{(1)(5)+(-3)(0)}&(1)(-2)+(-3)(1)&(1)(0)+(-3)(-1)&(1)(1)+(-3)(4)\\ (2)(5)+(0)(0)&(2)(-2)+(0)(1)&(2)(0)+(0)(-1)&(2)(1)+(0)(4)\\(-2)(5)+(6)(0)&(-2)(-2)+(6)(1)&(-2)(0)+(6)(-1)&(-2)(1)+(6)(4)\end{array}\right)

Now that we know how to multiply matrices, we now go back to vectors, which can always be written as column matrices. For the sake of simplicity we continue to restrict ourselves to finite-dimensional vector spaces. We have seen that writing vectors as column matrices provides us with several conveniences. Other kinds of matrices are also useful in studying vector spaces.

For instance, we noted in Vector Spaces, Modules, and Linear Algebra that an important kind of function between vector spaces (of the same dimension) are the linear transformations, which are functions f(v) such that f(av)=af(v) and f(u+v)=f(u)+f(v). We note that if A is an n\times n square matrix, and v is an n\times 1 column matrix, then the product Av is another n\times 1 column matrix. It is a theorem that all linear transformations between n-dimensional vector spaces can be written as an n\times n square matrix.

 We also have functions from a vector space to the set of its scalars, which are sometimes referred to as functionals. The set of linear functionals, i.e. the set of functionals f(v) such that f(av)=af(v) and f(u+v)=f(u)+f(v), are represented by multiplying any column matrix by a row matrix (the number of their entries must be the same, as per the rules of matrix multiplication). For instance, we may have

\displaystyle \left(\begin{array}{cccc}u_{1}&u_{2}&u_{3}&u_{4} \end{array}\right)\left(\begin{array}{c}v_{1}\\v_{2}\\v_{3}\\v_{4}\end{array}\right) = u_{1}v_{1}+u_{2}v_{2}+u_{3}v_{3}+u_{4}v_{4}

Note that the right side is just a real number (or complex number, or perhaps most generally, an element of the field of scalars of our vector space).

Matrices are rather ubiquitous in mathematics (and also in physics). In fact, some might teach the subject of linear algebra with the focus on matrices first. Here, however, we have taken the view of introducing first the abstract idea of vector spaces, and with matrices being viewed as a method of making these abstract ideas of vectors, linear transformations, and linear functionals more “concrete”. At the very heart of linear algebra still remains the idea of a set whose elements can be added and scaled, and functions between these sets that “respect” the addition and scaling. But when we want to actually compute things, matrices will often come in handy.

References:

 Matrix on Wikipedia

Linear Algebra Done Right by Sheldon Axler

Algebra by Michael Artin

Abstract Algebra by David S. Dummit and Richard M. Foote

More on Chain Complexes

In Homology and Cohomology we used the concept of chain complexes to investigate topological spaces. In Exact Sequences we saw examples of chain complexes generalized to abelian groups other than that made out of topological spaces. In this post we study chain complexes in the context of linear algebra (see Vector Spaces, Modules, and Linear Algebra).

We start with some definitions regarding modules. In More on Vector Spaces and Modules we gave the definition of a basis of a vector space. It is known that any vector space can always have a basis. However, the same is not true for modules. It is only a certain special kind of module called a free module which has the property that one can always find a basis for it.

Alternatively, a free module over a ring R may be thought of as being a module that is isomorphic to a direct sum of several copies of the ring R.

An example of a module that is not free is the module \mathbb{Z}/2\mathbb{Z} over the ring \mathbb{Z}. It is a module over \mathbb{Z} since it is closed under addition and under multiplication by any element of \mathbb{Z}, however a basis that will allow it to be written as a unique linear combination of elements of the basis cannot be found, nor is it a direct sum of copies of \mathbb{Z}.

Although not all modules are free, it is actually a theorem that any module is a quotient of a free module. Let A be a module over a ring R. The theorem says that this module is the quotient of some free module, which we denote by F_{0}, by some other module which we denote by K_{1}. In other words,

A=F_{0}/K_{1}

We can write this as the following chain complex, which also happens to be an exact sequence (see Exact Sequences):

0\rightarrow K_{1}\xrightarrow{i_{1}} F_{0}\xrightarrow{\epsilon} A\rightarrow 0

We know that the module F is free. However, we do not know if the same holds true for K_{1}. Regardless, the theorem says that any module is a quotient of a free module. Therefore we can write

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{\epsilon_{1}} K_{1}\rightarrow 0

We can therefore put these chain complexes together to get

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{\epsilon_{1}} K_{1}\xrightarrow{i_{1}} F_{0}\xrightarrow{\epsilon} A\rightarrow 0

However, this sequence of modules and morphisms is not a chain complex since the image of \epsilon_{1} is not contained in the kernel of i_{1}. But if we compose these two maps together, we obtain

0\rightarrow K_{2}\xrightarrow{i_{2}} F_{1}\xrightarrow{d_{1} }F_{0}\xrightarrow{\epsilon} A\rightarrow 0

where d_{1}=i_{1}\circ \epsilon_{1}. This is a chain complex as one may check. We can keep repeating the process indefinitely to obtain

...\xrightarrow{d_{3}} F_{2}\xrightarrow{d_{2} } F_{1}\xrightarrow{d_{1} } F_{0}\xrightarrow{\epsilon} A\rightarrow 0

This chain complex is called a free resolution of A. A free resolution is another example of an exact sequence.

We now introduce two more special kinds of modules.

A projective module is a module P such for any surjective morphism p: A\rightarrow A'' between two modules A and A'' and morphism h: P\rightarrow A'', there exists a morphism g: P\rightarrow A such that p\circ g=h.

It is a theorem that a module is projective if and only if it is a direct summand of a free module. This also means that a free module is automatically also projective.

An injective module is a module E such for any injective morphism i: A\rightarrow B between two modules A and B and morphism f: A\rightarrow E, there exists a morphism g: B\rightarrow E such that g\circ i=f.

Similar to our discussion regarding free resolutions earlier, we can also have projective resolutions and injective resolutions. A projective resolution is a chain complex

...\xrightarrow{d_{3}} P_{2}\xrightarrow{d_{2} } P_{1}\xrightarrow{d_{1} } P_{0}\xrightarrow{\epsilon} A\rightarrow 0

such that the P_{n} are projective modules.

Meanwhile, an injective resolution is a chain complex

...0\rightarrow A\xrightarrow{\eta} E^{0}\xrightarrow{d^{0} } E^{1}\xrightarrow{d^{1}} E^{2}\xrightarrow{d^{2}} ...

such that the E^{n} are injective modules.

Since projective and injective resolutions are chain complexes, we can use the methods of homology and cohomology to study them (Homology and Cohomology) even though they may not be made up of topological spaces. However, the usual procedure is to consider these chain complexes as forming an “abelian category” and then applying certain functors (see Category Theory) such as what are called the “Tensor” and “Hom” functors before applying the methods of homology and cohomology, resulting in what are known as “derived functors“. This is all part of the subject known as homological algebra.

References:

Free Module on Wikipedia

Projective Module on Wikipedia

Injective Module on Wikipedia

Resolution on Wikipedia

An Introduction to Homological Algebra by Joseph J. Rotman

Abstract Algebra by David S. Dummit and Richard M. Foote

More on Vector Spaces and Modules

In this short post, we show a method of constructing new vector spaces (see Vector Spaces, Modules, and Linear Algebra) from old ones. But first we introduce some more definitions important to the study of vector spaces. We will refer to the elements of vector spaces as the familiar vectors. A basis of a vector space is a set of vectors v_{1}, v_{2},...,v_{n} such that any vector v in the vector space can be written uniquely as a linear combination

v=c_{1}v_{1}+c_{2}v_{2}+...+c_{n}v_{n}.

where c_{1}, c_{2},...,c_{n} are elements of the set of “scalars” of the vector space. The number of elements n of the basis is called the dimension of the vector space.

The tensor product V\otimes W is the quotient set of elements which are formal linear combinations of ordered pairs of vectors (v, w), where v\in V and v\in W under the following equivalence relations (see Modular Arithmetic and Quotient Sets)

(v_{1}, w)+(v_{2}, w)\sim (v_{1}+v_{2}, w)

(v, w_{1})+(v, w_{2})\sim (v, w_{1}+w_{2})

c(v, w)\sim (cv, w)

c(v, w)\sim (v, cw)

where c is a scalar.

The last two equivalence relations together also imply that:

(cv, w)\sim (v, cw)

Denoting the set of scalars by F, so that c\in F, we also sometimes write V\otimes_{F} W.

Tensor products also exist for modules. In physics, vector spaces provide us with the language we use to study quantum mechanics, and tensor products are important for expressing the phenomenon of quantum entanglement.

This post is quite a bit shorter than most of my previous ones and could perhaps be seen as just an addendum to Vector Spaces, Modules, and Linear Algebra. More on tensor products shall perhaps be discussed in future posts, including examples and applications.

References:

Vector Space on Wikipedia

Tensor Product in Wikipedia

Quantum Entanglement on Wikipedia

Abstract Algebra by David S. Dummit and Richard M. Foote

Vector Spaces, Modules, and Linear Algebra

Let’s take a little trip back in time to grade school mathematics. What is five apples plus three apples? Easy, the answer is eight apples. What about two oranges plus two oranges? The answer is four oranges. What about three apples plus two oranges? Wait, that question is literally “apples and oranges”! But we can still answer that question of course. Three apples plus two oranges is three apples and two oranges. Does that sound too easy? We ramp it up just a little bit: What is three apples and two oranges, plus one apple and five oranges? The answer is four apples and seven oranges. Even if we’re dealing with two objects we’re not supposed to mix together, we can still do mathematics with them, as long as we treat each object separately.

Such an idea can be treated with the concept of vector spaces. Another application of this concept is to quantities with magnitude and direction in physics, where the concept actually originated. Yet another application is to quantum mechanics, where things can be simultaneously on and off, or simultaneously pointing up and down, or simultaneously be in a whole bunch of different states we would never think of being capable of existing together simultaneously. But what, really, is a vector space?

We can think of vector spaces as sets of things that can be added to or subtracted from each other, or scaled up or scaled down, or combinations of all these. To make all these a little easier, we stay in the realm of what are called “finite dimensional” vector spaces, and we develop for this purpose a little notation. We go back to the example we set out at the start of this post, that of the apples and oranges. Say for example that we have three apples and two oranges. We will write this as

\displaystyle \left(\begin{array}{c}3\\2\end{array}\right)

Now, say we want to add to this quantity, one more apple and five oranges. We write

\displaystyle \left(\begin{array}{c}3\\2\end{array}\right)+\left(\begin{array}{c}1\\5\end{array}\right)

Of course this is easy to solve, and we have already done the calculation earlier. We have

\displaystyle \left(\begin{array}{c}3\\2\end{array}\right)+\left(\begin{array}{c}1\\5\end{array}\right)=\left(\begin{array}{c}4\\7\end{array}\right)

But we also said we can “scale” such a quantity. So suppose again that we have three apples and two oranges. If we were to double this quantity, what would we have? We would have six apples and four oranges. We write this operation as

\displaystyle 2\left(\begin{array}{c}3\\2\end{array}\right)=\left(\begin{array}{c}6\\4\end{array}\right)

We can also “scale down” such a quantity. Suppose we want to cut in half our amount of three apples and two oranges. We would have one and a half apples (or three halves of an apple) and one orange:

\displaystyle \frac{1}{2}\left(\begin{array}{c}3\\2\end{array}\right)=\left(\begin{array}{c}\frac{3}{2}\\1\end{array}\right)

We can also apply what we know of negative numbers – we can for example think of a negative amount of something as being like a “debt”. With this we can now add subtraction to the operations that we can do to vector spaces. For example, let us subtract from our quantity of three apples and two oranges the quantity of one apple and five oranges. We will be left with two apples and a “debt” of three oranges. We write

\displaystyle \left(\begin{array}{c}3\\2\end{array}\right)-\left(\begin{array}{c}1\\5\end{array}\right)=\left(\begin{array}{c}2\\-3\end{array}\right)

Finally, we can combine all these operations:

\displaystyle 2\left(\left(\begin{array}{c}3\\2\end{array}\right)+\left(\begin{array}{c}1\\5\end{array}\right)\right)=2\left(\begin{array}{c}4\\7\end{array}\right)=\left(\begin{array}{c}8\\14\end{array}\right)

For vector spaces, the “scaling” operation possesses a property analogous to the distributive property of multiplication over addition. So if we wanted to, we could also have performed the previous operation in another way, which gives the same answer:

\displaystyle 2\left(\left(\begin{array}{c}3\\2\end{array}\right)+\left(\begin{array}{c}1\\5\end{array}\right)\right)=2\left(\begin{array}{c}3\\2\end{array}\right)+2\left(\begin{array}{c}1\\5\end{array}\right)=\left(\begin{array}{c}6\\4\end{array}\right)+\left(\begin{array}{c}2\\10\end{array}\right)=\left(\begin{array}{c}8\\14\end{array}\right)

We can also apply this notation to problems in physics. Suppose a rigid object acted on by a force of one Newton to the north and another force of one Newton to the east. Then adopting a convention of Cartesian coordinates with the positive x-axis oriented towards the east, we can calculate the resultant force acting on the object as follows

\displaystyle \left(\begin{array}{c}1\\0\end{array}\right)+\left(\begin{array}{c}0\\1\end{array}\right)=\left(\begin{array}{c}1\\1\end{array}\right)

This is actually a force with a magnitude of around 1.414 Newtons, with a direction pointing towards the northeast, but a discussion of such calculations will perhaps be best left for future posts. For now, we want to focus on the two important properties of vector spaces, its being closed under the operations of addition and multiplication by a scaling factor, or “scalar”.

In Rings, Fields, and Ideals, we discussed what it means for a set to be closed under certain operations. A vector space is therefore a set that is closed under addition among its own elements and under multiplication by a “scalar”, which is an element of a field, a concept we discussed in the same post linked to above. A set that is closed under addition among its own elements and multiplication by a scalar which is a ring instead of a field is called a module. Another concept we discussed in Rings, Fields, and Ideals and also in More on Ideals is the concept of an ideal. An ideal is a module which is also a subset of its ring of scalars.

Whenever we talk about sets, it is always important to also talk about the functions between such sets. A vector space (or a module) is just a set with special properties, namely closure under addition and scalar multiplication, therefore we want to talk about functions that are related to these properties. A linear transformation is a function between two vector spaces or modules that “respect” addition and scalar multiplication. Let u and v be any two elements of a vector space or a module, and let a be any element of their field or ring of scalars. By the properties defining vector spaces and modules, u+v and av are also elements of the same vector space or module. A function f between two vector spaces or modules is called a linear transformation if

\displaystyle f(u+v)=f(u)+f(v)

\displaystyle f(av)=af(v)

Linear transformations are related to the equation of a line in Cartesian geometry, and they give the study of vector spaces and modules its name, linear algebra. For certain types of vector spaces or modules, linear transformations can be represented by nifty little gadgets called matrices, which are rectangular arrays of elements of the field or ring of scalars. The vectors (elements of vector spaces) which we have featured in this post can be thought of as matrices with only a single column, or sometimes called column matrices. We will not discuss matrices in this post, although perhaps in the future we will; they can be found, along with many other deeper aspects of linear algebra, in most textbooks on linear algebra or abstract algebra such as Linear Algebra Done Right by Sheldon Axler or Algebra by Michael Artin.

References:

Vector Space in Wikipedia

Module on Wikipedia

Linear Algebra Done Right by Sheldon Axler

Algebra by Michael Artin