Metric, Norm, and Inner Product

In Vector Spaces, Modules, and Linear Algebra, we defined vector spaces as sets closed under addition and scalar multiplication (in this case the scalars are the elements of a field; if they are elements of a ring which is not a field, we have not a vector space but a module). We have seen since then that the study of vector spaces, linear algebra, is very useful, interesting, and ubiquitous in mathematics.

In this post we discuss vector spaces with some more additional structure – which will give them a topology (Basics of Topology and Continuous Functions), giving rise to topological vector spaces. This also leads to the branch of mathematics called functional analysis, which has applications to subjects such as quantum mechanics, aside from being an interesting subject in itself. Two of the important objects of study in functional analysis that we will introduce by the end of this post are Banach spaces and Hilbert spaces.

I. Metric

We start with the concept of a metric. We have to get two things out of the way. First, this is not the same as the metric tensor in differential geometry, although it also gives us a notion of a “distance”. Second, the concept of metric is not limited to vector spaces only, unlike the other two concepts we will discuss in this post. It is actually something that we can put on a set to define a topology, called the metric topology.

As we discussed in Basics of Topology and Continuous Functions, we may think of a topology as an “arrangement”. The notion of “distance” provided by the metric gives us an intuitive such arrangement. We will make this concrete shortly, but first we give the technical definition of the metric. We quote from the book Topology by James R. Munkres:

A metric on a set X is a function

\displaystyle d: X\times X\rightarrow \mathbb{R}

having the following properties:

1) d(x, y)>0 for all x,y \in X; equality holds if and only if x=y.

2) d(x,y)=d(y,x) for all x,y \in X.

3) (Triangle inequality) d(x,y)+d(y,z)>d(x,z), for all x,y,z \in X.

We quote from the same book another important definition:

Given a metric d on X, the number d(x, y) is often called the distance between x and y in the metric d. Given \epsilon >0, consider the set

\displaystyle B_{d}(x,\epsilon)=\{y|d(x,y)<\epsilon\}

of all points у whose distance from x is less than \epsilon. It is called the \epsilon-ball centered at x. Sometimes we omit the metric d from the notation and write this ball simply as B(x,\epsilon) when no confusion will arise.

Finally, once more from the same book, we have the definition of the metric topology:

If d is a metric on the set X, then the collection of all \epsilon-balls B_{d}(x,\epsilon), for x\in X and \epsilon>0, is a basis for a topology on X, called the metric topology induced by d.

We recall that the basis of a topology is a collection of open sets such that every other open set can be described as a union of the elements of this collection. A set with a specific metric that makes it into a topological space with the metric topology is called a metric space.

An example of a metric on the set \mathbb{R}^{n} is given by the ordinary “distance formula”:

\displaystyle d(x,y)=\sqrt{\sum_{i=1}^{n}(x_{i}-y_{i})^{2}}

Note: We have followed the notation of the book of Munkres, which may be different from the usual notation. Here x and y are two different points on \mathbb{R}^{n}, and x_{i} and y_{i} are their respective coordinates.

The above metric is not the only one possible however. There are many others. For instance, we may simply put

\displaystyle d(x,y)=0 if \displaystyle x=y

\displaystyle d(x,y)=1 if \displaystyle x\neq y.

This is called the discrete metric, and one may check that it satisfies the definition of a metric. One may think of it as something that simply specifies the distance from a point to itself as “near”, and the distance to any other point that is not itself as “far”. There is also the taxicab metric, given by the following formula:

\displaystyle d(x,y)=\sum_{i=1}^{n}|x_{i}-y_{i}|

One way to think of the taxicab metric, which reflects the origins of the name, is that it is the “distance” important to taxi drivers (needed to calculate the fare) in a certain city with perpendicular roads. The ordinary distance formula is not very helpful since one needs to stay on the roads – therefore, for example, if one needs to go from point x to point y which are on opposite corners of a square, the distance traversed is not equal to the length of the diagonal, but is instead equal to the length of two sides. Again, one may check that the taxicab metric satisfies the definition of a metric.

II. Norm

Now we move on to vector spaces (we will consider in this post only vector spaces over the real or complex numbers), and some mathematical concepts that we can associate with them, as suggested in the beginning of this post. Being a set closed under addition and scalar multiplication is already a useful concept, as we have seen, but we can still add on some ideas that would make them even more interesting. The notion of metric that we have discussed earlier will show up repeatedly over this discussion.

We first discuss the notion of a norm, which gives us a notion of a “magnitude” of a vector. We quote from the book Introductory Functional Analysis with Applications by Erwin Kreyszig for the definition:

A norm on a (real or complex) vector space X is a real valued function on X whose value at an x\in X is denoted by

\displaystyle \|x\|    (read “norm of x“)

and which has the properties

(N1) \|x\|\geq 0

(N2) \|x\|=0\iff x=0

(N3) \|\alpha x\|=|\alpha|\|x\|

(N4) \|x+y\|\leq\|x\|+\|y\|    (triangle inequality)

here x and y are arbitrary vectors in X and \alpha is any scalar.

A vector space with a specified norm is called a normed space.

A norm automatically provides a vector space with a metric; in other words, a normed space is always a metric space. The metric is given in terms of the norm by the following equation:

\displaystyle d(x,y)=\|x-y\|

However, not all metrics come from a norm. An example is the discrete metric, which satisfies the properties of the metric but not the norm.

III. Inner Product

Next we discuss the inner product. The inner product gives us a notion of “orthogonality”, a concept which we already saw in action in Some Basics of Fourier Analysis. Intuitively, when two vectors are “orthogonal”, they are “perpendicular” in some sense. However, our geometric intuition may not be as useful when we are discussing, say, the infinite-dimensional vector space whose elements are functions. For this we need a more abstract notion of orthogonality, which is embodied by the inner product. Again, for the technical definition we quote from the book of Kreyszig:

With every pair of vectors x and y there is associated a scalar which is written

\displaystyle \langle x,y\rangle

and is called the inner product of x and y, such that for all vectors x, y, z and scalars \alpha we have

(IPl) \langle x+y,z\rangle=\langle x,z\rangle+\langle y,z\rangle

(IP2) \langle \alpha x,y\rangle=\alpha\langle x,y\rangle

(IP3) \langle x,y\rangle=\overline{\langle y,x\rangle}

(IP4) \langle x,x\rangle\geq 0,    \langle x,x\rangle=0 \iff x=0

A vector space with a specified inner product is called an inner product space.

One of the most basic examples, in the case of a finite-dimensional vector space, is given by the following procedure. Let x and y be elements (vectors) of some n-dimensional real vector space X, with respective components x_{1}, x_{2},...,x_{n} and y_{1},y_{2},...,y_{n} in some basis. Then we can set

\displaystyle \langle x,y\rangle=x_{1}y_{1}+x_{2}y_{2}+...+x_{n}y_{n}

This is the familiar “dot product” taught in introductory university-level mathematics courses.

Let us now see how the inner product gives us a notion of “orthogonality”. To make things even easier to visualize, let us set n=2, so that we are dealing with vectors (which we can now think of as quantities with magnitude and direction) in the plane. A unit vector x pointing “east” has components x_{1}=1, x_{2}=0, while a unit vector y pointing “north” has components y_{1}=0, y_{2}=1. These two vectors are perpendicular, or orthogonal. Computing the inner product we discussed earlier, we have

\displaystyle \langle x,y\rangle=(1)(0)+(0)(1)=0.

We say, therefore, that two vectors are orthogonal when their inner product is zero. As we have mentioned earlier, we can extend this to cases where our geometric intuition may no longer be as useful to us. For example, consider the infinite dimensional vector space of (real-valued) functions which are “square integrable” over some interval (if we square them and integrate over this interval, we have a finite answer), say [0,1]. We set our inner product to be

\displaystyle \int_{0}^{1}f(x)g(x)dx.

As an example, let f(x)=\text{cos}(2\pi x) and g(x)=\text{sin}(2\pi x). We say that these functions are “orthogonal”, but it is hard to imagine in what way. But if we take the inner product, we will see that

\displaystyle \int_{0}^{1}\text{cos}(2\pi x)\text{sin}(2\pi x)dx=0.

Hence we see that \text{cos}(2\pi x) and \text{sin}(2\pi x) are orthogonal. Similarly, we have

\displaystyle \int_{0}^{1}\text{cos}(2\pi x)\text{cos}(4\pi x)dx=0

and \text{cos}(2\pi x) and \text{cos}(4\pi x) are also orthogonal. We have discussed this in more detail in Some Basics of Fourier Analysis. We have also seen in that post that orthogonality plays a big role in the subject of Fourier analysis.

Just as a norm always induces a metric, an inner product also induces a norm, and by extension also a metric. In other words, an inner product space is also a normed space, and also a metric space. The norm is given in terms of the inner product by the following expression:

\displaystyle \|x\|=\sqrt{\langle x,x\rangle}

Just as with the norm and the metric, although an inner product always induces a norm, not every norm is induced by an inner product.

IV. Banach Spaces and Hilbert Spaces

There is one more concept I want to discuss in this post. In Valuations and Completions, we discussed Cauchy sequences and completions. Those concepts still carry on here, because they are actually part of the study of metric spaces (in fact, the valuations discussed in that post actually serve as a metric on the fields that were discussed, showing how in number theory the concept of metric and metric spaces still make an appearance). If every Cauchy sequence in a metric space X converges to an element in X, then we say that X is a complete metric space.

Since normed spaces and inner product spaces are also metric spaces, the notion of a complete metric space still makes sense, and we have special names for them. A normed space which is also a complete metric space is called a Banach space, while an inner product space which is also a complete metric space is called a Hilbert space. Finite-dimensional vector spaces (over the real or complex numbers) are always complete, and therefore we only really need the distinction when we are dealing with infinite dimensional vector spaces.

Banach spaces and Hilbert spaces are important in quantum mechanics. We recall in Some Basics of Quantum Mechanics that the possible states of a system in quantum mechanics form a vector space. However, more is true – they actually form a Hilbert space, and the states that we can observe “classically” are orthogonal to each other. The Dirac “bra-ket” notation that we have discussed makes use of the inner product to express probabilities.

Meanwhile, Banach spaces often arise when studying operators, which correspond to observables such as position and momentum. Of course the states form Banach spaces too, since all Hilbert spaces are Banach spaces, but there is much motivation to study the Banach spaces formed by the operators as well instead of just that formed by the states. This is an important aspect of the more mathematically involved treatments of quantum mechanics.

References:

Topological Vector Space on Wikipedia

Functional Analysis on Wikipedia

Metric on Wikipedia

Norm on Wikipedia

Inner Product Space on Wikipedia

Complete Metric Space on Wikipedia

Banach Space on Wikipedia

Hilbert Space on Wikipedia

A Functional Analysis Primer on Bahcemizi Yetistermeliyiz

Topology by James R. Munkres

Introductory Functional Analysis with Applications by Erwin Kreyszig

Real Analysis by Halsey Royden

Basics of Topology and Continuous Functions

Informally, a topology is a kind of “arrangement” or “organization” that we put on a set. One can think of an analogy with an army, which is made up of soldiers organized into squads, which are in turn organized into platoons, and so forth. Topology accomplishes this by organizing subsets of a set into “open sets” and “closed sets”. We quote here the rigorous definition of a topology following the book Topology by James R. Munkres:

A topology on a set X is a collection \mathcal{T} of subsets of X having the following properties:

(1)  \varnothing and X are in \mathcal{T}.

(2) The union of the elements of any subcollection of \mathcal{T} is in \mathcal{T}.

(3) The intersection of the elements of any finite subcollection of \mathcal{T} is in \mathcal{T}.

A set X for which a topology \mathcal{T} has been specified is called a topological space.

From the same book, we have the following definition of an open set:

If X is a topological space with a topology \mathcal{T}, we say that a subset U of X is an open set of X if U belongs to the collection \mathcal{T}.

We also have the following definition of a closed set:

A subset A of a topological space X is said to be closed if the set  X-A is open.

We note that the notation X-A here refers to the complement of A in X, i.e., the set X-A is the set of all elements of the set X that are not also elements of the set A.

Our definition of open sets and closed sets has some results that seem rather weird at first glance. By definition, both the entire set X and the empty set \varnothing are open. But X-\varnothing is just X, which is open; therefore, by our definition of closed sets, \varnothing is closed. Similarly, since X-X is the empty set \varnothing, which is again open, we find that the entire set X is also closed. Therefore, the sets X and \varnothing are both open and closed!

This only seems paradoxical because we are used to thinking of the words open and closed as being opposites; such may be the case in real life, but for our purposes these words are merely terminology that we use in order to organize our set; therefore, it should not be troubling for us to find a set both closed and open (some refer to such a set as a “clopen” set). There also exist examples of sets in some topologies being neither closed nor open.

Now we show one example of putting a topology on a set. We consider the set with two elements, which we shall refer to as 0 and 1. This set has the following subsets:

\displaystyle \varnothing

\displaystyle \{0\}

\displaystyle \{1\}

\displaystyle \{0,1\}

We shall now put a topology on this set. By the definition of a topology, the subsets \{0,1\}, which is the entire set, along with the empty set \varnothing have to be open. By the result we discovered earlier, they must also both be closed. We now have a choice of what to do with the remaining subsets, \{0\} and  \{1\}. If we declare them to both be open, then all the subsets are open sets. We call a topology where all subsets are open the discrete topology. It so happens that if we do this, both \{0\} and  \{1\} will also be closed by the definition of topology. So putting the discrete topology on this set with two elements makes all subsets both open and closed.

We can also not declare anything on the two sets \{0\} and  \{1\}; this will make them neither open nor closed, and only the entire set and the empty set are declared open (they also happen to be closed). Such a topology where only the entire set and empty set are declared to be open (which they are forced to be, by definition) is called the trivial topology.

Finally, we can declare just one of \{0\} and  \{1\} to be open. There are two different ways of doing this; we can declare either \{0\} to be open, in which case  \{1\} will be closed, or we can declare  \{1\} to be open, in which case \{0\} will be closed. A two element set where one of the one-element subset is declared to be open, rendering the other one-element subset closed, is called a Sierpinski space.

We tackle one more example. Consider the set of all real numbers \mathbb{R}, also called the real line. We will put a topology on the real line, but first there is one more concept that we need to define. Let a and b be two real numbers, where b is greater than a (also written, of course, as a<b). The set of all real numbers which are greater than a but less than b is denoted by (a, b). Note that a and b themselves are not included in the set (a, b). We call sets such as these open intervals. If instead we consider the set of real numbers greater than or equal to a but less than or equal to b, then we write [a, b], and both a and b are now included in [a, b]. Such sets are called closed intervals.

We now go back to putting a topology on the real line. As may be suggested by the naming, we now declare that all sets that are unions of open intervals, including of course the open intervals themselves, to be the open sets of our topology. This will make the closed intervals, including the sets consisting of only one real number, into closed sets. We will not explain why in this post, but it always goes back to the definitions of topology, open set, and closed set. This topology that we have defined on the real line is called its standard topology.

Now that we know the concept of an open interval, there is another related concept that we will introduce in this post. We stay in the context of real numbers and the real line. Let x be a real number, and let \epsilon be a positive real number. The open interval (x-\epsilon, x+\epsilon) is an example of what we call a neighborhood of x. It consists of all real numbers whose difference from x is less than \epsilon, or we may think of it them as being less than a distance \epsilon away from x. This motivates the terminology of “neighborhood”, even though \epsilon can be as small or as big as we want.

The concept of a neighborhood plays a big role in a very common kind of topology called a metric topology. It plays a big role in the modern foundations of calculus and geometry.

Recall that a function is a mapping between sets, in the sense that it assigns to every element in a set called its domain an element of another set called its range. As per our definitions above, a topological space is just a set  for which a topology is specified, so we can talk about functions between topological spaces. The topologies on the sets involved will allow us to define an important kind of function between topological spaces, called a continuous function. Once more we refer to the book of Munkres:

Let X and Y be topological spaces. A function f: X \rightarrow Y is said to be continuous if for each open subset V of Y, the set f^{-1}(V) is an open subset of X

Recall that f^{-1}(V) is the set of all points x of X for which f(x)\in V; it is empty if V does not intersect the image set f(X) of f.

Continuity of a function depends not only upon the function f itself, but also on the topologies specified for its domain and range. If we wish to emphasize this fact, we can say that f is continuous relative to specific topologies on X and Y.

A continuous function with a continuous inverse is called a homeomorphism.

This is the most basic definition of continuity of a function. However, depending on the topologies on the domain and range, there may be several equivalent definitions, all deriving from this one most basic definition, that will shed light on certain concepts of importance for the topological spaces that we are studying. We state here one important equivalent definition for the case of functions from real numbers to real numbers, with the set of real numbers equipped with the standard topology discussed earlier.

A function f: \mathbb{R} \rightarrow \mathbb{R} is said to be continuous if for any two real numbers x and y and a positive real number \epsilon, there exists another positive real number \delta such that whenever |x-y| is less than \delta, then it is guaranteed that |f(x)-f(y)| is less than \epsilon. The notation |x-y| stands for the absolute value of x-y; if x is greater than y then it is simply equal to x-y, but if y is greater than x then it is instead equal to y-x. The same applies to |f(x)-f(y)|. If f(x) is greater than f(y) then it is equal to f(x)-f(y) but if f(y) is greater than f(x) then it is equal to f(y)-f(x).

The idea that this definition of continuity is supposed to communicate, is that we can always produce as small a change as we want in the “output” of the function as long as we make a change in the “input” that is sufficiently small enough. We can think of functions that are not continuous as having abrupt “jumps” such that even if we make the smallest of changes in the input we still cannot make the output change slowly enough with respect to this change in the input.

It is important to remind ourselves, once again, that this latter definition of continuity follows from the one most basic definition of continuity we have defined earlier, we have simply specialized it to the case where the domain and range is the set of real numbers, and we have equipped this set with its standard topology. We have not discussed explicitly how exactly to relate the two definitions here, but the inquisitive reader can find it and much more in the book of Munkres.

References:

General Topology on Wikipedia

Sierpinski Space on Wikipedia

Continuous Function on Wikipedia

Topology by James R. Munkres