Skip to main content

Section 2.6 Linear Transformations II

In Section 2.4 we saw that an arbitrary matrix equations could be solved completely through putting the matrix into reduced echelon form. As we will now see, for vector spaces with chosen bases, we can write any abstract linear transformation between them as a matrix, and leverage our knowledge of matrix equations to better understand the transformations. This also has a tremendous computational advantage in that we can then parametrize all types of linear structures with linear (or more generally affine) transformations.
Suppose \(U\) and \(V\) are vector spaces over \(K\) and
\begin{align*} \mathcal{B} \amp = \{\mb{u}_1 , \ldots, \mb{u}_n \}, \\ \mathcal{C} \amp = \{ \mb{v}_1, \ldots, \mb{v}_m\} \end{align*}
are bases of \(U\) and \(V\) respectively. Finally, suppose that
\begin{equation*} T: U \to V \end{equation*}
is a linear transformation. Then for any \(1 \leq j \leq n\text{,}\) we have that \(T(\mb{u}_j)\) is a vector in \(V\) and can thus be written uniquely as a linear combination of basis vectors in \(\mathcal{C}\text{.}\) We write such a combination out here
\begin{equation} T (\mb{u}_j) = a_{1j}\mb{v}_1 + a_{2j} \mb{v}_2 + \cdots + a_{mj} \mb{v}_m. \tag{2.6.1} \end{equation}
Notice that this gives us scalars \(a_{ij}\) for all \(1 \leq i \leq m\) and \(1 \leq j \leq n\) which we can place in an \(m \times n\) matrix
\begin{equation*} \cob{T}{\mathcal{B}}{\mathcal{C}} = \left[ \begin{matrix} a_{11} \amp a_{12} \amp \cdots \amp a_{1n} \\ a_{21} \amp a_{22} \amp \cdots \amp a_{2n} \\ \vdots \amp \ddots \amp \amp \vdots \\ a_{m1} \amp a_{m2} \amp \cdots \amp a_{mn} \end{matrix} \right] \end{equation*}
This matrix is called the matrix representation of T. While this may certainly seem like awkward notation, it aligns well with what is known as `tensor notation’, commonly used in various branches of physics. What is important though is that it encodes all of the needed data (the transformation \(T\) and the bases \(\mathcal{B}\) and \(\mathcal{C}\)). But how is this matrix used to compute with \(T\text{?}\) Well, we have not used that \(\mathcal{B}\) is a basis yet, so let’s do that. We know that \text{any} vector \(\mb{u}\) in \(U\) has a unique expression as
\begin{equation*} \mb{u} = x_1 \mb{u}_1 + x_2 \mb{u}_2 + \cdots + x_n \mb{u}_n. \end{equation*}
We can feed this into the transformation and use linearity to see
\begin{align*} T (\mb{u}) \amp = T (x_1 \mb{u}_1 + \cdots + x_n \mb{u}_n ) ,\\ \amp = x_1 T (\mb{u}_1) + \cdots + x_n T (\mb{u}_n), \\ \amp = x_1 \left(a_{11}\mb{v}_1 + a_{21} \mb{v}_2 + \cdots + a_{m1} \mb{v}_m \right) + \cdots + x_n \left(a_{1n}\mb{v}_1 + a_{2n} \mb{v}_2 + \cdots + a_{mn} \mb{v}_m \right), \\ \amp = \left( a_{11} x_1 + \cdots + a_{1n} x_n \right) \mb{v}_1 + \cdots + \left( a_{m1} x_1 + \cdots + a_{mn} x_n \right) \mb{v}_m. \end{align*}
The coefficients of the vectors \(\mb{v}_i\) may look familiar, because they were the coordinates from the formula for matrix multiplication in equation (2.3.1). In fact, we can write this observation as an important equation
\begin{equation} \coord{T(\mb{u})}{\mathcal{C}} = \cob{T}{\mathcal{B}}{\mathcal{C}} \coord{\mb{u}}{\mathcal{B}} .\tag{2.6.2} \end{equation}
Alternatively, we can give the elegant definition
\begin{equation} \cob{T}{\mathcal{B}}{\mathcal{C}} = \coord{}{\mathcal{C}} \circ T \circ []_\mathcal{B}. \tag{2.6.3} \end{equation}
While important, it is easy to relate to a student who sees this equation as confusing gibberish. So we shall tilt our heads for the moment and understand it from the perspective of the diagram
\begin{equation} \begin{CD} U @>T>> V\\ @VV\coord{}{\mathcal{B}}V @VV\coord{}{\mathcal{C}}V\\ K^n @>\cob{T}{\mathcal{B}}{\mathcal{C}}>> K^m \end{CD}\tag{2.6.4} \end{equation}
You should look at this diagram in the following way. What we are interested in is the linear transformation \(T\) from \(U\) to \(V\text{,}\) so the top row of the diagram is what we care about. But it is abstract and difficult to compute with, so we consider the linear isomorphisms \(\coord{}{\mathcal{B}}\) and \(\coord{}{\mathcal{C}}\) that place coordinates on \(U\) and \(V\) respectively and bring us down to earth on the bottom row of the diagram. Down here, \(U\) and \(V\) have been replaced with column vectors of numbers which we can manipulate with basic arithmetic. But equation (2.6.2) tells us that we can also work with \(T\) here as well! In particular, we can write \(T\) as the concrete matrix of numbers \(\cob{T}{\mathcal{B}}{\mathcal{C}}\) and compute \(\coord{T(\mb{u})}{\mathcal{C}}\) as matrix multiplication \(\cob{T}{\mathcal{B}}{\mathcal{C}} \coord{\mb{u}}{\mathcal{B}}\text{.}\) Let’s see this worked out in an example.

Example 2.6.1. The matrix of a derivative.

Consider \(P_n\) as the polynomials of degree less than or equal to \(n\) with coefficients in \(\mathbb{R}\text{.}\) Take
\begin{equation*} D: P_2 \to P_1 \end{equation*}
to be the linear transformation obtained by taking the derivative. In other words,
\begin{equation*} D(f) = f^\prime . \end{equation*}
In order to write out \(D\) as a matrix, we need to choose bases for \(P_2\) and \(P_1\text{.}\) The natural candidate for \(P_n\) is \(\{1, x, x^2, \ldots, x^n\}\text{,}\) so we take
\begin{align*} \mathcal{B} \amp = \{1, x, x^2\}, \\ \mathcal{C} \amp = \{1, x\}. \end{align*}
To find \(\cob{D}{\mathcal{B}}{\mathcal{C}}\) we will simply need to find the coefficients from equation (2.6.1). In other words, we need to take the derivative of our polynomials from \(\mathcal{B}\) and write them out as linear combinations of the polynomials in \(\mathcal{C}\text{.}\)
\begin{align*} D(1) \amp = 0 = 0\cdot 1 + 0 \cdot x, \\ D(x) \amp = 1 = 1\cdot 1 + 0 \cdot x, \\ D(x^2) \amp = 2x = 0\cdot 1 + 2 \cdot x. \end{align*}
Placing these coefficients into the matrix (appropriately!) gives
\begin{equation*} \cob{D}{\mathcal{B}}{\mathcal{C}} = \begin{bmatrix} {0}\amp {1}\amp {0}\\ {0}\amp {0} \amp {2} \end{bmatrix} . \end{equation*}
Of course, in this example, since we all know how to take derivatives, the matrix representation of \(D\) is of limited usefulness. Nonetheless, let us show how equation (2.6.2) works in this case. Take the arbitrary quadratic polynomial
\begin{equation*} f = ax^2 + bx + c \end{equation*}
in \(P_2\) and observe that
\begin{equation*} \coord{f}{\mathcal{B}} = \threevec{c}{b}{a} . \end{equation*}
Then multiplying this column vector on the left by \(\cob{D}{\mathcal{B}}{\mathcal{C}}\) gives
\begin{align*} \cob{D}{\mathcal{B}}{\mathcal{C}} \coord{f}{\mathcal{B}} \amp = \begin{bmatrix} {0}\amp {1}\amp {0}\\ {0}\amp {0} \amp {2} \end{bmatrix} \threevec{c}{b}{a} ,\\ \amp = \twovec{b}{2a} . \end{align*}
But this vector represents the element
\begin{equation*} b\cdot 1 + 2a \cdot x = 2ax + b \end{equation*}
which we all know as the derivative of \(f\text{.}\)
The technique of representing a linear transformation as a matrix already gives us some important results. First though, we define the following notions.

Definition 2.6.2.

If \(U\) and \(V\) are finite dimensional vector spaces over \(K\text{,}\) the rank of a linear transformation \(T : U \to V\text{,}\) denoted \(\rk (T)\text{,}\) is the dimension of \(\im (T)\text{.}\) The nullity of \(T\) is the dimension of \(\ker (T)\text{.}\)
The following theorem gives us a good amount of qualitative information about a linear transformation.

Proof.

To see this, suppose \(\dim (U) = n\text{,}\) \(\dim (V) = m\) and let \(A\) be the \(m \times n\) matrix representing \(T\) relative to some bases on \(U\) and \(V\text{.}\) Then the kernel of \(T\) is isomorphic to the null space of \(A\) (which is the kernel of multiplying by \(A\)) and the image of \(T\) is isomorphic to the column space of \(A\) (which is the image of multiplying by \(A\)). Now, the null space of \(A\) is the linear subspace of solutions to the matrix equation
\begin{equation*} A \mb{x} = \mb{0}. \end{equation*}
We saw in Theorem 2.4.7 that the solutions to these equations were parameterized by \(K^r\) where \(r\) was the number of free columns of \(A\text{.}\) So the nullity of \(A\) is precisely \(r\text{.}\) On the other hand, Proposition 2.5.11 showed that the dimension of the column space equaled the number of basic columns of \(A\text{.}\) Now every column of \(A\) is either free or basic (but not both), so the sum of these two numbers is precisely \(n = \dim (U)\text{.}\)
From this we obtain the corollary

Proof.

For the first claim, represent \(T\) by a matrix and apply Corollary 2.5.6. For the second, if \(T\) is onto then then \(\im (T) = V\) so that \(\rk (T) = \dim (V)\text{.}\) By Theorem 2.6.3, \(\nullity (T) + \dim (V) = \dim (U)\) implying \(\dim (U) \geq \dim (V)\text{.}\) The last claim follows from the fact that a linear isomorphism is a one-to-one correspondence by definition.
We also can use our theorem to make it easier to detect linear isomorphisms.

Proof.

Note that \(T\) is one-to-one if and only if \(\nullity (T) = 0\) which by Theorem 2.6.3 holds if and only if \(\rk (T) = \dim U = \dim (V)\text{.}\) But then \(\im (T) = V\) since otherwise \(\im (T)\) would be a proper subspace of \(V\) and Corollary 2.5.9 would give that \(\rk (T) = \dim (\im (T)) \lt \dim (V)\text{.}\) Thus \(T\) is onto and a linear isomorphism.
Now, if \(T\) is onto then \(\rk (T) = \dim (V) = \dim (U)\) which again implies by Theorem 2.6.3 that \(\nullity (T) = 0\text{.}\) This would give that \(\ker (T) = \{\mb{0} \}\) so that \(T\) is one-to-one and a linear isomorphism.
Clearly, if \(T\) is a linear isomorphism then it is both one-to-one and onto by definition.
The ability to represent a linear transformation as a matrix is especially important when we consider linear transformations from a vector space to itself
\begin{equation*} T : V \to V. \end{equation*}
In this case, we need only choose one basis \(\mathcal{B}\) of \(V\) to get a matrix representation
\begin{equation*} B = \cob{T}{\mathcal{B}}{\mathcal{B}} \end{equation*}
because the domain and the codomain are equal. However, if we chose a different basis \(\mathcal{C}\text{,}\) it would be good to know how to change this matrix to obtain the matrix representation
\begin{equation*} C = \cob{T}{\mathcal{C}}{\mathcal{C}}. \end{equation*}
In fact, it is a nice exercise to show that one can do this with the following invertible matrix
\begin{equation*} P = \cob{1_V}{\mathcal{B}}{\mathcal{C}} \end{equation*}
by using the simple matrix equation
\begin{equation} C = P B P^{-1} .\tag{2.6.5} \end{equation}
The matrix \(P\) called a change of basis matrix from \(\mathcal{C}\) to \(\mathcal{B}\) and it allows us to transfer information relative to the basis \(\mathcal{C}\) to that of \(\mathcal{B}\) (and upon taking its inverse, vice versa).

Exercises Exercises

1.

Using the standard basis \(\mathcal{B} = \{\mb{e}_1, \mb{e}_2\}\text{,}\) represent the linear transformation of the plane which rotates the plane by counter-clockwise rotation of \(\pi / 2\) and then reflects over the \(x\)-axis.
Hint.
To do this, just ask yourself where the two basis vectors are sent and write the results as the columns of your matrix.

2.

Suppose \(T : U \to V\) is represented by the matrix \(A\text{.}\) Show that
(a)
if \(T\) is one-to-one then multiplication by \(A\) is one-to-one.
(b)
if \(T\) is onto then multiplication by \(A\) is onto.
(c)
the dimension of \(\ker (T)\) equals the dimension of the null space of \(A\) (which is the solution space to \(A \mb{x} = \mb{0}\)).
(d)
there is a linear isomorphism from the image of \(T\) to the column space of \(A\text{.}\)
Hint.
Consider using \(\coord{}{\mathcal{C}}\) where \(\mathcal{C}\) is the basis for \(V\) used in the representation \(A\text{.}\)

3.

Let \(P_2\) be the real vector space of polynomials of degree \(2\) or less. Let \(\mathcal{B} = \{1 , x , x^2 \}\) and \(\mathcal{C} = \{1, (x - 2), (x - 2)^2 \}\text{.}\) As in the example, let \(D: P_2 \to P_2\) be the derivative. Calculate
(a)
\(\cob{D}{\mathcal{C}}{\mathcal{C}}\text{,}\)
(b)
\(\cob{D}{\mathcal{B}}{\mathcal{C}}\text{.}\)
Expanding out functions in the basis of \(\mathcal{C}\) versus \(\mathcal{B}\) can be thought of as taking second order approximations to functions near \(2\) instead of \(0\) (i.e. examining the first three terms of the Taylor series near different points).

4.

Use equation (2.6.3) to show that if \(T : U \to V\) and \(S : V \to W\) are linear transformations and \(\mathcal{B}, \mathcal{C}\) and \(\mathcal{D}\) are bases for \(U, V\) and \(W\) respectively, then
\begin{equation*} \cob{S}{\mathcal{C}}{\mathcal{D}} \, \cob{T}{\mathcal{B}}{\mathcal{C}} = \cob{S \circ T}{\mathcal{B}}{\mathcal{D}}. \end{equation*}