Diagonalization

Section 4.1 Diagonalization

We now return to linear transformations and tackle the problem of diagonalization. Let us first put ourselves in the right context. The linear transformations we want to consider here have domain equal to the codomain so that they are functions

\begin{equation*} T : V \to V . \end{equation*}

Such a transformation is often called a linear operator. One reason this is important is that we can then form equations relating vectors to their values. In particular, we may want to understand solutions to

\begin{equation*} T (\mb{v} ) = \mb{v}, \end{equation*}

\begin{equation*} T (\mb{w} ) = - \mb{w}. \end{equation*}

Vectors satisfying the first equation are not changed by \(T\) and those satisfying the second are ‘flipped around’. As it turns out, if some conditions are satisfied, we already have the tools to solve these equations and their generalizations.

Definition 4.1.1.

Let \(T : V \to V\) be a linear operator. A non-zero vector \(\mb{v}\) is called an eigenvector of \(T\) if there is a scalar \(\lambda\) in \(K\) for which

\begin{equation} T(\mb{v} ) = \lambda \mb{v} . \tag{4.1.1} \end{equation}

Any such scalar \(\lambda\) will be called an eigenvalue of \(T\text{.}\) For a given \(\lambda\text{,}\) the vector subspace of solutions to equation (4.1.1) is called the \(\lambda\)-eigenspace of \(T\text{.}\)

We will often refer to eigenvectors and eigenvalues of matrices as well. When we do, we mean eigenvectors and eigenvalues of the linear transformation obtained by multiplying on the left by the matrix. Let us take a look at a few examples.

Example 4.1.2. Diagonal matrices.

Multiplying column vectors in \(K^n\) by a diagonal matrix \(\text{Diag} (\lambda_1 , \ldots, \lambda_n)\) will give us \(n\) eigenvalues \(\lambda_1, \ldots, \lambda_n\) with eigenvectors equal to the standard basis vectors \(\mb{e}_1 , \ldots , \mb{e}_n\text{.}\)

Example 4.1.3. Eigenvalues of a \(2 \times 2\) matrix.

More generally, matrices often have eigenvalues that cannot be detected by merely looking at their entries. For example, the matrix

\begin{equation*} A = \left[ \begin{matrix} 1 \amp 1 \\ 1 \amp 1 \end{matrix} \right] \end{equation*}

has eigenvalues \(0\) and \(2\) (which don’t seem to me like numbers that pop right out). Indeed,

\begin{equation*} A \twovec{1}{1} = \left[ \begin{matrix} 1 \amp 1 \\ 1 \amp 1 \end{matrix} \right] \twovec{1}{1} = \twovec{2}{2} = 2 \, \twovec{1}{1} \end{equation*}

and

\begin{equation*} A \twovec{1}{-1} = \left[ \begin{matrix} 1 \amp 1 \\ 1 \amp 1 \end{matrix} \right] \twovec{1}{-1} = \twovec{0}{0} = 0 \, \twovec{1}{-1}. \end{equation*}

While the zero vector appears on the right hand side of this equation, be very careful not to take it on the left ... the zero vector is not an eigenvector by definition (otherwise, every number would be an eigenvalue!).

Example 4.1.4. Complex eigenvalues.

The plot thickens even more when we consider our number system. For example, the very innocuous looking matrix

\begin{equation*} A = \left[ \begin{matrix} 0 \amp -1 \\ 1 \amp 0 \end{matrix} \right] \end{equation*}

has no eigenvalues or eigenvectors when thought of as a real matrix. In other words, there are no nonzero numbers \(a,b\) and a real \(\lambda\) for which

\begin{equation*} \left[ \begin{matrix} 0 \amp -1 \\ 1 \amp 0 \end{matrix} \right] \twovec{a}{b} = \lambda \twovec{a}{b}. \end{equation*}

However, if we use the same matrix, but work over \(\mathbb{C}\) we see that

\begin{equation*} A \twovec{1}{i} = \left[ \begin{matrix} 0 \amp -1 \\ 1 \amp 0 \end{matrix} \right] \twovec{1}{i} = \twovec{-i}{1} = -i \, \twovec{1}{i}, \end{equation*}

and

\begin{equation*} A \twovec{1}{-i} = \left[ \begin{matrix} 0 \amp -1 \\ 1 \amp 0 \end{matrix} \right] \twovec{1}{-i} = \twovec{i}{1} = i \, \twovec{1}{-i}, \end{equation*}

These examples may lead the student to throw up their hands and exclaim that this whole business is too complicated and not worth the effort. However, I encourage them not to give up. What we get out of solving these problems is a slew of amazing and important applications! Let us first though state the problems to be solved:

Given a linear transformation \(T : V \to V\) on a finite dimensional vector space \(V\text{:}\)

Eigenvalue Problem: Find all eigenvalues of \(T\text{.}\)
Eigenvector Problem: For an eigenvalue \(\lambda\text{,}\) find all \(\lambda\)-eigenvectors.
Diagonalization Problem: What conditions ensure that there is a basis of \(V\) consisting of eigenvectors of \(T\text{?}\)

Subsection 4.1.1 Eigenvalue Problem

To solve this problem, we first leverage the fact that the domain and codomain of \(T\) are the same to define the determinant of \(T\text{.}\)

Definition 4.1.5.

Given a finite dimensional vector space \(V\) and a linear transformation \(T : V \to V\text{,}\) the determinant of \(T\text{,}\) denoted \(\det (T)\text{,}\) is the determinant of any matrix representing \(T\) with respect to the same basis. I.e. \(\det (T) := \det (\cob{T}{\mathcal{B}}{\mathcal{B}})\) for any basis \(\mathcal{B}\) of \(V\text{.}\)

Of course, this definition may seem suspicious at first. As I have been emphasizing, an abstract vector space does not come with a basis, but rather one must choose a basis. So what happens if one person chooses a basis \(\mathcal{B}\) to compute \(\det (T)\) and another chooses a different basis \(\mathcal{C}\text{?}\) Students with great memories will recall equation (2.6.5) and Exercise 2.6.5 which showed that if \(B = \cob{T}{\mathcal{B}}{\mathcal{B}}\) and \(C = \cob{T}{\mathcal{C}}{\mathcal{C}}\) were two different matrix representations of the same linear transformation \(T\text{,}\) then

\begin{equation*} C = P^{-1} B P. \end{equation*}

But then by Proposition 2.7.16 we have

\begin{align*} \det (B) \amp = \frac{\det (P)}{\det (P)} \det (B), \\ \amp = \det (P^{-1}) \det (B) \det (P), \\ \amp = \det (P^{-1} B P), \\ \amp = \det (C). \end{align*}

So indeed, defining \(\det (T)\) with either matrix gives the same quantity.

For the insightful philosophical student, simply justifying that this definition gives a well defined number may not be satisfying. They may reasonably ask why, if \(\det (A)\) is a computation of volume and volume is a measure that can only be given in an inner product space, does it make sense to talk about \(\det (T)\) for a linear transformation on an abstract vector space? What does this have to do with volume!? Well, the answer is that \(\det (T)\) does not specify the volume of anything at all in \(V\text{,}\) but it tells you exactly how much the volume of something changes if you apply \(T\text{.}\) In particular, if you have a box \(\mb{B}\) in a real vector space \(V\text{,}\) you may assign many different inner products to \(V\) to produce many different values of \(\text{Vol} (\mb{B})\text{.}\) However, no matter how you do this, you will always get the equation

\begin{equation} \text{Vol} (T (\mb{B})) = | \det (T) | \text{Vol} (\mb{B}). \tag{4.1.2} \end{equation}

For the impatient student, going over all of this may be quite annoying and they may ask - why are you bothering me about this now ... what does this have to do with eigenvalues!?? To which I would say:

Proposition 4.1.6.

The number \(\lambda\) in \(K\) is an eigenvalue of \(T\) if and only if

\begin{equation*} \det ( \lambda \, I - T) = 0 . \end{equation*}

Proof.

We have that \(\lambda\) is an eigenvalue if and only if there is a non-zero \(\mb{v}\) in \(V\) for which \(T (\mb{v} ) = \lambda \mb{v}\) or equivalently if \((\lambda I - T) ( \mb{v} ) = \mb{0}\text{.}\) But this is equivalent to saying that \(\mb{v}\) is in the kernel of \(\lambda I - T\text{.}\) We know that this would happen if and only if \(\lambda I - T\) is not a one-to-one transformation. Since

\begin{equation*} (\lambda I - T) : V \to V \end{equation*}

is a linear transformation between spaces of the same dimension, Corollary 2.6.5 shows that \(\lambda I - T\) is not one-to-one if and only if it is not invertible. But this is the case if and only if any matrix \(A\) representing it is not invertible which by Lemma 2.7.15 can be true if and only if \(\det (\lambda I - T ) = \det (A) = 0\text{.}\)

This proposition suggests the following definition.

Definition 4.1.7.

If \(V\) is a finite dimensional vector space, the characteristic polynomial of a linear transformation \(T : V \to V\) is given by

\begin{equation*} p_T ( t) = \det (t I - T). \end{equation*}

Here the variable \(t\) is just that, a variable. You will show in the exercises that if \(\dim V = n\) then \(p_T (t)\) is always a degree \(n\) polynomial. Thus Proposition 4.1.6 then immediately yields.

Corollary 4.1.8.

If \(\dim V = n\) and \(T : V \to V\) is a linear transformation, there are at most \(n\) eigenvalues of \(T\) corresponding to roots of the characteristic polynomial \(p_T (t)\text{.}\)

As before, when \(T\) is given by a representing matrix \(A\text{,}\) we will write \(p_A (t)\) and talk about the characteristic polynomial of the matrix. To be certain we are not lost in abstraction, let us see that this polynomial can easily be computed.

Example 4.1.9. Characteristic polynomial of a \(2 \times 2\) matrix I.

Let us reconsider the case of multiplying by

\begin{equation*} A = \left[ \begin{matrix} 1 \amp 1 \\ 1 \amp 1 \end{matrix} \right] . \end{equation*}

Subtracting from \(t\) times the identity gives

\begin{equation*} tI - A = \left[ \begin{matrix} t - 1 \amp -1 \\ -1 \amp t - 1 \end{matrix} \right] \end{equation*}

and taking determinant then produces

\begin{equation*} p_A (t) = \det (t I - T) = (t - 1)^2 - 1 = t^2 - 2t . \end{equation*}

For those that did not imagine how we could find the eigenvalues of \(0\) and \(2\) before, this polynomial should light a bit of a spark! By Proposition 4.1.6, the eigenvalues must solve the equation \(p_A (t) = 0\text{,}\) or equivalently, be roots of \(p_A (t)\text{.}\)

Example 4.1.10. Characteristic polynomial of a \(2 \times 2\) matrix II.

On the other hand, we may consider multiplying by

\begin{equation*} A = \left[ \begin{matrix} 0 \amp -1 \\ 1 \amp 0 \end{matrix} \right] . \end{equation*}

Here we get

\begin{equation*} p_A (t) = \det \left( \left[ \begin{matrix} t \amp 1 \\ -1 \amp t \end{matrix} \right] \right) = t^2 + 1. \end{equation*}

Solving the equation \(p_A (t) = 0\) leads to the unimaginable

\begin{equation*} t^2 = -1 \end{equation*}

which no one can really solve. Except complex people.

Now, we should mentioned that while what we have learned is progress, it also has limitations. The problem is that we have replaced our problem of finding eigenvalues with another problem of finding roots of a polynomial. For small matrices, this problem can be solved with complete accuracy and we will be pleased. However, for larger matrices we get higher degree polynomials. Finding exact roots of such polynomials can be an impossible task (although approximation methods exist). Nevertheless, we should not neglect the fact that we now have a much better understanding of what can happen. In particular, we cannot have infinitely many eigenvalues (in fact the number is bound by the dimension) and they all occur as roots of a polynomial coming directly from the transformation.

Subsection 4.1.2 Eigenvector Problem

Having made great progress with our eigenvalue problem, we may ask some questions about the vectors that accompany them. A student with applications in mind may quickly ask: ‘How do we find the \(\lambda\)-eigenvectors?’ to which I would respond : ‘Solve a matrix equation!’

Example 4.1.11. Eigenvectors of a \(3 \times 3\) matrix.

Let’s consider a new computational example of multiplying by

\begin{equation*} A = \left[ \begin{matrix} 2 \amp -1 \amp -2 \\ 0 \amp 4 \amp 4 \\ 2 \amp 1 \amp 2 \end{matrix} \right] . \end{equation*}

The first step to finding eigenvectors is to find the eigenvalues. To do this, we learned to find the characteristic polynomial

\begin{align*} p_A (t) \amp = \det \left( \left[ \begin{matrix} t -2 \amp 1 \amp 2 \\ 0 \amp t-4\amp -4 \\ -2 \amp -1 \amp t - 2 \end{matrix} \right] \right), \\ \amp = (t - 2) \det \left( \left[ \begin{matrix} t - 4 \amp - 4 \\ - 1 \amp t - 2 \end{matrix} \right] \right) - \det \left( \left[ \begin{matrix} 0 \amp -4 \\ -2 \amp t - 2\end{matrix} \right] \right) + 2 \left( \left[ \begin{matrix} 0 \amp t - 4 \\ -2 \amp -1 \end{matrix} \right] \right), \\ \amp = (t - 2) (t^2 - 6t + 4) + 8 + 4 (t - 4),\\ \amp = t^3 - 8t^2 + 20t - 16. \end{align*}

One can check that \(2\) is a root, divide \(p_A (t)\) by \((t - 2)\) and factor to see that

\begin{equation*} p_A(t) = (t - 2) (t^2 - 6t + 8) = (t - 2) (t - 2) (t - 4). \end{equation*}

Thus we see that our eigenvalues are \(2\) and \(4\text{.}\) It is interesting to note that we have multiplicity here for the root \(2\) which means that \((t - 2)^2\) factors the polynomial. When you see this, your eyebrows should be raised and you should be on alert for unexpected phenomena.

To find the \(2\)-eigenvectors, we simply solve the equation \((2I - A )\mb{x} = \mb{0}\text{.}\) Writing this out we are solving

\begin{equation*} \left[ \begin{matrix} 0 \amp 1 \amp 2 \\ 0 \amp -2 \amp -4 \\ -2 \amp -1 \amp 0 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0}. \end{equation*}

The reduced row echelon form of this matrix equation is just

\begin{equation*} \left[ \begin{matrix} 1 \amp 0 \amp -1 \\ 0 \amp 1 \amp 2\\ 0 \amp 0 \amp 0 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0}. \end{equation*}

So we write a parametric solution with parameter \(z\) (so as to not confuse it with the \(t\) in the characteristic polynomial)

\begin{equation*} \mb{x} (z) = \threevec{z}{-2z}{z}. \end{equation*}

In particular

\begin{equation*} \threevec{1}{-2}{1} \end{equation*}

is a \(2\)-eigenvector.

Repeating this process with the eigenvalue \(4\) gives the equation

\begin{equation*} \left[ \begin{matrix} 2 \amp 1 \amp 2 \\ 0 \amp 0 \amp -4 \\ -2 \amp -1 \amp 2 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0} \end{equation*}

with reduced row echelon form

\begin{equation*} \left[ \begin{matrix} 1 \amp 1/2 \amp 0 \\ 0 \amp 0 \amp 1\\ 0 \amp 0 \amp 0 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0}. \end{equation*}

So we write a parametric solution

\begin{equation*} \mb{x} (z) = \threevec{z}{-2z}{0}. \end{equation*}

In particular

\begin{equation*} \threevec{1}{-2}{0} \end{equation*}

is a \(4\)-eigenvector.

While this example was a delight to work through, it did raise a question. In general, we expect to obtain \(3\) different roots to a degree \(3\) polynomial. This would give us \(3\) different eigenvectors in a \(3\) dimensional space. It is natural to ask whether we get a basis from these vectors or not. It is also natural to ask what happened in this example... we only got two vectors!? These types of questions are all about diagonalization.

Subsection 4.1.3 Diagonalization Problem

We now pick up the question of the eigenvectors of \(T\) and whether we can form a basis from them. First, let’s give such a collection a name.

Definition 4.1.12.

A collection \(\mathcal{B} = \{\mb{v}_1 , \ldots, \mb{v}_n \}\) is called an eigenbasis for \(T\) if it is a basis of eigenvectors of \(T\text{.}\)

While it may seem at first that you will have to work hard to find an eigenbasis, the following proposition shows that, in many cases, we already know how to obtain one.

Lemma 4.1.13.

If \(\mb{v}_1 , \ldots , \mb{v}_k\) are eigenvectors with distinct eigenvalues, then they are linearly independent.

Proof.

Since \(\mb{v}_1, \ldots, \mb{v}_k\) are eigenvectors, they are all non-zero. So this means that \(\{\mb{v}_1\}\) is a linearly independent set. If \(0\) is an eigenvalue, we will assume \(\mb{v}_1\) is its eigenvector. Let us keep going and say that \(j\) is the largest number for which \(\{\mb{v}_1, \ldots, \mb{v}_j\}\) is linearly independent, but \(\{\mb{v}_1, \ldots, \mb{v}_j, \mb{v}_{j + 1}\}\) is linearly dependent. Then we know that there are numbers \(a_1, \ldots, a_j\) for which

\begin{equation} \mb{v}_{j + 1} = a_1 \mb{v}_1 + \cdots + a_j \mb{v}_j .\tag{4.1.3} \end{equation}

We note that these numbers must be unique, for otherwise, we could subtract the other relation

\begin{equation*} \mb{v}_{j + 1} = a_1^\prime \mb{v}_1 + \cdots + a_j^\prime \mb{v}_j \end{equation*}

and obtain

\begin{equation*} \mb{0} = (a_1 - a_1^\prime) \mb{v}_1 + \cdots + (a_j - a_j^\prime) \mb{v}_j . \end{equation*}

Since the vectors \(\{\mb{v}_1, \ldots, \mb{v}_j\}\) are linearly independent, this would mean \(a_i = a_i^\prime\) for each \(1 \leq i \leq j\) (showing they are unique).

However, we can apply \(T\) to both sides of equation (4.1.3) to obtain

\begin{align*} \lambda_{j + 1} \mb{v}_{j + 1} \amp = T (\mb{v}_{j+1}) , \\ \amp = T (a_1 \mb{v}_1 + \cdots + a_j \mb{v}_j ), \\ \amp = a_1 T (\mb{v}_1 ) + \cdots + a_j T (\mb{v}_j ), \\ \amp = a_1 \lambda_1 \mb{v}_1 + \cdots + a_j \lambda_j \mb{v}_j . \end{align*}

Since \(\lambda_{j +1} \ne 0\text{,}\) we may divide and obtain

\begin{equation*} \mb{v}_j = a_1 \frac{\lambda_1}{\lambda_{j + 1}} \mb{v}_1 + \cdots + a_j \frac{\lambda_j}{\lambda_{j + 1}} \mb{v}_j . \end{equation*}

Now, not all \(a_i\) are zero for \(1 \lt i \leq j\) (otherwise \(\mb{v}_{j + 1}\) would be a multiple of \(\mb{v}_1\) and a zero eigenvector), so there is at least one \(i\) whose coefficient in equation (4.1.3) has changed from \(a_i\) to \(a_i\frac{\lambda_i}{\lambda_{j + 1}}\text{.}\) But since these must be equal, we must have that \(\lambda_i = \lambda_{j + 1}\) which contradicts the assumption that all eigenvalues were distinct. This proves the lemma.

This lemma gives us a useful corollary.

Corollary 4.1.14.

If \(p_T (t)\) has distinct roots, then \(V\) has an eigenbasis for \(T\text{.}\)

The converse of this corollary is definitely false. For example, any basis of \(V\) is an eigenbasis for the identity transformation which has \(p_T (t) = (t - 1)^n\text{.}\)

One may ask why an eigenbasis is so useful. The answer is that if you have an eigenbasis, your linear transformation becomes very easy to understand. In particular, all your transformation is doing is scaling each coordinate corresponding to your basis vectors. We can understand this fact by representing \(T\) with respect to an eigenbasis \(\mathcal{B}\) as the matrix \(\cob{T}{\mathcal{B}}{\mathcal{B}}\text{.}\)

Proposition 4.1.15.

If \(\mathcal{B} = \{ \mb{v}_1, \ldots, \mb{v}_n \}\) is an eigenbasis for the linear transformation \(T: V \to V\) with eigenvalues \(\lambda_1, \ldots, \lambda_n\) then

\begin{equation*} \cob{T}{\mathcal{B}}{\mathcal{B}} = \textnormal{Diag} (\lambda_1 , \ldots, \lambda_n ). \end{equation*}

Proof.

This follows from simply following the definition of \(\cob{T}{\mathcal{B}}{\mathcal{B}}\text{.}\) Indeed, we have

\begin{align*} \cob{T}{\mathcal{B}}{\mathcal{B}} \,\mb{e}_i \amp = \coord{T([\mb{e}_i]_\mathcal{B})} {\mathcal{B}}, \\ \amp = \coord{T(\mb{v}_i)}{\mathcal{B}}, \\ \amp = \coord{\lambda_i \mb{v}_i} {\mathcal{B}}, \\ \amp = \lambda_i \coord{\mb{v}_i}{\mathcal{B}}, \\ \amp = \lambda_i \mb{e}_i. \end{align*}

But this means that the \(i\)-th column of \(\cob{T}{\mathcal{B}}{\mathcal{B}}\) is \(\lambda_i \mb{e}_i\text{,}\) or, that \(\cob{T}{\mathcal{B}}{\mathcal{B}} = \text{Diag} (\lambda_1, \ldots, \lambda_n)\text{.}\)

This proposition and indeed the idea of representing linear transformations as matrices via bases, leads to a connection between eigenbases and diagonalization. Let us first define a diagonalizable matrix.

Definition 4.1.16.

A square \(n \times n\) matrix \(A\) is diagonalizable if there is an invertible matrix \(P\) such that

\begin{equation*} P^{-1} A P = \textnormal{Diag} (\lambda_1 , \ldots, \lambda_n ) . \end{equation*}

This definition most certainly will appear meaningless to the uninitiated as it leaves murky the main idea behind diagonalizable matrices. Namely, that there is a change of coordinates \(P\) for which the linear transformation induced by \(A\) is extremely simple. Let us pose this as a proposition.

Proposition 4.1.17.

Let \(A\) be an \(n \times n\) matrix and \(T_A : K^n \to K^n\) the linear transformation associated to multiplying column vectors by \(A\text{.}\) The matrix \(A\) is diagonalizable by \(P\) and

\begin{equation*} P^{-1} A P = \textnormal{Diag} (\lambda_1 , \ldots, \lambda_n ) \end{equation*}

if and only if \(T_A\) has an eigenbasis \(\mathcal{B} = \{\mb{v}_1, \ldots, \mb{v}_n\}\) where \(\mb{v}_i\) is a \(\lambda_i\)-eigenvector. Furthermore, if this is the case, then one can take \(\mb{v}_i = P \mb{e}_i\) so that the columns of \(P\) are the eigenvectors \(\mb{v}_i\text{.}\)

Proof.

If \(A\) is diagonalizable then there is an invertible matrix \(P\) as which satisfies the equation (by definition). Consider the set \(\mathcal{B} = \{ P \mb{e}_1, \ldots, P \mb{e}_n \}\) and notice that it is a basis of \(K^n\) (since it is the image of the standard basis and \(P\) is invertible). Also, we then have

\begin{align*} T_A ( \mb{v}_i) \amp = A \mb{v}_i, \\ \amp = \left( P \, \textnormal{Diag} (\lambda_1, \ldots, \lambda_n)\, P^{-1} \right) (P \mb{e}_i) , \\ \amp = P \,\textnormal{Diag} (\lambda_1, \ldots, \lambda_n) \mb{e}_i \\ \amp = P (\lambda_i \mb{e}_i) , \\ \amp = \lambda_i P \mb{e}_i, \\ \amp = \lambda_i \mb{v}_i. \end{align*}

Showing that \(\mathcal{B}\) is an eigenbasis for \(T_A\text{.}\)

Conversely, if \(T_A\) has an eigenbasis \(\mathcal{B}\text{,}\) then Proposition 4.1.15 shows that the representing matrix \(\cob{T_A}{\mathcal{B}}{\mathcal{B}} = \textnormal{Diag} (\lambda_1, \ldots, \lambda_n)\text{.}\) Now, \(A\) is also a matrix representing \(T_A\text{,}\) but relative to the standard basis, so that if we write \(\mathcal{C} = \{\mb{e}_1, \ldots, \mb{e}_n\}\) then \(\cob{T_A}{\mathcal{C}}{\mathcal{C}} = A\text{.}\) Taking \(P = \cob{1_{K^n}}{\mathcal{B}}{\mathcal{C}}\) to be the change of basis matrix from the standard basis to the eigenbasis, and using equation (2.6.5) we see

\begin{equation*} A = \cob{T_A}{\mathcal{C}}{\mathcal{C}} = P \, \cob{T_A}{\mathcal{B}}{\mathcal{B}} \, P^{-1} = P \, \text{Diag} (\lambda_1, \ldots, \lambda_n)\, P^{-1} \end{equation*}

so that \(A\) is diagonalizable.

Let us give an example that illustrates this proposition.

Example 4.1.18. .

Consider diagonalizing the matrix

\begin{equation*} A = \left[ \begin{matrix} -2 \amp 0 \amp -2 \\ -1 \amp -1 \amp -2 \\ 4 \amp 0 \amp 4 \end{matrix} \right]. \end{equation*}

This is in fact a bit of an undertaking, but we now know all of the steps. First, let us find the eigenvalues by obtaining the characteristic polynomial

\begin{align*} p_A (t) \amp = \det \left( \left[ \begin{matrix} t + 2 \amp 0 \amp 2 \\ 1 \amp t + 1 \amp 2 \\ -4 \amp 0 \amp t - 4\end{matrix} \right] \right) , \\ \amp = (t + 2)(t + 1) (t - 4) + 0 + (-2) [-4 (t + 1)] , \\ \amp = t^3 - t^2 - 2t,\\ \amp = t (t - 2) (t + 1). \end{align*}

Thus the eigenvalues are the roots \(-1, 0, 2\) of \(p_A (t)\text{.}\) Now, generally at this point one may have to worry about the existence of an eigenbasis, but in our case we have \(3\) distinct eigenvalues so that Corollary 4.1.14 reassures us that we do indeed have an eigenbasis. Now we need only solve three linear equations to find it (as an aside: one could try to solve these simultaneously by row reducing with rational functions... but we will keep to our basic approach). First, we take \(t = -1\) and solve

\begin{equation*} \left[ \begin{matrix} 1 \amp 0 \amp 2 \\ 1 \amp 0 \amp 2 \\ -4 \amp 0 \amp -5 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0} \end{equation*}

which has reduced row echelon form

\begin{equation*} \left[ \begin{matrix} 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \\ 0 \amp 0 \amp 0 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0} \end{equation*}

Leading to the \((-1)\)-eigenvector

\begin{equation*} \mb{v}_1 = \threevec{0}{1}{0}. \end{equation*}

Now taking \(t = 0\) gives

\begin{equation*} \left[ \begin{matrix} 2 \amp 0 \amp 2 \\ 1 \amp 1 \amp 2 \\ -4 \amp 0 \amp -4 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0} \end{equation*}

which has reduced row echelon form

\begin{equation*} \left[ \begin{matrix} 1 \amp 0 \amp 1 \\ 0 \amp 1 \amp 1 \\ 0 \amp 0 \amp 0 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0} \end{equation*}

leading to the \(0\)-eigenvector

\begin{equation*} \mb{v}_2 = \threevec{1}{1}{-1}. \end{equation*}

Finally taking \(t = 2\) gives

\begin{equation*} \left[ \begin{matrix} 4 \amp 0 \amp 2 \\ 1 \amp 3 \amp 2 \\ -4 \amp 0 \amp -2 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0} \end{equation*}

which has reduced row echelon form

\begin{equation*} \left[ \begin{matrix} 1 \amp 0 \amp 1/2 \\ 0 \amp 1 \amp 1/2 \\ 0 \amp 0 \amp 0 \end{matrix} \right] \threevec{x_1}{x_2}{x_3} = \threevec{0}{0}{0} \end{equation*}

leading to the \(0\)-eigenvector

\begin{equation*} \mb{v}_3 = \threevec{1}{1}{-2}. \end{equation*}

So we have achieved the goal of finding an eigenbasis!

\begin{equation*} \mathcal{B} = \left\{\mb{v}_1 , \mb{v}_2, \mb{v}_3 \right\} = \left\{ \threevec{0}{1}{0} , \threevec{1}{1}{-1} , \threevec{1}{1}{-2} \right\} . \end{equation*}

But what about diagonalizing \(A\text{?}\) Well, here we apply Proposition 4.1.17, and in particular the last sentence where \(P^{-1}\) is identified as the change of basis matrix from the standard basis to the eigenbasis. But this means that \(P\) is the matrix whose columns are the eigenvectors, so that

\begin{equation*} P = \left[ \begin{matrix} 0 \amp 1 \amp 1 \\ 1 \amp 1 \amp 1 \\ 0 \amp -1 \amp -2 \end{matrix} \right]. \end{equation*}

Either using our determinant formula or through an augmented row reduction, we can calculate the inverse

\begin{equation*} P^{-1} = \left[ \begin{matrix} -1 \amp 1 \amp 0 \\ 2 \amp 0 \amp 1 \\ -1 \amp 0 \amp -1 \end{matrix} \right]. \end{equation*}

Finally, we encourage the student to compute \(P A P^{-1}\) and confirm

\begin{equation*} P^{-1} A P = \left[ \begin{matrix} -1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 2 \end{matrix} \right]. \end{equation*}

We end this section in blissful ignorance with a vague false hope that we can always diagonalize. Our dreams will be crushed next section, but a nuanced understanding will replace our Pollyanish viewpoint!

Exercises 4.1.4 Exercises

1.

Let \(T : V \to V\) be a linear transformation of the vector space \(V\) over \(K\text{.}\) Explain your responses to:

(a)

True or False : If \(K = \mathbb{R}\) then \(T\) has an eigenvalue.

(b)

True or False : If \(K = \mathbb{C}\) then \(T\) has an eigenvalue.

(c)

True or False : This exercise is one of the main reasons to study complex numbers in this course.

2.

Recall that rotation matrices in \(\mathbb{R}^2\) are of the form

\begin{equation*} A_\theta = \left[ \begin{matrix} \cos \theta \amp - \sin \theta \\ \sin \theta \amp \cos \theta \end{matrix} \right] \end{equation*}

Besides the identity matrix, are there any rotations which have real eigenvalues? Explain your response.

3.

Note that if \(a\) is any number in \(K\) and \(f(t)\) is a degree \((n - 1)\) polynomial that \((t - a) f(t)\) is a degree \(n\) polynomial and \(b f(t)\) has degree less than \(n\text{.}\) Using this, explain why \(p_A (t)\) is a degree \(n\) polynomial for an \(n \times n\) matrix.

4.

Suppose \(a_0, a_1, \ldots, a_{n - 1}\) are numbers in \(K\text{.}\) Find the characteristic polynomial \(p_A (t)\) of the \(n \times n\)-matrix

\begin{equation*} A = \left[ \begin{matrix} 0 \amp 1 \amp 0 \amp \cdots \amp 0 \\ 0 \amp 0 \amp 1 \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \ddots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp 0 \amp 1 \\ - a_0 \amp - a_1 \amp -a_2 \amp \cdots \amp - a_{n - 1} \end{matrix} \right]. \end{equation*}

5.

Let

\begin{equation*} A = \left[ \begin{matrix} 0 \amp 3 \amp 2 \\ -2 \amp -7 \amp -4 \\ 2 \amp 7 \amp 4 \end{matrix} \right]. \end{equation*}

(a)

Find the eigenvalues of \(A\text{.}\)

(b)

Is there an eigenbasis for \(A\text{?}\) Explain your response.

(c)

Find an eigenvector for each eigenvalue.

(d)

Find a matrix \(P\) for which \(P A P^{-1}\) is a diagonal matrix (in other words, diagonalize \(A\)).

6.

Diagonalize the matrix \(A_\theta\) from Exercise 4.1.4.2 when considered as a complex matrix.

Hint.

Feel free to use numbers like \(e^{i\theta}\) and \(e^{-i \theta}\text{...}\) that’s what they’re there for!

Prev Top Next