Jordan Normal Form

Section 4.2 Jordan Normal Form

As it turns out, not every linear transformation has an eigenbasis, but there is something quite close. To describe it, we need to see that our direct sum is compatible with linear transformations in the following sense.

Definition 4.2.1.

If \(U = U_1 \oplus U_2\text{,}\) \(V = V_1 \oplus V_2\text{,}\) \(T_1 : U_1 \to V_1\) and \(T_2 : U_2 \to V_2\) we write

\begin{equation*} T_1 \oplus T_2 : U \to V \end{equation*}

to be the linear transformation which takes \(\mb{u} = \mb{u}_1 + \mb{u}_2\) to \(T( \mb{u} ) = T_1 (\mb{u}_1 ) + T_2 (\mb{u}_2 )\text{.}\) Here the vectors \(\mb{u}_1\) and \(\mb{u}_2\) are the unique vectors in \(U_1\) and \(U_2\) respectively that add to \(\mb{u}\text{.}\)

It is fair to object to the messy look of this definition, but indeed, to understand the motivation for it, we consider what happens when we represent a direct sum of linear transformations as a matrix.

Proposition 4.2.2.

Suppose \(\mathcal{B}_1\text{,}\) \(\mathcal{B}_2\text{,}\) \(\mathcal{C}_1\) and \(\mathcal{C}_2\) are bases for \(U_1\text{,}\) \(U_2\text{,}\) \(V_1\) and \(V_2\) respectively, and let \(\mathcal{B} = \mathcal{B}_1 \cup \mathcal{B}_2\) and \(\mathcal{C} = \mathcal{C}_1 \cup \mathcal{C}_2\text{.}\) Then \(\mathcal{B}\) is a basis for \(U\text{,}\) \(\mathcal{C}\) is a basis for \(V\) and

\begin{equation*} \cob{T_1 \oplus T_2}{\mathcal{B}}{\mathcal{C}} = \left[ \begin{matrix} \cob{T_1}{\mathcal{B}_1}{\mathcal{C}_1} \amp 0 \\ 0 \amp \cob{T_2}{\mathcal{B}_2}{\mathcal{C}_2} \end{matrix} \right]. \end{equation*}

Here the matrix on the right is a matrix of matrices (so the zeros are matrices as well).

The proof of this proposition is elementary and is left to the student. However, the idea here is still important. We introduce direct sums because it allows us to decompose vector spaces into more elementary pieces (vector space summands) as well as decomposing linear transformations into more elementary linear transformations (blocks in the matrix).

Let us now return to the question of diagonalization and reconsider Example 4.1.11. Recall that there we examined the matrix

\begin{equation*} A = \left[ \begin{matrix} 2 \amp -1 \amp -2 \\ 0 \amp 4 \amp 4 \\ 2 \amp 1 \amp 2 \end{matrix} \right] . \end{equation*}

We found the characteristic polynomial was

\begin{equation*} p_A(t) = (t - 2) (t^2 - 6t + 8) = (t - 2) (t - 2) (t - 4) \end{equation*}

so that there were eigenvalues \(2\) and \(4\text{.}\) However, since we don’t have \(3\) distinct roots of \(p_A(t)\text{,}\) we are no longer in the case of Corollary 4.1.14. This raises the question as to whether we can diagonalize \(A\) at all or, equivalently, whether \(A\) has an eigenbasis. Recall that we found the \(2\) and \(4\) eigenvectors

\begin{equation*} \threevec{1}{-2}{1}, \threevec{1}{-2}{0} \end{equation*}

by solving the eigenvector equations. If you look closely at those solutions though, you will see that any other eigenvector must be a multiple of one of these. But this means that the span of the eigenvectors is \(2\)-dimensional, and we must conclude that there is no eigenbasis!

Let us look at another example where this occurs.

Example 4.2.3. A \(2 \times 2\) nilpotent matrix.

Consider the matrix

\begin{equation*} A = \left[ \begin{matrix} 0 \amp 1 \\ 0 \amp 0 \end{matrix} \right] . \end{equation*}

One easily computes the characteristic polynomial to be \(p_A(t) = t^2\text{.}\) However, solving the equation \(A \mb{x} = \mb{0}\) gives that the only \(0\)-eigenvectors are multiples of

\begin{equation*} \mb{e}_1 = \twovec{1}{0} . \end{equation*}

On the other hand, any other vector does satisfy the equation

\begin{equation*} A^2 \mb{x} = 0. \end{equation*}

This last equation leads to a definition.

Definition 4.2.4.

A linear operator \(N : V \to V\) is called nilpotent if there is a positive integer \(n\) for which \(N^n\) is the zero linear operator (i.e. \(N^n (\mb{v}) = \mb{0}\) for every vector \(\mb{v}\)). We say \(n\) is the order of \(N\) if \(N^{n - 1}\) is not zero but \(N^n\) is zero.

Nilpotent linear transformations can be quite useful, although they are also fairly rare. In one sense, though, they play nicely with invertible matrices (although they are, of course not invertible).

Lemma 4.2.5.

If \(S : V \to V\) is a linear isomorphism, \(N : V \to V\) is nilpotent and \(S\) commutes with \(N\text{,}\) then \(S + N\) is a linear isomorphism.

Proof.

The key to this observation is the geometric series

\begin{equation*} ( 1 - x)^{-1} = \frac{1}{1 - x} = 1 + x + x^2 + \cdots \end{equation*}

that you learn in a first year calculus course. We can manipulate this slightly to see that

\begin{equation*} (y + x)^{-1} = y^{-1} \frac{1}{1 - (-y^{-1} x)} = y^{-1} \left[ 1 + (-y^{-1} x) + (-y^{-1} x)^2 + \cdots \right] \end{equation*}

Now, if we try to put random matrices (or linear transformations) in for \(x\) and \(y\text{,}\) we may run into trouble. First, we would of course need the \(y\) matrix to be invertible to make sense of \(y^{-1}\text{,}\) but more importantly, we would need to know this series converged in some reasonable sense. However, if the \(y\) matrix commuted with the \(x\) matrix (which would imply \(y^{-1}\) did as well), then the terms

\begin{equation*} (-y^{-1} x)^k = (-1)^k y^{-k} x^k . \end{equation*}

Better yet, if the \(x\) matrix were nilpotent, we would have no concern over convergence because these terms would all be zero once \(k\) is large enough. Taking the \(y\) matrix to represent \(S\) and the \(x\) matrix to represent \(N\) thus shows that \(S + N\) is invertible (and in fact the formula above can be used to compute the inverse).

The following lemma will be used to classify a normal form for nilpotent linear transformations. The proof of this lemma is fairly technical and can be skipped on first (or second) reading.

Lemma 4.2.6.

Suppose \(N: V \to V\) is nilpotent of order \(n\) and \(\mb{v}\) is a vector for which \(N^{n - 1} (\mb{v}) \ne \mb{0}\text{.}\) Let \(\mathcal{B} = \{ N^{n - 1} (\mb{v} ), \ldots , N (\mb{v} ) , \mb{v}\}\) and \(V_1 = \textnormal{span} \mathcal{B}\text{.}\) Then

\(\mathcal{B}\) is linearly independent and thus a basis for \(V_1\text{,}\)
there is a complementary subspace \(V_2\) of \(V\) for which
\begin{equation*} N = N_1 \oplus N_2 : V_1 \oplus V_2 \to V_1 \oplus V_2 \end{equation*}
where \(N_1\) and \(N_2\) are the restrictions of \(N\) to \(V_1\) and \(V_2\) respectively.

Proof.

To see the first claim, suppose

\begin{equation*} a_1 \mb{v} + a_2 N(\mb{v}) + \cdots + a_n N^{n - 1} (\mb{v}) = \mb{0}. \end{equation*}

Then by taking \(N^{n - 1}\) of both sides we see \(a_1 = 0\text{.}\) Repeating with \(N^{n - 2}, N^{n - 3} , \ldots\) one sees that \(a_2, a_3, \ldots\) all must be zero as well implying \(\mathcal{B}\) is linearly independent.

For the second claim, let we argue by induction on \(n\text{.}\) If \(n = 1\) then \(V = \ker (N)\) and one can take any complementary subspace to the span of \(\mb{v}\text{.}\) Since \(N\) is zero on all of \(V\text{,}\) it can be written as \(0 \oplus 0\text{.}\)

So assume the statement is true for \((n - 1)\) and let \(W = \ker (N^{n - 1})\text{.}\) Then by the induction hypothesis we can find a complementary subspace \(W_2\) to \(W_1 = \textnormal{span} \{N (\mb{v} ) , \ldots , N^{n - 1} (\mb{v}) \}\) for which \(N = N_1 \oplus N_2\text{.}\)

Let \(U = \{\mb{w} : N (\mb{w} ) \in W_2 \}\) be the vector subspace of \(V\) consisting of all vectors sent to \(W_2\) and \(V_2\) any complementary subspace of \(\span \{N^{n - 1} (\mb{v}) \}\) in \(U\) which contains \(W_2\text{.}\) We claim that \(V_2\) satisfies the second statement. To check this claim we must verify that \(V_2\) maps to itself under \(N\) and that \(V_1\) and \(V_2\) are complementary. As \(V_2\) is contained in \(U\text{,}\) it maps to \(W_2\) under \(N\) which, by construction, is contained in \(V_2\text{.}\)

Verifying that that \(V_1\) and \(V_2\) are complementary we first check that their intersection is zero. Note if \(\mb{u}\) is in both \(V_1\) and \(V_2\) then \(N (\mb{u} )\) must be in \(W_1\) and \(W_2\text{.}\) But since these are complementary, we have \(N(\mb{u}) = \mb{0}\) and \(\mb{u}\) is in the kernel of \(N\text{.}\) As \(n > 1\) we have \(\mb{u} \in \ker (N^{n - 1} ) = W\text{.}\) This implies \(\mb{u}\) is in \(W_1 = V_1 \cap \ker (N^{n - 1})\) and \(W_2 = V_2 \cap \ker (N^{n - 1} )\) which consists only of \(\mb{0}\text{.}\)

To see that \(V_1\) and \(V_2\) span \(V\text{,}\) suppose \(\mb{u}\) is any vector and consider \(N ( \mb{u} ) \in W\text{.}\) Then by the induction hypothesis, there is a unique decomposition \(N (\mb{u}) = \mb{w}_1 + \mb{w}_2\) with \(\mb{w}_1 \in W_1\) and \(\mb{w}_2 \in W_2\text{.}\) As \(\mb{w}_1 = a_1 N (\mb{v}) + \cdots a_{n - 1} N^{n - 1} (\mb{v}) = N (a_1 \mb{v} + \cdots + a_{n - 1} N^{n - 2} (\mb{v}) )\) we can take \(\tilde{\mb{v}}_1 = a_1 \mb{v} + \cdots + a_{n - 1} N^{n - 2} (\mb{v}) \in V_1\) so that \(N (\tilde{\mb{v}}_1) = \mb{w}_1\text{.}\) Subtracting, this shows that \(\mb{w}_2 = N ( \mb{u} - \tilde{\mb{v}}_1 )\) so that \(\mb{u} - \tilde{\mb{v}}_1\) is in \(U\text{.}\) Thus \(\mb{u} - \tilde{\mb{v}}_1 = a N^{n - 1} (\mb{v}) + \mb{v}_2\) for some number \(a\) and \(\mb{v}_2 \in V_2\) (since \(U = V_2 \oplus \span \{N^{n - 1} (\mb{v}) \}\)). Taking \(\mb{v}_1 = \tilde{\mb{v}}_1 + a N^{n - 1} (\mb{v}) \in V_1\) we get that \(\mb{u}\) is in the span finishing the proof.

Inductively applying this lemma gives the following corollary.

Corollary 4.2.7.

If \(V\) is a finite dimensional vector space and \(N : V \to V\) a nilpotent linear transformation. Then there are vectors \(\mb{v}_1, \ldots, \mb{v}_k\) and positive integers \(r_1, \ldots, r_k\) such that

\begin{equation*} \left\{N^{r_1 - 1} (\mb{v}_1 ), \ldots, N (\mb{v}_1) , \mb{v}_1 ,N^{r_2 -1} (\mb{v}_2 ), \ldots, N (\mb{v}_2) ,\mb{v}_2 , \ldots, N^{r_k -1} ( \mb{v}_k ) , \ldots, N (\mb{v}_k ) ,\mb{v}_k \right\} \end{equation*}

is a basis and \(N^{r_i} (\mb{v}_i) = 0\) for \(1 \leq i \leq k\text{.}\)

Proof.

This follows from repeatedly applying Lemma 4.2.6 to the summand \(V_2\text{.}\)

Now we return to the general problem of finding an eigenbasis for \(T: V \to V\text{.}\) Since we cannot always find a \(\lambda\)-eigenvectors of \(T\text{,}\) we propose a broader notion in the following definition.

Definition 4.2.8.

Suppose \(V\) is a vector space and \(T : V \to V\) is a linear transformation. For a number \(\lambda \in K\text{,}\) the generalized \(\lambda\)-eigenspace is the vector subspace

\begin{equation*} V_\lambda = \left\{ \mb{v} : (\lambda I - T)^r (\mb{v}) = 0 \textnormal{ for sufficiently large } r \right\}. \end{equation*}

A non-zero vector \(\mb{v}\) in \(V_\lambda\) is called a generalized eigenvector.

One can check that \(V_\lambda\) is indeed a vector subspace of \(V\text{.}\) The following theorem can be used to see that the generalized eigenspaces are ever present.

Theorem 4.2.9. Cayley-Hamilton.

If \(V\) is a finite dimensional vector space and \(T: V \to V\) is a linear transformation then \(p_T (T) = 0\text{.}\)

Proof.

We show this by taking \(A\) to represent \(T\) and verifying \(p_A (A) = 0\) as a matrix equation. Suppose

\begin{equation*} p_A (t) = t^n + a_{n - 1} t^{n - 1} + \cdots + a_1 t + a_0 . \end{equation*}

Letting \(B = (tI - A)\) be the matrix with polynomial entries, we can take its adjugate matrix \(\text{adj} (B)\) since the entries are cofactors which can be defined on matrices with polynomial entries (since there is no division). Now, by equation (2.7.5) we have that

\begin{equation} (tA - I) \cdot \text{adj} (B) = \det (B) \cdot I = p_A (t) I . \tag{4.2.1} \end{equation}

On the other hand, we can expand \(\text{adj} (B)\) as matrices multiplied by monomials \(t^k\text{,}\) i.e.

\begin{equation*} \text{adj} (B) = t^{n - 1} B_{n - 1} + \cdots t B_1 + B_0 . \end{equation*}

Now, multiplying the left hand side of equation (4.2.1) gives

\begin{equation*} t^n B_{n - 1} + t^{n - 1} (B_{n - 2} - A B_{n - 1} ) + \cdots + t ( B_0 - A B_1) - A B_0 . \end{equation*}

Setting the coefficients of \(t^k\) equal to the right hand side for each \(k\) gives the matrix equations

\begin{align*} B_{n - 1} \amp = I, \\ B_{n - 2} - A B_{n - 1} \amp = a_{n - 1} I, \\ \vdots \amp \\ B_0 - A B_1 \amp = a_1, \\ - A B_0 \amp = a_0. \end{align*}

Multiplying the first equation on the left by \(A^n\text{,}\) the second by \(A^{n - 1}\) and so on gives

\begin{align*} A^n B_{n - 1} \amp = A^n, \\ A^{n - 1} B_{n - 2} - A^n B_{n - 1} \amp = a_{n - 1} A^{n - 1}, \\ \vdots \amp \\ A B_0 - A^2 B_1 \amp = a_1 A, \\ - A B_0 \amp = a_0. \end{align*}

Adding the left and right sides then gives

\begin{equation*} 0 = p_A (A) . \end{equation*}

Let us interpret this Theorem. Since \(p_T (t)\) is a polynomial, it involves taking powers of \(t\text{,}\) multiplying it by a scalar and adding the results together. However, all of these operations can be made on \(T\) itself, so the equation \(p_T (T) = 0\) says that we obtain the zero linear transformation when we put \(T\) into its own characteristic polynomial.

To connect this definition to our discussion of nilpotent matrices, we observe the following simple Lemma.

Lemma 4.2.10.

Suppose \(V\) is a finite dimensional vector space over \(K\) and \(T : V \to V\) is a linear transformation. If \(\lambda\) and \(\tilde{\lambda}\) are two numbers in \(K\) then

\begin{equation*} (\tilde{\lambda} I - T) (V_\lambda) \subseteq V_\lambda \end{equation*}

and

\begin{equation*} \tilde{\lambda}I - T : V_\lambda \to V_\lambda \end{equation*}

is nilpotent if \(\tilde{\lambda} = \lambda\) and a linear isomorphism otherwise.

Proof.

To see the first statement, just note that \((\tilde{\lambda} I - T)\) commutes with \(({\lambda} I - T)\) so that if \(({\lambda} I - T)^n (\mb{v}) = 0\) then

\begin{align*} ({\lambda} I - T)^n [(\tilde{\lambda} I - T) (\mb{v})] \amp = (\tilde{\lambda} I - T) [({\lambda} I - T)^n (\mb{v} )] , \\ \amp = (\tilde{\lambda} I - T) ( \mb{0} ), \\ \amp = \mb{0}. \end{align*}

So if \(\mb{v}\) is in \(V_\lambda\) then so is \((\tilde{\lambda} I - T) (\mb{v})\text{.}\)

For the second statement, if \(\lambda = \tilde{\lambda}\) then \(V_\lambda\) is defined to be the vector subspace on which \((\tilde{\lambda} I - T)\) is nilpotent. There is a subtlety here which relies on the fact that \(V\) is finite dimensional. In particular, there must be a finite \(n\) for which \((\tilde{\lambda} I - T)^n (\mb{v}) = \mb{0}\) for all \(\mb{v}\) in \(V_\lambda\) owing to the fact that there is a finite basis for \(V_\lambda\) (check this). Thus \((\tilde{\lambda} I - T)\) is nilpotent on \(V_\lambda\text{.}\)

Now, if \(\tilde{\lambda} \ne \lambda\) then we take \(N = (\lambda I - T)\) and \(S = (\tilde{\lambda} - \lambda) I\text{.}\) Then \(S\) and \(N\) commute, \(S\) is invertible and \(N\) is nilpotent on \(V_\lambda\text{.}\) Applying Lemma 4.2.5 gives that \((\tilde{\lambda} I - T) = S + N\) is invertible.

Using this lemma, we obtain a generalized version of Lemma 4.1.13.

Lemma 4.2.11.

Let \(V\) be a finite dimensional vector space and \(T: V \to V\) be a linear transformation. If \(\mb{v}_1 , \ldots , \mb{v}_k\) are generalized \(\lambda_i\)-eigenvectors with distinct \(\lambda_i\text{,}\) then they are linearly independent.

Proof.

Assuming that this were false, we may choose a smallest set of eigenvalues for which there is a linear dependence of respective generalized eigenvectors. Relabel the vectors and eigenvalues so that this set consists of the first \(j\) values \(\lambda_1, \ldots, \lambda_j\text{.}\) Then there are generalized \(\lambda_i\)-eigenvectors \(\mb{w}_1, \ldots, \mb{w}_j\) that have a non-trivial linear relation

\begin{equation*} \mb{0} = a_1 \mb{w}_1 + \cdots + a_{j} \mb{w}_{j}. \end{equation*}

By Lemma 4.2.10, there is some \(n\) for which \((\lambda_j I - T)^n\) is zero on \(V_{\lambda_j}\) and a linear isomorphism for each \(V_{\lambda_i}\) with \(i \ne j\text{.}\) Write \(S = (\lambda_j I - T)^n\) and apply this to both sides of the linear relation so that

\begin{align*} \mb{0} \amp = S ( \mb{0} ), \\ \amp = S ( a_1 \mb{w}_1 + \cdots + a_{j} \mb{w}_{j} ), \\ \amp = a_1 S( \mb{w}_1 ) + \cdots + a_{j - 1} S( \mb{w}_{j - 1} ) + a_j S ( \mb{w}_j ), \\ \amp = a_1 S( \mb{w}_1 ) + \cdots + a_{j - 1} S( \mb{w}_{j - 1} ). \end{align*}

We note that \(S (\mb{w}_i) \ne \mb{0}\) for all \(1 \leq i \leq j - 1\) since \(S\) is a linear isomorphism on \(V_{\lambda_i}\text{.}\) But this is a non-trivial linear dependence with fewer than \(j\) generalized eigenvectors contradicting our choice of the smallest set of linearly dependent vectors.

From these lemmas we are able to show the following important theorem.

Theorem 4.2.12.

\begin{equation*} V = V_{\lambda_1} \oplus \cdots \oplus V_{\lambda_m} \end{equation*}

where \(\dim V_{\lambda_i} = k_i\) and \(T\) also decomposes as a direct sum of its restrictions

\begin{equation*} T = T_1 \oplus \cdots \oplus T_m . \end{equation*}

Proof.

It is clear that \(T\) maps each generalized eigenspace to itself. Taking

\begin{equation*} W = \span ( V_{\lambda_1} \cup \cdots \cup V_{\lambda_m} ) \end{equation*}

Lemma 4.2.11 implies that

\begin{equation*} W = V_{\lambda_1} \oplus \cdots \oplus V_{\lambda_m} . \end{equation*}

Indeed, to check this one needs only show that there is no non-zero vector of \(V_{\lambda_i}\) equal to a sum of vectors from the other generalized eigenspaces. But any such equation would give a non-trivial linear relation.

To see the decomposition, all that is left to show is that \(W = V\text{.}\) For this we use the Cayley-Hamilton Theorem. Indeed, if \(\mb{v}\) is a vector not in \(W\) then because \(p_T (T) = 0\) we have

\begin{equation*} (\lambda_1 I - T)^{k_1} \cdots (\lambda_m I - T)^{k_m} (\mb{v}) = \mb{0} . \end{equation*}

Write \(S_i = (\lambda_i I - T)^{k_i}\) and recall that Lemma 4.2.10 gives that \(S_i\) is invertible on \(V_{\lambda_j}\) for \(i \ne j\text{.}\) Take \(\mb{w}_m = S_1 \cdots S_{m - 1} (\mb{v})\) and observe it is in \(V_{\lambda_m}\) (since \(S_m (\mb{w}_m) = \mb{0}\)) we can define \(\mb{u}_m = S^{-1}_{m - 1} \cdots S_1^{-1} (\mb{w}_m)\) in \(V_{\lambda_m}\text{.}\) But then it is clear that \(S_1 \cdots S_{m - 1} (\mb{v} - \mb{u}_m) = \mb{0}\text{.}\) Repeating this process gives vectors \(\mb{u}_1, \ldots, \mb{u}_m\) in \(V_{\lambda_1}, \ldots, V_{\lambda_m}\) with \(\mb{v} = \mb{u}_1 + \cdots + \mb{u}_m\text{.}\)

Restricting \(T\) to \(V_{\lambda_i}\) to obtain \(T_i\) one sees that \(p_{T_i} (t) = (t - \lambda_i)^r\) where \(r = \dim V_{\lambda_i}\) (since \(T_i\) has no other eigenvalues). But it is an exercise to see that the direct sum decomposition gives a factorization

\begin{equation*} p_T (t) = p_{T_1} (t) \cdots p_{T_m} (t) \end{equation*}

so we may conclude that \(\dim V_{\lambda_i} = r = k_i\text{.}\)

It is time to take a very deep breath and sigh loudly with relief. We have now proven the main theorem that allows us to represent any linear transformation as a completely understandable matrix! What type of matrices you ask?

Definition 4.2.13.

For a number \(\lambda \in K\text{,}\) a Jordan matrix is the \(n \times n\) matrix

\begin{equation*} J_{\lambda ,n } = \left[ \begin{matrix} \lambda \amp 1 \amp 0 \amp \cdots \amp 0 \\ 0 \amp \lambda \amp 1 \amp\cdots \amp 0 \\ \vdots \amp \ddots \amp \ddots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp \lambda \amp 1 \\ 0 \amp \cdots \amp 0 \amp 0 \amp \lambda \end{matrix} \right]. \end{equation*}

And now for our classification.

Theorem 4.2.14. Jordan Normal Form.

Suppose \(V\) is an \(n\)-dimensional vector space and \(T : V \to V\) is a linear transformation. Suppose \(p_T (t)\) factors as \((t - \lambda_1)^{k_1} \cdots (t - \lambda_m)^{k_m}\) where \(\lambda_i \ne \lambda_j\) for \(i \ne j\text{.}\) Then there is a basis \(\mathcal{B}\) of \(V\) consisting of generalized eigenvectors for which the representing matrix is a block diagonal matrix

\begin{equation} \cob{T}{\mathcal{B}}{\mathcal{B}} = \left[ \begin{matrix} B_{\lambda_1} \amp 0 \amp \cdots \amp 0 \\ 0 \amp B_{\lambda_2} \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp B_{\lambda_m} \end{matrix} \right].\tag{4.2.2} \end{equation}

The matrix \(B_{\lambda_i}\) represents the restriction of \(T\) to the generalized eigenspace \(V_{\lambda_i}\) and itself has the block diagonal form

\begin{equation} B_{\lambda_i} = \left[ \begin{matrix} J_{\lambda_i, n^i_1} \amp 0 \amp \cdots \amp 0 \\ 0 \amp J_{\lambda_i, n^i_2} \amp \cdots \amp 0 \\ \vdots \amp \ddots \amp \ddots \amp \vdots \\ 0 \amp \cdots \amp 0 \amp J_{\lambda_i, n^i_{k_i}} \end{matrix} \right].\tag{4.2.3} \end{equation}

Up to reordering the blocks, this form is unique.

The proof of the existence of such a basis for equation (4.2.2) follows immediately from Theorem 4.2.12. On the other hand, one can verify equation (4.2.3) by repeatedly applying Lemma 4.2.6 to the nilpotent transformation \((\lambda_i I - T)\) on \(V_{\lambda_i}\text{.}\) The uniqueness claim is worth some attention, but will be left to the students and office hours!

Example 4.2.15. Jordan normal form of a \(4 \times 4\) matrix.

Let us now endeavor to work through an example with a little bit of nuance. Take

\begin{equation*} A = \left[ \begin{matrix} -1 \amp 1 \amp 0 \amp 0 \\ -1 \amp -3 \amp 0 \amp 0 \\ 0 \amp 0 \amp -3 \amp -1 \\ 1 \amp 1 \amp 2 \amp 0 \end{matrix} \right]. \end{equation*}

One can compute the characteristic polynomial of this matrix as usual, or they can observe that it is a block lower triangular matrix with diagonal blocks

\begin{equation*} C_1 = \left[ \begin{matrix} -1 \amp 1 \\ -1 \amp -3 \end{matrix} \right] \end{equation*}

and

\begin{equation*} C_2 = \left[ \begin{matrix} -3 \amp -1 \\ 2 \amp 0 \end{matrix} \right] . \end{equation*}

This implies that \(p_A (t) = p_{C_1} (t) p_{C_2} (t)\) which simplifies our computation. We check that

\begin{equation*} p_{C_1} (t) = (t + 1) (t + 3) + 1 = (t + 2)^2 \end{equation*}

and

\begin{equation*} p_{C_2} (t) = (t + 3) t + 2 = (t + 2) (t + 1) \end{equation*}

so that

\begin{align*} p_A (t) \amp = p_{C_1} (t) p_{C_2} (t), \\ \amp = (t + 2)^3 (t + 1). \end{align*}

Thus the eigenvalues of \(A\) are \(-2\) and \(-1\text{.}\) Theorem 4.2.12 gives us that \(\dim V_{-2} = 3\) and \(\dim V_{-1} = 1\text{.}\) So we first find a \((-1)\)-eigenvector by solving the equation \((-I - A) \mb{x} = \mb{0}\) or

\begin{equation*} \left[ \begin{matrix} 0 \amp -1 \amp 0 \amp 0 \\ 1 \amp 2 \amp 0 \amp 0 \\ 0 \amp 0 \amp 2 \amp 1 \\ -1 \amp -1 \amp -2 \amp -1 \end{matrix} \right] \left[ \begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{matrix} \right] = \left[ \begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix} \right] . \end{equation*}

One can find here that

\begin{equation*} \left[ \begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{matrix} \right] = \left[ \begin{matrix} 0 \\ 0 \\ 1 \\ -2 \end{matrix} \right] \end{equation*}

gives a non-trivial solution. For the generalized \((-2)\)-eigenspace we consider the matrix \((-2I - A)\) which is

\begin{equation*} -2I - A = \left[ \begin{matrix} -1 \amp -1 \amp 0 \amp 0 \\ 1 \amp 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \amp 1 \\ -1 \amp -1 \amp -2 \amp -2 \end{matrix} \right] \end{equation*}

The generalized \((-2)\)-eigenspace \(V_{-2}\) has dimension \(3\text{,}\) so \((-2I - A)^3\) is zero on this space (by the Cayley-Hamilton Theorem) and we can find a basis for it by simply solving the equation \((-2 I - A)^3 \mb{x} = 0\text{.}\) However, this is not the most effective way at seeing the Jordan Normal Form. Instead, we will first find our \((-2)\)-eigenspace by solving

\begin{equation} \left[ \begin{matrix} -1 \amp -1 \amp 0 \amp 0 \\ 1 \amp 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \amp 1 \\ -1 \amp -1 \amp -2 \amp -2 \end{matrix} \right] \left[ \begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{matrix} \right] = \left[ \begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix} \right] .\tag{4.2.4} \end{equation}

We can see that

\begin{equation*} \left[ \begin{matrix} -1 \\ 1 \\ -1 \\ 1 \end{matrix} \right] \hspace{.1in} \text{and} \hspace{.1in} \left[ \begin{matrix} -1 \\ 1 \\ 0 \\ 0 \end{matrix} \right] \end{equation*}

are linearly independent \((-2)\)-eigenvectors. We can also see that these span our solution space to equation (4.2.4). This means that \(A\) is not diagonalizable, but that there is a non-trivial Jordan block. To find it, we just need some vector that would be sent to one of the two \((-2)\)-eigenvectors above by \(( A - (-2)I)\text{.}\) Had I chosen my solutions above at random, there may not be such a vector and we would have to adjust the two eigenvectors so that one of them is in the image of \((A - (-2)I)\text{.}\) However, I have been judicious in my choice and we see that

\begin{equation*} \left[ \begin{matrix} -1 \\ 0 \\ 1 \\ 0 \end{matrix} \right] \end{equation*}

is indeed such a vector. Thus the basis

\begin{equation*} \mathcal{B} = \left\{ \left[ \begin{matrix} 0 \\ 0 \\ 1 \\ -2 \end{matrix} \right] , \left[ \begin{matrix} -1 \\ 1 \\ -1 \\ 1 \end{matrix} \right] , \left[ \begin{matrix} -1 \\ 0 \\ 1 \\ 0 \end{matrix} \right] , \left[ \begin{matrix} -1 \\ 1 \\ 0 \\ 0 \end{matrix} \right] \right\} \end{equation*}

will satisfy the requirements of Theorem 4.2.14. Indeed, taking \(P^{-1}\) to be the matrix with columns given by these vectors, we have

\begin{equation*} P A P^{-1} = \left[ \begin{matrix} -1 \amp 0 \amp 0 \amp 0 \\ 0 \amp -2 \amp 1 \amp 0 \\ 0 \amp 0 \amp -2 \amp 0 \\ 0 \amp 0 \amp 0 \amp -2 \end{matrix} \right]. \end{equation*}

Here we have two block matrices \(B_{-1}\) and \(B_{-2}\) with three Jordan matrices, \(J_{-1,1}\) in \(B_1\) and \(J_{-2,2}, J_{-2,1}\) in \(B_2\text{.}\)

Exercises Exercises

1.

Let

\begin{equation*} A = \left[ \begin{matrix} 0 \amp 1 \amp 1 \\ 0 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \end{matrix} \right]. \end{equation*}

Without using row reduction or the determinant / adjugate formula, find the inverse of \(I + A\text{.}\)

2.

Let \(N_1\) be \(N\) restricted to the subspace \(V_1\) in Lemma 4.2.6. Describe the matrix representing \(N_1\) using the basis \(\mathcal{B}\text{.}\)

3.

Give an example of a nilpotent \(3 \times 3\) matrix which has

(a)

a one dimensional kernel.

(b)

a two dimensional kernel.

4.

True or False (with explanation) : If two linear transformations have the same characteristic polynomial, then they can be represented by the same matrix.

5.

Let

\begin{equation*} A = \left[ \begin{matrix} 0 \amp -2 \amp 1 \\ -1 \amp 0 \amp 0 \\ -5 \amp 7 \amp -3 \end{matrix} \right]. \end{equation*}

(a)

Find the characteristic polynomial \(p_A (t)\text{.}\) What are the eigenvalues of \(A\text{?}\)

(b)

Find a maximal collection of linearly independent eigenvectors.

(c)

Is \(A\) diagonalizable? Explain your response.

(d)

Find a basis \(\mathcal{B}\) for which \(\cob{A}{\mathcal{B}}{\mathcal{B}} = P^{-1} A P\) is in Jordan Normal Form where \(P\) is a change of basis matrix for \(\mathcal{B}\text{.}\)

Prev Top Next