Matrices III

Section 2.7 Matrices III

In this section on matrices, we will be primarily concerned with square matrices.

Definition 2.7.1.

A matrix is said to be invertible if it has it has a multiplicative inverse.

If \(A\) is an \(m \times n\) matrix, then multiplication by \(A\) gives a linear transformation

\begin{equation*} T_A : K^n \to K^m \end{equation*}

Notice that \(A\) represents \(T_A\) relative to the standard bases so that Exercise 2.6.4 shows that \(T_A \circ T_B = T_{AB}\text{.}\) This implies that \(A\) is an invertible matrix if and only if \(T_A\) is a linear isomorphism. In particular, Corollary 2.6.4 says that this can only be the case if \(A\) is a square matrix. We state this as a proposition for the record.

Proposition 2.7.2.

If a matrix is invertible, then it is a square matrix.

Of course, we can actually use our strong results to give a very concrete criterion for checking invertibility of a square matrix.

Proposition 2.7.3.

A square matrix is invertible if and only if its reduced row echelon form is the identity.

Proof.

To see this, let \(A\) be a square matrix. Then by Proposition 1.2.11 \(A\) is invertible implies that multiplying by \(A\) is a one-to-one operation. Conversely, if it is one-to-one, Theorem 2.6.3 implies that onto as well and thus an isomorphism. This means that \(A \mb{x} = \mb{0}\) has the unique solution \(\mb{0}\text{.}\) But by Theorem 2.4.7, this only happens if there are no free columns of \(A\text{,}\) implying that all columns are basic. So there are \(n\) leading coefficients implying that there must be a leading coefficient on every row. But the only reduced row echelon square matrix of this type is the identity matrix (check this!).

In fact, using row reduction we can do much better than simply determining if a matrix is invertible. We can in fact compute the inverse. To do this, we simply augment our matrix on the right with the identity matrix and row reduce the left hand side. If our matrix is invertible, what we end up with on the right at the end of the row reduction process in fact the inverse!

Let us perform this exercise before showing why it works.

Example 2.7.4. Computing the inverse with row reduction.

Let us find the inverse of

\begin{equation*} A = \left[ \begin{matrix} -1 \amp 2 \amp -4 \\ -2 \amp 3 \amp -7 \\ -1 \amp 1 \amp -2 \end{matrix} \right]. \end{equation*}

We augment and reduce

\begin{align*} \left[ \begin{array}{ccc|ccc} -1 \amp 2 \amp -4 \amp 1 \amp 0 \amp 0 \\ -2 \amp 3 \amp -7 \amp 0 \amp 1 \amp 0 \\ -1 \amp 1 \amp -2 \amp 0 \amp 0 \amp 1 \end{array} \right] \amp \stackrel{(-1)\mb{r}_1}{\longrightarrow} \left[ \begin{array}{ccc|ccc} 1 \amp -2 \amp 4 \amp -1 \amp 0 \amp 0 \\ -2 \amp 3 \amp -7 \amp 0 \amp 1 \amp 0 \\ -1 \amp 1 \amp -2 \amp 0 \amp 0 \amp 1 \end{array} \right] , \\ \amp \stackrel{\mb{r}_2 + 2\mb{r}_1}{\longrightarrow} \left[ \begin{array}{ccc|ccc} 1 \amp -2 \amp 4 \amp -1 \amp 0 \amp 0 \\ 0 \amp -1 \amp 1 \amp -2 \amp 1 \amp 0 \\ -1 \amp 1 \amp -2 \amp 0 \amp 0 \amp 1 \end{array} \right] , \\ \amp \stackrel{\mb{r}_3 + \mb{r}_1}{\longrightarrow} \left[ \begin{array}{ccc|ccc} 1 \amp -2 \amp 4 \amp -1 \amp 0 \amp 0 \\ 0 \amp -1 \amp 1 \amp -2 \amp 1 \amp 0 \\ 0 \amp -1 \amp 2 \amp -1 \amp 0 \amp 1 \end{array} \right] , \\ \amp \stackrel{(-1)\mb{r}_2}{\longrightarrow} \left[ \begin{array}{ccc|ccc} 1 \amp -2 \amp 4 \amp -1 \amp 0 \amp 0 \\ 0 \amp 1 \amp -1 \amp 2 \amp -1 \amp 0 \\ 0 \amp -1 \amp 2 \amp -1 \amp 0 \amp 1 \end{array} \right] , \\ \amp \stackrel{\mb{r}_3 + \mb{r}_2}{\longrightarrow} \left[ \begin{array}{ccc|ccc} 1 \amp -2 \amp 4 \amp -1 \amp 0 \amp 0 \\ 0 \amp 1 \amp -1 \amp 2 \amp -1 \amp 0 \\ 0 \amp 0 \amp 1 \amp 1 \amp -1 \amp 1 \end{array} \right] , \\ \amp \stackrel{\mb{r}_1 + 2\mb{r}_2}{\longrightarrow} \left[ \begin{array}{ccc|ccc} 1 \amp 0 \amp 2 \amp 3 \amp -2 \amp 0 \\ 0 \amp 1 \amp -1 \amp 2 \amp -1 \amp 0 \\ 0 \amp 0 \amp 1 \amp 1 \amp -1 \amp 1 \end{array} \right] , \\ \amp \stackrel{\mb{r}_1 - 2\mb{r}_3}{\longrightarrow} \left[ \begin{array}{ccc|ccc} 1 \amp 0 \amp 0 \amp 1 \amp 0 \amp -2 \\ 0 \amp 1 \amp -1 \amp 2 \amp -1 \amp 0 \\ 0 \amp 0 \amp 1 \amp 1 \amp -1 \amp 1 \end{array} \right] , \\ \amp \stackrel{\mb{r}_2 + \mb{r}_3}{\longrightarrow} \left[ \begin{array}{ccc|ccc} 1 \amp 0 \amp 0 \amp 1 \amp 0 \amp -2 \\ 0 \amp 1 \amp 0 \amp 3 \amp -2 \amp 1 \\ 0 \amp 0 \amp 1 \amp 1 \amp -1 \amp 1 \end{array} \right] , \end{align*}

One can easily check that the resulting matrix on the right

\begin{equation*} B = \left[ \begin{matrix} 1 \amp 0 \amp -2 \\ 3 \amp -2 \amp 1 \\ 1 \amp -1 \amp 1 \end{matrix} \right] \end{equation*}

satisfies \(AB = I = BA\) verifying that it is the inverse of \(A\text{.}\)

To understand why this process works, we observe that the process of row reduction itself is already matrix arithmetic.

Subsection 2.7.1 Elementary Matrices

To each of the elementary row operations, we can construct an elementary square matrix. To write formula for these matrices, we will use the following notation. The \(n \times n\) matrix with \((i,j)\)-entry \(1\) and all other entries \(0\) will be denoted \(\mb{e}_{ij}\text{.}\)

Type I: This elementary matrix is obtained by switching two rows of the identity matrix. It can be written as
\begin{equation*} E_{ij} = I_n - \mb{e}_{ii} - \mb{e}_{jj} + \mb{e}_{ij} + \mb{e}_{ji}. \end{equation*}
To explain this equation, one sees that subtracting the first two \(\mb{e}\) matrices makes the \(i\) and \(j\)-th diagonal entries \(0\) while adding the last two puts two off diagonal \(1\)’s in the \((i,j)\) and \((j,i)\) spots.
Type II: This elementary matrix is obtained by taking the identity matrix and replacing the \((i,i)\)-entry of \(1\) with some non-zero constant \(c\text{.}\)
\begin{equation*} E_i (c) = I_n + (c - 1) \mb{e}_{ii}. \end{equation*}
Type III: This elementary matrix is obtained by placing exactly one non-zero number \(c\) in the \((i,j)\)-entry where \(i \ne j\) giving
\begin{equation*} E_{ij} (c) = I_n + c \mb{e}_{ij}. \end{equation*}

We record the following fact which can be checked by hand.

Proposition 2.7.5.

Performing a Type I, II or III elementary row operation on a matrix \(A\) is equivalent to multiplying \(A\) on the left by a Type I, II or III elementary matrix. Furthermore, each elementary matrix is invertible and

\begin{align*} E_{ij}^{-1} \amp = E_{ij},\\ E_i (c)^{-1} \amp = E_i (c^{-1}),\\ E_{ij} (c)^{-1} \amp = E_{ij} (-c). \end{align*}

What we obtain from this viewpoint on row reduction along with our previous results is the following very nice proposition.

Proposition 2.7.6.

A matrix is invertible if and only if it is the product of elementary matrices.

Proof.

Let \(A\) be an \(n \times n\) matrix. We saw in Proposition 2.7.3 that it is invertible if and only if its reduced row echelon form is the identity. We see from Proposition 2.7.5 that this is the case if and only if there are elementary matrices \(E_1, E_2, \ldots, E_r\) so that

\begin{equation} E_r E_{r - 1} \cdots E_2 E_1 A = I. \tag{2.7.1} \end{equation}

But since every elementary matrix has inverse that is an elementary matrix, such an equation is possible if and only if

\begin{equation*} A = E_1^{-1} \cdots E_r^{-1}. \end{equation*}

We also see from this proposition why our augmented matrix approach to computing inverse matrices works. As we row reduce \(A\text{,}\) the augmented reduction of the identity is keeping track of the products of the elementary matrices \(E_j \cdots E_1\text{.}\) Thus when we are finished, on the right hand side we simply obtain the full product \((E_r \cdots E_1)\) which is, by equation (2.7.1) the inverse of \(A\text{.}\)

Subsection 2.7.2 Determinants

The first thing to understand about determinants is that, despite being included in every linear algebra course ever given, they really don’t belong to the subject of linear algebra at all. Rather, they naturally sit in a subject known as multi-linear algebra. That is because they are, in fact, multi-linear functions, not linear functions. While no one should be worried, I say this only to warn the student that the determinant is not linear and may reasonably appear rather strange. In fact, it is helpful in this context to view an \(m \times n\)-matrix \(A\) as a column of rows

\begin{equation*} A = \left[ \begin{matrix} - \amp \mb{r}_1 \amp - \\ \amp \vdots \amp \\ - \amp \mb{r}_i \amp - \\ \amp\vdots \amp \\ - \amp \mb{r}_m \amp - \end{matrix} \right] \end{equation*}

and if \(f : M_{m, n} (K) \to K\) is a function on matrices, write

\begin{equation*} f(\mb{r}_1 , \ldots, \mb{r}_m) := f(A). \end{equation*}

This notation is much cleaner than writing out the matrix each time and we will use it frequently in this section.

Definition 2.7.7.

Let \(M_{m,n} (K)\) be the set of all \(m \times n\) matrices with entries in \(K\text{.}\) A function

\begin{equation*} f : M_{m,n} (K) \to K \end{equation*}

is called:

row linear: if, keeping all rows except the \(i\)-th row constant, \(f\) is linear on the \(i\)-th row (for any \(i\)). In other words if
\begin{align*} f(\mb{r}_1 , \ldots, a \mb{r}_i \amp + b \mb{r}_i^\prime, \ldots ,\mb{r}_m) \\ \amp \parallel \\ a f(\mb{r}_1 , \ldots, \mb{r}_i , \ldots, \mb{r}_m) \amp + b f(\mb{r}_1 , \ldots, \mb{r}_i^\prime, \ldots, \mb{r}_m) \end{align*}
alternating: if \(f\) reverses sign whenever two rows are switched. In other words
\begin{align*} f(\mb{r}_1 , \ldots,\mb{r}_i, \amp \ldots , \mb{r}_j, \ldots , \mb{r}_m) \\ \parallel \\ - f(\mb{r}_1 , \ldots,\mb{r}_j, \amp \ldots , \mb{r}_i, \ldots ,\mb{r}_m) \end{align*}

Now let us define the determinant.

Definition 2.7.8.

The determinant is the unique function

\begin{equation*} \det : M_{n,n} (K) \to K \end{equation*}

which is row linear, alternating and satisfies \(\det (I) = 1\text{.}\)

Actually, this definition is also a theorem in the sense that it is saying both that such a function exists and that it is unique. Before showing either of these two facts, we mention an extremely important point.

Remark 2.7.9.

While we define the determinant abstractly so that we may find many of its important properties, the determinant should be known as a way to correctly compute and work with volume in \(n\)-dimensions. We will see this shortly in \(2\) and \(3\) dimensions. This is of particular importance when we consider integrals in higher dimensions.

Now we give the common inductive construction of the determinant. We do this in a step-by-step manner.

Step 1: Notice that the function \(D_1 : M_{1,1} (K) \to K\) given by \(\det ([a]) = a\) satisfies the properties of Definition 2.7.8.
Step 2: Now assume there is a function \(D_{n - 1} : M_{n - 1, n - 1} (K) \to K\text{.}\) If \(A = (a_{ij})\) is an \(n \times n\)-matrix write \(A_{ij}\) for the \((n - 1) \times (n - 1)\)-matrix obtained by forgetting the \(i\)-th row and \(j\)-th column. The \((i,j)\)-minor, denoted \(M_{ij}\text{,}\) is the \(D_{n - 1} (A_{ij})\text{.}\) The \((i,j)\)-cofactor is simply \(C_{ij} = (-1)^{i + j} M_{ij}\text{.}\)
Step 3: With this notation in mind, define
\begin{equation*} D_n ( A ) = a_{11} C_{11} + a_{12} C_{12} + \cdots + a_{1n} C_{1n} . \end{equation*}

The following theorem establishes that this function \(D\) is the determinant.

Theorem 2.7.10.

The function \(D\) given is the unique function satisfying the properties of the determinant in Definition 2.7.8. It is thus written as \(\det\text{.}\)

Repeating steps 1 through 3 will repeatedly will give a function from \(M_{n,n} (K)\) to \(K\text{.}\) Before proving this theorem, let us use this procedure to make some computations and get a feel for it.

Example 2.7.11. Determinants of \(2\times 2\) matrices.

For \(2 \times 2\) matrices, we can write down a simple formula which many of you already know. Take

\begin{equation*} A = \left[ \begin{matrix} a \amp b \\ c \amp d \end{matrix} \right] \end{equation*}

and observe that \(C_{11} = (-1)^{1 + 1} D([d]) = d\) while \(C_{12} = (-1)^{1 + 2} D([c]) = -c\text{.}\) So

\begin{equation*} \det (A) = a d - bc. \end{equation*}

It is common in multivariable calculus courses to also learn the formula (or variants thereof) for \(3 \times 3\) matrices. For now, let us use the previous example and compute a numerical case.

Example 2.7.12. Determinants of \(3\times 3\) matrices.

Finding the determinant of a \(3 \times 3\) involves finding three \(2 \times 2\) cofactors. Taking

\begin{equation*} A = \left[ \begin{matrix} 3 \amp 1 \amp -2 \\ 5 \amp 2 \amp 1 \\ -1 \amp 1 \amp 0 \end{matrix} \right] \end{equation*}

we can compute the minors \(M_{1,1}\text{,}\) \(M_{1,2}\) and \(M_{1,3}\) using the formula in Example 2.7.11 as

\begin{align*} M_{1,1} \amp = \det \left( \left[ \begin{matrix} 2 \amp 1 \\ 1 \amp 0 \end{matrix} \right] \right)= -1,\\ M_{1,2} \amp = \det \left( \left[ \begin{matrix} 5 \amp 1 \\ -1 \amp 0 \end{matrix} \right] \right) = 1,\\ M_{1,3} \amp = \det \left( \left[ \begin{matrix} 5 \amp 2 \\ -1 \amp 1 \end{matrix} \right] \right) = 7. \end{align*}

Then our inductive formula gives

\begin{align*} \det (A) \amp = 3 \cdot (-1)^{1 + 1} M_{1,1} + 1 \cdot(-1)^{1 + 2} M_{1,2} + (-2) \cdot(-1)^{1 + 3} M_{1,3}, \\ \amp = -3 - 1 - 14, \\ \amp = -18. \end{align*}

In fact, there is a formula for \(3 \times 3\) determinants that, were one so inclined, could be memorized.

\begin{equation*} \det \left( \left[ \begin{matrix} a_1 \amp a_2 \amp a_3 \\ b_1 \amp b_2 \amp b_3 \\ c_1 \amp c_2 \amp c_3 \end{matrix} \right] \right) = a_1 b_2 c_3 - a_1 b_3 c_2 + a_2 b_3 c_1 - a_2 b_1 c_3 + a_3 b_1 c_2 - a_3 b_2 c_1. \end{equation*}

You might observe that the formula for \(2 \times 2\) determinants had two terms and \(3 \times 3\) had six. Were you to write out the formula for \(4 \times 4\) matrices, you’d find there are \(24\) terms. In fact, the number of terms in the \(n \times n\) determinant is \(n!\text{,}\) which grows quite quickly. These formulas can be of great use to computers and humans alike for smaller matrices, but for large ones (which occur frequently in applications), they are much less helpful. Nevertheless, there are other ways to compute the determinant of a matrix (for example, it can be done by row reduction) that are faster. More importantly, what we learn by understanding the properties of the determinant gives us great insight into many linear algebra problems.

Subsection 2.7.3 Proof of Theorem 2.7.10

Before verifying that this function satisfies the properties in Definition 2.7.8, we give an alternative characterization of the alternating property.

Lemma 2.7.13.

A row linear function \(f : M_{m, n} (K) \to K\) is alternating if and only if \(f (A) = 0\) whenever \(A\) has two identical rows.

Proof.

Assume \(f\) is alternating and \(A\) has two identical rows. Then switching these rows still gives \(A\) which implies \(f(A) = - f(A)\) or \(f(A) = 0\text{.}\) Conversely, suppose \(f\) is row linear and \(f(A)\) is zero whenever \(A\) has two identical rows. Then

\begin{align*} 0 \amp = f(\mb{r}_1 , \ldots,\mb{r}_i + \mb{r}_j, \ldots ,\mb{r}_i + \mb{r}_j, \ldots \mb{r}_n),\\ \amp = f(\mb{r}_1 , \ldots,\mb{r}_i, \ldots , \mb{r}_i, \ldots \mb{r}_n) + f(\mb{r}_1 , \ldots,\mb{r}_i, \ldots , \mb{r}_j, \ldots \mb{r}_n) + \cdots,\\ \amp \cdots + f(\mb{r}_1 , \ldots,\mb{r}_j, \ldots , \mb{r}_i, \ldots \mb{r}_n) + f(\mb{r}_1 , \ldots,\mb{r}_j, \ldots , \mb{r}_j, \ldots \mb{r}_n),\\ \amp = f(\mb{r}_1 , \ldots,\mb{r}_i, \ldots , \mb{r}_j, \ldots \mb{r}_n) + f(\mb{r}_1 , \ldots,\mb{r}_j, \ldots , \mb{r}_i, \ldots \mb{r}_n) \end{align*}

Thus

\begin{equation*} f(\mb{r}_1 , \ldots,\mb{r}_i, \ldots , \mb{r}_j, \ldots \mb{r}_n) = - f(\mb{r}_1 , \ldots,\mb{r}_j, \ldots , \mb{r}_i, \ldots \mb{r}_n) \end{equation*}

and \(f\) is alternating.

Proof of the existence portion of Theorem 2.7.10.

We will use induction on \(n\) to show that \(D_n\) satisfies the properties in Definition 2.7.8.

Row Linear: To show row linearity, we notice that since \(D_{n - 1}\) is row linear, the cofactors \(C_{1j}\) are row linear. By this we mean that if we denote the rows of \(n \times n\) matrices as \(\mb{r}_1 , \ldots, \mb{r}_n\text{,}\) then

\begin{equation*} C_{1j} (\mb{r}_2, \ldots, \mb{r}_{n}) := (-1)^{1 + j} D_{n - 1} (\mb{r}_2, \ldots, \mb{r}_{n}) \end{equation*}

is row linear. This implies that the formula

\begin{equation*} D_n (\mb{r}_1, \mb{r}_2, \ldots, \mb{r}_{n}) = a_{11} C_{11} (\mb{r}_2, \ldots, \mb{r}_{n}) + \cdots + a_{1n} C_{1j} (\mb{r}_2, \ldots, \mb{r}_{n}) \end{equation*}

is row linear in the last \((n - 1)\) rows. However, we can notice that this formula can also be written as the matrix product

\begin{equation*} D_n (\mb{r}_1, \mb{r}_2, \ldots, \mb{r}_{n}) = \mb{r}_1 \cdot \left[ \begin{matrix} C_{11} \\ C_{12} \\ \vdots \\ C_{1n} \end{matrix} \right] \end{equation*}

which is clearly linear in the first row.
Normalized: We call the property that \(D_n (I_n) = 1\) a normalization property. To prove it, we can assume that it holds for \(D_{n - 1}\text{.}\) Now, notice that the minor \((I_n)_{11} = I_{n - 1}\) while the minors \((I_n)_{1j}\) has all zeros in the \((j - 1)\)-st row. But if there is a row equal to zero, say \(\mb{r}_i = \mb{0}\text{,}\) and \(f\) is row linear, then

\begin{align} f(\mb{r}_1, \ldots , \mb{0} , \ldots, \mb{r}_n) \amp = f(\mb{r}_1, \ldots , \mb{0} + \mb{0} , \ldots, \mb{r}_n),\tag{2.7.2}\\ \amp = f(\mb{r}_1, \ldots , \mb{0} , \ldots, \mb{r}_n) + f(\mb{r}_1, \ldots , \mb{0} , \ldots, \mb{r}_n). \tag{2.7.3} \end{align}

So subtracting gives \(f(\mb{r}_1, \ldots, \mb{0}, \mb{r}_n) = 0\text{.}\) Thus, since we have shown that \(D_n\) is row linear, we have that the cofactors \(C_{1j}\) of the identity are \(1\) for \(j = 1\) and \(0\) otherwise. But then

\begin{equation*} D_n (I_n) = 1 \cdot C_{11} + 0 \cdot C_{12} + \cdots + 0 \cdot C_{1n} = 1. \end{equation*}
Alternating: To show the alternating property, we may assume inductively that \(D_{n -1}\) satisfies this property. This implies the cofactors

\begin{equation*} C_{1i} = (-1)^{1 + i} D_{n - 1} (\mb{r}_2 , \ldots, \mb{r}_n) \end{equation*}

are also alternating and in turn, switching any two of the last \(n - 1\) rows for \(D_n\) does result in multiplying by \(-1\text{.}\) Thus, we need only show that if we switch \(\mb{r}_i\) with the first row \(\mb{r}_1\) then \(D_n\) switches signs. Using the alternating property for the last \(n - 1\) rows, we may assume that \(i = 2\text{.}\) Furthermore, using the formula in Example 2.7.11 we see that the alternating property holds for \(2 \times 2\) matrices, so we may assume \(n > 2\text{.}\) Finally, using row linearity, it suffices to show this property when \(\mb{r}_1 = \mb{e}_j\) and \(\mb{r}_2 = \mb{e}_k\) are standard basis row vectors.

In the case where \(j = k\text{,}\) one sees that the minor \(M_{1j}\) is the \(D_{n - 1}\) of a matrix with first row zero. So by the argument given above in equation (2.7.3), we have \(C_{1j} = (-1)^{1 + j} M_{1j} = 0\text{.}\) But as all other entries on the first row \(\mb{r}_1 = \mb{e}_j\) are zero, we then have \(D_n (\mb{r}_1, \ldots, \mb{r}_n) = 0\) so that switching the first two rows does act as multiplying the number \(0\) by \(-1\text{.}\)

Now assume we have two distinct indices, \(1 \leq j \leq n\) and \(1 \leq k \leq n\) and take \(A (j,k)\) to be the submatrix of \(A\) obtained by eliminating the first two rows and the \(j\)-th and \(k\)-the columns. A quick look shows then that if \(j \lt k\)

\begin{equation*} D_n (\mb{e}_j, \mb{e}_k, \mb{r}_3 , \ldots, \mb{r}_n) = (-1)^{1 + j} (-1)^{1 + (k - 1)} D_{n - 2} (A (j,k)). \end{equation*}

Indeed, one must subtract \(1\) from \(k\) because, in the submatrix \(A_{1j}\) obtained by eliminating the first row and the \(j\)-th column, the index of the entry \(a_{2k}\) goes down by one. On the other hand, this does not occur if we switch the order (again assuming \(j \lt k\) ) so we get

\begin{equation*} D_n (\mb{e}_k, \mb{e}_j, \mb{r}_3 , \ldots, \mb{r}_n) = (-1)^{1 + k} (-1)^{1 + j} D_{n - 2} (A (j,k)). \end{equation*}

The difference in the sign justifies that switching the first two rows does indeed multiply the value of \(D_n\) by \(-1\text{.}\)

Some comments on this proof are in order. First, we note that at this point, we have only shown that a function satisfying the properties exists, but not that there is only one. Second, we could have quite easily used the last row instead of the first row when defining the \(D_n\text{.}\) In fact, we can use any row and write the formula

\begin{equation} D_n (A) = a_{i1} C_{i1} + \cdots + a_{in} C_{in} . \tag{2.7.4} \end{equation}

This is still a valid formula and can be obtained by using the alternating property. Once we prove uniqueness, we see that each of these formulas yields the same number.

To see the uniqueness of the determinant, we return to our elementary matrices and row reduction.

Lemma 2.7.14.

If \(D\) is a function satisfying the conditions of Definition 2.7.8, then

\begin{align*} D (E_{ij} ) \amp = -1,\\ D (E_{i} (c)) \amp = c,\\ D (E_{ij} (c)) \amp = 1. \end{align*}

Furthermore, if \(A\) is any square matrix and \(E\) is any elementary matrix of the same size,

\begin{equation*} D( EA) = D(E) D (A). \end{equation*}

Proof.

We leave it to the student to work out the first set of equations. For the second, notice that if \(E\) is Type I then \(EA\) is just \(A\) with two rows switched so by the alternating property and the first equation \(D(EA) = - D(A) = D(E) D(A)\text{.}\) On the other hand, if \(E\) is Type II so that \(E = E_i (c)\) then \(E A\) multiplies the \(i\)-th row by \(c\text{.}\) So row linearity and the second equation gives \(D(E A ) = c D(A) = D(E) D(A)\text{.}\) Finally, if \(E\) is Type III and equals \(E_{ij} (c)\) we have that the \(i\)-th row of \(E A\) is \(\mb{r}_i + c\mb{r}_j\text{.}\) So using row linearity, Lemma 2.7.13, and the third equation we get

\begin{align*} D (EA) \amp = D(\mb{r}_1, \ldots, \mb{r}_i + c\mb{r}_j, \ldots , \mb{r}_j, \ldots, \mb{r}_n ), \\ \amp = D(\mb{r}_1, \ldots, \mb{r}_i , \ldots , \mb{r}_j, \ldots, \mb{r}_n ) + c D(\mb{r}_1, \ldots, \mb{r}_j, \ldots , \mb{r}_j, \ldots, \mb{r}_n ),\\ \amp = D(\mb{r}_1, \ldots, \mb{r}_i , \ldots , \mb{r}_j, \ldots, \mb{r}_n ), \\ \amp = D (A), \\ \amp = D(E) D(A). \end{align*}

We may follow this lemma up with one of the most important results concerning the determinant.

Lemma 2.7.15.

Suppose \(D\) satisfies the properties of Definition 2.7.8. Then \(A\) is invertible if and only if \(D(A) \ne 0\text{.}\)

Proof.

Let \(A^\prime\) be the reduced row echelon form of \(A\) so that there are elementary matrices \(E_1, \ldots, E_r\) so that

\begin{equation*} E_r \cdots E_1 A = A^\prime. \end{equation*}

Taking \(D\) of both sides and repeatedly applying Lemma 2.7.14 gives

\begin{equation*} D(E_r) \cdots D(E_1) D(A) = D(A^\prime). \end{equation*}

Noting that \(D (E_i) \ne 0\) we also have

\begin{equation*} D(A) = \frac{D(A^\prime)}{ D(E_r) \cdots D(E_1)}. \end{equation*}

By Proposition 2.7.3, \(A\) is invertible if and only if \(A^\prime = I\text{.}\) If this is the case, then the normalization property says the right hand side is \(1 / D(E_r) \cdots D(E_1) \ne 0\text{.}\) If not, then \(A^\prime\) must have a row of zeros (because it is a square matrix) which implies by the argument in equation (2.7.3) that \(D(A^\prime) = 0\text{.}\) This proves the statement.

These two lemmas immediately give our uniqueness claim.

Proof of uniqueness of Theorem 2.7.10.

If \(A\) is not invertible, then we must have \(D(A) = 0\) by Lemma 2.7.15. On the other hand, the proof of the same lemma showed that if \(A\) is invertible, then

\begin{equation*} D(A) = \frac{1}{D(E_r) \cdots D(E_1)} \end{equation*}

for a collection of an elementary matrices which row reduce \(A\text{.}\) However, Lemma 2.7.14 shows that for elementary matrices \(E\text{,}\) \(D(E)\) is determined by the properties of Definition 2.7.8. Thus there is no choice in defining \(D\) for an arbitrary \(A\text{.}\)

From now on, we will just use \(\det\) instead of \(D\text{.}\) We notice that the last proof also shows that if we row reduce a matrix, and keep track of our operations, we can compute the determinant. This is generally a quicker way to get determinants than using formulas for large matrices.

We end this section with a useful result concerning the determinant.

Proposition 2.7.16.

Given two \(n \times n\) matrices \(A\) and \(B\) then

\begin{equation*} \det (A B) = \det (A) \det (B). \end{equation*}

Proof.

If either \(A\) or \(B\) is not invertible, then \(AB\) is not invertible. Indeed, if \(C\) were an inverse of \((AB)\) then \(A (BC) = (AB) C = I\) and \((CA) B = C(AB) = I\) would show that \(A\) or \(B\) had an inverse (here you would need to observe that, for square matrices, a left or right inverse is an inverse which follows from Lemma 2.7.17 proven independently below). Thus in the non-invertible case both sides are zero. On the other hand, if \(A\) and \(B\) are invertible, then by Proposition 2.7.6, they are both products of elementary matrices

\begin{align*} A \amp = E_1 \cdots E_r,\\ B \amp = \tilde{E}_1 \cdots \tilde{E}_s. \end{align*}

So repeatedly using Lemma 2.7.14, we have

\begin{align*} \det (AB) \amp = \det (E_1 \cdots E_r\tilde{E}_1 \cdots \tilde{E}_s ) ,\\ \amp = \det (E_1) \cdots \det (E_r) \det (\tilde{E}_1 \cdots \tilde{E}_s ) ,\\ \amp = \det (E_1 \cdots E_r) \det (\tilde{E}_1 \cdots \tilde{E}_s ) ,\\ \amp = \det (A) \det (B). \end{align*}

Subsection 2.7.4 Formula for the Inverse

As a final result of this section, we note that the determinant also allows us to write out a formula for the inverse matrix when it exists. First we take the adjugate matrix of a square matrix \(A\) (which is sometimes incorrectly referred to as the adjoint) as the transpose matrix of cofactors

\begin{equation*} \textnormal{adj} (A) = \left[ \begin{matrix} C_{11} \amp C_{21} \amp \cdots \amp C_{n1} \\ C_{12} \amp C_{22} \amp \cdots \amp C_{n2} \\ \vdots \amp \amp \ddots \amp \vdots \\ C_{1n} \amp C_{2n} \amp \cdots \amp C_{nn} \end{matrix} \right] . \end{equation*}

One notes that if we multiply \(\text{adj} (A)\) on the left by the \(i\)-th row \(\mb{r}_i\) of \(A\text{,}\) we obtain the row vector \(\det (A) \mb{e}_i\) (check this). But this implies that

\begin{equation} A \, \cdot \textnormal{adj} (A) = \det (A) I .\tag{2.7.5} \end{equation}

Of course, if \(\det (A) = 0\text{,}\) this may not give us much to work with when it comes to finding an inverse. But otherwise we would like to conclude that

\begin{equation*} A^{-1} = \frac{1}{\det (A)} \textnormal{adj} (A) \hspace{.2in} \text{if }\det (A) \ne 0. \end{equation*}

However, one might worry whether this really works, since we only have that \(\frac{1}{\det (A)} \text{adj} (A)\) is a right inverse in the equation above. Happily, there is a simple argument to show that left and right inverses of square matrices must be the same.

Lemma 2.7.17.

If \(B\) is a right inverse or left inverse of a square matrix \(A\text{,}\) then it is the inverse of \(A\text{.}\)

Proof.

Since \(A\) is a square matrix, it represents a linear transformation from \(K^n\) to \(K^n\text{.}\) If it has a left inverse \(B\) so that \(BA = I\) then that transformation is one-to-one. But by Corollary 2.6.5 we have that this means it is an isomorphism. Thus it has an inverse linear transformation which can be represented by a two sided matrix inverse \(A^{-1}\text{.}\) But then since \(BA = I\) we may multiply on the right by \(A^{-1}\) and obtain \(B = A^{-1}\text{.}\) A similar argument applies for the right inverse.

Exercises 2.7.5 Exercises

1.

Determine if the matrix is invertible using row reduction (augment with the identity). If it is, find the inverse.

(a)

\begin{equation*} \left[ \begin{matrix} 0 \amp 1 \\ 1 \amp -2 \end{matrix} \right] \end{equation*}

(b)

\begin{equation*} \left[ \begin{matrix} 1 \amp 0 \amp 1 \\ 0 \amp -1 \amp 1 \\ 0 \amp 1 \amp -1 \end{matrix} \right] \end{equation*}

2.

Compute the determinants of the following matrices.

(a)

\begin{equation*} \left[ \begin{matrix} 1 \amp 2 \\ 3 \amp -1 \end{matrix} \right] \end{equation*}

(b)

\begin{equation*} \left[ \begin{matrix} 1 \amp 1 \amp 0 \\ -2 \amp 2 \amp 1 \\ 0 \amp 1 \amp -1 \end{matrix} \right] \end{equation*}

(c)

\begin{equation*} \left[ \begin{matrix} 1 \amp 0 \amp 1 \amp 0\\ 1 \amp 1 \amp 1 \amp 1 \\ 0 \amp 1 \amp 0 \amp 1 \\ 1 \amp 2 \amp 3 \amp 4 \end{matrix} \right] \end{equation*}

3.

Verify the first three equations in Lemma 2.7.14.

4.

Use Proposition 2.7.16 to show that if \(A\) is an invertible matrix then

\begin{equation*} \det (A^{-1} ) = \frac{1}{\det (A)}. \end{equation*}

5.

Let

\begin{equation*} A = \left[ \begin{matrix} a \amp b \\ c \amp d \end{matrix} \right] \end{equation*}

be the a \(2 \times 2\) matrix. Consider the column vectors \(\mb{u} = \twovec{a}{c}\) and \(\mb{v} = \twovec{b}{d}\) in \(\mathbb{R}^2\) and write \(\mathcal{P}\) for the parallelogram in \(\mathbb{R}^2\) spanned by \(\mb{u}\) and \(\mb{v}\) (this is the parallelogram with sides \(\mb{u}\text{,}\) \(\mb{v}\) and their translations). Show that

\begin{equation*} | \det (A) | = \text{Area} (\mathcal{P}). \end{equation*}

Hint.

First rotate \(\mb{u}\) and \(\mb{v}\) by multiplying by a rotation matrix so that \(\mb{u}\) is of the form \(\twovec{r}{0}\text{.}\) Use Proposition 2.7.16 to show this won’t change the determinant of \(A\) and observe that this does not change the area of the parallelogram. However, the new area and determinant is easy to calculate and compare.

Prev Top Next