A square n × n {\displaystyle n\times n} matrix A {\displaystyle A} with entries in a field F {\displaystyle F} is called diagonalizable or nondefective if there exists an n × n {\displaystyle n\times n} invertible matrix (i.e. an element of the general linear group GLn(F)), P {\displaystyle P} , such that P − 1 A P {\displaystyle P^{-1}AP} is a diagonal matrix.
The fundamental fact about diagonalizable maps and matrices is expressed by the following:
The following sufficient (but not necessary) condition is often useful.
Let A {\displaystyle A} be a matrix over F {\displaystyle F} . If A {\displaystyle A} is diagonalizable, then so is any power of it. Conversely, if A {\displaystyle A} is invertible, F {\displaystyle F} is algebraically closed, and A n {\displaystyle A^{n}} is diagonalizable for some n {\displaystyle n} that is not an integer multiple of the characteristic of F {\displaystyle F} , then A {\displaystyle A} is diagonalizable. Proof: If A n {\displaystyle A^{n}} is diagonalizable, then A {\displaystyle A} is annihilated by some polynomial ( x n − λ 1 ) ⋯ ( x n − λ k ) {\displaystyle \left(x^{n}-\lambda _{1}\right)\cdots \left(x^{n}-\lambda _{k}\right)} , which has no multiple root (since λ j ≠ 0 {\displaystyle \lambda _{j}\neq 0} ) and is divided by the minimal polynomial of A {\displaystyle A} .
Over the complex numbers C {\displaystyle \mathbb {C} } , almost every matrix is diagonalizable. More precisely: the set of complex n × n {\displaystyle n\times n} matrices that are not diagonalizable over C {\displaystyle \mathbb {C} } , considered as a subset of C n × n {\displaystyle \mathbb {C} ^{n\times n}} , has Lebesgue measure zero. One can also say that the diagonalizable matrices form a dense subset with respect to the Zariski topology: the non-diagonalizable matrices lie inside the vanishing set of the discriminant of the characteristic polynomial, which is a hypersurface. From that follows also density in the usual (strong) topology given by a norm. The same is not true over R {\displaystyle \mathbb {R} } .
The Jordan–Chevalley decomposition expresses an operator as the sum of its semisimple (i.e., diagonalizable) part and its nilpotent part. Hence, a matrix is diagonalizable if and only if its nilpotent part is zero. Put in another way, a matrix is diagonalizable if each block in its Jordan form has no nilpotent part; i.e., each "block" is a one-by-one matrix.
See also: Eigendecomposition of a matrix
Consider the two following arbitrary bases E = { e i | ∀ i ∈ [ n ] } {\displaystyle E=\{{{\boldsymbol {e}}_{i}|\forall i\in [n]}\}} and F = { α i | ∀ i ∈ [ n ] } {\displaystyle F=\{{{\boldsymbol {\alpha }}_{i}|\forall i\in [n]}\}} . Suppose that there exists a linear transformation represented by a matrix A E {\displaystyle A_{E}} which is written with respect to basis E. Suppose also that there exists the following eigen-equation:
A E α E , i = λ i α E , i {\displaystyle A_{E}{\boldsymbol {\alpha }}_{E,i}=\lambda _{i}{\boldsymbol {\alpha }}_{E,i}}
The alpha eigenvectors are written also with respect to the E basis. Since the set F is both a set of eigenvectors for matrix A and it spans some arbitrary vector space, then we say that there exists a matrix D F {\displaystyle D_{F}} which is a diagonal matrix that is similar to A E {\displaystyle A_{E}} . In other words, A E {\displaystyle A_{E}} is a diagonalizable matrix if the matrix is written in the basis F. We perform the change of basis calculation using the transition matrix S {\displaystyle S} , which changes basis from E to F as follows:
D F = S E F A E S E − 1 F {\displaystyle D_{F}=S_{E}^{F}\ A_{E}\ S_{E}^{-1F}} ,
where S E F {\displaystyle S_{E}^{F}} is the transition matrix from E-basis to F-basis. The inverse can then be equated to a new transition matrix P {\displaystyle P} which changes basis from F to E instead and so we have the following relationship :
S E − 1 F = P F E {\displaystyle S_{E}^{-1F}=P_{F}^{E}}
Both S {\displaystyle S} and P {\displaystyle P} transition matrices are invertible. Thus we can manipulate the matrices in the following fashion: D = S A E S − 1 D = P − 1 A E P {\displaystyle {\begin{aligned}D=S\ A_{E}\ S^{-1}\\D=P^{-1}\ A_{E}\ P\end{aligned}}} The matrix A E {\displaystyle A_{E}} will be denoted as A {\displaystyle A} , which is still in the E-basis. Similarly, the diagonal matrix is in the F-basis.
If a matrix A {\displaystyle A} can be diagonalized, that is,
then:
The transition matrix S has the E-basis vectors as columns written in the basis F. Inversely, the inverse transition matrix P has F-basis vectors α i {\displaystyle {\boldsymbol {\alpha }}_{i}} written in the basis of E so that we can represent P in block matrix form in the following manner:
as a result we can write: A [ α E , 1 α E , 2 ⋯ α E , n ] = [ α E , 1 α E , 2 ⋯ α E , n ] D . {\displaystyle {\begin{aligned}A{\begin{bmatrix}{\boldsymbol {\alpha }}_{E,1}&{\boldsymbol {\alpha }}_{E,2}&\cdots &{\boldsymbol {\alpha }}_{E,n}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {\alpha }}_{E,1}&{\boldsymbol {\alpha }}_{E,2}&\cdots &{\boldsymbol {\alpha }}_{E,n}\end{bmatrix}}D.\end{aligned}}}
In block matrix form, we can consider the A-matrix to be a matrix of 1x1 dimensions whilst P is a 1xn dimensional matrix. The D-matrix can be written in full form with all the diagonal elements as an nxn dimensional matrix:
A [ α E , 1 α E , 2 ⋯ α E , n ] = [ α E , 1 α E , 2 ⋯ α E , n ] [ λ 1 0 ⋯ 0 0 λ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ λ n ] . {\displaystyle A{\begin{bmatrix}{\boldsymbol {\alpha }}_{E,1}&{\boldsymbol {\alpha }}_{E,2}&\cdots &{\boldsymbol {\alpha }}_{E,n}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {\alpha }}_{E,1}&{\boldsymbol {\alpha }}_{E,2}&\cdots &{\boldsymbol {\alpha }}_{E,n}\end{bmatrix}}{\begin{bmatrix}\lambda _{1}&0&\cdots &0\\0&\lambda _{2}&\cdots &0\\\vdots &\vdots &\ddots &\vdots \\0&0&\cdots &\lambda _{n}\end{bmatrix}}.}
Performing the above matrix multiplication we end up with the following result: A [ α 1 α 2 ⋯ α n ] = [ λ 1 α 1 λ 2 α 2 ⋯ λ n α n ] {\displaystyle {\begin{aligned}A{\begin{bmatrix}{\boldsymbol {\alpha }}_{1}&{\boldsymbol {\alpha }}_{2}&\cdots &{\boldsymbol {\alpha }}_{n}\end{bmatrix}}={\begin{bmatrix}\lambda _{1}{\boldsymbol {\alpha }}_{1}&\lambda _{2}{\boldsymbol {\alpha }}_{2}&\cdots &\lambda _{n}{\boldsymbol {\alpha }}_{n}\end{bmatrix}}\end{aligned}}} Taking each component of the block matrix individually on both sides, we end up with the following:
So the column vectors of P {\displaystyle P} are right eigenvectors of A {\displaystyle A} , and the corresponding diagonal entry is the corresponding eigenvalue. The invertibility of P {\displaystyle P} also suggests that the eigenvectors are linearly independent and form a basis of F n {\displaystyle F^{n}} . This is the necessary and sufficient condition for diagonalizability and the canonical approach of diagonalization. The row vectors of P − 1 {\displaystyle P^{-1}} are the left eigenvectors of A {\displaystyle A} .
When a complex matrix A ∈ C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} is a Hermitian matrix (or more generally a normal matrix), eigenvectors of A {\displaystyle A} can be chosen to form an orthonormal basis of C n {\displaystyle \mathbb {C} ^{n}} , and P {\displaystyle P} can be chosen to be a unitary matrix. If in addition, A ∈ R n × n {\displaystyle A\in \mathbb {R} ^{n\times n}} is a real symmetric matrix, then its eigenvectors can be chosen to be an orthonormal basis of R n {\displaystyle \mathbb {R} ^{n}} and P {\displaystyle P} can be chosen to be an orthogonal matrix.
For most practical work matrices are diagonalized numerically using computer software. Many algorithms exist to accomplish this.
See also: Simultaneous triangularisability, Weight (representation theory), and Positive definite matrix
A set of matrices is said to be simultaneously diagonalizable if there exists a single invertible matrix P {\displaystyle P} such that P − 1 A P {\displaystyle P^{-1}AP} is a diagonal matrix for every A {\displaystyle A} in the set. The following theorem characterizes simultaneously diagonalizable matrices: A set of diagonalizable matrices commutes if and only if the set is simultaneously diagonalizable.1: p. 64
The set of all n × n {\displaystyle n\times n} diagonalizable matrices (over C {\displaystyle \mathbb {C} } ) with n > 1 {\displaystyle n>1} is not simultaneously diagonalizable. For instance, the matrices
are diagonalizable but not simultaneously diagonalizable because they do not commute.
A set consists of commuting normal matrices if and only if it is simultaneously diagonalizable by a unitary matrix; that is, there exists a unitary matrix U {\displaystyle U} such that U ∗ A U {\displaystyle U^{*}AU} is diagonal for every A {\displaystyle A} in the set.
In the language of Lie theory, a set of simultaneously diagonalizable matrices generates a toral Lie algebra.
In general, a rotation matrix is not diagonalizable over the reals, but all rotation matrices are diagonalizable over the complex field. Even if a matrix is not diagonalizable, it is always possible to "do the best one can", and find a matrix with the same properties consisting of eigenvalues on the leading diagonal, and either ones or zeroes on the superdiagonal – known as Jordan normal form.
Some matrices are not diagonalizable over any field, most notably nonzero nilpotent matrices. This happens more generally if the algebraic and geometric multiplicities of an eigenvalue do not coincide. For instance, consider
This matrix is not diagonalizable: there is no matrix U {\displaystyle U} such that U − 1 C U {\displaystyle U^{-1}CU} is a diagonal matrix. Indeed, C {\displaystyle C} has one eigenvalue (namely zero) and this eigenvalue has algebraic multiplicity 2 and geometric multiplicity 1.
Some real matrices are not diagonalizable over the reals. Consider for instance the matrix
The matrix B {\displaystyle B} does not have any real eigenvalues, so there is no real matrix Q {\displaystyle Q} such that Q − 1 B Q {\displaystyle Q^{-1}BQ} is a diagonal matrix. However, we can diagonalize B {\displaystyle B} if we allow complex numbers. Indeed, if we take
then Q − 1 B Q {\displaystyle Q^{-1}BQ} is diagonal. It is easy to find that B {\displaystyle B} is the rotation matrix which rotates counterclockwise by angle θ = − π 2 {\textstyle \theta =-{\frac {\pi }{2}}}
Note that the above examples show that the sum of diagonalizable matrices need not be diagonalizable.
Diagonalizing a matrix is the same process as finding its eigenvalues and eigenvectors, in the case that the eigenvectors form a basis. For example, consider the matrix
The roots of the characteristic polynomial p ( λ ) = det ( λ I − A ) {\displaystyle p(\lambda )=\det(\lambda I-A)} are the eigenvalues λ 1 = 1 , λ 2 = 1 , λ 3 = 2 {\displaystyle \lambda _{1}=1,\lambda _{2}=1,\lambda _{3}=2} . Solving the linear system ( 1 I − A ) v = 0 {\displaystyle \left(1I-A\right)\mathbf {v} =\mathbf {0} } gives the eigenvectors v 1 = ( 1 , 1 , 0 ) {\displaystyle \mathbf {v} _{1}=(1,1,0)} and v 2 = ( 0 , 2 , 1 ) {\displaystyle \mathbf {v} _{2}=(0,2,1)} , while ( 2 I − A ) v = 0 {\displaystyle \left(2I-A\right)\mathbf {v} =\mathbf {0} } gives v 3 = ( 1 , 0 , − 1 ) {\displaystyle \mathbf {v} _{3}=(1,0,-1)} ; that is, A v i = λ i v i {\displaystyle A\mathbf {v} _{i}=\lambda _{i}\mathbf {v} _{i}} for i = 1 , 2 , 3 {\displaystyle i=1,2,3} . These vectors form a basis of V = R 3 {\displaystyle V=\mathbb {R} ^{3}} , so we can assemble them as the column vectors of a change-of-basis matrix P {\displaystyle P} to get: P − 1 A P = [ 1 0 1 1 2 0 0 1 − 1 ] − 1 [ 0 1 − 2 0 1 0 1 − 1 3 ] [ 1 0 1 1 2 0 0 1 − 1 ] = [ 1 0 0 0 1 0 0 0 2 ] = D . {\displaystyle P^{-1}AP=\left[{\begin{array}{rrr}1&0&1\\1&2&0\\0&1&\!\!\!\!-1\end{array}}\right]^{-1}\left[{\begin{array}{rrr}0&1&\!\!\!-2\\0&1&0\\1&\!\!\!-1&3\end{array}}\right]\left[{\begin{array}{rrr}1&\,0&1\\1&2&0\\0&1&\!\!\!\!-1\end{array}}\right]={\begin{bmatrix}1&0&0\\0&1&0\\0&0&2\end{bmatrix}}=D.} We may see this equation in terms of transformations: P {\displaystyle P} takes the standard basis to the eigenbasis, P e i = v i {\displaystyle P\mathbf {e} _{i}=\mathbf {v} _{i}} , so we have: P − 1 A P e i = P − 1 A v i = P − 1 ( λ i v i ) = λ i e i , {\displaystyle P^{-1}AP\mathbf {e} _{i}=P^{-1}A\mathbf {v} _{i}=P^{-1}(\lambda _{i}\mathbf {v} _{i})=\lambda _{i}\mathbf {e} _{i},} so that P − 1 A P {\displaystyle P^{-1}AP} has the standard basis as its eigenvectors, which is the defining property of D {\displaystyle D} .
Note that there is no preferred order of the eigenvectors in P {\displaystyle P} ; changing the order of the eigenvectors in P {\displaystyle P} just changes the order of the eigenvalues in the diagonalized form of A {\displaystyle A} .2
Diagonalization can be used to efficiently compute the powers of a matrix A = P D P − 1 {\displaystyle A=PDP^{-1}} :
and the latter is easy to calculate since it only involves the powers of a diagonal matrix. For example, for the matrix A {\displaystyle A} with eigenvalues λ = 1 , 1 , 2 {\displaystyle \lambda =1,1,2} in the example above we compute:
This approach can be generalized to matrix exponential and other matrix functions that can be defined as power series. For example, defining exp ( A ) = I + A + 1 2 ! A 2 + 1 3 ! A 3 + ⋯ {\textstyle \exp(A)=I+A+{\frac {1}{2!}}A^{2}+{\frac {1}{3!}}A^{3}+\cdots } , we have:
This is particularly useful in finding closed form expressions for terms of linear recursive sequences, such as the Fibonacci numbers.
For example, consider the following matrix:
Calculating the various powers of M {\displaystyle M} reveals a surprising pattern:
The above phenomenon can be explained by diagonalizing M {\displaystyle M} . To accomplish this, we need a basis of R 2 {\displaystyle \mathbb {R} ^{2}} consisting of eigenvectors of M {\displaystyle M} . One such eigenvector basis is given by
where ei denotes the standard basis of Rn. The reverse change of basis is given by
Straightforward calculations show that
Thus, a and b are the eigenvalues corresponding to u and v, respectively. By linearity of matrix multiplication, we have that
Switching back to the standard basis, we have
The preceding relations, expressed in matrix form, are
thereby explaining the above phenomenon.
In quantum mechanical and quantum chemical computations matrix diagonalization is one of the most frequently applied numerical processes. The basic reason is that the time-independent Schrödinger equation is an eigenvalue equation, albeit in most of the physical situations on an infinite dimensional Hilbert space.
A very common approximation is to truncate (or project) the Hilbert space to finite dimension, after which the Schrödinger equation can be formulated as an eigenvalue problem of a real symmetric, or complex Hermitian matrix. Formally this approximation is founded on the variational principle, valid for Hamiltonians that are bounded from below.
First-order perturbation theory also leads to matrix eigenvalue problem for degenerate states.
Horn, Roger A.; Johnson, Charles R. (2013). Matrix Analysis, second edition. Cambridge University Press. ISBN 9780521839402. 9780521839402 ↩
Anton, H.; Rorres, C. (22 Feb 2000). Elementary Linear Algebra (Applications Version) (8th ed.). John Wiley & Sons. ISBN 978-0-471-17052-5. 978-0-471-17052-5 ↩