Any matrix A with entries in a field F has characteristic polynomial p ( x ) = det ( x I − A ) {\displaystyle p(x)=\det(xI-A)} , which in turn has companion matrix C ( p ) {\displaystyle C(p)} . These matrices are related as follows.
The following statements are equivalent:
If the above hold, one says that A is non-derogatory.
Not every square matrix is similar to a companion matrix, but every square matrix is similar to a block diagonal matrix made of companion matrices. If we also demand that the polynomial of each diagonal block divides the next one, they are uniquely determined by A, and this gives the rational canonical form of A.
The roots of the characteristic polynomial p ( x ) {\displaystyle p(x)} are the eigenvalues of C ( p ) {\displaystyle C(p)} . If there are n distinct eigenvalues λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} , then C ( p ) {\displaystyle C(p)} is diagonalizable as C ( p ) = V − 1 D V {\displaystyle C(p)=V^{-1}\!DV} , where D is the diagonal matrix and V is the Vandermonde matrix corresponding to the λ's: D = [ λ 1 0 ⋯ 0 0 λ 2 ⋯ 0 0 0 ⋯ λ n ] , V = [ 1 λ 1 λ 1 2 ⋯ λ 1 n − 1 1 λ 2 λ 2 2 ⋯ λ 2 n − 1 ⋮ ⋮ ⋮ ⋱ ⋮ 1 λ n λ n 2 ⋯ λ n n − 1 ] . {\displaystyle D={\begin{bmatrix}\lambda _{1}&0&\!\!\!\cdots \!\!\!&0\\0&\lambda _{2}&\!\!\!\cdots \!\!\!&0\\0&0&\!\!\!\cdots \!\!\!&\lambda _{n}\end{bmatrix}},\qquad V={\begin{bmatrix}1&\lambda _{1}&\lambda _{1}^{2}&\!\!\!\cdots \!\!\!&\lambda _{1}^{n-1}\\1&\lambda _{2}&\lambda _{2}^{2}&\!\!\!\cdots \!\!\!&\lambda _{2}^{n-1}\\[-1em]\vdots &\vdots &\vdots &\!\!\!\ddots \!\!\!&\vdots \\1&\lambda _{n}&\lambda _{n}^{2}&\!\!\!\cdots \!\!\!&\lambda _{n}^{n-1}\end{bmatrix}}.} Indeed, a reasonably hard computation shows that the transpose C ( p ) T {\displaystyle C(p)^{T}} has eigenvectors v i = ( 1 , λ i , … , λ i n − 1 ) {\displaystyle v_{i}=(1,\lambda _{i},\ldots ,\lambda _{i}^{n-1})} with C ( p ) T ( v i ) = λ i v i {\displaystyle C(p)^{T}\!(v_{i})=\lambda _{i}v_{i}} , which follows from p ( λ i ) = c 0 + c 1 λ i + ⋯ + c n − 1 λ i n − 1 + λ i n = 0 {\displaystyle p(\lambda _{i})=c_{0}+c_{1}\lambda _{i}+\cdots +c_{n-1}\lambda _{i}^{n-1}+\lambda _{i}^{n}=0} . Thus, its diagonalizing change of basis matrix is V T = [ v 1 T … v n T ] {\displaystyle V^{T}=[v_{1}^{T}\ldots v_{n}^{T}]} , meaning C ( p ) T = V T D ( V T ) − 1 {\displaystyle C(p)^{T}=V^{T}D\,(V^{T})^{-1}} , and taking the transpose of both sides gives C ( p ) = V − 1 D V {\displaystyle C(p)=V^{-1}\!DV} .
We can read the eigenvectors of C ( p ) {\displaystyle C(p)} with C ( p ) ( w i ) = λ i w i {\displaystyle C(p)(w_{i})=\lambda _{i}w_{i}} from the equation C ( p ) = V − 1 D V {\displaystyle C(p)=V^{-1}\!DV} : they are the column vectors of the inverse Vandermonde matrix V − 1 = [ w 1 T ⋯ w n T ] {\displaystyle V^{-1}=[w_{1}^{T}\cdots w_{n}^{T}]} . This matrix is known explicitly, giving the eigenvectors w i = ( L 0 i , … , L ( n − 1 ) i ) {\displaystyle w_{i}=(L_{0i},\ldots ,L_{(n-1)i})} , with coordinates equal to the coefficients of the Lagrange polynomials L i ( x ) = L 0 i + L 1 i x + ⋯ + L ( n − 1 ) i x n − 1 = ∏ j ≠ i x − λ j λ j − λ i = p ( x ) ( x − λ i ) p ′ ( λ i ) . {\displaystyle L_{i}(x)=L_{0i}+L_{1i}x+\cdots +L_{(n-1)i}x^{n-1}=\prod _{j\neq i}{\frac {x-\lambda _{j}}{\lambda _{j}-\lambda _{i}}}={\frac {p(x)}{(x-\lambda _{i})\,p'(\lambda _{i})}}.} Alternatively, the scaled eigenvectors w ~ i = p ′ ( λ i ) w i {\displaystyle {\tilde {w}}_{i}=p'\!(\lambda _{i})\,w_{i}} have simpler coefficients.
If p ( x ) {\displaystyle p(x)} has multiple roots, then C ( p ) {\displaystyle C(p)} is not diagonalizable. Rather, the Jordan canonical form of C ( p ) {\displaystyle C(p)} contains one Jordan block for each distinct root; if the multiplicity of the root is m, then the block is an m × m matrix with λ {\displaystyle \lambda } on the diagonal and 1 in the entries just above the diagonal. in this case, V becomes a confluent Vandermonde matrix.2
A linear recursive sequence defined by a k + n = − c 0 a k − c 1 a k + 1 ⋯ − c n − 1 a k + n − 1 {\displaystyle a_{k+n}=-c_{0}a_{k}-c_{1}a_{k+1}\cdots -c_{n-1}a_{k+n-1}} for k ≥ 0 {\displaystyle k\geq 0} has the characteristic polynomial p ( x ) = c 0 + c 1 x + ⋯ + c n − 1 x n − 1 + x n {\displaystyle p(x)=c_{0}+c_{1}x+\cdots +c_{n-1}x^{n-1}+x^{n}} , whose transpose companion matrix C ( p ) T {\displaystyle C(p)^{T}} generates the sequence: [ a k + 1 a k + 2 ⋮ a k + n − 1 a k + n ] = [ 0 1 0 ⋯ 0 0 0 1 ⋯ 0 ⋮ ⋮ ⋮ ⋱ ⋮ 0 0 0 ⋯ 1 − c 0 − c 1 − c 2 ⋯ − c n − 1 ] [ a k a k + 1 ⋮ a k + n − 2 a k + n − 1 ] . {\displaystyle {\begin{bmatrix}a_{k+1}\\a_{k+2}\\\vdots \\a_{k+n-1}\\a_{k+n}\end{bmatrix}}={\begin{bmatrix}0&1&0&\cdots &0\\0&0&1&\cdots &0\\\vdots &\vdots &\vdots &\ddots &\vdots \\0&0&0&\cdots &1\\-c_{0}&-c_{1}&-c_{2}&\cdots &-c_{n-1}\end{bmatrix}}{\begin{bmatrix}a_{k}\\a_{k+1}\\\vdots \\a_{k+n-2}\\a_{k+n-1}\end{bmatrix}}.} The vector v = ( 1 , λ , λ 2 , … , λ n − 1 ) {\displaystyle v=(1,\lambda ,\lambda ^{2},\ldots ,\lambda ^{n-1})} is an eigenvector of this matrix, where the eigenvalue λ {\displaystyle \lambda } is a root of p ( x ) {\displaystyle p(x)} . Setting the initial values of the sequence equal to this vector produces a geometric sequence a k = λ k {\displaystyle a_{k}=\lambda ^{k}} which satisfies the recurrence. In the case of n distinct eigenvalues, an arbitrary solution a k {\displaystyle a_{k}} can be written as a linear combination of such geometric solutions, and the eigenvalues of largest complex norm give an asymptotic approximation.
Similarly to the above case of linear recursions, consider a homogeneous linear ODE of order n for the scalar function y = y ( t ) {\displaystyle y=y(t)} : y ( n ) + c n − 1 y ( n − 1 ) + ⋯ + c 1 y ( 1 ) + c 0 y = 0. {\displaystyle y^{(n)}+c_{n-1}y^{(n-1)}+\dots +c_{1}y^{(1)}+c_{0}y=0.} This can be equivalently described as a coupled system of homogeneous linear ODE of order 1 for the vector function z ( t ) = ( y ( t ) , y ′ ( t ) , … , y ( n − 1 ) ( t ) ) {\displaystyle z(t)=(y(t),y'(t),\ldots ,y^{(n-1)}(t))} : z ′ = C ( p ) T z {\displaystyle z'=C(p)^{T}z} where C ( p ) T {\displaystyle C(p)^{T}} is the transpose companion matrix for the characteristic polynomial p ( x ) = x n + c n − 1 x n − 1 + ⋯ + c 1 x + c 0 . {\displaystyle p(x)=x^{n}+c_{n-1}x^{n-1}+\cdots +c_{1}x+c_{0}.} Here the coefficients c i = c i ( t ) {\displaystyle c_{i}=c_{i}(t)} may be also functions, not just constants.
If C ( p ) T {\displaystyle C(p)^{T}} is diagonalizable, then a diagonalizing change of basis will transform this into a decoupled system equivalent to one scalar homogeneous first-order linear ODE in each coordinate.
An inhomogeneous equation y ( n ) + c n − 1 y ( n − 1 ) + ⋯ + c 1 y ( 1 ) + c 0 y = f ( t ) {\displaystyle y^{(n)}+c_{n-1}y^{(n-1)}+\dots +c_{1}y^{(1)}+c_{0}y=f(t)} is equivalent to the system: z ′ = C ( p ) T z + F ( t ) {\displaystyle z'=C(p)^{T}z+F(t)} with the inhomogeneity term F ( t ) = ( 0 , … , 0 , f ( t ) ) {\displaystyle F(t)=(0,\ldots ,0,f(t))} .
Again, a diagonalizing change of basis will transform this into a decoupled system of scalar inhomogeneous first-order linear ODEs.
In the case of p ( x ) = x n − 1 {\displaystyle p(x)=x^{n}-1} , when the eigenvalues are the complex roots of unity, the companion matrix and its transpose both reduce to Sylvester's cyclic shift matrix, a circulant matrix.
Consider a polynomial p ( x ) = x n + c n − 1 x n − 1 + ⋯ + c 1 x + c 0 {\displaystyle p(x)=x^{n}+c_{n-1}x^{n-1}+\cdots +c_{1}x+c_{0}} with coefficients in a field F {\displaystyle F} , and suppose p ( x ) {\displaystyle p(x)} is irreducible in the polynomial ring F [ x ] {\displaystyle F[x]} . Then adjoining a root λ {\displaystyle \lambda } of p ( x ) {\displaystyle p(x)} produces a field extension K = F ( λ ) ≅ F [ x ] / ( p ( x ) ) {\displaystyle K=F(\lambda )\cong F[x]/(p(x))} , which is also a vector space over F {\displaystyle F} with standard basis { 1 , λ , λ 2 , … , λ n − 1 } {\displaystyle \{1,\lambda ,\lambda ^{2},\ldots ,\lambda ^{n-1}\}} . Then the F {\displaystyle F} -linear multiplication mapping
has an n × n matrix [ m λ ] {\displaystyle [m_{\lambda }]} with respect to the standard basis. Since m λ ( λ i ) = λ i + 1 {\displaystyle m_{\lambda }(\lambda ^{i})=\lambda ^{i+1}} and m λ ( λ n − 1 ) = λ n = − c 0 − ⋯ − c n − 1 λ n − 1 {\displaystyle m_{\lambda }(\lambda ^{n-1})=\lambda ^{n}=-c_{0}-\cdots -c_{n-1}\lambda ^{n-1}} , this is the companion matrix of p ( x ) {\displaystyle p(x)} : [ m λ ] = C ( p ) . {\displaystyle [m_{\lambda }]=C(p).} Assuming this extension is separable (for example if F {\displaystyle F} has characteristic zero or is a finite field), p ( x ) {\displaystyle p(x)} has distinct roots λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} with λ 1 = λ {\displaystyle \lambda _{1}=\lambda } , so that p ( x ) = ( x − λ 1 ) ⋯ ( x − λ n ) , {\displaystyle p(x)=(x-\lambda _{1})\cdots (x-\lambda _{n}),} and it has splitting field L = F ( λ 1 , … , λ n ) {\displaystyle L=F(\lambda _{1},\ldots ,\lambda _{n})} . Now m λ {\displaystyle m_{\lambda }} is not diagonalizable over F {\displaystyle F} ; rather, we must extend it to an L {\displaystyle L} -linear map on L n ≅ L ⊗ F K {\displaystyle L^{n}\cong L\otimes _{F}K} , a vector space over L {\displaystyle L} with standard basis { 1 ⊗ 1 , 1 ⊗ λ , 1 ⊗ λ 2 , … , 1 ⊗ λ n − 1 } {\displaystyle \{1{\otimes }1,\,1{\otimes }\lambda ,\,1{\otimes }\lambda ^{2},\ldots ,1{\otimes }\lambda ^{n-1}\}} , containing vectors w = ( β 1 , … , β n ) = β 1 ⊗ 1 + ⋯ + β n ⊗ λ n − 1 {\displaystyle w=(\beta _{1},\ldots ,\beta _{n})=\beta _{1}{\otimes }1+\cdots +\beta _{n}{\otimes }\lambda ^{n-1}} . The extended mapping is defined by m λ ( β ⊗ α ) = β ⊗ ( λ α ) {\displaystyle m_{\lambda }(\beta \otimes \alpha )=\beta \otimes (\lambda \alpha )} .
The matrix [ m λ ] = C ( p ) {\displaystyle [m_{\lambda }]=C(p)} is unchanged, but as above, it can be diagonalized by matrices with entries in L {\displaystyle L} : [ m λ ] = C ( p ) = V − 1 D V , {\displaystyle [m_{\lambda }]=C(p)=V^{-1}\!DV,} for the diagonal matrix D = diag ( λ 1 , … , λ n ) {\displaystyle D=\operatorname {diag} (\lambda _{1},\ldots ,\lambda _{n})} and the Vandermonde matrix V corresponding to λ 1 , … , λ n ∈ L {\displaystyle \lambda _{1},\ldots ,\lambda _{n}\in L} . The explicit formula for the eigenvectors (the scaled column vectors of the inverse Vandermonde matrix V − 1 {\displaystyle V^{-1}} ) can be written as: w ~ i = β 0 i ⊗ 1 + β 1 i ⊗ λ + ⋯ + β ( n − 1 ) i ⊗ λ n − 1 = ∏ j ≠ i ( 1 ⊗ λ − λ j ⊗ 1 ) {\displaystyle {\tilde {w}}_{i}=\beta _{0i}{\otimes }1+\beta _{1i}{\otimes }\lambda +\cdots +\beta _{(n-1)i}{\otimes }\lambda ^{n-1}=\prod _{j\neq i}(1{\otimes }\lambda -\lambda _{j}{\otimes }1)} where β i j ∈ L {\displaystyle \beta _{ij}\in L} are the coefficients of the scaled Lagrange polynomial p ( x ) x − λ i = ∏ j ≠ i ( x − λ j ) = β 0 i + β 1 i x + ⋯ + β ( n − 1 ) i x n − 1 . {\displaystyle {\frac {p(x)}{x-\lambda _{i}}}=\prod _{j\neq i}(x-\lambda _{j})=\beta _{0i}+\beta _{1i}x+\cdots +\beta _{(n-1)i}x^{n-1}.}
Horn, Roger A.; Charles R. Johnson (1985). Matrix Analysis. Cambridge, UK: Cambridge University Press. pp. 146–147. ISBN 0-521-30586-1. Retrieved 2010-02-10. 0-521-30586-1 ↩
Turnbull, H. W.; Aitken, A. C. (1961). An Introduction to the Theory of Canonical Matrices. New York: Dover. p. 60. ISBN 978-0486441689. 978-0486441689 ↩