In statistical inference, there are several approaches to estimation theory that can be used to decide immediately what estimators should be used according to those approaches. For example, ideas from Bayesian inference would lead directly to Bayesian estimators. Similarly, the theory of classical statistical inference can sometimes lead to strong conclusions about what estimator should be used. However, the usefulness of these theories depends on having a fully prescribed statistical model and may also depend on having a relevant loss function to determine the estimator. Thus a Bayesian analysis might be undertaken, leading to a posterior distribution for relevant parameters, but the use of a specific utility or loss function may be unclear. Ideas of invariance can then be applied to the task of summarising the posterior distribution. In other cases, statistical analyses are undertaken without a fully defined statistical model or the classical theory of statistical inference cannot be readily applied because the family of models being considered are not amenable to such treatment. In addition to these cases where general theory does not prescribe an estimator, the concept of invariance of an estimator can be applied when seeking estimators of alternative forms, either for the sake of simplicity of application of the estimator or so that the estimator is robust.
The concept of invariance is sometimes used on its own as a way of choosing between estimators, but this is not necessarily definitive. For example, a requirement of invariance may be incompatible with the requirement that the estimator be mean-unbiased; on the other hand, the criterion of median-unbiasedness is defined in terms of the estimator's sampling distribution and so is invariant under many transformations.
One use of the concept of invariance is where a class or family of estimators is proposed and a particular formulation must be selected amongst these. One procedure is to impose relevant invariance properties and then to find the formulation within this class that has the best properties, leading to what is called the optimal invariant estimator.
There are several types of transformations that are usefully considered when dealing with invariant estimators. Each gives rise to a class of estimators which are invariant to those particular types of transformation.
The combination of permutation invariance and location invariance for estimating a location parameter from an independent and identically distributed dataset using a weighted average implies that the weights should be identical and sum to one. Of course, estimators other than a weighted average may be preferable.
Under this setting, we are given a set of measurements x {\displaystyle x} which contains information about an unknown parameter θ {\displaystyle \theta } . The measurements x {\displaystyle x} are modelled as a vector random variable having a probability density function f ( x | θ ) {\displaystyle f(x|\theta )} which depends on a parameter vector θ {\displaystyle \theta } .
The problem is to estimate θ {\displaystyle \theta } given x {\displaystyle x} . The estimate, denoted by a {\displaystyle a} , is a function of the measurements and belongs to a set A {\displaystyle A} . The quality of the result is defined by a loss function L = L ( a , θ ) {\displaystyle L=L(a,\theta )} which determines a risk function R = R ( a , θ ) = E [ L ( a , θ ) | θ ] {\displaystyle R=R(a,\theta )=E[L(a,\theta )|\theta ]} . The sets of possible values of x {\displaystyle x} , θ {\displaystyle \theta } , and a {\displaystyle a} are denoted by X {\displaystyle X} , Θ {\displaystyle \Theta } , and A {\displaystyle A} , respectively.
In statistical classification, the rule which assigns a class to a new data-item can be considered to be a special type of estimator. A number of invariance-type considerations can be brought to bear in formulating prior knowledge for pattern recognition.
An invariant estimator is an estimator which obeys the following two rules:
To define an invariant or equivariant estimator formally, some definitions related to groups of transformations are needed first. Let X {\displaystyle X} denote the set of possible data-samples. A group of transformations of X {\displaystyle X} , to be denoted by G {\displaystyle G} , is a set of (measurable) 1:1 and onto transformations of X {\displaystyle X} into itself, which satisfies the following conditions:
Datasets x 1 {\displaystyle x_{1}} and x 2 {\displaystyle x_{2}} in X {\displaystyle X} are equivalent if x 1 = g ( x 2 ) {\displaystyle x_{1}=g(x_{2})} for some g ∈ G {\displaystyle g\in G} . All the equivalent points form an equivalence class. Such an equivalence class is called an orbit (in X {\displaystyle X} ). The x 0 {\displaystyle x_{0}} orbit, X ( x 0 ) {\displaystyle X(x_{0})} , is the set X ( x 0 ) = { g ( x 0 ) : g ∈ G } {\displaystyle X(x_{0})=\{g(x_{0}):g\in G\}} . If X {\displaystyle X} consists of a single orbit then g {\displaystyle g} is said to be transitive.
A family of densities F {\displaystyle F} is said to be invariant under the group G {\displaystyle G} if, for every g ∈ G {\displaystyle g\in G} and θ ∈ Θ {\displaystyle \theta \in \Theta } there exists a unique θ ∗ ∈ Θ {\displaystyle \theta ^{*}\in \Theta } such that Y = g ( x ) {\displaystyle Y=g(x)} has density f ( y | θ ∗ ) {\displaystyle f(y|\theta ^{*})} . θ ∗ {\displaystyle \theta ^{*}} will be denoted g ¯ ( θ ) {\displaystyle {\bar {g}}(\theta )} .
If F {\displaystyle F} is invariant under the group G {\displaystyle G} then the loss function L ( θ , a ) {\displaystyle L(\theta ,a)} is said to be invariant under G {\displaystyle G} if for every g ∈ G {\displaystyle g\in G} and a ∈ A {\displaystyle a\in A} there exists an a ∗ ∈ A {\displaystyle a^{*}\in A} such that L ( θ , a ) = L ( g ¯ ( θ ) , a ∗ ) {\displaystyle L(\theta ,a)=L({\bar {g}}(\theta ),a^{*})} for all θ ∈ Θ {\displaystyle \theta \in \Theta } . The transformed value a ∗ {\displaystyle a^{*}} will be denoted by g ~ ( a ) {\displaystyle {\tilde {g}}(a)} .
In the above, G ¯ = { g ¯ : g ∈ G } {\displaystyle {\bar {G}}=\{{\bar {g}}:g\in G\}} is a group of transformations from Θ {\displaystyle \Theta } to itself and G ~ = { g ~ : g ∈ G } {\displaystyle {\tilde {G}}=\{{\tilde {g}}:g\in G\}} is a group of transformations from A {\displaystyle A} to itself.
An estimation problem is invariant(equivariant) under G {\displaystyle G} if there exist three groups G , G ¯ , G ~ {\displaystyle G,{\bar {G}},{\tilde {G}}} as defined above.
For an estimation problem that is invariant under G {\displaystyle G} , estimator δ ( x ) {\displaystyle \delta (x)} is an invariant estimator under G {\displaystyle G} if, for all x ∈ X {\displaystyle x\in X} and g ∈ G {\displaystyle g\in G} ,
For a given problem, the invariant estimator with the lowest risk is termed the "best invariant estimator". Best invariant estimator cannot always be achieved. A special case for which it can be achieved is the case when g ¯ {\displaystyle {\bar {g}}} is transitive.
Suppose θ {\displaystyle \theta } is a location parameter if the density of X {\displaystyle X} is of the form f ( x − θ ) {\displaystyle f(x-\theta )} . For Θ = A = R 1 {\displaystyle \Theta =A=\mathbb {R} ^{1}} and L = L ( a − θ ) {\displaystyle L=L(a-\theta )} , the problem is invariant under g = g ¯ = g ~ = { g c : g c ( x ) = x + c , c ∈ R } {\displaystyle g={\bar {g}}={\tilde {g}}=\{g_{c}:g_{c}(x)=x+c,c\in \mathbb {R} \}} . The invariant estimator in this case must satisfy
thus it is of the form δ ( x ) = x + K {\displaystyle \delta (x)=x+K} ( K ∈ R {\displaystyle K\in \mathbb {R} } ). g ¯ {\displaystyle {\bar {g}}} is transitive on Θ {\displaystyle \Theta } so the risk does not vary with θ {\displaystyle \theta } : that is, R ( θ , δ ) = R ( 0 , δ ) = E [ L ( X + K ) | θ = 0 ] {\displaystyle R(\theta ,\delta )=R(0,\delta )=\operatorname {E} [L(X+K)|\theta =0]} . The best invariant estimator is the one that brings the risk R ( θ , δ ) {\displaystyle R(\theta ,\delta )} to minimum.
In the case that L is the squared error δ ( x ) = x − E [ X | θ = 0 ] . {\displaystyle \delta (x)=x-\operatorname {E} [X|\theta =0].}
The estimation problem is that X = ( X 1 , … , X n ) {\displaystyle X=(X_{1},\dots ,X_{n})} has density f ( x 1 − θ , … , x n − θ ) {\displaystyle f(x_{1}-\theta ,\dots ,x_{n}-\theta )} , where θ is a parameter to be estimated, and where the loss function is L ( | a − θ | ) {\displaystyle L(|a-\theta |)} . This problem is invariant with the following (additive) transformation groups:
The best invariant estimator δ ( x ) {\displaystyle \delta (x)} is the one that minimizes
and this is Pitman's estimator (1939).
For the squared error loss case, the result is
If x ∼ N ( θ 1 n , I ) {\displaystyle x\sim N(\theta 1_{n},I)\,\!} (i.e. a multivariate normal distribution with independent, unit-variance components) then
If x ∼ C ( θ 1 n , I σ 2 ) {\displaystyle x\sim C(\theta 1_{n},I\sigma ^{2})\,\!} (independent components having a Cauchy distribution with scale parameter σ) then δ Pitman ≠ δ M L {\displaystyle \delta _{\text{Pitman}}\neq \delta _{ML}} ,. However the result is
with
see section 5.2.1 in Gourieroux, C. and Monfort, A. (1995). Statistics and econometric models, volume 1. Cambridge University Press. ↩
Gouriéroux and Monfort (1995) ↩