En estadística , la cota de Cramér-Rao (abreviada CRB por sus siglas del inglés ) o cota inferior de Cramér-Rao (CRLB), llamada así en honor a Harald Cramér y Calyampudi Radhakrishna Rao , expresa una cota inferior para la varianza de un estimador insesgado , basado en la información de Fisher .
Ilustración del límite de Cramer-Rao: no existe un estimador imparcial que sea capaz de estimar el parámetro (bidimensional) con menos varianza que el límite de Cramer-Rao, ilustrado como elipse de desviación estándar.
Establece que la inversa multiplicativa de la información de Fisher de un parámetro
θ
{\displaystyle \theta }
,
I
(
θ
)
{\displaystyle {\mathcal {I}}(\theta )}
, es una cota inferior para la varianza de un estimador insesgado del parámetro (denotado mediante
θ
^
{\displaystyle {\widehat {\theta }}}
). Nótese que
f
{\displaystyle f}
es la función de verosimilitud .
v
a
r
(
θ
^
)
≥
1
I
(
θ
)
=
1
E
[
[
∂
∂
θ
ln
f
(
X
;
θ
)
]
2
]
{\displaystyle \mathrm {var} \left({\widehat {\theta }}\right)\geq {\frac {1}{{\mathcal {I}}(\theta )}}={\frac {1}{\mathrm {E} \left[\left[{\frac {\partial }{\partial \theta }}\ln f(X;\theta )\right]^{2}\right]}}}
En algunos casos, no existe un estimador insesgado que alcance la cota inferior.
A esta cota se la conoce también como la desigualdad de Cramér-Rao o como la desigualdad de información .
Condiciones de regularidad
editar
Extendiendo la cota de Cramér-Rao para múltiples parámetros, defínase el vector columna de parámetros
θ
=
[
θ
1
,
θ
2
,
…
,
θ
d
]
T
∈
R
d
{\displaystyle {\boldsymbol {\theta }}=\left[\theta _{1},\theta _{2},\dots ,\theta _{d}\right]^{T}\in \mathbb {R} ^{d}}
con función de densidad de probabilidad
f
(
x
;
θ
)
{\displaystyle f(x;{\boldsymbol {\theta }})}
que satisface las dos condiciones de regularidad definidas anteriormente.
La matriz de información de Fisher es una matriz de dimensión
d
×
d
{\displaystyle d\times d}
con elementos
I
m
,
k
{\displaystyle {\mathcal {I}}_{m,k}}
definidos según
I
m
,
k
=
E
[
d
d
θ
m
log
f
(
x
;
θ
)
d
d
θ
k
log
f
(
x
;
θ
)
]
{\displaystyle {\mathcal {I}}_{m,k}=\mathrm {E} \left[{\frac {d}{d\theta _{m}}}\log f\left(x;{\boldsymbol {\theta }}\right){\frac {d}{d\theta _{k}}}\log f\left(x;{\boldsymbol {\theta }}\right)\right]}
entonces, la cota de Cramér-Rao es
c
o
v
θ
(
T
(
X
)
)
≥
∂
ψ
(
θ
)
∂
θ
T
I
(
θ
)
−
1
∂
ψ
(
θ
)
T
∂
θ
{\displaystyle \mathrm {cov} _{\boldsymbol {\theta }}\left({\boldsymbol {T}}(X)\right)\geq {\frac {\partial {\boldsymbol {\psi }}\left({\boldsymbol {\theta }}\right)}{\partial {\boldsymbol {\theta }}^{T}}}{\mathcal {I}}\left({\boldsymbol {\theta }}\right)^{-1}{\frac {\partial {\boldsymbol {\psi }}\left({\boldsymbol {\theta }}\right)^{T}}{\partial {\boldsymbol {\theta }}}}}
donde
T
(
X
)
=
[
T
1
(
X
)
T
2
(
X
)
⋯
T
d
(
X
)
]
T
{\displaystyle {\boldsymbol {T}}(X)={\begin{bmatrix}T_{1}(X)&T_{2}(X)&\cdots &T_{d}(X)\end{bmatrix}}^{T}}
ψ
=
E
[
T
(
X
)
]
=
[
ψ
1
(
θ
)
ψ
2
(
θ
)
⋯
ψ
d
(
θ
)
]
T
{\displaystyle {\boldsymbol {\psi }}=\mathrm {E} \left[{\boldsymbol {T}}(X)\right]={\begin{bmatrix}\psi _{1}\left({\boldsymbol {\theta }}\right)&\psi _{2}\left({\boldsymbol {\theta }}\right)&\cdots &\psi _{d}\left({\boldsymbol {\theta }}\right)\end{bmatrix}}^{T}}
∂
ψ
(
θ
)
∂
θ
T
=
[
ψ
1
(
θ
)
ψ
2
(
θ
)
⋮
ψ
d
(
θ
)
]
[
∂
∂
θ
1
∂
∂
θ
2
⋯
∂
∂
θ
d
]
=
[
∂
ψ
1
(
θ
)
∂
θ
1
∂
ψ
1
(
θ
)
∂
θ
2
⋯
∂
ψ
1
(
θ
)
∂
θ
d
∂
ψ
2
(
θ
)
∂
θ
1
∂
ψ
2
(
θ
)
∂
θ
2
⋯
∂
ψ
2
(
θ
)
∂
θ
d
⋮
⋮
⋱
⋮
∂
ψ
d
(
θ
)
∂
θ
1
∂
ψ
d
(
θ
)
∂
θ
2
⋯
∂
ψ
d
(
θ
)
∂
θ
d
]
{\displaystyle {\frac {\partial {\boldsymbol {\psi }}\left({\boldsymbol {\theta }}\right)}{\partial {\boldsymbol {\theta }}^{T}}}={\begin{bmatrix}\psi _{1}\left({\boldsymbol {\theta }}\right)\\\psi _{2}\left({\boldsymbol {\theta }}\right)\\\vdots \\\\\psi _{d}\left({\boldsymbol {\theta }}\right)\end{bmatrix}}{\begin{bmatrix}{\frac {\partial }{\partial \theta _{1}}}&{\frac {\partial }{\partial \theta _{2}}}&\cdots &{\frac {\partial }{\partial \theta _{d}}}\end{bmatrix}}={\begin{bmatrix}{\frac {\partial \psi _{1}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{1}}}&{\frac {\partial \psi _{1}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{2}}}&\cdots &{\frac {\partial \psi _{1}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{d}}}\\\\{\frac {\partial \psi _{2}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{1}}}&{\frac {\partial \psi _{2}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{2}}}&\cdots &{\frac {\partial \psi _{2}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{d}}}\\\\\vdots &\vdots &\ddots &\vdots \\\\{\frac {\partial \psi _{d}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{1}}}&{\frac {\partial \psi _{d}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{2}}}&\cdots &{\frac {\partial \psi _{d}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{d}}}\end{bmatrix}}}
∂
ψ
(
θ
)
T
∂
θ
=
[
∂
∂
θ
1
∂
∂
θ
2
⋮
∂
∂
θ
d
]
[
ψ
1
(
θ
)
ψ
2
(
θ
)
⋯
ψ
d
(
θ
)
]
=
[
∂
ψ
1
(
θ
)
∂
θ
1
∂
ψ
2
(
θ
)
∂
θ
1
⋯
∂
ψ
d
(
θ
)
∂
θ
1
∂
ψ
1
(
θ
)
∂
θ
2
∂
ψ
2
(
θ
)
∂
θ
2
⋯
∂
ψ
d
(
θ
)
∂
θ
2
⋮
⋮
⋱
⋮
∂
ψ
1
(
θ
)
∂
θ
d
∂
ψ
2
(
θ
)
∂
θ
d
⋯
∂
ψ
d
(
θ
)
∂
θ
d
]
{\displaystyle {\frac {\partial {\boldsymbol {\psi }}\left({\boldsymbol {\theta }}\right)^{T}}{\partial {\boldsymbol {\theta }}}}={\begin{bmatrix}{\frac {\partial }{\partial \theta _{1}}}\\{\frac {\partial }{\partial \theta _{2}}}\\\vdots \\{\frac {\partial }{\partial \theta _{d}}}\end{bmatrix}}{\begin{bmatrix}\psi _{1}\left({\boldsymbol {\theta }}\right)&\psi _{2}\left({\boldsymbol {\theta }}\right)&\cdots &\psi _{d}\left({\boldsymbol {\theta }}\right)\end{bmatrix}}={\begin{bmatrix}{\frac {\partial \psi _{1}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{1}}}&{\frac {\partial \psi _{2}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{1}}}&\cdots &{\frac {\partial \psi _{d}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{1}}}\\\\{\frac {\partial \psi _{1}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{2}}}&{\frac {\partial \psi _{2}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{2}}}&\cdots &{\frac {\partial \psi _{d}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{2}}}\\\\\vdots &\vdots &\ddots &\vdots \\\\{\frac {\partial \psi _{1}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{d}}}&{\frac {\partial \psi _{2}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{d}}}&\cdots &{\frac {\partial \psi _{d}\left({\boldsymbol {\theta }}\right)}{\partial \theta _{d}}}\end{bmatrix}}}
Y
c
o
v
θ
(
T
(
X
)
)
{\displaystyle \mathrm {cov} _{\boldsymbol {\theta }}\left({\boldsymbol {T}}(X)\right)}
es una matriz semi-definida positiva , es decir
x
T
c
o
v
θ
(
T
(
X
)
)
x
≥
0
∀
x
∈
R
d
{\displaystyle x^{T}\mathrm {cov} _{\boldsymbol {\theta }}\left({\boldsymbol {T}}(X)\right)x\geq 0\quad \forall x\in \mathbb {R} ^{d}}
Si
T
(
X
)
=
[
T
1
(
X
)
T
2
(
X
)
⋯
T
d
(
X
)
]
T
{\displaystyle {\boldsymbol {T}}(X)={\begin{bmatrix}T_{1}(X)&T_{2}(X)&\cdots &T_{d}(X)\end{bmatrix}}^{T}}
es un estimador insesgado (es decir,
ψ
(
θ
)
=
θ
{\displaystyle {\boldsymbol {\psi }}\left({\boldsymbol {\theta }}\right)={\boldsymbol {\theta }}}
) entonces la cota de Cramér-Rao es
c
o
v
θ
(
T
(
X
)
)
≥
I
(
θ
)
−
1
{\displaystyle \mathrm {cov} _{\boldsymbol {\theta }}\left({\boldsymbol {T}}(X)\right)\geq {\mathcal {I}}\left({\boldsymbol {\theta }}\right)^{-1}}
Distribución normal multivariada
editar
Para el caso de una distribución normal multivariada de dimensión d
x
∼
N
d
(
μ
(
θ
)
,
C
(
θ
)
)
{\displaystyle {\boldsymbol {x}}\sim N_{d}\left({\boldsymbol {\mu }}\left({\boldsymbol {\theta }}\right),C\left({\boldsymbol {\theta }}\right)\right)}
con función de densidad de probabilidad
f
(
x
;
θ
)
=
1
(
2
π
)
d
|
C
|
exp
(
−
1
2
(
x
−
μ
)
T
C
−
1
(
x
−
μ
)
)
,
{\displaystyle f\left({\boldsymbol {x}};{\boldsymbol {\theta }}\right)={\frac {1}{\sqrt {(2\pi )^{d}\left|C\right|}}}\exp \left(-{\frac {1}{2}}\left({\boldsymbol {x}}-{\boldsymbol {\mu }}\right)^{T}C^{-1}\left({\boldsymbol {x}}-{\boldsymbol {\mu }}\right)\right),}
la matriz de información de Fisher tiene entradas
I
m
,
k
=
∂
μ
T
∂
θ
m
C
−
1
∂
μ
∂
θ
k
+
1
2
t
r
(
C
−
1
∂
C
∂
θ
m
C
−
1
∂
C
∂
θ
k
)
{\displaystyle {\mathcal {I}}_{m,k}={\frac {\partial {\boldsymbol {\mu }}^{T}}{\partial \theta _{m}}}C^{-1}{\frac {\partial {\boldsymbol {\mu }}}{\partial \theta _{k}}}+{\frac {1}{2}}\mathrm {tr} \left(C^{-1}{\frac {\partial C}{\partial \theta _{m}}}C^{-1}{\frac {\partial C}{\partial \theta _{k}}}\right)}
donde
t
r
{\displaystyle tr}
es la traza de una matriz .
En particular, si
w
[
n
]
{\displaystyle w[n]}
es ruido blanco gaussiano (una muestra de
N
{\displaystyle N}
observaciones independientes) con varianza conocida
σ
2
{\displaystyle \sigma ^{2}}
, es decir,
w
[
n
]
∼
N
N
(
μ
(
θ
)
,
σ
2
I
)
,
{\displaystyle w[n]\sim \mathbb {N} _{N}\left({\boldsymbol {\mu }}(\theta ),\sigma ^{2}{\mathcal {I}}\right),}
y
θ
{\displaystyle \theta }
es un escalar, entonces la matriz de información de Fisher es de dimensión 1 × 1
I
(
θ
)
=
(
∂
μ
(
θ
)
∂
θ
m
)
T
C
−
1
(
∂
μ
(
θ
)
∂
θ
k
)
=
∑
i
=
0
N
1
σ
2
=
N
σ
2
,
{\displaystyle {\mathcal {I}}(\theta )=\left({\frac {\partial {\boldsymbol {\mu }}(\theta )}{\partial \theta _{m}}}\right)^{T}C^{-1}\left({\frac {\partial {\boldsymbol {\mu }}(\theta )}{\partial \theta _{k}}}\right)=\sum _{i=0}^{N}{\frac {1}{\sigma ^{2}}}={\frac {N}{\sigma ^{2}}},}
y por lo tanto la cota de Cramér-Rao es
v
a
r
(
θ
)
≥
σ
2
N
.
{\displaystyle \mathrm {var} \left(\theta \right)\geq {\frac {\sigma ^{2}}{N}}.}