Allan Birnbaum. # A unified theory of estimation. 1. (Rev. & extended Feb. 1960) online

. **(page 1 of 5)**

Online Library → Allan Birnbaum → A unified theory of estimation. 1. (Rev. & extended Feb. 1960) → online text (page 1 of 5)

Font size

No. 196047

'^W YORK UNIVERSITY

^^^^^^P^^ OF MATHEMATICAL SCIENCES

ubrary

P5 Waverfy PtÂ«, t4e^ y^^ ^ ^ ^

MAY 2 n 19B0

NEW YORK UNIVERSITY

INSTITUTE OF

MATHEMATICAL SCIENCES

IMM-NYU 266

APRIL I960

A UNIFIED THEORY OF ESTIMATION. I

(Revised and extended, February 1960)

ALLAN BIRNBAUM

REPRODUCTION IN WHOLE OR IN PART

IS PERMITTED FOli ANY PL'UIX)SE

OE THE UNITED STATES GOVERNMENT.

PREPARED UNDER

CONTRACT NO. NONR-285(38)

WITH THE

OFFICE OF NAVAL RESEARCH

UNITED STATES NAVY

IMM-NYU 266

April I960

A UNIFIED THEORY OP ESTIMTION, I,

(Revised and extended, February I960)

Allan Blrnbaxora

This report represents results obtained at the Institute

of Mathematical Sciences, New York University, under the'

sponsorship of the Office of Naval Research, Contract No,

Nonr-285(38)â€ž Some sections include results previously

reported under the same title^) obtained at Columbia University

\inder the sponsorship of the Office of Naval Research,

Contract No, Nonr-266(33)

c Â»

i I t

Â» â€¢ t

Â«t Â» ^

Â» ' Â». â€¢

V i'-

0. I ntroduction and Sunrary . This jjaper extends and unifies some

previous formulations and theories of estimation for one-parameter

problems. The basic criterion used is admissibility of a point

estimator, defined with reference to its full distribution rather

than special loss functions such as squared error. Theoretical

methods of characterizing admissible estimators are given, and

practical comput':^ tional m.ethods for their use are illustrated in

a variety of examples.

Point, confidence limit, and confidence interval estimation are

included in a single theoretical formulation, and incorporated into

estimators of an "omnibus" form called "confidence curves," The

usefulness of the latter for some rpplicatlons as well as theoret-

ical purposes is Illustrated,

Wisher's maximum likelihood principle of estimation is general-

ized, given excct (non-asymptotic) justification, and unified with

the theory of tests and confidence regions of Keyman and Pearson.

Relations between exact and asymptotic results are discussed.

An application of the general theory gives optimal sequential

estimators having prescribed precision in a specified Interval,

Further developments, including multiparameter and nuisance para-

meter problems, problems of choice among admissible estimators,

formal and informal criteria for optimallty, and related problems

in the foundations of statistical Inference, will be presented sub-

sequently.

'1.1

1, A broad fornulati c n ci the problem of point est imstion . We con-

sider problems of estimation vjith reference to a specified experi-

ment E, leaving aside here questions of experimental design includ-

ing those of choice of a sample size or a sequential sampling rulej

some definite sampling rule, possibly sequential, is assumed speci-

fied as part of E. Let S =/x^ denote the s?mple spnce of possible

outcomes x of the experiment, Let f(x,0) denote one of the element-

ary probability functions on S which .-re specified as possibly true.

Let A = x^ denote the specified parameter space, i^'^or each in i ^

and for each subset of A of S, the probability that E yields an

outcome x in A is given by

'â€¢ X e A|Q I = { f (x,0) d^(x).

Prob

vjhere ti is a specified c"- finite measure on S, (Vie assume tacitly

here and belovj that consideration is appropriately restricted to

measurable sets and functions only.)

If Y = yC^) is any function defined on D-(e.g. y(^) = ^ cr

y(^) = ), with ranre ' , a point estimator of y is any measurable

function g = g(x) taking values in ['(or in T, its closure, if, for

example, ('is an open interval). The problem of choosing a good

estimator, that is an estimator which tends to take values close to

the true unknc^^Jn value of y, has been formulated mathematically in

various ways. Most formulations achieve mathematical definiteness

by introducing criteria of closeness which appear somewhat arbitrary

from some standpoints of application and undesirably schematic as

expressions of the intuitive notion of closeness.

If il is given no specific (parametric) structure, then the

latter features can be fully avoided only by a very broad formulation

3

which specifies only that ir y is true, then an exactly correct

estimate (g = y) is closer th::n any incorrect estimate (g ^ y) , If

iX is finite, -0-= ^i,'"% , snd y(") = ^, this leads to the

formulation of Lindley [1] in which estimators are compared only

on the basis of their error probabilities

p^^ = Prob [c'"'' (X) = 0. |0^ ] , i,j, = l,...k, i ^ j,

where o'"(x) is any estimator of 0. This formulation has no very

useful extension to typical estimation problems in which, fcr

example, n is an interval, and in which the event 0"(X) = exactly

has typically negligible probability and little interest.

The case in which H. is any set of real numbers, for example an

interval, and yC^) = ^, r^iay be terned the central problem of theory

of point-estimation, although very important generalizations of

this problem have been treated extensively. For this problem,

closeness of C"' to Q has been specified by the introduction of

specific loss functions: The absolute error criterion, |fi"-Ol,

was introduced by Laplace. Gauss replaced this by the squared -â€¢â€¢â€¢

error criterion (O'-G) which proved nathemo tically much more tract-

able and provided a definite formulation of the problem which seemed

equally reasonable. A generalized squared error criterion,

c(fi).(fl -fe) , where c(0) is any specif lee' positive function, is

used in some work in modern statistical decision theory. Such

criteria are sometimes used in conjunction with the requirement of

unbiasedness , E(Q"(X)|Q) = Q', this is done (evidently primarily to

facilitate mathematical developments) particularly in the theory

of linear estimation due to Gauss; this reduces the mean squared

k

error criterion to a criterion of variance: E[ (Q'-O) |0] E

'"'

Var(P |fi), (For a brief account of the history of the theory of

point estimation, cf, Neyrnan [2], pp. 9-lU â€¢ )

Each such definite specification of closeness can be criticiz-

ed as sonewhat arbitrary, except in a context where one postulates

the reality of the indicated costs of errors of each possible kind.

To avoid such features it is evidently necessary and sufficient to

adopt the following weak specification of closeness: If Q'^

For formal convenience, we also define a{0,^ ,Q" ) = 0.

When reference to a given estimator Q" is understood, we may write

simply a(u,P), a(0-,Q), or a(P+,Q). The functions a(0-,0) and

a(C+,C) of play a useful technical role, and will be called

respectively the l owe r and upper location functions of O",

In many problems, estimators for which Prob [o"(X) = o|oj>

for some are found not useful. The remaining estimators have

continuous c.d.f's,, and have a(0-,fi) e l-a(Q+,0). No two such

estimators, having different location functions, can be comparable J

for a(Q-,0,Q""') < a ( P- , , 0"''""' ) is equivalent to a (0+,O,o'"") > a(P+,P,P''

this shows that neither ebtirr^-tor is at least as good as the other.

The broad and "weak" definition of admissibility adopted here

leads to very large admissible classes in typical problems, Hovjever

it does not seem unreasonable to conceive of the problem of point

estimation as one in which the investigator chooses an estimator on

the basis of consideration of the risk curves of all estimators in

some essentially complete class. In principle this consideration

should be complete, but of course the practical counterpart of this

can be at most a more or less extensive f ai/iliarity with an essen-

tially complete class, developed by study of the risk-curves of a

variety of specific estimators, possibly strengthened by some

general theoretical considerations (Including envelope risk-curves,

discussed below)jand perhaps also by reference to one or several loss

8

functions and criteria of optinality which may seem more or less

appropriate in specific applications. Such an approach is not so

difficult to carry out as might be anticipated, as vjill be illus-

trated. Of course difficulties of coiiiput^.tion or complexity may

sometimes dictate that an inadmlssable estimator must be adopted;

even in such cases, the most general basis on which any particular

estimator might be justified as not too inefficient, is evidently

the comparison of its risk-curves with those of other estimators,

especially admissible ones.

Example . Let X be normally distributed with unknovjn mean C

r

and variance 1, i^Jithil= \^\ -co < C I'.'

3i^

Interval estimator, this is taken as evidence for the conclusion

that the true unknown value of the parameter C lies in the closed

interval [Q\Q'' ],

The probability properties of any interval estimator J may be

described in the following terms: It is natural to call a{0-,0,0")

the lower location function of J (as vjell as of 0"), and to denote

it when convenient by a(Q-,Q,J)j similarly a(0+,Â©,J) s a(0+,0,0Â»)

is the upper location function of J, As with point estimators,

these functions give respectively the probabilities of under-

estimation and of cverestimation vjhen a given interval estimator J

is used. For exam,ple, it is natural, to call J a med ian-unbias ed

interval estimator if for each Â© we have equal probabilities of

cverestimation and underestirra tion: a(0-,0,J) = a(0+,0,J). This

usage is compatible with the definition of a median-unbiased point

estimator.

A quantity of primary interest is the probability that the

conclusion indicgted by any interval estimator J ("C lies in

[Ot^p"]") will be incorrect, for each possible true value Q, This

probability is just the sum of the locstion functions of J:

Prob [o not covered by J(X)|o]= Prob [o" (X) < Q|o}

+ Prob {o(X) > e|0 } = a(0-,e,J) + a(0+,Â©,J).

If this probability equals a for each 0, then J is a (l-ci) confi-

dence interval; if in addition J is median-unbiased, then 0' and

P" are (l-'^a) confidence limits. As with point aid confidence limit

estimators, it is of interest in general to consider the probabili-

ties of errors of under-estimation and of over-estimation of various

magnitudes in interval estimation; we denote these probabilities by

â– lOl "i

15

a(u,Â©,J) = ra(u,e,Oi) for each u > 0,

la(u,e,C") for each u < Q,

In a formal sense, a point estimator may be regarded as an

intervol estimator J = (Q' , 'P" ) having the specie! form: 0' (x) =

'^W YORK UNIVERSITY

^^^^^^P^^ OF MATHEMATICAL SCIENCES

ubrary

P5 Waverfy PtÂ«, t4e^ y^^ ^ ^ ^

MAY 2 n 19B0

NEW YORK UNIVERSITY

INSTITUTE OF

MATHEMATICAL SCIENCES

IMM-NYU 266

APRIL I960

A UNIFIED THEORY OF ESTIMATION. I

(Revised and extended, February 1960)

ALLAN BIRNBAUM

REPRODUCTION IN WHOLE OR IN PART

IS PERMITTED FOli ANY PL'UIX)SE

OE THE UNITED STATES GOVERNMENT.

PREPARED UNDER

CONTRACT NO. NONR-285(38)

WITH THE

OFFICE OF NAVAL RESEARCH

UNITED STATES NAVY

IMM-NYU 266

April I960

A UNIFIED THEORY OP ESTIMTION, I,

(Revised and extended, February I960)

Allan Blrnbaxora

This report represents results obtained at the Institute

of Mathematical Sciences, New York University, under the'

sponsorship of the Office of Naval Research, Contract No,

Nonr-285(38)â€ž Some sections include results previously

reported under the same title^) obtained at Columbia University

\inder the sponsorship of the Office of Naval Research,

Contract No, Nonr-266(33)

c Â»

i I t

Â» â€¢ t

Â«t Â» ^

Â» ' Â». â€¢

V i'-

0. I ntroduction and Sunrary . This jjaper extends and unifies some

previous formulations and theories of estimation for one-parameter

problems. The basic criterion used is admissibility of a point

estimator, defined with reference to its full distribution rather

than special loss functions such as squared error. Theoretical

methods of characterizing admissible estimators are given, and

practical comput':^ tional m.ethods for their use are illustrated in

a variety of examples.

Point, confidence limit, and confidence interval estimation are

included in a single theoretical formulation, and incorporated into

estimators of an "omnibus" form called "confidence curves," The

usefulness of the latter for some rpplicatlons as well as theoret-

ical purposes is Illustrated,

Wisher's maximum likelihood principle of estimation is general-

ized, given excct (non-asymptotic) justification, and unified with

the theory of tests and confidence regions of Keyman and Pearson.

Relations between exact and asymptotic results are discussed.

An application of the general theory gives optimal sequential

estimators having prescribed precision in a specified Interval,

Further developments, including multiparameter and nuisance para-

meter problems, problems of choice among admissible estimators,

formal and informal criteria for optimallty, and related problems

in the foundations of statistical Inference, will be presented sub-

sequently.

'1.1

1, A broad fornulati c n ci the problem of point est imstion . We con-

sider problems of estimation vjith reference to a specified experi-

ment E, leaving aside here questions of experimental design includ-

ing those of choice of a sample size or a sequential sampling rulej

some definite sampling rule, possibly sequential, is assumed speci-

fied as part of E. Let S =/x^ denote the s?mple spnce of possible

outcomes x of the experiment, Let f(x,0) denote one of the element-

ary probability functions on S which .-re specified as possibly true.

Let A = x^ denote the specified parameter space, i^'^or each in i ^

and for each subset of A of S, the probability that E yields an

outcome x in A is given by

'â€¢ X e A|Q I = { f (x,0) d^(x).

Prob

vjhere ti is a specified c"- finite measure on S, (Vie assume tacitly

here and belovj that consideration is appropriately restricted to

measurable sets and functions only.)

If Y = yC^) is any function defined on D-(e.g. y(^) = ^ cr

y(^) = ), with ranre ' , a point estimator of y is any measurable

function g = g(x) taking values in ['(or in T, its closure, if, for

example, ('is an open interval). The problem of choosing a good

estimator, that is an estimator which tends to take values close to

the true unknc^^Jn value of y, has been formulated mathematically in

various ways. Most formulations achieve mathematical definiteness

by introducing criteria of closeness which appear somewhat arbitrary

from some standpoints of application and undesirably schematic as

expressions of the intuitive notion of closeness.

If il is given no specific (parametric) structure, then the

latter features can be fully avoided only by a very broad formulation

3

which specifies only that ir y is true, then an exactly correct

estimate (g = y) is closer th::n any incorrect estimate (g ^ y) , If

iX is finite, -0-= ^i,'"% , snd y(") = ^, this leads to the

formulation of Lindley [1] in which estimators are compared only

on the basis of their error probabilities

p^^ = Prob [c'"'' (X) = 0. |0^ ] , i,j, = l,...k, i ^ j,

where o'"(x) is any estimator of 0. This formulation has no very

useful extension to typical estimation problems in which, fcr

example, n is an interval, and in which the event 0"(X) = exactly

has typically negligible probability and little interest.

The case in which H. is any set of real numbers, for example an

interval, and yC^) = ^, r^iay be terned the central problem of theory

of point-estimation, although very important generalizations of

this problem have been treated extensively. For this problem,

closeness of C"' to Q has been specified by the introduction of

specific loss functions: The absolute error criterion, |fi"-Ol,

was introduced by Laplace. Gauss replaced this by the squared -â€¢â€¢â€¢

error criterion (O'-G) which proved nathemo tically much more tract-

able and provided a definite formulation of the problem which seemed

equally reasonable. A generalized squared error criterion,

c(fi).(fl -fe) , where c(0) is any specif lee' positive function, is

used in some work in modern statistical decision theory. Such

criteria are sometimes used in conjunction with the requirement of

unbiasedness , E(Q"(X)|Q) = Q', this is done (evidently primarily to

facilitate mathematical developments) particularly in the theory

of linear estimation due to Gauss; this reduces the mean squared

k

error criterion to a criterion of variance: E[ (Q'-O) |0] E

'"'

Var(P |fi), (For a brief account of the history of the theory of

point estimation, cf, Neyrnan [2], pp. 9-lU â€¢ )

Each such definite specification of closeness can be criticiz-

ed as sonewhat arbitrary, except in a context where one postulates

the reality of the indicated costs of errors of each possible kind.

To avoid such features it is evidently necessary and sufficient to

adopt the following weak specification of closeness: If Q'^

For formal convenience, we also define a{0,^ ,Q" ) = 0.

When reference to a given estimator Q" is understood, we may write

simply a(u,P), a(0-,Q), or a(P+,Q). The functions a(0-,0) and

a(C+,C) of play a useful technical role, and will be called

respectively the l owe r and upper location functions of O",

In many problems, estimators for which Prob [o"(X) = o|oj>

for some are found not useful. The remaining estimators have

continuous c.d.f's,, and have a(0-,fi) e l-a(Q+,0). No two such

estimators, having different location functions, can be comparable J

for a(Q-,0,Q""') < a ( P- , , 0"''""' ) is equivalent to a (0+,O,o'"") > a(P+,P,P''

this shows that neither ebtirr^-tor is at least as good as the other.

The broad and "weak" definition of admissibility adopted here

leads to very large admissible classes in typical problems, Hovjever

it does not seem unreasonable to conceive of the problem of point

estimation as one in which the investigator chooses an estimator on

the basis of consideration of the risk curves of all estimators in

some essentially complete class. In principle this consideration

should be complete, but of course the practical counterpart of this

can be at most a more or less extensive f ai/iliarity with an essen-

tially complete class, developed by study of the risk-curves of a

variety of specific estimators, possibly strengthened by some

general theoretical considerations (Including envelope risk-curves,

discussed below)jand perhaps also by reference to one or several loss

8

functions and criteria of optinality which may seem more or less

appropriate in specific applications. Such an approach is not so

difficult to carry out as might be anticipated, as vjill be illus-

trated. Of course difficulties of coiiiput^.tion or complexity may

sometimes dictate that an inadmlssable estimator must be adopted;

even in such cases, the most general basis on which any particular

estimator might be justified as not too inefficient, is evidently

the comparison of its risk-curves with those of other estimators,

especially admissible ones.

Example . Let X be normally distributed with unknovjn mean C

r

and variance 1, i^Jithil= \^\ -co < C I'.'

3i^

Interval estimator, this is taken as evidence for the conclusion

that the true unknown value of the parameter C lies in the closed

interval [Q\Q'' ],

The probability properties of any interval estimator J may be

described in the following terms: It is natural to call a{0-,0,0")

the lower location function of J (as vjell as of 0"), and to denote

it when convenient by a(Q-,Q,J)j similarly a(0+,Â©,J) s a(0+,0,0Â»)

is the upper location function of J, As with point estimators,

these functions give respectively the probabilities of under-

estimation and of cverestimation vjhen a given interval estimator J

is used. For exam,ple, it is natural, to call J a med ian-unbias ed

interval estimator if for each Â© we have equal probabilities of

cverestimation and underestirra tion: a(0-,0,J) = a(0+,0,J). This

usage is compatible with the definition of a median-unbiased point

estimator.

A quantity of primary interest is the probability that the

conclusion indicgted by any interval estimator J ("C lies in

[Ot^p"]") will be incorrect, for each possible true value Q, This

probability is just the sum of the locstion functions of J:

Prob [o not covered by J(X)|o]= Prob [o" (X) < Q|o}

+ Prob {o(X) > e|0 } = a(0-,e,J) + a(0+,Â©,J).

If this probability equals a for each 0, then J is a (l-ci) confi-

dence interval; if in addition J is median-unbiased, then 0' and

P" are (l-'^a) confidence limits. As with point aid confidence limit

estimators, it is of interest in general to consider the probabili-

ties of errors of under-estimation and of over-estimation of various

magnitudes in interval estimation; we denote these probabilities by

â– lOl "i

15

a(u,Â©,J) = ra(u,e,Oi) for each u > 0,

la(u,e,C") for each u < Q,

In a formal sense, a point estimator may be regarded as an

intervol estimator J = (Q' , 'P" ) having the specie! form: 0' (x) =

Online Library → Allan Birnbaum → A unified theory of estimation. 1. (Rev. & extended Feb. 1960) → online text (page 1 of 5)