stata的权重

来源:百度文库 编辑:神马文学网 时间:2024/05/05 06:00:21
weight -- Weights
Remarks    Most Stata commands can deal with weighted data.  Stata allows four kinds
    of weights:    1.  fweights, or frequency weights, are weights that indicate the number
        of duplicated observations.    2.  pweights, or sampling weights, are weights that denote the inverse of
        the probability that the observation is included because of the
        sampling design.    3.  aweights, or analytic weights, are weights that are inversely
        proportional to the variance of an observation; i.e., the variance of
        the jth observation is assumed to be sigma^2/w_j, where w_j are the
        weights.  Typically, the observations represent averages and the
        weights are the number of elements that gave rise to the average.
        For most Stata commands, the recorded scale of aweights is
        irrelevant; Stata internally rescales them to sum to N, the number of
        observations in your data, when it uses them.    4.  iweights, or importance weights, are weights that indicate the
        "importance" of the observation in some vague sense.  iweights have
        no formal statistical definition; any command that supports iweights
        will define exactly how they are treated.  Usually, they are intended
        for use by programmers who want to produce a certain computation.    The general syntax is            command ... [weightword=exp] ...    For example:        . anova y x1 x2 x1*x2 [fweight=pop]        . regress avgy avgx1 avgx2 [aweight=cellpop]        . regress y x1 x2 x3 [pweight=1/prob]        . scatter y x [aweight=y2], mfcolor(none)    You type the square brackets.    Stata allows abbreviations: fw for fweight, aw for aweight, and so on.
    You could type        . anova y x1 x2 x1*x2 [fw=pop]        . regress avgy avgx1 avgx2 [aw=cellpop]        . regress y x1 x2 x3 [pw=1/prob]        . scatter y x [aw=y2], mfcolor(none)    Also, each command has its own idea of the "natural" kind of weight.  If
    you type        . regress avgy avgx1 avgx2 [w=cellpop]    the command will tell you what kind of weight it is assuming and perform
    the request as if you specified that kind of weight.    There are synonyms for some of the weight types.  fweight can also be
    referred to as frequency (abbreviation freq).  aweight can be referred to
    as cellsize (abbreviation cell):        . anova y x1 x2 x1*x2 [freq=pop]        . regress avgy avgx1 avgx2 [cell=cellpop]
fweights    Frequency fweights indicate replicated data.  The weight tells the
    command how many observations each observation really represents.
    fweights allow data to be stored more parsimoniously.  The weighting
    variable contains positive integers.  The result of the command is the
    same as if you duplicated each observation however many times and then
    ran the command unweighted.
pweights    Sampling pweights indicate the inverse of the probability that this
    observation was sampled.  Commands that allow pweights typically provide
    a cluster() option.  These can be combined to produce estimates for
    unstratified cluster-sampled data.  If you must also deal with issues of
    stratification, see [SVY] survey.
aweights    Analytic aweights are typically appropriate when you are dealing with
    data containing averages.  For instance, you have average income and
    average characteristics on a group of people.  The weighting variable
    contains the number of persons over which the average was calculated (or
    a number proportional to that amount).
iweights    This weight has no formal statistical definition and is a catch-all
    category.  The weight somehow reflects the importance of the observation
    and any command that supports such weights will define exactly how such
    weights are treated.