SAS9

How to get weighted counts with PROC SUMMARY?

Simon 13 vues

When analyzing clinical or survey data, it is common to use weighting variables, such as IPW (Inverse Probability Weighting), to adjust the representativeness of the sample.

A common confusion arises when trying to obtain a "weighted number of subjects" using the PROC SUMMARY procedure. Unlike PROC FREQ, using the WEIGHT statement in PROC SUMMARY does not automatically modify the frequency count (_FREQ_) in the output table.

Here's how to understand this behavior and get the desired result.

How to get weighted counts with PROC SUMMARY? -

The problem: The WEIGHT statement does not change the FREQ

If you simply add a WEIGHT statement to your procedure, you will notice that the _FREQ_ variable in the output table continues to count the number of physical observations (the rows), not the sum of the weights.

Code that does not produce the expected result:

1PROC SUMMARY DATA=a1 chartype completetypes;
2 class treatment;
3 types treatment;
4 weight IPW; /* This statement does not alter the output _FREQ_ */
5 OUTPUT out=ae_1;
6QUIT;

In this example, SAS© does use the weights to calculate statistics like the mean or variance, but the raw count remains unchanged.

Solution 1: Sum the weighting variable

The most direct and often clearest approach to obtain a "weighted count" is to treat your weight variable as a standard analysis variable and request its sum.

By summing the weights, you get the equivalent of a weighted population count.

Recommended code:

1PROC SUMMARY DATA=a1 chartype completetypes;
2 class treatment;
3 types treatment;
4 var IPW; /* We declare the weight variable as an analysis variable */
5 OUTPUT out=ae_1 sum=sum_weights; /* The sum of weights = weighted count */
6QUIT;
This method is simple and avoids any confusion about the nature of the statistics produced.

Solution 2: Use the SUMWGT statistic

If you insist on using the WEIGHT statement (for example, if you are simultaneously calculating weighted averages of other variables), you must explicitly request the SUMWGT (Sum of Weights) statistic in the OUTPUT statement.

Alternative code:

1PROC SUMMARY DATA=a1 chartype;
2 class treatment;
3 weight IPW;
4 /* We specifically request the sumwgt statistic */
5 OUTPUT out=ae_1 sumwgt=sum_weights;
6QUIT;

Important note: FREQ vs WEIGHT

It is crucial to distinguish the FREQ statement from the WEIGHT statement:

  • FREQ: Is used when the variable represents an integer number of occurrences (e.g., "this row represents 5 identical patients"). This changes the sample size ($N$).

  • WEIGHT: Is used for statistical weights (often non-integers like IPW). This affects the calculation of variance and means, but does not "duplicate" the physical observations.

In summary, to get a weighted total in PROC SUMMARY, simply calculate the sum of your weighting variable.