When analyzing clinical or survey data, it is common to use weighting variables, such as IPW (Inverse Probability Weighting), to adjust the representativeness of the sample.
A common confusion arises when trying to obtain a "weighted number of subjects" using the PROC SUMMARY procedure. Unlike PROC FREQ, using the WEIGHT statement in PROC SUMMARY does not automatically modify the frequency count (_FREQ_) in the output table.
Here's how to understand this behavior and get the desired result.
The problem: The WEIGHT statement does not change the FREQ
If you simply add a WEIGHT statement to your procedure, you will notice that the _FREQ_ variable in the output table continues to count the number of physical observations (the rows), not the sum of the weights.
Code that does not produce the expected result:
proc summary data=a1 chartype completetypes;
class treatment;
types treatment;
weight IPW; /* This statement does not alter the output _FREQ_ */
output out=ae_1;
quit;
1
PROC SUMMARYDATA=a1 chartype completetypes;
2
class treatment;
3
types treatment;
4
weight IPW; /* This statement does not alter the output _FREQ_ */
The most direct and often clearest approach to obtain a "weighted count" is to treat your weight variable as a standard analysis variable and request its sum.
By summing the weights, you get the equivalent of a weighted population count.
Recommended code:
proc summary data=a1 chartype completetypes;
class treatment;
types treatment;
var IPW; /* We declare the weight variable as an analysis variable */
output out=ae_1 sum=sum_weights; /* The sum of weights = weighted count */
quit;
1
PROC SUMMARYDATA=a1 chartype completetypes;
2
class treatment;
3
types treatment;
4
var IPW; /* We declare the weight variable as an analysis variable */
5
OUTPUT out=ae_1 sum=sum_weights; /* The sum of weights = weighted count */
6
QUIT;
This method is simple and avoids any confusion about the nature of the statistics produced.
Solution 2: Use the SUMWGT statistic
If you insist on using the WEIGHT statement (for example, if you are simultaneously calculating weighted averages of other variables), you must explicitly request the SUMWGT (Sum of Weights) statistic in the OUTPUT statement.
Alternative code:
proc summary data=a1 chartype;
class treatment;
weight IPW;
/* We specifically request the sumwgt statistic */
output out=ae_1 sumwgt=sum_weights;
quit;
1
PROC SUMMARYDATA=a1 chartype;
2
class treatment;
3
weight IPW;
4
/* We specifically request the sumwgt statistic */
5
OUTPUT out=ae_1 sumwgt=sum_weights;
6
QUIT;
Important note: FREQ vs WEIGHT
It is crucial to distinguish the FREQ statement from the WEIGHT statement:
FREQ: Is used when the variable represents an integer number of occurrences (e.g., "this row represents 5 identical patients"). This changes the sample size ($N$).
WEIGHT: Is used for statistical weights (often non-integers like IPW). This affects the calculation of variance and means, but does not "duplicate" the physical observations.
In summary, to get a weighted total in PROC SUMMARY, simply calculate the sum of your weighting variable.
Avertissement important
Les codes et exemples fournis sur WeAreCAS.eu sont à but pédagogique. Il est impératif de ne pas les copier-coller aveuglément sur vos environnements de production. La meilleure approche consiste à comprendre la logique avant de l'appliquer. Nous vous recommandons vivement de tester ces scripts dans un environnement de test (Sandbox/Dev). WeAreCAS décline toute responsabilité quant aux éventuels impacts ou pertes de données sur vos systèmes.
SAS et tous les autres noms de produits ou de services de SAS Institute Inc. sont des marques déposées ou des marques de commerce de SAS Institute Inc. aux États-Unis et dans d'autres pays. ® indique un enregistrement aux États-Unis. WeAreCAS est un site communautaire indépendant et n'est pas affilié à SAS Institute Inc.
Ce site utilise des cookies techniques et analytiques pour améliorer votre expérience.
En savoir plus.