When analyzing clinical or survey data, it is common to use weighting variables, such as IPW (Inverse Probability Weighting), to adjust the representativeness of the sample.
A common confusion arises when trying to obtain a "weighted number of subjects" using the PROC SUMMARY procedure. Unlike PROC FREQ, using the WEIGHT statement in PROC SUMMARY does not automatically modify the frequency count (_FREQ_) in the output table.
Here's how to understand this behavior and get the desired result.
The problem: The WEIGHT statement does not change the FREQ
If you simply add a WEIGHT statement to your procedure, you will notice that the _FREQ_ variable in the output table continues to count the number of physical observations (the rows), not the sum of the weights.
Code that does not produce the expected result:
proc summary data=a1 chartype completetypes;
class treatment;
types treatment;
weight IPW; /* This statement does not alter the output _FREQ_ */
output out=ae_1;
quit;
1
PROC SUMMARYDATA=a1 chartype completetypes;
2
class treatment;
3
types treatment;
4
weight IPW; /* This statement does not alter the output _FREQ_ */
The most direct and often clearest approach to obtain a "weighted count" is to treat your weight variable as a standard analysis variable and request its sum.
By summing the weights, you get the equivalent of a weighted population count.
Recommended code:
proc summary data=a1 chartype completetypes;
class treatment;
types treatment;
var IPW; /* We declare the weight variable as an analysis variable */
output out=ae_1 sum=sum_weights; /* The sum of weights = weighted count */
quit;
1
PROC SUMMARYDATA=a1 chartype completetypes;
2
class treatment;
3
types treatment;
4
var IPW; /* We declare the weight variable as an analysis variable */
5
OUTPUT out=ae_1 sum=sum_weights; /* The sum of weights = weighted count */
6
QUIT;
This method is simple and avoids any confusion about the nature of the statistics produced.
Solution 2: Use the SUMWGT statistic
If you insist on using the WEIGHT statement (for example, if you are simultaneously calculating weighted averages of other variables), you must explicitly request the SUMWGT (Sum of Weights) statistic in the OUTPUT statement.
Alternative code:
proc summary data=a1 chartype;
class treatment;
weight IPW;
/* We specifically request the sumwgt statistic */
output out=ae_1 sumwgt=sum_weights;
quit;
1
PROC SUMMARYDATA=a1 chartype;
2
class treatment;
3
weight IPW;
4
/* We specifically request the sumwgt statistic */
5
OUTPUT out=ae_1 sumwgt=sum_weights;
6
QUIT;
Important note: FREQ vs WEIGHT
It is crucial to distinguish the FREQ statement from the WEIGHT statement:
FREQ: Is used when the variable represents an integer number of occurrences (e.g., "this row represents 5 identical patients"). This changes the sample size ($N$).
WEIGHT: Is used for statistical weights (often non-integers like IPW). This affects the calculation of variance and means, but does not "duplicate" the physical observations.
In summary, to get a weighted total in PROC SUMMARY, simply calculate the sum of your weighting variable.
Wichtiger Haftungsausschluss
Die auf WeAreCAS.eu bereitgestellten Codes und Beispiele dienen Lehrzwecken. Es ist zwingend erforderlich, sie nicht blind in Ihre Produktionsumgebungen zu kopieren. Der beste Ansatz besteht darin, die Logik zu verstehen, bevor sie angewendet wird. Wir empfehlen dringend, diese Skripte in einer Testumgebung (Sandbox/Dev) zu testen. WeAreCAS übernimmt keine Verantwortung für mögliche Auswirkungen oder Datenverluste auf Ihren Systemen.
SAS und alle anderen Produkt- oder Dienstleistungsnamen von SAS Institute Inc. sind eingetragene Marken oder Marken von SAS Institute Inc. in den USA und anderen Ländern. ® zeigt die Registrierung in den USA an. WeAreCAS ist eine unabhängige Community-Site und nicht mit SAS Institute Inc. verbunden.
Diese Website verwendet technische und analytische Cookies, um Ihre Erfahrung zu verbessern.
Mehr erfahren.