When analyzing clinical or survey data, it is common to use weighting variables, such as IPW (Inverse Probability Weighting), to adjust the representativeness of the sample.
A common confusion arises when trying to obtain a "weighted number of subjects" using the PROC SUMMARY procedure. Unlike PROC FREQ, using the WEIGHT statement in PROC SUMMARY does not automatically modify the frequency count (_FREQ_) in the output table.
Here's how to understand this behavior and get the desired result.
The problem: The WEIGHT statement does not change the FREQ
If you simply add a WEIGHT statement to your procedure, you will notice that the _FREQ_ variable in the output table continues to count the number of physical observations (the rows), not the sum of the weights.
Code that does not produce the expected result:
proc summary data=a1 chartype completetypes;
class treatment;
types treatment;
weight IPW; /* This statement does not alter the output _FREQ_ */
output out=ae_1;
quit;
1
PROC SUMMARYDATA=a1 chartype completetypes;
2
class treatment;
3
types treatment;
4
weight IPW; /* This statement does not alter the output _FREQ_ */
The most direct and often clearest approach to obtain a "weighted count" is to treat your weight variable as a standard analysis variable and request its sum.
By summing the weights, you get the equivalent of a weighted population count.
Recommended code:
proc summary data=a1 chartype completetypes;
class treatment;
types treatment;
var IPW; /* We declare the weight variable as an analysis variable */
output out=ae_1 sum=sum_weights; /* The sum of weights = weighted count */
quit;
1
PROC SUMMARYDATA=a1 chartype completetypes;
2
class treatment;
3
types treatment;
4
var IPW; /* We declare the weight variable as an analysis variable */
5
OUTPUT out=ae_1 sum=sum_weights; /* The sum of weights = weighted count */
6
QUIT;
This method is simple and avoids any confusion about the nature of the statistics produced.
Solution 2: Use the SUMWGT statistic
If you insist on using the WEIGHT statement (for example, if you are simultaneously calculating weighted averages of other variables), you must explicitly request the SUMWGT (Sum of Weights) statistic in the OUTPUT statement.
Alternative code:
proc summary data=a1 chartype;
class treatment;
weight IPW;
/* We specifically request the sumwgt statistic */
output out=ae_1 sumwgt=sum_weights;
quit;
1
PROC SUMMARYDATA=a1 chartype;
2
class treatment;
3
weight IPW;
4
/* We specifically request the sumwgt statistic */
5
OUTPUT out=ae_1 sumwgt=sum_weights;
6
QUIT;
Important note: FREQ vs WEIGHT
It is crucial to distinguish the FREQ statement from the WEIGHT statement:
FREQ: Is used when the variable represents an integer number of occurrences (e.g., "this row represents 5 identical patients"). This changes the sample size ($N$).
WEIGHT: Is used for statistical weights (often non-integers like IPW). This affects the calculation of variance and means, but does not "duplicate" the physical observations.
In summary, to get a weighted total in PROC SUMMARY, simply calculate the sum of your weighting variable.
Aviso importante
Los códigos y ejemplos proporcionados en WeAreCAS.eu son con fines educativos. Es imperativo no copiarlos y pegarlos ciegamente en sus entornos de producción. El mejor enfoque es comprender la lógica antes de aplicarla. Recomendamos encarecidamente probar estos scripts en un entorno de prueba (Sandbox/Dev). WeAreCAS no acepta ninguna responsabilidad por cualquier impacto o pérdida de datos en sus sistemas.
SAS y todos los demás nombres de productos o servicios de SAS Institute Inc. son marcas registradas o marcas comerciales de SAS Institute Inc. en los EE. UU. y otros países. ® indica registro en los EE. UU. WeAreCAS es un sitio comunitario independiente y no está afiliado a SAS Institute Inc.
Este sitio utiliza cookies técnicas y analíticas para mejorar su experiencia.
Saber más.