Scénario de test & Cas d'usage
Creating a dataset with zero-centered data and near-perfect multicollinearity to test singularity limits.
| 1 | |
| 2 | DATA casuser.genetics; |
| 3 | call streaminit(789); |
| 4 | DO sample_id = 1 to 200; |
| 5 | gene_A = rand('Normal', 0, 1); |
| 6 | gene_B = gene_A * 2 + rand('Normal', 0, 0.000001); |
| 7 | /* High correlation */ gene_C = rand('Normal', 0, 2); |
| 8 | OUTPUT; |
| 9 | END; |
| 10 | |
| 11 | RUN; |
| 12 |
| 1 | |
| 2 | PROC CAS; |
| 3 | pca.eig / TABLE={name='genetics', caslib='casuser'} inputs={'gene_A', 'gene_B', 'gene_C'} noInt=true singular=1E-09 outStat={casOut={name='gene_stats', caslib='casuser', replace=true}}; |
| 4 | |
| 5 | RUN; |
| 6 |
| 1 | |
| 2 | PROC CAS; |
| 3 | TABLE.fetch / TABLE={name='gene_stats', caslib='casuser'}; |
| 4 | |
| 5 | RUN; |
| 6 |
The PCA runs without centering the data (No Intercept). Due to the high correlation between gene_A and gene_B, the smallest eigenvalue should be extremely close to zero, testing the singularity threshold.