pca eig

Genomic Data with Singularity and Origin Forcing

Scénario de test & Cas d'usage

Business Context

Researchers are analyzing gene expression data that has been pre-normalized to be centered around zero. They want to perform PCA without an intercept (forcing the model through the origin). Additionally, the data contains highly collinear genes (redundant), which tests the solver's singularity handling.
Data Preparation

Creating a dataset with zero-centered data and near-perfect multicollinearity to test singularity limits.

Copied!
1 
2DATA casuser.genetics;
3call streaminit(789);
4DO sample_id = 1 to 200;
5gene_A = rand('Normal', 0, 1);
6gene_B = gene_A * 2 + rand('Normal', 0, 0.000001);
7/* High correlation */ gene_C = rand('Normal', 0, 2);
8OUTPUT;
9END;
10 
11RUN;
12 

Étapes de réalisation

1
Execute PCA forcing no intercept (noInt=true) and setting a strict singularity criterion.
Copied!
1 
2PROC CAS;
3pca.eig / TABLE={name='genetics', caslib='casuser'} inputs={'gene_A', 'gene_B', 'gene_C'} noInt=true singular=1E-09 outStat={casOut={name='gene_stats', caslib='casuser', replace=true}};
4 
5RUN;
6 
2
Verify the results in the statistics table.
Copied!
1 
2PROC CAS;
3TABLE.fetch / TABLE={name='gene_stats', caslib='casuser'};
4 
5RUN;
6 

Expected Result


The PCA runs without centering the data (No Intercept). Due to the high correlation between gene_A and gene_B, the smallest eigenvalue should be extremely close to zero, testing the singularity threshold.