pca

eig

Description

The `eig` action performs Principal Component Analysis (PCA) using the eigenvalue decomposition method. It is a fundamental statistical technique used for dimensionality reduction and data exploration. By analyzing the covariance or correlation matrix of numeric variables, it calculates eigenvalues and eigenvectors to transform the original correlated variables into a smaller set of uncorrelated variables called principal components. This action supports weighting, frequency variables, and can generate output tables containing component scores and statistical summaries.

pca.eig <result=results> <status=rc> / attributes={{format="string", label="string", name="variable-name", ...}, ...} code={casOut={...}} cov=TRUE | FALSE display={caseSensitive=TRUE|FALSE, exclude=TRUE|FALSE, names={"string-1", ...}, ...} freq="variable-name" gpu={enable=TRUE | FALSE} groupbyLimit=64-bit-integer inputs={{name="variable-name", ...}, ...} n=integer noInt=TRUE | FALSE outStat={casOut={name="table-name", ...}, rPrefix="string"} output={casOut={name="table-name", ...}, copyVars={"variable-name", ...}, residual="string", score="string"} outputTables={names={"string-1", ...}, replace=TRUE|FALSE} partial={"variable-name-1", ...} prefix="string" singular=double std=TRUE | FALSE store={name="table-name", ...} table={name="table-name", caslib="string", ...} varDef="DF" | "N" | "WDF" | "WEIGHT" | "WGT" weight="variable-name";
Settings
ParameterDescription
tableSpecifies the settings for the input CAS table to be analyzed.
inputsSpecifies the list of numeric variables to use for the analysis. If omitted, all numeric variables are used.
nSpecifies the number of principal components to be computed. If set to 0, all components are computed.
covIf set to TRUE, computes the principal components from the covariance matrix. If FALSE (default), the correlation matrix is used.
stdIf set to TRUE, standardizes the principal component scores in the output table to unit variance.
outputSpecifies the output table to contain observation-wise statistics, such as component scores.
outStatSpecifies the output table to contain statistics like means, standard deviations, eigenvalues, and eigenvectors.
noIntIf set to TRUE, suppresses the intercept (fits the model through the origin).
prefixSpecifies a prefix string for naming the principal component variables (default is 'Prin').
freqSpecifies a numeric variable that contains the frequency of occurrence for each observation.
weightSpecifies a numeric variable to use as a weight for performing a weighted analysis.
codeGenerates SAS DATA step code to compute predicted values (scores) based on the fitted model.
storeSaves the model fit information to a CAS table (analytic store) for use in scoring.
Data Preparation View data prep sheet
Data Preparation

Loads the sample 'Iris' dataset into a CAS table named 'iris' in the 'casuser' library.

Copied!
1PROC CAS;
2 /* Load SASHELP.IRIS into CAS memory */
3 DATA casuser.iris;
4 SET sashelp.iris;
5 RUN;
6QUIT;

Examples

Performs a standard Principal Component Analysis on the Iris dataset to extract the top 2 components based on the correlation matrix.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 pca.eig /
3 TABLE={name="iris", caslib="casuser"}
4 inputs={"SepalLength", "SepalWidth", "PetalLength", "PetalWidth"}
5 n=2;
6RUN;
Result :
The action returns the Eigenvalues table (showing variance explained) and Eigenvectors table for the first 2 components.

Performs PCA using the covariance matrix, standardizes the output scores, creates specific output tables for statistics and scores, and saves the scoring code.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 pca.eig /
3 TABLE={name="iris", caslib="casuser"}
4 /* Use specific numeric inputs */
5 inputs={"SepalLength", "SepalWidth", "PetalLength", "PetalWidth"}
6 /* Use Covariance matrix instead of Correlation */
7 cov=true
8 /* Standardize scores to unit variance */
9 std=true
10 /* Custom prefix for component names */
11 prefix="PC"
12 /* Output table for Eigenvalues/Vectors */
13 outStat={casOut={name="eigen_stats", caslib="casuser", replace=true}}
14 /* Output table for Scores, copying the Species variable */
15 OUTPUT={casOut={name="iris_scores", caslib="casuser", replace=true},
16 score="Score",
17 copyVars={"Species"}}
18 /* Generate scoring code */
19 code={casOut={name="score_code", caslib="casuser", replace=true}};
20RUN;
Result :
Generates 'eigen_stats' table with statistical summaries and 'iris_scores' table containing the original 'Species' column and new 'Score1', 'Score2', etc. columns. Also creates 'score_code' containing DATA step logic.

FAQ

What is the primary purpose of the eig action?
How can I calculate principal components using the covariance matrix instead of the correlation matrix?
How do I specify the number of principal components to compute?
Can I omit the intercept from the model?
How can I perform a weighted analysis of the data?
Is it possible to accelerate the computation using a GPU?
How do I save the model fit information for future scoring?
What does the "singular" parameter control?
How can I obtain standard deviations and eigenvalues in an output table?

Associated Scenarios

Use Case
Standard Customer Behavior Dimensionality Reduction

A retail bank wants to segment its customer base for a new credit card offer. They have multiple correlated variables related to spending habits (groceries, travel, entertainmen...

Use Case
High-Volume Sensor Analysis with Weighting and Covariance

A manufacturing plant monitors heavy machinery using dozens of sensors. They need to analyze the raw variance (Covariance) rather than correlation, because the magnitude of vibr...

Use Case
Genomic Data with Singularity and Origin Forcing

Researchers are analyzing gene expression data that has been pre-normalized to be centered around zero. They want to perform PCA without an intercept (forcing the model through ...