fastKnn

fastknn

Description

The fastknn action performs a k-nearest neighbor search. It calculates the distances between observations in a query table and an input table to identify the k-nearest neighbors. It supports both exact and approximate search methods using various distance metrics (Euclidean, Cosine, Inner Product) and allows for missing value imputation.

Settings
ParameterDescription
tableSpecifies the settings for the input table containing the data to be searched.
querySpecifies the query input data table containing the observations for which to find neighbors.
kSpecifies the number of nearest neighbors to be returned. The default is 2.
methodSpecifies the k-nearest neighbor search method to use. Values can be 'EXACT' (default) or 'APPROXIMATE'.
distanceMetricSpecifies the metric to measure the distance between points. Options are 'L2' (Euclidean, default), 'COSINE', or 'IP' (Inner Product).
inputsSpecifies the variables to use in the analysis for distance calculation.
idSpecifies one or more variables to use as record identifiers.
outputSpecifies the output data table in which to save the computed neighbors.
outDistSpecifies the output data table in which to save the computed distances.
outImputeSpecifies the output data table in which to save the query data after imputing missing values.
imputeWhen set to True, enables imputation of missing values in the query table using the k-nearest neighbors method.
efConstructionSpecifies the number of neighbors to consider during graph construction (for approximate search).
efSearchSpecifies the number of candidate nodes to explore during the graph search phase (for approximate search).
Data Preparation View data prep sheet
Create Input and Query Data

Create two datasets: one serving as the reference data and one as the query data.

Copied!
1DATA mycas.inputData; INPUT id x y; CARDS; 1 1 1
22 1 2
33 2 1
44 2 2
55 5 5
66 5 6
77 6 5
88 6 6
9; RUN;
10 
11DATA mycas.queryData; INPUT id x y; CARDS; 1 1.5 1.5
122 5.5 5.5
13; RUN;

Examples

Perform an exact search to find the 2 nearest neighbors for each query point using Euclidean distance.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3fastKnn.fastknn / TABLE={name='inputData'} query={name='queryData'} k=2 id={'id'} inputs={'x', 'y'} OUTPUT={casOut={name='knn_results', replace=true}};
4 
5RUN;
6 
Result :
The 'knn_results' table will contain the 2 nearest neighbors from 'inputData' for each observation in 'queryData'.

Perform an approximate search for 3 neighbors using the Cosine distance metric, and output the calculated distances to a separate table.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3fastKnn.fastknn / TABLE={name='inputData'} query={name='queryData'} k=3 method='APPROXIMATE' distanceMetric='COSINE' id={'id'} inputs={'x', 'y'} outDist={name='dist_results', replace=true} OUTPUT={casOut={name='knn_approx_results', replace=true}};
4 
5RUN;
6 
Result :
Two output tables are created: 'knn_approx_results' containing the neighbors and 'dist_results' containing the distances between the query points and their neighbors.

FAQ

What is the primary function of the fastknn action?
Which input tables are required for the fastknn action?
What are the available distance metrics for the k-nearest neighbor calculation?
How can missing values in the query data be handled?
What search methods are available in the fastknn action?
What does the efConstruction parameter control?
How can I specify the number of neighbors to return?
What output tables can be created by the fastknn action?
What is the purpose of the useTopKOutDist parameter?