fastKnn

fastknn

Description

The fastknn action performs a k-nearest neighbor search. It calculates the distances between observations in a query table and an input table to identify the k-nearest neighbors. It supports both exact and approximate search methods using various distance metrics (Euclidean, Cosine, Inner Product) and allows for missing value imputation.

Settings
ParameterDescription
table Specifies the settings for the input table containing the data to be searched.
query Specifies the query input data table containing the observations for which to find neighbors.
k Specifies the number of nearest neighbors to be returned. The default is 2.
method Specifies the k-nearest neighbor search method to use. Values can be 'EXACT' (default) or 'APPROXIMATE'.
distanceMetric Specifies the metric to measure the distance between points. Options are 'L2' (Euclidean, default), 'COSINE', or 'IP' (Inner Product).
inputs Specifies the variables to use in the analysis for distance calculation.
id Specifies one or more variables to use as record identifiers.
output Specifies the output data table in which to save the computed neighbors.
outDist Specifies the output data table in which to save the computed distances.
outImpute Specifies the output data table in which to save the query data after imputing missing values.
impute When set to True, enables imputation of missing values in the query table using the k-nearest neighbors method.
efConstruction Specifies the number of neighbors to consider during graph construction (for approximate search).
efSearch Specifies the number of candidate nodes to explore during the graph search phase (for approximate search).
Data Preparation View data prep sheet
Create Input and Query Data

Create two datasets: one serving as the reference data and one as the query data.

Copied!
1DATA mycas.inputData; INPUT id x y; CARDS; 1 1 1
22 1 2
33 2 1
44 2 2
55 5 5
66 5 6
77 6 5
88 6 6
9; RUN;
10 
11DATA mycas.queryData; INPUT id x y; CARDS; 1 1.5 1.5
122 5.5 5.5
13; RUN;

Examples

Perform an exact search to find the 2 nearest neighbors for each query point using Euclidean distance.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3fastKnn.fastknn / TABLE={name='inputData'} query={name='queryData'} k=2 id={'id'} inputs={'x', 'y'} OUTPUT={casOut={name='knn_results', replace=true}};
4 
5RUN;
6 
Result :
The 'knn_results' table will contain the 2 nearest neighbors from 'inputData' for each observation in 'queryData'.

Perform an approximate search for 3 neighbors using the Cosine distance metric, and output the calculated distances to a separate table.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3fastKnn.fastknn / TABLE={name='inputData'} query={name='queryData'} k=3 method='APPROXIMATE' distanceMetric='COSINE' id={'id'} inputs={'x', 'y'} outDist={name='dist_results', replace=true} OUTPUT={casOut={name='knn_approx_results', replace=true}};
4 
5RUN;
6 
Result :
Two output tables are created: 'knn_approx_results' containing the neighbors and 'dist_results' containing the distances between the query points and their neighbors.

FAQ

What is the primary function of the fastknn action?
Which input tables are required for the fastknn action?
What are the available distance metrics for the k-nearest neighbor calculation?
How can missing values in the query data be handled?
What search methods are available in the fastknn action?
What does the efConstruction parameter control?
How can I specify the number of neighbors to return?
What output tables can be created by the fastknn action?
What is the purpose of the useTopKOutDist parameter?