The fastknn action performs a k-nearest neighbor search. It calculates the distances between observations in a query table and an input table to identify the k-nearest neighbors. It supports both exact and approximate search methods using various distance metrics (Euclidean, Cosine, Inner Product) and allows for missing value imputation.
| Parameter | Description |
|---|---|
| table | Specifies the settings for the input table containing the data to be searched. |
| query | Specifies the query input data table containing the observations for which to find neighbors. |
| k | Specifies the number of nearest neighbors to be returned. The default is 2. |
| method | Specifies the k-nearest neighbor search method to use. Values can be 'EXACT' (default) or 'APPROXIMATE'. |
| distanceMetric | Specifies the metric to measure the distance between points. Options are 'L2' (Euclidean, default), 'COSINE', or 'IP' (Inner Product). |
| inputs | Specifies the variables to use in the analysis for distance calculation. |
| id | Specifies one or more variables to use as record identifiers. |
| output | Specifies the output data table in which to save the computed neighbors. |
| outDist | Specifies the output data table in which to save the computed distances. |
| outImpute | Specifies the output data table in which to save the query data after imputing missing values. |
| impute | When set to True, enables imputation of missing values in the query table using the k-nearest neighbors method. |
| efConstruction | Specifies the number of neighbors to consider during graph construction (for approximate search). |
| efSearch | Specifies the number of candidate nodes to explore during the graph search phase (for approximate search). |
Create two datasets: one serving as the reference data and one as the query data.
| 1 | DATA mycas.inputData; INPUT id x y; CARDS; 1 1 1 |
| 2 | 2 1 2 |
| 3 | 3 2 1 |
| 4 | 4 2 2 |
| 5 | 5 5 5 |
| 6 | 6 5 6 |
| 7 | 7 6 5 |
| 8 | 8 6 6 |
| 9 | ; RUN; |
| 10 | |
| 11 | DATA mycas.queryData; INPUT id x y; CARDS; 1 1.5 1.5 |
| 12 | 2 5.5 5.5 |
| 13 | ; RUN; |
Perform an exact search to find the 2 nearest neighbors for each query point using Euclidean distance.
| 1 | |
| 2 | PROC CAS; |
| 3 | fastKnn.fastknn / TABLE={name='inputData'} query={name='queryData'} k=2 id={'id'} inputs={'x', 'y'} OUTPUT={casOut={name='knn_results', replace=true}}; |
| 4 | |
| 5 | RUN; |
| 6 |
Perform an approximate search for 3 neighbors using the Cosine distance metric, and output the calculated distances to a separate table.
| 1 | |
| 2 | PROC CAS; |
| 3 | fastKnn.fastknn / TABLE={name='inputData'} query={name='queryData'} k=3 method='APPROXIMATE' distanceMetric='COSINE' id={'id'} inputs={'x', 'y'} outDist={name='dist_results', replace=true} OUTPUT={casOut={name='knn_approx_results', replace=true}}; |
| 4 | |
| 5 | RUN; |
| 6 |