Index Columns in a CAS Table

Indexing columns in a CAS table creates an optimized data structure that allows faster searching and access to rows based on the values of those columns. This is particularly useful for queries that use WHERE clauses or joins on indexed columns. This process creates a new in-memory table with the index applied. The examples below show how to create indexed tables and how to use them efficiently.

Data Analysis

Type : INTERNAL_CREATION

The examples use simulated Iris data (generated via datalines) or internally generated data (DATA step) to create in-memory CAS tables. This data is then used to demonstrate indexing functionalities.

1 Code Block

DATA STEP / table.index Data

Explanation :
This example creates an in-memory table 'iris_data' from simulated Iris data. Then, the 'table.index' action is used to create a new table 'iris_indexed_basic' with an index on the 'SepalLength' column. 'table.columnInfo' confirms the index creation, and 'table.fetch' demonstrates using the filter on the indexed column for quick data access.

Copied!

1	/* Préparation des données dans une session CAS */
2	PROC CAS;
3	SESSION casauto;
4	/* Création d'une table Iris simulée */
5	DATA casuser.iris_data;
6	INFILE DATALINES dsd;
7	INPUT SepalLength SepalWidth PetalLength PetalWidth Species $;
8	DATALINES;
9	5.1,3.5,1.4,0.2,setosa
10	4.9,3.0,1.4,0.2,setosa
11	4.7,3.2,1.3,0.2,setosa
12	4.6,3.1,1.5,0.2,setosa
13	5.0,3.6,1.4,0.2,setosa
14	5.4,3.9,1.7,0.4,setosa
15	4.6,3.4,1.4,0.3,setosa
16	5.0,3.4,1.5,0.2,setosa
17	4.4,2.9,1.4,0.2,setosa
18	4.9,3.1,1.5,0.1,setosa
19	7.0,3.2,4.7,1.4,versicolor
20	6.4,3.2,4.5,1.5,versicolor
21	6.9,3.1,4.9,1.5,versicolor
22	5.5,2.3,4.0,1.3,versicolor
23	6.5,2.8,4.6,1.5,versicolor
24	5.7,2.8,4.5,1.3,versicolor
25	6.3,3.3,6.0,2.5,virginica
26	5.8,2.7,5.1,1.9,virginica
27	7.1,3.0,5.9,2.1,virginica
28	6.3,2.9,5.6,1.8,virginica
29	7.6,3.0,6.6,2.1,virginica
30	6.1,2.8,5.0,2.4,virginica
31	;
32	RUN;
33
34	/* Indexer la colonne 'SepalLength' */
35	TABLE.index /
36	TABLE='iris_data',
37	casOut={
38	name='iris_indexed_basic',
39	indexVars={'SepalLength'},
40	replace=TRUE
41	};
42
43	/* Afficher les informations de la table pour confirmer l'index */
44	TABLE.columnInfo /
45	TABLE='iris_indexed_basic';
46
47	/* Récupérer des données en filtrant sur la colonne indexée */
48	TABLE.fetch /
49	TABLE={
50	name='iris_indexed_basic',
51	where='SepalLength = 5.0'
52	};
53	QUIT;

2 Code Block

table.index

Explanation :
This case extends the basic example by indexing two columns, 'SepalLength' and 'Species', and explicitly specifying the 'indexType' as 'Btree' (which is often the default but made explicit here for demonstration of options). The 'table.fetch' query uses these two columns in its WHERE clause to take advantage of composite indexes.

Copied!

1	PROC CAS;
2	SESSION casauto;
3	/* La table 'iris_data' est supposée exister de l'exemple précédent */
4
5	/* Indexer plusieurs colonnes et spécifier un type d'index Btree */
6	TABLE.index /
7	TABLE='iris_data',
8	casOut={
9	name='iris_indexed_multi',
10	indexVars={'SepalLength', 'Species'},
11	indexType='Btree',
12	replace=TRUE
13	};
14
15	/* Afficher les informations de la table pour confirmer les index */
16	TABLE.columnInfo /
17	TABLE='iris_indexed_multi';
18
19	/* Récupérer des données en filtrant sur les colonnes indexées */
20	TABLE.fetch /
21	TABLE={
22	name='iris_indexed_multi',
23	where='SepalLength > 5.0 and Species = "virginica"'
24	};
25	QUIT;

3 Code Block

table.index / table.tableDetails

Explanation :
This example demonstrates how to create an initial index on a column ('PetalLength'), then how to 'rebuild' (recreate) an index on a new table with additional columns ('PetalLength' and 'Species') using the `rebuild=TRUE` option. It also uses the `table.tableDetails` action to inspect the size of the index and data, which is crucial for CAS performance and memory management considerations.

Copied!

1	PROC CAS;
2	SESSION casauto;
3	/* La table 'iris_data' est supposée exister de l'exemple précédent */
4
5	/* Créer un index sur la colonne 'PetalLength' */
6	TABLE.index /
7	TABLE='iris_data',
8	casOut={
9	name='iris_petal_indexed',
10	indexVars={'PetalLength'},
11	replace=TRUE
12	};
13
14	/* Analyser les détails de la table pour voir la taille de l'index */
15	TABLE.tableDetails RESULT=r_details / TABLE='iris_petal_indexed';
16	PRINT r_details.TableDetails[,{'Active', 'Rows', 'IndexSize', 'DataSize'}];
17
18	/* Reconstruire l'index avec une colonne supplémentaire */
19	TABLE.index /
20	TABLE='iris_data',
21	casOut={
22	name='iris_petal_species_indexed',
23	indexVars={'PetalLength', 'Species'},
24	rebuild=TRUE,
25	replace=TRUE
26	};
27
28	/* Analyser la nouvelle table indexée */
29	TABLE.tableDetails RESULT=r_rebuild_details / TABLE='iris_petal_species_indexed';
30	PRINT r_rebuild_details.TableDetails[,{'Active', 'Rows', 'IndexSize', 'DataSize'}];
31
32	/* Afficher les informations de colonne */
33	TABLE.columnInfo / TABLE='iris_petal_species_indexed';
34	QUIT;

4 Code Block

table.index / table.dropTable

Explanation :
This example addresses error handling and index deletion in the CAS environment. It shows (by commenting out the erroneous code) how attempting to index a non-existent column would produce an error. Then, it creates a temporary index on 'PetalWidth' and demonstrates using 'table.dropTable' to delete the indexed table. Deleting the indexed table also implicitly deletes the associated index, which is good practice for freeing up resources.

Copied!

1	PROC CAS;
2	SESSION casauto;
3	/* La table 'iris_data' est supposée exister de l'exemple précédent */
4
5	/* Tentative de créer un index sur une colonne inexistante (gestion d'erreur) */
6	/* Note: CASL générera un avertissement ou une erreur, le script continuera si possible. */
7	/* La ligne suivante est commentée car elle provoquerait une erreur dans un environnement réel */
8	/*
9	table.index /
10	table='iris_data',
11	casOut={
12	name='iris_error_index',
13	indexVars={'NonExistentColumn'},
14	replace=TRUE
15	};
16	*/
17
18	/* Créer un index temporaire pour la démonstration de suppression */
19	TABLE.index /
20	TABLE='iris_data',
21	casOut={
22	name='iris_temp_indexed',
23	indexVars={'PetalWidth'},
24	replace=TRUE
25	};
26
27	/* Vérifier que l'index temporaire a été créé */
28	TABLE.columnInfo / TABLE='iris_temp_indexed';
29
30	/* Supprimer la table indexée (ce qui supprime l'index implicitement) */
31	TABLE.dropTable /
32	caslib='casuser',
33	tableName='iris_temp_indexed',
34	quiet=TRUE;
35
36	/* Tenter de récupérer des informations sur la table supprimée (devrait échouer) */
37	/* La ligne suivante est commentée car elle provoquerait une erreur dans un environnement réel */
38	/*
39	table.columnInfo / table='iris_temp_indexed';
40	*/
41	QUIT;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste

Expert Advice

Michael

Responsable de l'infrastructure Viya.

« Indexes in CAS are primary for Read-Heavy workloads. If you are constantly updating or appending to a table, the overhead of maintaining the index can negate the performance gains. Use indexing strategically on "Gold" or "Marts" tables that serve dashboards and end-user queries. »