SVMACHINE Procedure

The SVMACHINE procedure is a powerful tool for building classification and regression models. It uses the concept of hyperplanes to separate data classes, maximizing the margin between the closest data points (support vectors). It supports various kernel functions (linear, polynomial, RBF, sigmoid) to adapt to complex relationships in the data. The procedure runs on the CAS server, enabling distributed and parallel processing of large volumes of data. It is particularly effective for binary classification problems and can be adapted to regression problems.

Data Analysis

Type : CREATION_INTERNE

Examples use generated data (datalines) or SASHELP data, loaded into the CAS session.

1 Code Block

PROC SVMACHINE Data

Explanation :
This example illustrates a simple binary classification using the SVMACHINE procedure with a linear kernel. A subset of the SASHELP.IRIS dataset is created in CAS, keeping only two species. The 'target' variable is binarized (0 or 1). The SVMACHINE procedure is then executed to train a model, and fit statistics are displayed.

Copied!

1	CASLIB _ALL_ ASSIGN; /* Assurez-vous que la CASLIB est assignée */
2
3	DATA mycas.iris_subset;
4	SET sashelp.iris;
5	where species in ('Setosa', 'Versicolor');
6	IF species = 'Setosa' THEN target = 0;
7	ELSE target = 1;
8	drop species;
9	RUN;
10
11	PROC SVMACHINE DATA=mycas.iris_subset;
12	INPUT sepalwidth sepallength petalwidth petallength / level=interval;
13	target target / level=binary;
14	kernel linear;
15	ods OUTPUT FitStatistics=FitStat;
16	RUN;
17
18	PROC PRINT DATA=FitStat;
19	title 'Statistiques d''ajustement du modèle SVM Linéaire';
20	RUN;

2 Code Block

PROC SVMACHINE Data

Explanation :
This example uses a simulated two-dimensional dataset with a non-linear decision boundary. The SVMACHINE procedure is configured with an RBF (Radial Basis Function) kernel and a cost parameter 'C' of 10. The 'C' parameter controls the penalty for misclassified data points, influencing the margin width and model complexity.

Copied!

1	CASLIB _ALL_ ASSIGN;
2
3	DATA mycas.simulated_data;
4	call streaminit(123);
5	DO i = 1 to 100;
6	x1 = rand('Uniform') * 10;
7	x2 = rand('Uniform') * 10;
8	IF (x1 - 5)2 + (x2 - 5)2 < 10 THEN target = 0;
9	ELSE target = 1;
10	OUTPUT;
11	END;
12	RUN;
13
14	PROC SVMACHINE DATA=mycas.simulated_data;
15	INPUT x1 x2 / level=interval;
16	target target / level=binary;
17	kernel rbf;
18	c 10; /* Paramètre de coût (C) */
19	ods OUTPUT FitStatistics=FitStat;
20	RUN;
21
22	PROC PRINT DATA=FitStat;
23	title 'Statistiques d''ajustement du modèle SVM RBF';
24	RUN;

3 Code Block

PROC SVMACHINE Data

Explanation :
This example demonstrates the use of a polynomial kernel of degree 2 to capture more complex relationships in the data. It also incorporates a data partitioning step into training (70%) and validation (30%) sets to evaluate model generalization. The output includes fit statistics and a confusion matrix.

Copied!

1	CASLIB _ALL_ ASSIGN;
2
3	DATA mycas.complex_data;
4	call streaminit(456);
5	DO i = 1 to 200;
6	x1 = rand('Normal', 0, 2);
7	x2 = rand('Normal', 0, 2);
8	IF x12 + x22 < 4 and x1 > 0 THEN target = 0;
9	ELSE target = 1;
10	OUTPUT;
11	END;
12	RUN;
13
14	PROC SVMACHINE DATA=mycas.complex_data;
15	INPUT x1 x2 / level=interval;
16	target target / level=binary;
17	kernel polynomial / degree=2; /* Noyau polynomial de degré 2 */
18	c 1;
19	partition fraction(train=0.7 val=0.3 seed=789);
20	ods OUTPUT FitStatistics=FitStat
21	MisclassificationMatrix=MisClass;
22	RUN;
23
24	PROC PRINT DATA=FitStat;
25	title 'Statistiques d''ajustement du modèle SVM Polynomial';
26	RUN;
27
28	PROC PRINT DATA=MisClass;
29	title 'Matrice de confusion du modèle SVM Polynomial';
30	RUN;

4 Code Block

PROC CAS / SVMACHINE Data

Explanation :
This advanced example shows how to interact with the SVMACHINE procedure via the CASL scripting language (CAS Language) within PROC CAS. It includes creating training data and new data to be scored directly in a CAS table. The SVM model is trained using the `svMachine.svMachine` action and its state is saved. Then, this saved model is used to predict values on new data, and the results are displayed.

Copied!

1	CASLIB _ALL_ ASSIGN;
2
3	PROC CAS;
4	SESSION casauto;
5	/* Création de données simulées directement dans CAS */
6	DATA casuser.score_data / overwrite=true;
7	INPUT x1 x2 target;
8	DATALINES;
9	1 1 0
10	9 9 1
11	5 5 0
12	2 8 1
13	7 3 0
14	;
15	RUN;
16
17	/* Entraînement d'un modèle simple pour la démonstration */
18	svmachine.svMachine /
19	TABLE={name='casuser.score_data'},
20	inputs={'x1', 'x2'},
21	target='target',
22	kernel={type='linear'},
23	savestate={name='my_svm_model', replace=true}
24	;
25
26	/* Préparation des nouvelles données à scorer */
27	DATA casuser.new_data / overwrite=true;
28	INPUT x1 x2;
29	DATALINES;
30	1.5 1.5
31	8.5 8.5
32	4.8 5.2
33	3.0 7.0
34	6.5 2.5
35	;
36	RUN;
37
38	/* Scoring des nouvelles données */
39	svmachine.svMachine /
40	TABLE={name='casuser.new_data'},
41	predict={model={name='my_svm_model'}},
42	OUTPUT={casout={name='casuser.predictions', replace=true}, copyvars={'x1', 'x2'}}
43	;
44
45	/* Affichage des prédictions */
46	printtable / TABLE={name='casuser.predictions'};
47	QUIT;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste

Expert Advice

Michael

Responsable de l'infrastructure Viya.

« In SAS Viya, PROC SVMACHINE implements the Support Vector Machine (SVM) algorithm, a gold standard for high-accuracy binary classification. By finding the optimal hyperplane that maximizes the margin between classes, SVMs are uniquely robust against overfitting, even in high-dimensional feature spaces. »