Computes various features for audio files loaded into a CAS table. This action is fundamental in audio processing for machine learning, as it transforms raw audio signals into a structured format (features) that models can interpret. These features, such as MFCC or FBank, capture essential characteristics of the audio like pitch and timbre, which are crucial for tasks like speech recognition or sound classification.
| Parameter | Description |
|---|---|
| audioColumn | Specifies the name of the column in the input table that contains the audio data. |
| casOut | Specifies the output table to store the computed features. This table will contain the feature vectors for each audio frame. |
| copyVars | Specifies a list of variables to be transferred from the input table to the output table, preserving metadata. |
| fbankOptions | Specifies settings for FBank (Filter Bank) feature computations, a common representation for audio signals. |
| featureScalingMethod | Specifies the feature scaling method to apply to the computed feature vectors, such as standardization (mean removal and variance scaling). |
| frameExtractionOptions | Specifies settings for dividing the audio signal into frames, including frame length, shift, and windowing function. |
| melBanksOptions | Specifies settings for determining the mel-frequency banks, which are crucial for creating perceptually relevant audio features. |
| mfccOptions | Specifies settings for MFCC (Mel-Frequency Cepstral Coefficients) feature computations, a standard feature set in speech recognition. |
| nContextFrames | Specifies the number of context frames to append before and after the current audio frame, providing temporal context. |
| nOutputFrames | Specifies the exact number of frames to include in the output table, useful for creating fixed-size inputs for models. |
| table | Specifies the input table that contains the audio data to be processed. |
First, we load audio files into a CAS table. The `loadAudio` action scans a directory for audio files and loads them into the specified CAS table. This table will then be used as input for feature computation.
| 1 | PROC CAS; |
| 2 | LOADACTIONSET 'audio'; |
| 3 | audio.loadAudio / |
| 4 | caslib='your_caslib' |
| 5 | path='path/to/your/audio_files/' |
| 6 | casOut={name='my_audio_table', replace=true}; |
| 7 | RUN; |
This example computes standard MFCC (Mel-Frequency Cepstral Coefficients) features from the audio data in `my_audio_table` and stores the results in `my_features_table`. This is a common first step for many audio analysis tasks.
| 1 | PROC CAS; |
| 2 | audio.computeFeatures / |
| 3 | TABLE={name='my_audio_table'} |
| 4 | audioColumn='_audio_' |
| 5 | copyVars={'_path_', '_id_'} |
| 6 | casOut={name='my_features_table', replace=true}; |
| 7 | RUN; |
This example demonstrates a more advanced feature computation. It customizes the frame extraction by setting a frame length of 30ms and a frame shift of 15ms. It also adjusts the MFCC computation to generate 20 cepstral coefficients and applies standardization to scale the features.
| 1 | PROC CAS; |
| 2 | audio.computeFeatures / |
| 3 | TABLE={name='my_audio_table'} |
| 4 | audioColumn='_audio_' |
| 5 | copyVars={'_path_', '_id_'} |
| 6 | frameExtractionOptions={frameLength=30, frameShift=15, windowType='HANNING'} |
| 7 | mfccOptions={nCeps=20, useEnergy=true} |
| 8 | featureScalingMethod='STANDARDIZATION' |
| 9 | casOut={name='my_detailed_features_table', replace=true}; |
| 10 | RUN; |
This example computes FBank (Filter Bank) features instead of MFCCs. It specifies 40 mel-frequency bins. Additionally, it adds a context window of 5 frames before and 5 frames after each central frame, which can improve model performance by providing more temporal information.
| 1 | PROC CAS; |
| 2 | audio.computeFeatures / |
| 3 | TABLE={name='my_audio_table'} |
| 4 | audioColumn='_audio_' |
| 5 | copyVars={'_path_', '_id_'} |
| 6 | fbankOptions={useLogFbank=true, usePower=true} |
| 7 | melBanksOptions={nBins=40} |
| 8 | nContextFrames=5 |
| 9 | casOut={name='my_fbank_features_table', replace=true}; |
| 10 | RUN; |