audio

computeFeatures

Description

Computes various features for audio files loaded into a CAS table. This action is fundamental in audio processing for machine learning, as it transforms raw audio signals into a structured format (features) that models can interpret. These features, such as MFCC or FBank, capture essential characteristics of the audio like pitch and timbre, which are crucial for tasks like speech recognition or sound classification.

audio.computeFeatures { audioColumn="string", casOut={casouttable}, copyVars={"variable-name-1", ...}, fbankOptions={fbankOptions}, featureScalingMethod="NONE"|"STANDARDIZATION", frameExtractionOptions={frameExtractionOptions}, melBanksOptions={melBanksOptions}, mfccOptions={mfccOptions}, nContextFrames=integer, nOutputFrames=integer, table={castable} };
Settings
ParameterDescription
audioColumnSpecifies the name of the column in the input table that contains the audio data.
casOutSpecifies the output table to store the computed features. This table will contain the feature vectors for each audio frame.
copyVarsSpecifies a list of variables to be transferred from the input table to the output table, preserving metadata.
fbankOptionsSpecifies settings for FBank (Filter Bank) feature computations, a common representation for audio signals.
featureScalingMethodSpecifies the feature scaling method to apply to the computed feature vectors, such as standardization (mean removal and variance scaling).
frameExtractionOptionsSpecifies settings for dividing the audio signal into frames, including frame length, shift, and windowing function.
melBanksOptionsSpecifies settings for determining the mel-frequency banks, which are crucial for creating perceptually relevant audio features.
mfccOptionsSpecifies settings for MFCC (Mel-Frequency Cepstral Coefficients) feature computations, a standard feature set in speech recognition.
nContextFramesSpecifies the number of context frames to append before and after the current audio frame, providing temporal context.
nOutputFramesSpecifies the exact number of frames to include in the output table, useful for creating fixed-size inputs for models.
tableSpecifies the input table that contains the audio data to be processed.
Data Preparation View data prep sheet
Data Creation: Loading Audio Files

First, we load audio files into a CAS table. The `loadAudio` action scans a directory for audio files and loads them into the specified CAS table. This table will then be used as input for feature computation.

Copied!
1PROC CAS;
2 LOADACTIONSET 'audio';
3 audio.loadAudio /
4 caslib='your_caslib'
5 path='path/to/your/audio_files/'
6 casOut={name='my_audio_table', replace=true};
7RUN;

Examples

This example computes standard MFCC (Mel-Frequency Cepstral Coefficients) features from the audio data in `my_audio_table` and stores the results in `my_features_table`. This is a common first step for many audio analysis tasks.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 audio.computeFeatures /
3 TABLE={name='my_audio_table'}
4 audioColumn='_audio_'
5 copyVars={'_path_', '_id_'}
6 casOut={name='my_features_table', replace=true};
7RUN;

This example demonstrates a more advanced feature computation. It customizes the frame extraction by setting a frame length of 30ms and a frame shift of 15ms. It also adjusts the MFCC computation to generate 20 cepstral coefficients and applies standardization to scale the features.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 audio.computeFeatures /
3 TABLE={name='my_audio_table'}
4 audioColumn='_audio_'
5 copyVars={'_path_', '_id_'}
6 frameExtractionOptions={frameLength=30, frameShift=15, windowType='HANNING'}
7 mfccOptions={nCeps=20, useEnergy=true}
8 featureScalingMethod='STANDARDIZATION'
9 casOut={name='my_detailed_features_table', replace=true};
10RUN;

This example computes FBank (Filter Bank) features instead of MFCCs. It specifies 40 mel-frequency bins. Additionally, it adds a context window of 5 frames before and 5 frames after each central frame, which can improve model performance by providing more temporal information.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 audio.computeFeatures /
3 TABLE={name='my_audio_table'}
4 audioColumn='_audio_'
5 copyVars={'_path_', '_id_'}
6 fbankOptions={useLogFbank=true, usePower=true}
7 melBanksOptions={nBins=40}
8 nContextFrames=5
9 casOut={name='my_fbank_features_table', replace=true};
10RUN;

FAQ

What is the purpose of the audio.computeFeatures action?
What does the 'audioColumn' parameter specify?
What is the 'casOut' parameter used for?
How can I transfer variables from the input table to the output table?
What are the 'fbankOptions' used for in the computeFeatures action?
In 'fbankOptions', what does 'energyFloor' control?
What is the function of the 'rawEnergy' option within 'fbankOptions'?
How does the 'useEnergy' option in 'fbankOptions' affect the output?
What is the difference between linear and log-filterbank values in FBank computations?
What does the 'usePower' option in 'fbankOptions' do?
What feature scaling methods are available through the 'featureScalingMethod' parameter?
What is the purpose of the 'frameExtractionOptions' parameter?
What does the 'frameLength' option within 'frameExtractionOptions' define?
What is the 'frameShift' option used for in frame extraction?
What window types can be applied during frame extraction using the 'windowType' option?
What are the 'melBanksOptions' used for?
How do you define the frequency range for mel-frequency bins?
What does the 'nBins' option in 'melBanksOptions' control?
What is the purpose of the 'mfccOptions' parameter?
What does the 'nCeps' option in 'mfccOptions' specify?
What is the 'nContextFrames' parameter?
How can I ensure the output table has a specific number of frames?
What is the 'table' parameter used for?