computeFeatures - WeAreCAS

Q: What is the purpose of the audio.computeFeatures action?

The audio.computeFeatures action computes various features for audio files that have been loaded into a CAS table.

Q: What does the 'audioColumn' parameter specify?

The 'audioColumn' parameter specifies the name of the column in the input table that contains the audio data.

Q: What is the 'casOut' parameter used for?

The 'casOut' parameter is required and specifies the settings for the output table where the computed features will be stored.

Q: How can I transfer variables from the input table to the output table?

Use the 'copyVars' parameter to specify a list of variable names to transfer from the input table to the output table.

Q: What are the 'fbankOptions' used for in the computeFeatures action?

The 'fbankOptions' parameter specifies the settings to determine how to perform the Filter Bank (FBank) feature computations.

Q: In 'fbankOptions', what does 'energyFloor' control?

The 'energyFloor' option specifies the linear floor on energy (absolute, not relative) for the FBank feature computations. The default value is 0.

Q: What is the function of the 'rawEnergy' option within 'fbankOptions'?

When set to True, the 'rawEnergy' option specifies that energy should be computed before preemphasis and windowing. The default is True.

Q: How does the 'useEnergy' option in 'fbankOptions' affect the output?

If 'useEnergy' is set to True, an extra dimension containing the computed energy is appended to each FBank feature frame. The default is False.

Q: What is the difference between linear and log-filterbank values in FBank computations?

The 'useLogFbank' option controls this. When set to True (the default), the output contains log-filterbank values; otherwise, the values are linear.

Q: What does the 'usePower' option in 'fbankOptions' do?

When 'usePower' is set to True (the default), it specifies that power should be used in the FBank feature computations; otherwise, the magnitude is used.

Description

Computes various features for audio files loaded into a CAS table. This action is fundamental in audio processing for machine learning, as it transforms raw audio signals into a structured format (features) that models can interpret. These features, such as MFCC or FBank, capture essential characteristics of the audio like pitch and timbre, which are crucial for tasks like speech recognition or sound classification.

audio.computeFeatures { audioColumn="string", casOut={casouttable}, copyVars={"variable-name-1", ...}, fbankOptions={fbankOptions}, featureScalingMethod="NONE"|"STANDARDIZATION", frameExtractionOptions={frameExtractionOptions}, melBanksOptions={melBanksOptions}, mfccOptions={mfccOptions}, nContextFrames=integer, nOutputFrames=integer, table={castable} };

Settings

Parameter	Description
audioColumn	Specifies the name of the column in the input table that contains the audio data.
casOut	Specifies the output table to store the computed features. This table will contain the feature vectors for each audio frame.
copyVars	Specifies a list of variables to be transferred from the input table to the output table, preserving metadata.
fbankOptions	Specifies settings for FBank (Filter Bank) feature computations, a common representation for audio signals.
featureScalingMethod	Specifies the feature scaling method to apply to the computed feature vectors, such as standardization (mean removal and variance scaling).
frameExtractionOptions	Specifies settings for dividing the audio signal into frames, including frame length, shift, and windowing function.
melBanksOptions	Specifies settings for determining the mel-frequency banks, which are crucial for creating perceptually relevant audio features.
mfccOptions	Specifies settings for MFCC (Mel-Frequency Cepstral Coefficients) feature computations, a standard feature set in speech recognition.
nContextFrames	Specifies the number of context frames to append before and after the current audio frame, providing temporal context.
nOutputFrames	Specifies the exact number of frames to include in the output table, useful for creating fixed-size inputs for models.
table	Specifies the input table that contains the audio data to be processed.

Data Preparation View data prep sheet

Data Creation: Loading Audio Files

First, we load audio files into a CAS table. The `loadAudio` action scans a directory for audio files and loads them into the specified CAS table. This table will then be used as input for feature computation.

Copied!

1	PROC CAS;
2	LOADACTIONSET 'audio';
3	audio.loadAudio /
4	caslib='your_caslib'
5	path='path/to/your/audio_files/'
6	casOut={name='my_audio_table', replace=true};
7	RUN;

Examples

This example computes standard MFCC (Mel-Frequency Cepstral Coefficients) features from the audio data in `my_audio_table` and stores the results in `my_features_table`. This is a common first step for many audio analysis tasks.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	audio.computeFeatures /
3	TABLE={name='my_audio_table'}
4	audioColumn='_audio_'
5	copyVars={'_path_', '_id_'}
6	casOut={name='my_features_table', replace=true};
7	RUN;

This example demonstrates a more advanced feature computation. It customizes the frame extraction by setting a frame length of 30ms and a frame shift of 15ms. It also adjusts the MFCC computation to generate 20 cepstral coefficients and applies standardization to scale the features.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	audio.computeFeatures /
3	TABLE={name='my_audio_table'}
4	audioColumn='_audio_'
5	copyVars={'_path_', '_id_'}
6	frameExtractionOptions={frameLength=30, frameShift=15, windowType='HANNING'}
7	mfccOptions={nCeps=20, useEnergy=true}
8	featureScalingMethod='STANDARDIZATION'
9	casOut={name='my_detailed_features_table', replace=true};
10	RUN;

This example computes FBank (Filter Bank) features instead of MFCCs. It specifies 40 mel-frequency bins. Additionally, it adds a context window of 5 frames before and 5 frames after each central frame, which can improve model performance by providing more temporal information.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	audio.computeFeatures /
3	TABLE={name='my_audio_table'}
4	audioColumn='_audio_'
5	copyVars={'_path_', '_id_'}
6	fbankOptions={useLogFbank=true, usePower=true}
7	melBanksOptions={nBins=40}
8	nContextFrames=5
9	casOut={name='my_fbank_features_table', replace=true};
10	RUN;

FAQ

What is the purpose of the audio.computeFeatures action?

What does the 'audioColumn' parameter specify?

What is the 'casOut' parameter used for?

How can I transfer variables from the input table to the output table?

What are the 'fbankOptions' used for in the computeFeatures action?

In 'fbankOptions', what does 'energyFloor' control?

What is the function of the 'rawEnergy' option within 'fbankOptions'?

How does the 'useEnergy' option in 'fbankOptions' affect the output?

What is the difference between linear and log-filterbank values in FBank computations?

What does the 'usePower' option in 'fbankOptions' do?

What feature scaling methods are available through the 'featureScalingMethod' parameter?

What is the purpose of the 'frameExtractionOptions' parameter?

What does the 'frameLength' option within 'frameExtractionOptions' define?

What is the 'frameShift' option used for in frame extraction?

What window types can be applied during frame extraction using the 'windowType' option?

What are the 'melBanksOptions' used for?

How do you define the frequency range for mel-frequency bins?

What does the 'nBins' option in 'melBanksOptions' control?

What is the purpose of the 'mfccOptions' parameter?

What does the 'nCeps' option in 'mfccOptions' specify?

What is the 'nContextFrames' parameter?

How can I ensure the output table has a specific number of frames?

What is the 'table' parameter used for?

Associated Scenarios

Use Case

Standard MFCC Extraction for Call Center Transcription

A banking call center wants to automate the transcription of customer support calls to analyze sentiment and intent. The speech-to-text model requires standard Mel-Frequency Cep...

View scenario

Use Case

High-Volume FBank Computation with Context for Noise Monitoring

A smart city project deploys thousands of sensors to monitor urban noise pollution. The objective is to classify sound events (sirens, drilling, traffic) using a Deep Neural Net...

View scenario

Use Case

Fixed-Size Input Generation for Edge AI Keyword Spotting

Deploying a lightweight 'Wake Word' detection model (e.g., 'Hello SAS') on edge devices. The Convolutional Neural Network (CNN) requires a strictly fixed input size of 50 frames...

View scenario

Actions associées

audio

loadAudio

The loadAudio action loads audio files from a specified path and caslib into ...

Table of Contents

Description

Data Creation: Loading Audio Files

Examples

Basic Feature Computation (MFCC)

Detailed MFCC Computation with Custom Options

Computing FBank Features with Context Windows

FAQ

Associated Scenarios

Use Case

Standard MFCC Extraction for Call Center Transcription

Use Case

High-Volume FBank Computation with Context for Noise Monitoring

Use Case

Fixed-Size Input Generation for Edge AI Keyword Spotting

Actions associées

loadAudio