dataShaping

longToWide

Description

Reshapes a table from a long format (multiple rows per subject) to a wide format (one row per subject with multiple columns for variables). This action is useful for preparing data for analyses that require a wide data structure.

proc cas; datashaping.longToWide / table={name='<long_table>', groupBy={'<variable_for_new_columns>'}} id={'<variable_for_rows>'} inputs={'<variable_to_populate_cells>'} casOut={name='<wide_table>', replace=true}; run;
Settings
ParameterDescription
tableSpecifies the input long-format table. Use its `groupBy` sub-parameter to specify the classification variable whose values will define the new columns in the output wide table.
idSpecifies the variable(s) that identify the observations. Each unique combination of ID variable values will form a single row in the output wide table.
inputsSpecifies the numeric variable(s) whose values will populate the cells of the new columns in the wide table.
casOutSpecifies the output wide-format table.
attributesSpecifies attributes for the variables, such as formats and labels.
charSeparatorCharSpecifies a character to use as a separator in the names of new variables when concatenating character values.
numSeparatorNumSpecifies a character to use as a separator in the names of new variables when concatenating numeric values.
cumFreqNameSpecifies the variable name for the cumulative frequency in the output table.
frequencyNameSpecifies the variable name for the frequency in the output table.
groupIdNameSpecifies the variable in the output table that contains the group ID.
keyModifySpecifies modifications to character key values, such as converting to uppercase (U) or compressing blanks (C).
maxPositionSpecifies the maximum value of the position variable to consider. Records with a position value greater than this are ignored.
noPrefixWhen set to True, prevents prefixing the statistic name to the variable name in the output table (e.g., '_sum' instead of 'var_sum').
sumSpecifies numeric variables for which to compute the sum for each ID group.
meanSpecifies numeric variables for which to compute the mean for each ID group.
minSpecifies numeric variables for which to compute the minimum value for each ID group.
maxSpecifies numeric variables for which to compute the maximum value for each ID group.
rangeSpecifies numeric variables for which to compute the range of values for each ID group.
nMissSpecifies variables for which to count the number of missing values for each ID group.
orderByTableSpecifies a pre-sorted and grouped table to improve performance, typically from the groupBy or groupByInfo actions.
Data Preparation View data prep sheet
Create a Sample Long-Format Dataset

This SAS code creates a sample table named 'sales_long' in the active caslib. The table is in a long format, with each row representing the sales of a single product for a single month. This format is ideal for reshaping into a wide format.

Copied!
1DATA mycas.sales_long;
2 LENGTH product $ 10 month $ 3;
3 INFILE DATALINES;
4 INPUT product $ month $ sales;
5 DATALINES;
6 apple jan 100
7 apple feb 110
8 apple mar 120
9 orange jan 80
10 orange feb 85
11 orange mar 95
12 banana jan 120
13 banana feb 125
14 banana mar 130
15 ;
16 RUN;

Examples

This example converts the 'sales_long' table to a wide format. 'product' becomes the key row identifier, the unique values of 'month' become new columns, and the 'sales' values fill the cells of these new columns.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 datashaping.longToWide /
3 TABLE={name='sales_long', groupBy={'month'}}
4 id={'product'}
5 inputs={'sales'}
6 casOut={name='sales_wide', replace=true};
7 RUN;
8
9 PROC PRINT DATA=mycas.sales_wide; RUN;
Result :
The output table 'sales_wide' contains one row for each product. It has columns named 'product', 'sales_jan', 'sales_feb', and 'sales_mar' populated with the corresponding sales data.

This example reshapes the long table while also generating summary statistics. The 'sum' and 'mean' parameters compute the total and average sales for each product. The 'numSeparatorNum' parameter specifies an underscore '_' to separate the original input variable name ('sales') from the group-by variable values ('jan', 'feb', 'mar') in the new column names.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 datashaping.longToWide /
3 TABLE={name='sales_long', groupBy={'month'}}
4 id={'product'}
5 inputs={'sales'}
6 sum={'sales'}
7 mean={'sales'}
8 numSeparatorNum='_'
9 casOut={name='sales_wide_stats', replace=true};
10 RUN;
11
12 PROC PRINT DATA=mycas.sales_wide_stats; RUN;
Result :
The output table 'sales_wide_stats' contains columns for monthly sales (e.g., 'sales_jan'), plus two additional columns: 'sales_sum' and 'sales_mean', which show the total and average sales across all months for each product.

FAQ

What is the purpose of the longToWide action?
What does the 'id' parameter specify in the longToWide action?
How do you specify the output table for the longToWide action?
What are the 'inputs' variables in the context of the longToWide action?
How can you compute statistics like sum, mean, or max during the reshaping process?
What is the function of the 'groupBy' parameter?