longToWide - WeAreCAS

Q: What is the purpose of the longToWide action?

The longToWide action reshapes a table containing thin records (long format) into a table containing wide records (wide format).

Q: What does the 'id' parameter specify in the longToWide action?

The 'id' parameter specifies the ID variables to copy from the input table to the output casOut table.

Q: How do you specify the output table for the longToWide action?

You use the 'casOut' parameter to specify the settings for the output table, including its name and the caslib where it should be stored. For example: casOut={'name':'wide_table', 'caslib':'casuser'}.

Q: What are the 'inputs' variables in the context of the longToWide action?

The 'inputs' parameter specifies the variables from the long-format table that will be transposed into new variables in the wide-format table.

Q: How can you compute statistics like sum, mean, or max during the reshaping process?

You can use the 'sum', 'mean', 'max', 'min', 'range', or 'nMiss' parameters to specify the variables for which you want to compute these statistics. The results are added as new variables to the output table.

Q: What is the function of the 'groupBy' parameter?

The 'groupBy' parameter specifies the variables to use for grouping results. Each unique combination of the groupBy variables will form a single row in the output wide-format table.

Description

Reshapes a table from a long format (multiple rows per subject) to a wide format (one row per subject with multiple columns for variables). This action is useful for preparing data for analyses that require a wide data structure.

proc cas; datashaping.longToWide / table={name='<long_table>', groupBy={'<variable_for_new_columns>'}} id={'<variable_for_rows>'} inputs={'<variable_to_populate_cells>'} casOut={name='<wide_table>', replace=true}; run;

Settings

Parameter	Description
table	Specifies the input long-format table. Use its `groupBy` sub-parameter to specify the classification variable whose values will define the new columns in the output wide table.
id	Specifies the variable(s) that identify the observations. Each unique combination of ID variable values will form a single row in the output wide table.
inputs	Specifies the numeric variable(s) whose values will populate the cells of the new columns in the wide table.
casOut	Specifies the output wide-format table.
attributes	Specifies attributes for the variables, such as formats and labels.
charSeparatorChar	Specifies a character to use as a separator in the names of new variables when concatenating character values.
numSeparatorNum	Specifies a character to use as a separator in the names of new variables when concatenating numeric values.
cumFreqName	Specifies the variable name for the cumulative frequency in the output table.
frequencyName	Specifies the variable name for the frequency in the output table.
groupIdName	Specifies the variable in the output table that contains the group ID.
keyModify	Specifies modifications to character key values, such as converting to uppercase (U) or compressing blanks (C).
maxPosition	Specifies the maximum value of the position variable to consider. Records with a position value greater than this are ignored.
noPrefix	When set to True, prevents prefixing the statistic name to the variable name in the output table (e.g., '_sum' instead of 'var_sum').
sum	Specifies numeric variables for which to compute the sum for each ID group.
mean	Specifies numeric variables for which to compute the mean for each ID group.
min	Specifies numeric variables for which to compute the minimum value for each ID group.
max	Specifies numeric variables for which to compute the maximum value for each ID group.
range	Specifies numeric variables for which to compute the range of values for each ID group.
nMiss	Specifies variables for which to count the number of missing values for each ID group.
orderByTable	Specifies a pre-sorted and grouped table to improve performance, typically from the groupBy or groupByInfo actions.

Data Preparation View data prep sheet

Create a Sample Long-Format Dataset

This SAS code creates a sample table named 'sales_long' in the active caslib. The table is in a long format, with each row representing the sales of a single product for a single month. This format is ideal for reshaping into a wide format.

Copied!

1	DATA mycas.sales_long;
2	LENGTH product $ 10 month $ 3;
3	INFILE DATALINES;
4	INPUT product $ month $ sales;
5	DATALINES;
6	apple jan 100
7	apple feb 110
8	apple mar 120
9	orange jan 80
10	orange feb 85
11	orange mar 95
12	banana jan 120
13	banana feb 125
14	banana mar 130
15	;
16	RUN;

Examples

This example converts the 'sales_long' table to a wide format. 'product' becomes the key row identifier, the unique values of 'month' become new columns, and the 'sales' values fill the cells of these new columns.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	datashaping.longToWide /
3	TABLE={name='sales_long', groupBy={'month'}}
4	id={'product'}
5	inputs={'sales'}
6	casOut={name='sales_wide', replace=true};
7	RUN;
8
9	PROC PRINT DATA=mycas.sales_wide; RUN;

Result :
The output table 'sales_wide' contains one row for each product. It has columns named 'product', 'sales_jan', 'sales_feb', and 'sales_mar' populated with the corresponding sales data.

This example reshapes the long table while also generating summary statistics. The 'sum' and 'mean' parameters compute the total and average sales for each product. The 'numSeparatorNum' parameter specifies an underscore '_' to separate the original input variable name ('sales') from the group-by variable values ('jan', 'feb', 'mar') in the new column names.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	datashaping.longToWide /
3	TABLE={name='sales_long', groupBy={'month'}}
4	id={'product'}
5	inputs={'sales'}
6	sum={'sales'}
7	mean={'sales'}
8	numSeparatorNum='_'
9	casOut={name='sales_wide_stats', replace=true};
10	RUN;
11
12	PROC PRINT DATA=mycas.sales_wide_stats; RUN;

Result :
The output table 'sales_wide_stats' contains columns for monthly sales (e.g., 'sales_jan'), plus two additional columns: 'sales_sum' and 'sales_mean', which show the total and average sales across all months for each product.

FAQ

What is the purpose of the longToWide action?

What does the 'id' parameter specify in the longToWide action?

How do you specify the output table for the longToWide action?

What are the 'inputs' variables in the context of the longToWide action?

How can you compute statistics like sum, mean, or max during the reshaping process?

What is the function of the 'groupBy' parameter?

Table of Contents

Description

Create a Sample Long-Format Dataset

Examples

Basic Reshaping from Long to Wide

Reshaping with Custom Column Naming and Statistics

FAQ