Creating Variables/Columns in a SAS Dataset

In the world of data manipulation and statistical programming, SAS (Statistical Analysis System) is a powerhouse. One of the foundational skills when working with SAS datasets is the ability to create new variables or columns. Whether you're preparing data for analysis or creating flags and indicators, understanding how to add new variables efficiently is crucial.

In this post, we’ll walk through several ways to create new variables in a SAS dataset using the DATA step and some handy functions.


Why Create New Variables?

Creating new variables allows you to:

  • Transform raw data into usable metrics (e.g., calculate age from a date of birth).
  • Create indicators or flags for filtering or analysis.
  • Aggregate or categorize continuous data.
  • Simplify complex logic into a readable format.


Method 1: Using the DATA Step

The most common way to create a new variable is within a DATA step. Here’s a simple example:

data work.sales;
set work.orders;
total_price = unit_price * quantity;
discount_flag = (discount > 0);
run;

Explanation:

  • total_price is a new numeric variable calculated by multiplying unit_price and quantity.
  • discount_flag is a new variable that will be 1 if there is a discount and 0 otherwise.


Method 2: Conditional Logic with IF-THEN/ELSE

Sometimes you need more control over how a variable is created:

data work.sales;
set work.orders;
if quantity > 10 then bulk_order = 'Yes';
else bulk_order = 'No';
run;

Note:

  • bulk_order is a character variable. In SAS, always ensure character variables are assigned character values (quoted strings), or define their length beforehand using a LENGTH statement.

Length bulk_order $3;

Method 3: Using SAS Functions

SAS has a wide array of built-in functions for working with dates, strings, math, and more.

data work.customer_info;
set work.customers;
age = int((today() - birth_date) / 365.25);
full_name = catx(' ', first_name, last_name);
run;

Functions used:

  • today() returns the current date.
  • catx() concatenates strings with a delimiter.


Method 4: Creating Dummy Variables or Indicators

For modeling or filtering, you often need binary variables:

data work.customers;
set work.customers;
is_senior = (age >= 65);
run;

This creates a variable is_senior that is 1 if age is 65 or above, and 0 otherwise.


Best Practices

  • Name wisely: Use meaningful, concise variable names.
  • Use LENGTH for character variables: Define the length explicitly to prevent truncation or memory waste.
  • Check your data: Use PROC PRINT or PROC CONTENTS to verify your new columns.

proc contents data=work.sales; run;
proc print data=work.sales (obs=10); run;

Conclusion

Creating new variables in SAS is straightforward once you understand the DATA step and basic functions. Whether you're computing totals, creating indicators, or manipulating text, SAS provides powerful tools to transform and enrich your data.

Mastering this foundational skill will significantly improve your data preparation and analysis capabilities in SAS.