Subsetting in SAS: Vertical and Horizontal Techniques Using IF, WHERE, KEEP, and DROP

When working with large datasets in SAS, it's often necessary to reduce the data to just what you need. This is where subsetting comes in. SAS allows you to subset data both vertically (columns) and horizontally (rows) using simple but powerful statements like IF, WHERE, KEEP, and DROP.

In this blog post, we'll explore how to perform vertical and horizontal subsetting in SAS, with practical examples to help you understand how and when to use each method.

🔍 What is Subsetting in SAS?

Subsetting means selecting a portion of your dataset. It can be of two types:

  • Horizontal Subsetting: Selecting specific rows (observations).
  • Vertical Subsetting: Selecting specific columns (variables).

By using subsetting, you can improve program efficiency and focus your analysis on relevant data.

📌 Horizontal Subsetting in SAS (Rows)

Horizontal subsetting is used when you want to filter rows based on conditions. SAS provides two main tools for this:

1. Using the IF Statement

The IF statement is placed inside a DATA step and processes after the data is read into the program.

data work.high_sales;
set work.sales; if amount > 1000; run;

Use Case: When you need to subset after reading the dataset.


2. Using the WHERE Statement

The WHERE statement can be used in both DATA steps and PROC steps, and it filters before the data is read.

data work.high_sales;
set work.sales; where amount > 1000; run;

Use Case: More efficient for large datasets, especially when reading from external sources.

🧠 Key Difference: IF vs WHERE

FeatureIF StatementWHERE Statement
LocationDATA step onlyDATA & PROC steps
EvaluationAfter reading dataBefore reading data
Expression TypeCan use any SAS expressionLimited to dataset variables

📊 Vertical Subsetting in SAS (Columns)

Vertical subsetting is used when you want to select or exclude variables (columns). This is done using the KEEP and DROP statements or options.

1. Using KEEP

The KEEP statement or option specifies which variables to retain in the output dataset.

data work.customer_subset;
set work.customers(keep=customer_id name email); run;

Or:

data work.customer_subset;
set work.customers; keep customer_id name email; run;

2. Using DROP

The DROP statement or option specifies which variables to exclude from the output dataset.

data work.customer_subset;
set work.customers(drop=address phone); run;

Or:

data work.customer_subset;
set work.customers; drop address phone; run;

🧠 Tip: Use KEEP and DROP as Options for Efficiency

Using keep= or drop= as dataset options (inside parentheses after set) can enhance performance by reading only the necessary columns.


🔄 Combining IF, WHERE, KEEP, and DROP

You can combine these statements for efficient, focused data processing:

data work.filtered_customers;
set work.customers(keep=customer_id age gender); where age >= 30; run;

Or:

data work.filtered_customers;
set work.customers; if age >= 30; drop address phone; run;

✅ Best Practices for Subsetting in SAS

  • Use WHERE for faster filtering during data input.
  • Use KEEP/DROP as dataset options to limit variables early.
  • Combine vertical and horizontal subsetting to optimize performance.
  • Avoid unnecessary variables in your output dataset to reduce file size and memory usage.


🧾 Conclusion

Subsetting data in SAS using IF, WHERE, KEEP, and DROP is a fundamental skill for any data professional. By mastering both horizontal (row) and vertical (column) subsetting techniques, you can write cleaner, faster, and more efficient SAS programs.

Whether you're working with millions of records or just prepping data for analysis, these tools give you the control to get exactly the data you need.

    Post a Comment

    0 Comments