When working with large datasets in SAS, it's often necessary to reduce the data to just what you need. This is where subsetting comes in. SAS allows you to subset data both vertically (columns) and horizontally (rows) using simple but powerful statements like IF
, WHERE
, KEEP
, and DROP
.
In this blog post, we'll explore how to perform vertical and horizontal subsetting in SAS, with practical examples to help you understand how and when to use each method.
🔍 What is Subsetting in SAS?
Subsetting means selecting a portion of your dataset. It can be of two types:
- Horizontal Subsetting: Selecting specific rows (observations).
- Vertical Subsetting: Selecting specific columns (variables).
By using subsetting, you can improve program efficiency and focus your analysis on relevant data.
📌 Horizontal Subsetting in SAS (Rows)
Horizontal subsetting is used when you want to filter rows based on conditions. SAS provides two main tools for this:
1. Using the IF
Statement
The IF
statement is placed inside a DATA step and processes after the data is read into the program.
✅ Use Case: When you need to subset after reading the dataset.
2. Using the WHERE
Statement
The WHERE
statement can be used in both DATA steps and PROC steps, and it filters before the data is read.
✅ Use Case: More efficient for large datasets, especially when reading from external sources.
🧠 Key Difference: IF vs WHERE
Feature | IF Statement | WHERE Statement |
---|---|---|
Location | DATA step only | DATA & PROC steps |
Evaluation | After reading data | Before reading data |
Expression Type | Can use any SAS expression | Limited to dataset variables |
📊 Vertical Subsetting in SAS (Columns)
Vertical subsetting is used when you want to select or exclude variables (columns). This is done using the KEEP
and DROP
statements or options.
1. Using KEEP
The KEEP
statement or option specifies which variables to retain in the output dataset.
Or:
2. Using DROP
The DROP
statement or option specifies which variables to exclude from the output dataset.
Or:
🧠 Tip: Use KEEP and DROP as Options for Efficiency
Using keep=
or drop=
as dataset options (inside parentheses after set
) can enhance performance by reading only the necessary columns.
🔄 Combining IF, WHERE, KEEP, and DROP
You can combine these statements for efficient, focused data processing:
Or:
✅ Best Practices for Subsetting in SAS
- Use
WHERE
for faster filtering during data input. - Use
KEEP
/DROP
as dataset options to limit variables early. - Combine vertical and horizontal subsetting to optimize performance.
- Avoid unnecessary variables in your output dataset to reduce file size and memory usage.
🧾 Conclusion
Subsetting data in SAS using IF
, WHERE
, KEEP
, and DROP
is a fundamental skill for any data professional. By mastering both horizontal (row) and vertical (column) subsetting techniques, you can write cleaner, faster, and more efficient SAS programs.
Whether you're working with millions of records or just prepping data for analysis, these tools give you the control to get exactly the data you need.
0 Comments
If you have any doubt please comment or write us to - datahark12@gmail.com