When working with large datasets in SAS, it's often necessary to reduce the data to just what you need. This is where subsetting comes in. SAS allows you to subset data both vertically (columns) and horizontally (rows) using simple but powerful statements like IF, WHERE, KEEP, and DROP.
In this blog post, we'll explore how to perform vertical and horizontal subsetting in SAS, with practical examples to help you understand how and when to use each method.
🔍 What is Subsetting in SAS?
Subsetting means selecting a portion of your dataset. It can be of two types:
- Horizontal Subsetting: Selecting specific rows (observations).
- Vertical Subsetting: Selecting specific columns (variables).
By using subsetting, you can improve program efficiency and focus your analysis on relevant data.
📌 Horizontal Subsetting in SAS (Rows)
Horizontal subsetting is used when you want to filter rows based on conditions. SAS provides two main tools for this:
1. Using the IF Statement
The IF statement is placed inside a DATA step and processes after the data is read into the program.
✅ Use Case: When you need to subset after reading the dataset.
2. Using the WHERE Statement
The WHERE statement can be used in both DATA steps and PROC steps, and it filters before the data is read.
✅ Use Case: More efficient for large datasets, especially when reading from external sources.
🧠 Key Difference: IF vs WHERE
| Feature | IF Statement | WHERE Statement |
|---|---|---|
| Location | DATA step only | DATA & PROC steps |
| Evaluation | After reading data | Before reading data |
| Expression Type | Can use any SAS expression | Limited to dataset variables |
📊 Vertical Subsetting in SAS (Columns)
Vertical subsetting is used when you want to select or exclude variables (columns). This is done using the KEEP and DROP statements or options.
1. Using KEEP
The KEEP statement or option specifies which variables to retain in the output dataset.
Or:
2. Using DROP
The DROP statement or option specifies which variables to exclude from the output dataset.
Or:
🧠 Tip: Use KEEP and DROP as Options for Efficiency
Using keep= or drop= as dataset options (inside parentheses after set) can enhance performance by reading only the necessary columns.
🔄 Combining IF, WHERE, KEEP, and DROP
You can combine these statements for efficient, focused data processing:
Or:
✅ Best Practices for Subsetting in SAS
- Use
WHEREfor faster filtering during data input. - Use
KEEP/DROPas dataset options to limit variables early. - Combine vertical and horizontal subsetting to optimize performance.
- Avoid unnecessary variables in your output dataset to reduce file size and memory usage.
🧾 Conclusion
Subsetting data in SAS using IF, WHERE, KEEP, and DROP is a fundamental skill for any data professional. By mastering both horizontal (row) and vertical (column) subsetting techniques, you can write cleaner, faster, and more efficient SAS programs.
Whether you're working with millions of records or just prepping data for analysis, these tools give you the control to get exactly the data you need.
0 Comments
If you have any doubt please comment or write us to - datahark12@gmail.com