
Image by Editor
# Introduction
In any machine learning project, feature selection can make or break your model. Selecting the optimal subset of features reduces noise, prevents overfitting, enhances interpretability, and often improves accuracy. With too many irrelevant or redundant variables, models become bloated and harder to train. With too few, they risk missing critical signals.
To tackle this challenge, we experimented with three popular feature selection techniques on a real dataset. The goal was to determine which approach would provide the best balance of performance, interpretability, and efficiency. In this article, we share our experience testing three feature selection techniques and reveal which one worked best for our dataset.
# Why Feature Selection Matters
When building machine learning models, especially on high-dimensional datasets, not all features contribute equally. A leaner, more informative set of inputs offers several advantages:
- Reduced overfitting – Eliminating irrelevant variables helps models generalize better to unseen data.
- Faster Training – Fewer features mean faster training and lower computational cost.
- Better Interpretability – With a compact set of predictors, it’s easier to explain what drives model decisions.
# The Dataset
For this experiment, we used the Diabetes dataset from scikit-learn. It contains 442 patient records with 10 baseline features such as body mass index (BMI), blood pressure, several serum measurements, and age. The target variable is a quantitative measure of disease progression one year after baseline.
Let’s load the dataset and prepare it:
import pandas as pd
from sklearn.datasets import load_diabetes
# Load dataset
data = load_diabetes(as_frame=True)
df = data.frame
X = df.drop(columns=['target'])
y = df['target']
print(df.head())
Here, X
contains the features, and y
contains the target. We now have everything ready to apply different feature selection methods.
# Filter Method
Filter methods rank or eliminate features based on statistical properties rather than by training a model. They are simple, fast, and give a quick way to remove obvious redundancies.
For this dataset, we checked for highly correlated features and dropped any that exceeded a correlation threshold of 0.85.
import numpy as np
corr = X.corr()
threshold = 0.85
upper = corr.abs().where(np.triu(np.ones(corr.shape), k=1).astype(bool))
to_drop = [col for col in upper.columns if any(upper[col] > threshold)]
X_filter = X.drop(columns=to_drop)
print("Remaining features after filter:", X_filter.columns.tolist())
Output:
Remaining features after filter: ['age', 'sex', 'bmi', 'bp', 's1', 's3', 's4', 's5', 's6']
Only one redundant feature was removed, so the dataset retained 9 of the 10 predictors. This shows the Diabetes dataset is relatively clean in terms of correlation.
# Wrapper Method
Wrapper methods evaluate subsets of features by actually training models and checking performance. One popular technique is Recursive Feature Elimination (RFE).
RFE starts with all features, fits a model, ranks them by importance, and recursively removes the least useful ones until the desired number of features remains.
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
lr = LinearRegression()
rfe = RFE(lr, n_features_to_select=5)
rfe.fit(X, y)
selected_rfe = X.columns[rfe.support_]
print("Selected by RFE:", selected_rfe.tolist())
Selected by RFE: ['bmi', 'bp', 's1', 's2', 's5']
RFE selected 5 features out of 10. The trade-off is that this approach is more computationally expensive since it requires multiple rounds of model fitting.
# Embedded Method
Embedded methods integrate feature selection into the model training process. Lasso Regression (L1 regularization) is a classic example. It penalizes feature weights, shrinking less important ones to zero.
from sklearn.linear_model import LassoCV
lasso = LassoCV(cv=5, random_state=42).fit(X, y)
coef = pd.Series(lasso.coef_, index=X.columns)
selected_lasso = coef[coef != 0].index
print("Selected by Lasso:", selected_lasso.tolist())
Selected by Lasso: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's4', 's5', 's6']
Lasso retained 9 features and eliminated one that contributed little predictive power. Unlike filter methods, however, this decision was based on model performance, not just correlation.
# Results Comparison
To evaluate each approach, we trained a Linear Regression model on the selected feature sets. We used 5-fold cross-validation and measured performance using R² score and Mean Squared Error (MSE).
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LinearRegression
# Helper evaluation function
def evaluate_model(X, y, model):
cv = KFold(n_splits=5, shuffle=True, random_state=42)
r2_scores = cross_val_score(model, X, y, cv=cv, scoring="r2")
mse_scores = cross_val_score(model, X, y, cv=cv, scoring="neg_mean_squared_error")
return r2_scores.mean(), -mse_scores.mean()
# 1. Filter Method results
lr = LinearRegression()
r2_filter, mse_filter = evaluate_model(X_filter, y, lr)
# 2. Wrapper (RFE) results
X_rfe = X[selected_rfe]
r2_rfe, mse_rfe = evaluate_model(X_rfe, y, lr)
# 3. Embedded (Lasso) results
X_lasso = X[selected_lasso]
r2_lasso, mse_lasso = evaluate_model(X_lasso, y, lr)
# Print results
print("=== Results Comparison ===")
print(f"Filter Method -> R2: {r2_filter:.4f}, MSE: {mse_filter:.2f}, Features: {X_filter.shape[1]}")
print(f"Wrapper (RFE) -> R2: {r2_rfe:.4f}, MSE: {mse_rfe:.2f}, Features: {X_rfe.shape[1]}")
print(f"Embedded (Lasso)-> R2: {r2_lasso:.4f}, MSE: {mse_lasso:.2f}, Features: {X_lasso.shape[1]}")
=== Results Comparison ===
Filter Method -> R2: 0.4776, MSE: 3021.77, Features: 9
Wrapper (RFE) -> R2: 0.4657, MSE: 3087.79, Features: 5
Embedded (Lasso)-> R2: 0.4818, MSE: 2996.21, Features: 9
The Filter method removed only one redundant feature and gave good baseline performance. The Wrapper (RFE) cut the feature set in half but slightly reduced accuracy. The Embedded (Lasso) retained 9 features and delivered the best R² and lowest MSE. Overall, Lasso offered the best balance of accuracy, efficiency, and interpretability.
# Conclusion
Feature selection is not simply a preprocessing step but a strategic decision that shapes the overall success of a machine learning pipeline. Our experiment reinforced that while simple filters and exhaustive wrappers each have their place, embedded methods like Lasso often provide the sweet spot.
On the Diabetes dataset, Lasso regularization emerged as the clear winner. It helped us build a faster, more accurate, and more interpretable model without the heavy computation of wrapper methods or the oversimplification of filters.
For practitioners, the takeaway is this: don’t rely on a single method blindly. Start with quick filters to prune obvious redundancies, try wrappers if you need exhaustive exploration, but always consider embedded methods like Lasso for a practical balance.
Jayita Gulati is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Master’s degree in Computer Science from the University of Liverpool.