We Used 3 Feature Selection Techniques: This One Worked Best

Image by Editor

# Introduction

In any machine learning project, feature selection can make or break your model. Selecting the optimal subset of features reduces noise, prevents overfitting, enhances interpretability, and often improves accuracy. With too many irrelevant or redundant variables, models become bloated and harder to train. With too few, they risk missing critical signals.

To tackle this challenge, we experimented with three popular feature selection techniques on a real dataset. The goal was to determine which approach would provide the best balance of performance, interpretability, and efficiency. In this article, we share our experience testing three feature selection techniques and reveal which one worked best for our dataset.

# Why Feature Selection Matters

When building machine learning models, especially on high-dimensional datasets, not all features contribute equally. A leaner, more informative set of inputs offers several advantages:

Reduced overfitting – Eliminating irrelevant variables helps models generalize better to unseen data.
Faster Training – Fewer features mean faster training and lower computational cost.
Better Interpretability – With a compact set of predictors, it’s easier to explain what drives model decisions.

# The Dataset

For this experiment, we used the Diabetes dataset from scikit-learn. It contains 442 patient records with 10 baseline features such as body mass index (BMI), blood pressure, several serum measurements, and age. The target variable is a quantitative measure of disease progression one year after baseline.

Let’s load the dataset and prepare it:

import pandas as pd
from sklearn.datasets import load_diabetes

# Load dataset
data = load_diabetes(as_frame=True)
df = data.frame

X = df.drop(columns=['target'])
y = df['target']

print(df.head())

Here, X contains the features, and y contains the target. We now have everything ready to apply different feature selection methods.

# Filter Method

Filter methods rank or eliminate features based on statistical properties rather than by training a model. They are simple, fast, and give a quick way to remove obvious redundancies.

For this dataset, we checked for highly correlated features and dropped any that exceeded a correlation threshold of 0.85.

import numpy as np

corr = X.corr()
threshold = 0.85
upper = corr.abs().where(np.triu(np.ones(corr.shape), k=1).astype(bool))
to_drop = [col for col in upper.columns if any(upper[col] > threshold)]
X_filter = X.drop(columns=to_drop)
print("Remaining features after filter:", X_filter.columns.tolist())

Output:

Remaining features after filter: ['age', 'sex', 'bmi', 'bp', 's1', 's3', 's4', 's5', 's6']

Only one redundant feature was removed, so the dataset retained 9 of the 10 predictors. This shows the Diabetes dataset is relatively clean in terms of correlation.

# Wrapper Method

Wrapper methods evaluate subsets of features by actually training models and checking performance. One popular technique is Recursive Feature Elimination (RFE).

RFE starts with all features, fits a model, ranks them by importance, and recursively removes the least useful ones until the desired number of features remains.

from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE

lr = LinearRegression()
rfe = RFE(lr, n_features_to_select=5)
rfe.fit(X, y)

selected_rfe = X.columns[rfe.support_]
print("Selected by RFE:", selected_rfe.tolist())

Selected by RFE: ['bmi', 'bp', 's1', 's2', 's5']

RFE selected 5 features out of 10. The trade-off is that this approach is more computationally expensive since it requires multiple rounds of model fitting.

# Embedded Method

Embedded methods integrate feature selection into the model training process. Lasso Regression (L1 regularization) is a classic example. It penalizes feature weights, shrinking less important ones to zero.

from sklearn.linear_model import LassoCV

lasso = LassoCV(cv=5, random_state=42).fit(X, y)

coef = pd.Series(lasso.coef_, index=X.columns)
selected_lasso = coef[coef != 0].index
print("Selected by Lasso:", selected_lasso.tolist())

Selected by Lasso: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's4', 's5', 's6']

Lasso retained 9 features and eliminated one that contributed little predictive power. Unlike filter methods, however, this decision was based on model performance, not just correlation.

# Results Comparison

To evaluate each approach, we trained a Linear Regression model on the selected feature sets. We used 5-fold cross-validation and measured performance using R² score and Mean Squared Error (MSE).

from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LinearRegression

# Helper evaluation function
def evaluate_model(X, y, model):
    cv = KFold(n_splits=5, shuffle=True, random_state=42)
    r2_scores = cross_val_score(model, X, y, cv=cv, scoring="r2")
    mse_scores = cross_val_score(model, X, y, cv=cv, scoring="neg_mean_squared_error")
    return r2_scores.mean(), -mse_scores.mean()

# 1. Filter Method results
lr = LinearRegression()
r2_filter, mse_filter = evaluate_model(X_filter, y, lr)

# 2. Wrapper (RFE) results
X_rfe = X[selected_rfe]
r2_rfe, mse_rfe = evaluate_model(X_rfe, y, lr)

# 3. Embedded (Lasso) results
X_lasso = X[selected_lasso]
r2_lasso, mse_lasso = evaluate_model(X_lasso, y, lr)

# Print results
print("=== Results Comparison ===")
print(f"Filter Method   -> R2: {r2_filter:.4f}, MSE: {mse_filter:.2f}, Features: {X_filter.shape[1]}")
print(f"Wrapper (RFE)   -> R2: {r2_rfe:.4f}, MSE: {mse_rfe:.2f}, Features: {X_rfe.shape[1]}")
print(f"Embedded (Lasso)-> R2: {r2_lasso:.4f}, MSE: {mse_lasso:.2f}, Features: {X_lasso.shape[1]}")

=== Results Comparison ===
Filter Method   -> R2: 0.4776, MSE: 3021.77, Features: 9
Wrapper (RFE)   -> R2: 0.4657, MSE: 3087.79, Features: 5
Embedded (Lasso)-> R2: 0.4818, MSE: 2996.21, Features: 9

The Filter method removed only one redundant feature and gave good baseline performance. The Wrapper (RFE) cut the feature set in half but slightly reduced accuracy. The Embedded (Lasso) retained 9 features and delivered the best R² and lowest MSE. Overall, Lasso offered the best balance of accuracy, efficiency, and interpretability.

# Conclusion

Feature selection is not simply a preprocessing step but a strategic decision that shapes the overall success of a machine learning pipeline. Our experiment reinforced that while simple filters and exhaustive wrappers each have their place, embedded methods like Lasso often provide the sweet spot.

On the Diabetes dataset, Lasso regularization emerged as the clear winner. It helped us build a faster, more accurate, and more interpretable model without the heavy computation of wrapper methods or the oversimplification of filters.

For practitioners, the takeaway is this: don’t rely on a single method blindly. Start with quick filters to prune obvious redundancies, try wrappers if you need exhaustive exploration, but always consider embedded methods like Lasso for a practical balance.

Jayita Gulati is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Master’s degree in Computer Science from the University of Liverpool.

Source link

We Used 3 Feature Selection Techniques: This One Worked Best

# Introduction

# Why Feature Selection Matters

# The Dataset

# Filter Method

# Wrapper Method

# Embedded Method

# Results Comparison

# Conclusion

Leave a comment Cancel reply

You May Also Like

Easily Integrate LLMs into Your Scikit-learn Workflow with Scikit-LLM

KDnuggets News, January 10: CS Degree Program For Free • Prompt Engineering 101 • 2023: The Crazy AI Year