Logistic Regression How to find top three feature that have highest weights?

Logistic Regression

Feature Selection

Machine Learning

Model Interpretation

Data Science

Logistic Regression How to find top three feature that have highest weights?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

In logistic regression, feature importance is usually read from the learned coefficients. The larger the absolute value of a coefficient, the more strongly that feature pushes the prediction toward one class or the other, assuming the features are on comparable scales.

What the Weights Mean

A logistic regression model learns one coefficient per feature for binary classification. A positive coefficient pushes the prediction toward the positive class as the feature increases. A negative coefficient pushes it the other way.

When people ask for the "top three features," they usually mean one of two things:

the three largest positive coefficients
the three coefficients with the largest absolute values

For overall influence, absolute value is the better default because a large negative weight can matter just as much as a large positive one.

A Runnable Example in `scikit-learn`

The example below trains a logistic regression model on a built-in dataset, scales the features, and prints the three coefficients with the largest absolute values.

python

1import numpy as np
2from sklearn.datasets import load_breast_cancer
3from sklearn.linear_model import LogisticRegression
4from sklearn.preprocessing import StandardScaler
5
6# Load data
7X, y = load_breast_cancer(return_X_y=True)
8feature_names = load_breast_cancer().feature_names
9
10# Scale features so coefficient magnitudes are comparable
11scaler = StandardScaler()
12X_scaled = scaler.fit_transform(X)
13
14# Train model
15model = LogisticRegression(max_iter=1000)
16model.fit(X_scaled, y)
17
18# Rank features by absolute coefficient value
19coefficients = model.coef_[0]
20top_indices = np.argsort(np.abs(coefficients))[-3:][::-1]
21
22for idx in top_indices:
23    print(feature_names[idx], coefficients[idx])

A typical output looks like this:

text

worst radius -1.0318
worst area -1.0041
mean concave points -0.9437

The sign tells you the direction of the effect. The absolute value tells you the size of the effect in the fitted model.

Why Scaling Matters

Comparing raw coefficients only makes sense when the features are measured on similar scales. If one feature is in dollars and another is in fractions, the larger coefficient may simply reflect the units rather than true predictive influence.

That is why the example standardizes the input first. After scaling, coefficient magnitudes are far more meaningful for ranking.

If your data is already normalized in a principled way, you may not need an extra scaling step. But if you skip scaling blindly, feature ranking becomes much less trustworthy.

Binary vs Multiclass Models

For binary classification, model.coef_[0] gives one coefficient per feature. For multiclass logistic regression, coef_ becomes a matrix with one row per class. In that case you need to decide what "top features" means:

top features for a specific class
top features averaged across classes
top features by maximum absolute weight over all classes

The ranking method should match the question you are trying to answer.

Coefficients Are Useful, but Not Perfect

Coefficient ranking is a good first interpretation tool, but it is not the same thing as causal importance. Correlated features can split influence between themselves, which may make each individual coefficient look smaller than expected.

Regularization also matters. Logistic regression in scikit-learn uses regularization by default, which shrinks weights toward zero. That is often desirable for generalization, but it changes the raw magnitude of the coefficients.

If interpretability is the main goal, it is worth checking how stable your top-three list is across different train-test splits or regularization strengths.

Common Pitfalls

The most common mistake is ranking coefficients on unscaled data. That makes the comparison unfair because coefficient size reflects feature units.

Another mistake is looking only at positive coefficients. Strong negative coefficients are just as important when measuring influence.

For one-hot encoded categorical variables, be careful about interpretation. A large coefficient on one encoded column only makes sense relative to the omitted reference category.

Summary

For binary logistic regression, the learned coefficients live in model.coef_[0].
Use absolute coefficient values to find the most influential features overall.
Scale numeric features before comparing coefficient magnitudes.
The sign of a coefficient shows direction, while the absolute value shows strength.
Correlation and regularization can change the ranking, so treat coefficient-based importance as an informed approximation.

Logistic Regression How to find top three feature that have highest weights?

Master System Design with Codemia

Introduction

What the Weights Mean

A Runnable Example in scikit-learn

Why Scaling Matters

Binary vs Multiclass Models

Coefficients Are Useful, but Not Perfect

Common Pitfalls

Summary

A Runnable Example in `scikit-learn`