feature importance
one-hot encoding
decision tree
machine learning
model interpretability

How to explain feature importance after one-hot encode used for decision tree

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

After one-hot encoding, a single categorical feature becomes several binary columns, so a decision tree reports importance for the expanded columns rather than for the original human-friendly feature. To explain the model properly, you usually need to map those binary columns back to their original feature group and interpret the grouped importance instead of reading each dummy variable in isolation.

Why One-Hot Encoding Changes the Explanation

Suppose the original feature is Color with values Red, Green, and Blue. After one-hot encoding, the tree sees separate columns such as:

  • 'Color_Red'
  • 'Color_Green'
  • 'Color_Blue'

The tree can split on any of those binary columns independently. As a result, feature_importances_ reports importance for those derived columns, not for Color as one conceptual variable.

That is why a raw importance table can feel misleading. The original feature has been decomposed.

Train a Small Example

Here is a simple scikit-learn pipeline with one-hot encoding and a decision tree:

python
1import pandas as pd
2from sklearn.compose import ColumnTransformer
3from sklearn.pipeline import Pipeline
4from sklearn.preprocessing import OneHotEncoder
5from sklearn.tree import DecisionTreeClassifier
6
7X = pd.DataFrame({
8    "Color": ["Red", "Blue", "Green", "Red", "Blue", "Green"],
9    "Size": [1, 2, 1, 3, 2, 3]
10})
11y = [1, 0, 1, 1, 0, 0]
12
13preprocessor = ColumnTransformer([
14    ("cat", OneHotEncoder(handle_unknown="ignore"), ["Color"]),
15    ("num", "passthrough", ["Size"])
16])
17
18model = Pipeline([
19    ("prep", preprocessor),
20    ("tree", DecisionTreeClassifier(random_state=42))
21])
22
23model.fit(X, y)

The tree importance values now belong to the transformed feature space, not the original dataframe columns.

Recover the Expanded Feature Names

To explain the importances, first get the transformed column names:

python
1feature_names = model.named_steps["prep"].get_feature_names_out()
2importances = model.named_steps["tree"].feature_importances_
3
4importance_table = pd.DataFrame({
5    "feature": feature_names,
6    "importance": importances
7})
8
9print(importance_table)

This shows which one-hot columns were actually used by the tree, but it is still only half of the explanation.

Aggregate Back to the Original Feature

The next step is grouping related one-hot columns back under the original feature name.

python
1def original_feature_name(transformed_name: str) -> str:
2    if transformed_name.startswith("cat__Color_"):
3        return "Color"
4    if transformed_name == "num__Size":
5        return "Size"
6    return transformed_name
7
8importance_table["original_feature"] = importance_table["feature"].map(original_feature_name)
9
10grouped = (
11    importance_table
12    .groupby("original_feature", as_index=False)["importance"]
13    .sum()
14    .sort_values("importance", ascending=False)
15)
16
17print(grouped)

Now you can say something meaningful such as "Color contributed 0.63 total importance across its encoded columns" instead of talking about isolated dummy variables as if they were unrelated features.

Explain the Result Carefully

When presenting the explanation, it helps to say two things:

  1. which individual encoded categories were important
  2. how much the original feature mattered overall

That keeps the explanation faithful to the model while still understandable to humans.

For example:

  • 'Color_Red may have been the strongest single split signal'
  • but the total importance of the original Color feature is the sum of all its encoded columns

This grouped view is usually what stakeholders expect when they ask about feature importance.

Remember the Limits of Tree Importances

Decision-tree impurity importance has known limitations. It can be biased toward features with more possible split opportunities, and one-hot expansion changes how that opportunity is distributed across dummy columns.

That is why permutation importance can be a useful second opinion. If interpretability matters a lot, compare the grouped impurity importance with grouped permutation importance on a validation set.

Common Pitfalls

One common mistake is treating each one-hot column as if it were a totally separate original feature. That makes categorical variables look fragmented and harder to explain than they really are.

Another is summing importances incorrectly when transformed names are not mapped back cleanly. Pipelines help here because get_feature_names_out() makes the transformed space explicit.

Developers also sometimes present impurity-based importance as a causal statement. It is not. It only describes how much the tree relied on those features for splitting.

Finally, if the categorical variable has many rare categories, some one-hot columns may have zero importance simply because they never became useful splits in that tree.

Summary

  • One-hot encoding turns one categorical feature into several binary columns.
  • Decision-tree importance is reported on those transformed columns, not on the original feature name.
  • To explain the result clearly, aggregate related one-hot columns back to the original feature group.
  • Report both per-category importance and grouped original-feature importance when useful.
  • Treat tree importances as model-behavior summaries, not as causal claims.

Course illustration
Course illustration

All Rights Reserved.