Decision Tree
MATLAB
Machine Learning
Data Analysis
Algorithm

Decision Tree in Matlab

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Decision trees in MATLAB are usually built with fitctree for classification or fitrtree for regression. They are useful because the model structure is easy to inspect, requires little feature scaling, and gives you a strong baseline for tabular data before moving to more complex models.

Core Sections

Classification trees with fitctree

A classification tree learns rules that split input features into increasingly pure groups. In MATLAB, each row of the feature matrix is an observation and the target vector contains the class labels.

matlab
1load fisheriris
2
3X = meas;
4Y = species;
5
6tree = fitctree(X, Y);
7view(tree, 'Mode', 'graph')

This trains a classifier on the iris dataset and opens a graphical tree view. The splits are chosen automatically based on the training data and the default splitting criterion.

To make predictions:

matlab
predictedLabels = predict(tree, X);
accuracy = mean(strcmp(predictedLabels, Y));
disp(accuracy)

That gives you a quick training-set sanity check, though a held-out set or cross-validation is better for real evaluation.

Regression trees with fitrtree

If the target is numeric instead of categorical, use fitrtree.

matlab
1load carsmall
2
3X = [Weight Horsepower];
4Y = MPG;
5
6validRows = all(~isnan(X), 2) & ~isnan(Y);
7X = X(validRows, :);
8Y = Y(validRows);
9
10rtree = fitrtree(X, Y);
11predictedMPG = predict(rtree, X);
12rmse = sqrt(mean((predictedMPG - Y).^2));
13disp(rmse)

A regression tree predicts continuous values by splitting the data to reduce target variance in each branch.

Control tree growth

Decision trees can overfit badly if you let them grow without constraints. MATLAB lets you control complexity through parameters such as MaxNumSplits, MinLeafSize, and pruning settings.

matlab
tree = fitctree(X, Y, ...
    'MaxNumSplits', 10, ...
    'MinLeafSize', 5);

These settings limit how detailed the tree can become. Smaller leaves and more splits make the model more flexible, but also more likely to memorize noise.

Evaluate with cross-validation

Training accuracy is not enough. MATLAB supports cross-validation directly on tree models.

matlab
1load fisheriris
2
3X = meas;
4Y = species;
5
6tree = fitctree(X, Y, 'CrossVal', 'on');
7loss = kfoldLoss(tree);
8disp(loss)

kfoldLoss reports an estimate of generalization error across validation folds. For quick model comparison, this is much more useful than checking only the resubstitution error on the same data used for fitting.

Inspect feature importance and structure

Trees are popular partly because they are interpretable. MATLAB exposes predictor importance estimates and visualization tools.

matlab
tree = fitctree(X, Y);
importance = predictorImportance(tree);
disp(importance)

The graph view and importance scores help answer two practical questions:

  • which variables drive the splits
  • whether the tree is too deep or too noisy to trust

Interpretability is not perfect, but it is far better than with many black-box models.

When a single tree is enough

A single decision tree is often a good choice when:

  • you need a fast baseline
  • interpretability matters
  • the feature set is tabular and mixed-scale
  • you want simple if-then style rules

If performance plateaus, ensembles such as bagged trees or boosted trees often outperform a single tree. Still, a plain tree is a good place to start because it makes debugging data issues easier.

Common Pitfalls

  • Evaluating the model only on the training data and mistaking that for real performance.
  • Letting the tree grow too deep and overfit noise in the dataset.
  • Ignoring missing values or invalid rows before fitting the model.
  • Expecting a single tree to outperform ensemble methods on harder tabular problems.
  • Treating predictor importance scores as proof of causality rather than as model-specific heuristics.

Summary

  • Use fitctree for classification and fitrtree for regression in MATLAB.
  • Decision trees are easy to train, inspect, and use as a baseline for tabular data.
  • Control tree size with options such as MaxNumSplits and MinLeafSize.
  • Prefer cross-validation or a held-out test set over training-set accuracy.
  • Start with a single tree for interpretability, then move to ensembles if you need more predictive power.

Course illustration
Course illustration

All Rights Reserved.