Decision Tree
MATLAB
Machine Learning
Model Accuracy
Data Science

How to improve accuracy of decision tree in matlab

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Improving the accuracy of a decision tree in MATLAB is usually more about training discipline than about one magic parameter. A tree can overfit, underfit, or learn unstable splits depending on feature quality, class balance, and hyperparameter choices. The most reliable approach is to evaluate with cross-validation, tune the tree deliberately, and only then decide whether a single tree is strong enough for the problem.

Start With a Reproducible Baseline

Before tuning, build a baseline with a train-test split so you can measure real change.

matlab
1rng(7);
2
3cv = cvpartition(y, 'HoldOut', 0.2);
4Xtrain = X(training(cv), :);
5ytrain = y(training(cv));
6Xtest = X(test(cv), :);
7ytest = y(test(cv));
8
9tree = fitctree(Xtrain, ytrain);
10pred = predict(tree, Xtest);
11acc = mean(pred == ytest);
12
13fprintf('Baseline accuracy: %.4f\n', acc);

Without a stable baseline, tuning becomes guesswork and it is easy to overfit your process instead of the model.

Tune the Tree Complexity

Two parameters matter quickly:

  • 'MaxNumSplits controls how deep and complex the tree can become.'
  • 'MinLeafSize controls how many samples must remain in a leaf.'

Smaller leaves and more splits usually increase training accuracy but can reduce generalization.

matlab
1tree = fitctree(Xtrain, ytrain, ...
2    'MinLeafSize', 5, ...
3    'MaxNumSplits', 50);
4
5pred = predict(tree, Xtest);
6acc = mean(pred == ytest);
7fprintf('Tuned accuracy: %.4f\n', acc);

If the tree is memorizing noise, increasing MinLeafSize or reducing MaxNumSplits often helps more than making the tree bigger.

Use Cross-Validation Instead of One Split

A single holdout split can be noisy, especially with limited data. MATLAB makes cross-validation easy:

matlab
1tree = fitctree(X, y, ...
2    'MinLeafSize', 5, ...
3    'MaxNumSplits', 50, ...
4    'CrossVal', 'on', ...
5    'KFold', 5);
6
7cvLoss = kfoldLoss(tree);
8fprintf('5-fold classification loss: %.4f\n', cvLoss);
9fprintf('Estimated accuracy: %.4f\n', 1 - cvLoss);

Use cross-validation to compare settings, then retrain the final model on the full training data with the best configuration.

Improve the Inputs, Not Only the Model

Decision trees are sensitive to feature quality. Accuracy often improves more from better predictors than from more hyperparameter tuning.

Useful checks:

  • remove features with little signal
  • encode categories consistently
  • handle missing values explicitly
  • add domain-relevant derived features

Trees do not need feature scaling the way some models do, but they still benefit from cleaner and more informative inputs.

Watch for Class Imbalance

If one class dominates, accuracy can look high while the model still performs poorly on the minority class. In MATLAB, check a confusion matrix and not just one scalar accuracy value.

matlab
1tree = fitctree(Xtrain, ytrain);
2pred = predict(tree, Xtest);
3
4confusionchart(ytest, pred);

If minority classes matter, use class weights or balance the data before training. A tree that predicts the majority class too often is not actually improving in a useful way.

Pruning and Model Alternatives

A fully grown tree can overfit badly. MATLAB supports pruning workflows, and in practice you should also ask whether one tree is enough at all.

If a single tree stalls at mediocre accuracy, ensemble methods such as bagging and boosting often perform much better:

matlab
1mdl = fitcensemble(Xtrain, ytrain, 'Method', 'Bag');
2pred = predict(mdl, Xtest);
3acc = mean(pred == ytest);
4fprintf('Bagged ensemble accuracy: %.4f\n', acc);

That does not mean a decision tree failed. It means the problem may need more stability than one tree can provide.

Common Pitfalls

  • Tuning on the test set instead of using cross-validation on training data.
  • Chasing training accuracy while ignoring overfitting.
  • Reporting only accuracy when classes are imbalanced.
  • Assuming more splits always improve the real model.
  • Ignoring weak features and trying to fix everything with hyperparameters.

Summary

  • Build a reproducible baseline before tuning.
  • Adjust MinLeafSize and MaxNumSplits to control overfitting.
  • Use cross-validation to compare settings reliably.
  • Improve feature quality and inspect class balance.
  • If one tree is not enough, evaluate ensemble methods in MATLAB.

Course illustration
Course illustration

All Rights Reserved.