Python
SVM
LIBSVM
Machine Learning
Python Bindings

An example using python bindings for SVM library, LIBSVM

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

LIBSVM is one of the most widely used SVM (Support Vector Machine) libraries, providing implementations for classification, regression, and distribution estimation. Its Python bindings through the libsvm package (or via scikit-learn which wraps LIBSVM internally) let you train and predict with SVMs directly in Python. This article demonstrates using the libsvm package directly with svmutil, as well as the more common scikit-learn wrapper.

Installation

bash
1# Option 1: Install the libsvm Python package
2pip install libsvm
3
4# Option 2: scikit-learn (uses LIBSVM internally for SVC)
5pip install scikit-learn

Using LIBSVM Directly with svmutil

python
1from libsvm.svmutil import svm_train, svm_predict, svm_problem, svm_parameter
2
3# Training data: labels and features
4y_train = [1, 1, 1, -1, -1, -1]  # Labels
5x_train = [
6    {1: 1, 2: 1},    # Feature 1=1, Feature 2=1 → label 1
7    {1: 1, 2: 0},    # Feature 1=1, Feature 2=0 → label 1
8    {1: 0, 2: 1},    # Feature 1=0, Feature 2=1 → label 1
9    {1: -1, 2: -1},  # → label -1
10    {1: -1, 2: 0},   # → label -1
11    {1: 0, 2: -1},   # → label -1
12]
13
14# Create problem and parameters
15problem = svm_problem(y_train, x_train)
16param = svm_parameter('-t 2 -c 1 -q')  # RBF kernel, C=1, quiet mode
17
18# Train the model
19model = svm_train(problem, param)
20
21# Predict on new data
22y_test = [1, -1]  # True labels (for accuracy calculation)
23x_test = [{1: 0.5, 2: 0.5}, {1: -0.5, 2: -0.5}]
24
25predicted_labels, accuracy, decision_values = svm_predict(y_test, x_test, model)
26print(f"Predicted: {predicted_labels}")  # [1.0, -1.0]
27print(f"Accuracy: {accuracy[0]:.1f}%")  # 100.0%

LIBSVM Parameter Options

python
1# Common parameters passed as a string
2param = svm_parameter('-t 0 -c 10')   # Linear kernel, C=10
3param = svm_parameter('-t 1 -d 3')    # Polynomial kernel, degree=3
4param = svm_parameter('-t 2 -g 0.5')  # RBF kernel, gamma=0.5
5param = svm_parameter('-t 3')         # Sigmoid kernel
6
7# Parameter reference:
8# -t kernel_type: 0=linear, 1=polynomial, 2=RBF, 3=sigmoid
9# -c cost: regularization parameter (default 1)
10# -g gamma: kernel coefficient for RBF/poly/sigmoid (default 1/num_features)
11# -d degree: polynomial kernel degree (default 3)
12# -q: quiet mode (suppress training output)
13# -v n: n-fold cross-validation

Using List-Based Features

python
1from libsvm.svmutil import svm_train, svm_predict
2
3# Features can be lists instead of dictionaries
4y = [1, 1, -1, -1]
5x = [[2, 1], [1, 2], [-1, -2], [-2, -1]]
6
7model = svm_train(y, x, '-t 0 -c 1 -q')  # Linear SVM
8
9# Predict
10p_labels, p_acc, p_vals = svm_predict([1, -1], [[1, 1], [-1, -1]], model)
11print(p_labels)  # [1.0, -1.0]

Cross-Validation

python
1from libsvm.svmutil import svm_train, svm_problem, svm_parameter
2
3y = [1, 1, 1, -1, -1, -1, 1, -1, 1, -1]
4x = [[1,1],[1,0],[0,1],[-1,-1],[-1,0],[0,-1],[2,1],[-2,-1],[1,2],[-1,-2]]
5
6# 5-fold cross-validation
7accuracy = svm_train(y, x, '-t 2 -c 1 -v 5 -q')
8print(f"Cross-validation accuracy: {accuracy:.1f}%")

Saving and Loading Models

python
1from libsvm.svmutil import svm_train, svm_predict, svm_save_model, svm_load_model
2
3# Train and save
4model = svm_train(y, x, '-t 2 -c 1 -q')
5svm_save_model('svm_model.model', model)
6
7# Load and predict
8loaded_model = svm_load_model('svm_model.model')
9p_labels, p_acc, p_vals = svm_predict(y_test, x_test, loaded_model)

scikit-learn's SVC uses LIBSVM internally with a more Pythonic API:

python
1from sklearn.svm import SVC
2from sklearn.datasets import load_iris
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import accuracy_score
5
6# Load dataset
7X, y = load_iris(return_X_y=True)
8X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
9
10# Train SVM (LIBSVM under the hood)
11clf = SVC(kernel='rbf', C=1.0, gamma='scale')
12clf.fit(X_train, y_train)
13
14# Predict
15y_pred = clf.predict(X_test)
16print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")  # ~0.98
17
18# Access LIBSVM internals
19print(f"Support vectors: {clf.n_support_}")    # Number per class
20print(f"Total SVs: {len(clf.support_vectors_)}")

Grid Search for Best Parameters

python
1from sklearn.model_selection import GridSearchCV
2
3param_grid = {
4    'C': [0.1, 1, 10, 100],
5    'gamma': ['scale', 'auto', 0.01, 0.1],
6    'kernel': ['rbf', 'linear']
7}
8
9grid = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy')
10grid.fit(X_train, y_train)
11
12print(f"Best params: {grid.best_params_}")
13print(f"Best CV accuracy: {grid.best_score_:.3f}")

Multi-Class Classification

LIBSVM handles multi-class problems automatically using one-vs-one:

python
1from libsvm.svmutil import svm_train, svm_predict
2
3# 3-class problem
4y = [0, 0, 1, 1, 2, 2]
5x = [[1,0], [0,1], [-1,0], [0,-1], [1,1], [-1,-1]]
6
7model = svm_train(y, x, '-t 2 -c 1 -q')
8p_labels, _, _ = svm_predict([0, 1, 2], [[0.5, 0.5], [-0.5, -0.5], [0, 0]], model)
9print(p_labels)  # Predicted class labels

Common Pitfalls

  • Not scaling features: SVM performance is sensitive to feature scales. Features with large ranges dominate the distance calculation. Always normalize or standardize features before training. Use sklearn.preprocessing.StandardScaler or LIBSVM's built-in svm-scale tool.
  • Using dictionary features incorrectly: LIBSVM's dictionary format uses 1-based indexing ({1: val, 2: val}), not 0-based. Index 0 is reserved. Using 0 as a feature index produces incorrect results silently.
  • Choosing the wrong kernel: Linear kernels work well for high-dimensional data (text classification). RBF kernels work well for low-dimensional data. Starting with RBF and tuning C and gamma via grid search is a reasonable default strategy.
  • Not setting -q for quiet mode: By default, LIBSVM prints training progress to stdout for every iteration. In production or notebooks, this floods the output. Always pass -q to suppress training messages.
  • Ignoring gamma with RBF kernel: The default gamma (1/num_features) may not be optimal. Too small gamma makes the model underfit; too large makes it overfit. Always tune gamma alongside C using cross-validation.

Summary

  • Install libsvm for direct LIBSVM Python bindings or use scikit-learn's SVC (recommended)
  • LIBSVM format: svm_train(labels, features, '-t 2 -c 1 -q') with dictionary or list features
  • Key parameters: -t (kernel type), -c (regularization), -g (gamma), -d (degree)
  • Use -v n for n-fold cross-validation, svm_save_model/svm_load_model for persistence
  • scikit-learn's SVC provides a cleaner API with the same LIBSVM backend
  • Always scale features and tune C/gamma via grid search for optimal results

Course illustration
Course illustration

All Rights Reserved.