An example using python bindings for SVM library, LIBSVM

Python

SVM

LIBSVM

Machine Learning

Python Bindings

An example using python bindings for SVM library, LIBSVM

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

LIBSVM is one of the most widely used SVM (Support Vector Machine) libraries, providing implementations for classification, regression, and distribution estimation. Its Python bindings through the libsvm package (or via scikit-learn which wraps LIBSVM internally) let you train and predict with SVMs directly in Python. This article demonstrates using the libsvm package directly with svmutil, as well as the more common scikit-learn wrapper.

Installation

bash

1# Option 1: Install the libsvm Python package
2pip install libsvm
3
4# Option 2: scikit-learn (uses LIBSVM internally for SVC)
5pip install scikit-learn

Using LIBSVM Directly with `svmutil`

python

1from libsvm.svmutil import svm_train, svm_predict, svm_problem, svm_parameter
2
3# Training data: labels and features
4y_train = [1, 1, 1, -1, -1, -1]  # Labels
5x_train = [
6    {1: 1, 2: 1},    # Feature 1=1, Feature 2=1 → label 1
7    {1: 1, 2: 0},    # Feature 1=1, Feature 2=0 → label 1
8    {1: 0, 2: 1},    # Feature 1=0, Feature 2=1 → label 1
9    {1: -1, 2: -1},  # → label -1
10    {1: -1, 2: 0},   # → label -1
11    {1: 0, 2: -1},   # → label -1
12]
13
14# Create problem and parameters
15problem = svm_problem(y_train, x_train)
16param = svm_parameter('-t 2 -c 1 -q')  # RBF kernel, C=1, quiet mode
17
18# Train the model
19model = svm_train(problem, param)
20
21# Predict on new data
22y_test = [1, -1]  # True labels (for accuracy calculation)
23x_test = [{1: 0.5, 2: 0.5}, {1: -0.5, 2: -0.5}]
24
25predicted_labels, accuracy, decision_values = svm_predict(y_test, x_test, model)
26print(f"Predicted: {predicted_labels}")  # [1.0, -1.0]
27print(f"Accuracy: {accuracy[0]:.1f}%")  # 100.0%

LIBSVM Parameter Options

python

1# Common parameters passed as a string
2param = svm_parameter('-t 0 -c 10')   # Linear kernel, C=10
3param = svm_parameter('-t 1 -d 3')    # Polynomial kernel, degree=3
4param = svm_parameter('-t 2 -g 0.5')  # RBF kernel, gamma=0.5
5param = svm_parameter('-t 3')         # Sigmoid kernel
6
7# Parameter reference:
8# -t kernel_type: 0=linear, 1=polynomial, 2=RBF, 3=sigmoid
9# -c cost: regularization parameter (default 1)
10# -g gamma: kernel coefficient for RBF/poly/sigmoid (default 1/num_features)
11# -d degree: polynomial kernel degree (default 3)
12# -q: quiet mode (suppress training output)
13# -v n: n-fold cross-validation

Using List-Based Features

python

1from libsvm.svmutil import svm_train, svm_predict
2
3# Features can be lists instead of dictionaries
4y = [1, 1, -1, -1]
5x = [[2, 1], [1, 2], [-1, -2], [-2, -1]]
6
7model = svm_train(y, x, '-t 0 -c 1 -q')  # Linear SVM
8
9# Predict
10p_labels, p_acc, p_vals = svm_predict([1, -1], [[1, 1], [-1, -1]], model)
11print(p_labels)  # [1.0, -1.0]

Cross-Validation

python

1from libsvm.svmutil import svm_train, svm_problem, svm_parameter
2
3y = [1, 1, 1, -1, -1, -1, 1, -1, 1, -1]
4x = [[1,1],[1,0],[0,1],[-1,-1],[-1,0],[0,-1],[2,1],[-2,-1],[1,2],[-1,-2]]
5
6# 5-fold cross-validation
7accuracy = svm_train(y, x, '-t 2 -c 1 -v 5 -q')
8print(f"Cross-validation accuracy: {accuracy:.1f}%")

Saving and Loading Models

python

1from libsvm.svmutil import svm_train, svm_predict, svm_save_model, svm_load_model
2
3# Train and save
4model = svm_train(y, x, '-t 2 -c 1 -q')
5svm_save_model('svm_model.model', model)
6
7# Load and predict
8loaded_model = svm_load_model('svm_model.model')
9p_labels, p_acc, p_vals = svm_predict(y_test, x_test, loaded_model)

scikit-learn Wrapper (Recommended for Most Users)

scikit-learn's SVC uses LIBSVM internally with a more Pythonic API:

python

1from sklearn.svm import SVC
2from sklearn.datasets import load_iris
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import accuracy_score
5
6# Load dataset
7X, y = load_iris(return_X_y=True)
8X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
9
10# Train SVM (LIBSVM under the hood)
11clf = SVC(kernel='rbf', C=1.0, gamma='scale')
12clf.fit(X_train, y_train)
13
14# Predict
15y_pred = clf.predict(X_test)
16print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")  # ~0.98
17
18# Access LIBSVM internals
19print(f"Support vectors: {clf.n_support_}")    # Number per class
20print(f"Total SVs: {len(clf.support_vectors_)}")

Grid Search for Best Parameters

python

1from sklearn.model_selection import GridSearchCV
2
3param_grid = {
4    'C': [0.1, 1, 10, 100],
5    'gamma': ['scale', 'auto', 0.01, 0.1],
6    'kernel': ['rbf', 'linear']
7}
8
9grid = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy')
10grid.fit(X_train, y_train)
11
12print(f"Best params: {grid.best_params_}")
13print(f"Best CV accuracy: {grid.best_score_:.3f}")

Multi-Class Classification

LIBSVM handles multi-class problems automatically using one-vs-one:

python

1from libsvm.svmutil import svm_train, svm_predict
2
3# 3-class problem
4y = [0, 0, 1, 1, 2, 2]
5x = [[1,0], [0,1], [-1,0], [0,-1], [1,1], [-1,-1]]
6
7model = svm_train(y, x, '-t 2 -c 1 -q')
8p_labels, _, _ = svm_predict([0, 1, 2], [[0.5, 0.5], [-0.5, -0.5], [0, 0]], model)
9print(p_labels)  # Predicted class labels

Common Pitfalls

Not scaling features: SVM performance is sensitive to feature scales. Features with large ranges dominate the distance calculation. Always normalize or standardize features before training. Use sklearn.preprocessing.StandardScaler or LIBSVM's built-in svm-scale tool.
Using dictionary features incorrectly: LIBSVM's dictionary format uses 1-based indexing ({1: val, 2: val}), not 0-based. Index 0 is reserved. Using 0 as a feature index produces incorrect results silently.
Choosing the wrong kernel: Linear kernels work well for high-dimensional data (text classification). RBF kernels work well for low-dimensional data. Starting with RBF and tuning C and gamma via grid search is a reasonable default strategy.
Not setting -q for quiet mode: By default, LIBSVM prints training progress to stdout for every iteration. In production or notebooks, this floods the output. Always pass -q to suppress training messages.
Ignoring gamma with RBF kernel: The default gamma (1/num_features) may not be optimal. Too small gamma makes the model underfit; too large makes it overfit. Always tune gamma alongside C using cross-validation.

Summary

Install libsvm for direct LIBSVM Python bindings or use scikit-learn's SVC (recommended)
LIBSVM format: svm_train(labels, features, '-t 2 -c 1 -q') with dictionary or list features
Key parameters: -t (kernel type), -c (regularization), -g (gamma), -d (degree)
Use -v n for n-fold cross-validation, svm_save_model/svm_load_model for persistence
scikit-learn's SVC provides a cleaner API with the same LIBSVM backend
Always scale features and tune C/gamma via grid search for optimal results

An example using python bindings for SVM library, LIBSVM

Master System Design with Codemia

Introduction

Installation

Using LIBSVM Directly with svmutil

LIBSVM Parameter Options

Using List-Based Features

Cross-Validation

Saving and Loading Models

scikit-learn Wrapper (Recommended for Most Users)

Grid Search for Best Parameters

Multi-Class Classification

Common Pitfalls

Summary

Using LIBSVM Directly with `svmutil`