Introduction
LIBSVM is one of the most widely used SVM (Support Vector Machine) libraries, providing implementations for classification, regression, and distribution estimation. Its Python bindings through the libsvm package (or via scikit-learn which wraps LIBSVM internally) let you train and predict with SVMs directly in Python. This article demonstrates using the libsvm package directly with svmutil, as well as the more common scikit-learn wrapper.
Installation
1# Option 1: Install the libsvm Python package
2pip install libsvm
3
4# Option 2: scikit-learn (uses LIBSVM internally for SVC)
5pip install scikit-learn
Using LIBSVM Directly with svmutil
1from libsvm.svmutil import svm_train, svm_predict, svm_problem, svm_parameter
2
3# Training data: labels and features
4y_train = [1, 1, 1, -1, -1, -1] # Labels
5x_train = [
6 {1: 1, 2: 1}, # Feature 1=1, Feature 2=1 → label 1
7 {1: 1, 2: 0}, # Feature 1=1, Feature 2=0 → label 1
8 {1: 0, 2: 1}, # Feature 1=0, Feature 2=1 → label 1
9 {1: -1, 2: -1}, # → label -1
10 {1: -1, 2: 0}, # → label -1
11 {1: 0, 2: -1}, # → label -1
12]
13
14# Create problem and parameters
15problem = svm_problem(y_train, x_train)
16param = svm_parameter('-t 2 -c 1 -q') # RBF kernel, C=1, quiet mode
17
18# Train the model
19model = svm_train(problem, param)
20
21# Predict on new data
22y_test = [1, -1] # True labels (for accuracy calculation)
23x_test = [{1: 0.5, 2: 0.5}, {1: -0.5, 2: -0.5}]
24
25predicted_labels, accuracy, decision_values = svm_predict(y_test, x_test, model)
26print(f"Predicted: {predicted_labels}") # [1.0, -1.0]
27print(f"Accuracy: {accuracy[0]:.1f}%") # 100.0%
LIBSVM Parameter Options
1# Common parameters passed as a string
2param = svm_parameter('-t 0 -c 10') # Linear kernel, C=10
3param = svm_parameter('-t 1 -d 3') # Polynomial kernel, degree=3
4param = svm_parameter('-t 2 -g 0.5') # RBF kernel, gamma=0.5
5param = svm_parameter('-t 3') # Sigmoid kernel
6
7# Parameter reference:
8# -t kernel_type: 0=linear, 1=polynomial, 2=RBF, 3=sigmoid
9# -c cost: regularization parameter (default 1)
10# -g gamma: kernel coefficient for RBF/poly/sigmoid (default 1/num_features)
11# -d degree: polynomial kernel degree (default 3)
12# -q: quiet mode (suppress training output)
13# -v n: n-fold cross-validation
Using List-Based Features
1from libsvm.svmutil import svm_train, svm_predict
2
3# Features can be lists instead of dictionaries
4y = [1, 1, -1, -1]
5x = [[2, 1], [1, 2], [-1, -2], [-2, -1]]
6
7model = svm_train(y, x, '-t 0 -c 1 -q') # Linear SVM
8
9# Predict
10p_labels, p_acc, p_vals = svm_predict([1, -1], [[1, 1], [-1, -1]], model)
11print(p_labels) # [1.0, -1.0]
Cross-Validation
1from libsvm.svmutil import svm_train, svm_problem, svm_parameter
2
3y = [1, 1, 1, -1, -1, -1, 1, -1, 1, -1]
4x = [[1,1],[1,0],[0,1],[-1,-1],[-1,0],[0,-1],[2,1],[-2,-1],[1,2],[-1,-2]]
5
6# 5-fold cross-validation
7accuracy = svm_train(y, x, '-t 2 -c 1 -v 5 -q')
8print(f"Cross-validation accuracy: {accuracy:.1f}%")
Saving and Loading Models
1from libsvm.svmutil import svm_train, svm_predict, svm_save_model, svm_load_model
2
3# Train and save
4model = svm_train(y, x, '-t 2 -c 1 -q')
5svm_save_model('svm_model.model', model)
6
7# Load and predict
8loaded_model = svm_load_model('svm_model.model')
9p_labels, p_acc, p_vals = svm_predict(y_test, x_test, loaded_model)
scikit-learn Wrapper (Recommended for Most Users)
scikit-learn's SVC uses LIBSVM internally with a more Pythonic API:
1from sklearn.svm import SVC
2from sklearn.datasets import load_iris
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import accuracy_score
5
6# Load dataset
7X, y = load_iris(return_X_y=True)
8X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
9
10# Train SVM (LIBSVM under the hood)
11clf = SVC(kernel='rbf', C=1.0, gamma='scale')
12clf.fit(X_train, y_train)
13
14# Predict
15y_pred = clf.predict(X_test)
16print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}") # ~0.98
17
18# Access LIBSVM internals
19print(f"Support vectors: {clf.n_support_}") # Number per class
20print(f"Total SVs: {len(clf.support_vectors_)}")
Grid Search for Best Parameters
1from sklearn.model_selection import GridSearchCV
2
3param_grid = {
4 'C': [0.1, 1, 10, 100],
5 'gamma': ['scale', 'auto', 0.01, 0.1],
6 'kernel': ['rbf', 'linear']
7}
8
9grid = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy')
10grid.fit(X_train, y_train)
11
12print(f"Best params: {grid.best_params_}")
13print(f"Best CV accuracy: {grid.best_score_:.3f}")
Multi-Class Classification
LIBSVM handles multi-class problems automatically using one-vs-one:
1from libsvm.svmutil import svm_train, svm_predict
2
3# 3-class problem
4y = [0, 0, 1, 1, 2, 2]
5x = [[1,0], [0,1], [-1,0], [0,-1], [1,1], [-1,-1]]
6
7model = svm_train(y, x, '-t 2 -c 1 -q')
8p_labels, _, _ = svm_predict([0, 1, 2], [[0.5, 0.5], [-0.5, -0.5], [0, 0]], model)
9print(p_labels) # Predicted class labels
Common Pitfalls
Not scaling features: SVM performance is sensitive to feature scales. Features with large ranges dominate the distance calculation. Always normalize or standardize features before training. Use sklearn.preprocessing.StandardScaler or LIBSVM's built-in svm-scale tool.
Using dictionary features incorrectly: LIBSVM's dictionary format uses 1-based indexing ({1: val, 2: val}), not 0-based. Index 0 is reserved. Using 0 as a feature index produces incorrect results silently.
Choosing the wrong kernel: Linear kernels work well for high-dimensional data (text classification). RBF kernels work well for low-dimensional data. Starting with RBF and tuning C and gamma via grid search is a reasonable default strategy.
Not setting -q for quiet mode: By default, LIBSVM prints training progress to stdout for every iteration. In production or notebooks, this floods the output. Always pass -q to suppress training messages.
Ignoring gamma with RBF kernel: The default gamma (1/num_features) may not be optimal. Too small gamma makes the model underfit; too large makes it overfit. Always tune gamma alongside C using cross-validation.
Summary
Install libsvm for direct LIBSVM Python bindings or use scikit-learn's SVC (recommended)
LIBSVM format: svm_train(labels, features, '-t 2 -c 1 -q') with dictionary or list features
Key parameters: -t (kernel type), -c (regularization), -g (gamma), -d (degree)
Use -v n for n-fold cross-validation, svm_save_model/svm_load_model for persistence
scikit-learn's SVC provides a cleaner API with the same LIBSVM backend
Always scale features and tune C/gamma via grid search for optimal results