Weka
command line
machine learning
data analysis
tutorial

Learning Weka on the Command Line

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Weka (Waikato Environment for Knowledge Analysis) is a Java-based machine learning toolkit with a GUI and a powerful command-line interface. While the GUI is useful for exploration, the command line is essential for automation, scripting, reproducible experiments, and processing large datasets on servers without a display. Every Weka algorithm can be invoked from the command line with full parameter control.

Prerequisites

bash
1# Check Java is installed (Weka requires Java 8+)
2java -version
3
4# Download Weka from https://www.cs.waikato.ac.nz/ml/weka/
5# Extract to a directory, e.g., ~/weka-3-8-6/
6
7# Set the classpath
8export WEKAJAR=~/weka-3-8-6/weka.jar
9
10# Test the installation
11java -cp $WEKAJAR weka.core.Version
12# 3.8.6

Loading and Inspecting Data

Weka uses ARFF (Attribute-Relation File Format) as its native format, but can read CSV files too:

bash
1# View dataset summary
2java -cp $WEKAJAR weka.core.Instances data/iris.arff
3
4# Convert CSV to ARFF
5java -cp $WEKAJAR weka.core.converters.CSVLoader data.csv > data.arff
6
7# Convert ARFF to CSV
8java -cp $WEKAJAR weka.core.converters.ArffLoader data.arff | \
9  java -cp $WEKAJAR weka.core.converters.CSVSaver -i /dev/stdin -o data.csv

ARFF File Format

 
1@relation iris
2
3@attribute sepallength numeric
4@attribute sepalwidth numeric
5@attribute petallength numeric
6@attribute petalwidth numeric
7@attribute class {Iris-setosa, Iris-versicolor, Iris-virginica}
8
9@data
105.1,3.5,1.4,0.2,Iris-setosa
114.9,3.0,1.4,0.2,Iris-setosa
12...

Running Classifiers

Decision Tree (J48)

bash
1# Train and evaluate with 10-fold cross-validation
2java -cp $WEKAJAR weka.classifiers.trees.J48 \
3  -t data/iris.arff \
4  -x 10
5
6# Key options:
7# -t  Training file
8# -T  Separate test file
9# -x  Number of cross-validation folds
10# -o  Output only the model (no evaluation)
11# -p 0  Output predictions (0 = no attributes, 1-n = specific attributes)

Random Forest

bash
1java -cp $WEKAJAR weka.classifiers.trees.RandomForest \
2  -t data/iris.arff \
3  -I 100 \
4  -x 10
5
6# -I  Number of trees (iterations)
7# -K  Number of features to consider at each split (0 = auto)
8# -depth  Maximum depth (0 = unlimited)

Naive Bayes

bash
java -cp $WEKAJAR weka.classifiers.bayes.NaiveBayes \
  -t data/iris.arff \
  -x 10

SVM (SMO)

bash
1java -cp $WEKAJAR weka.classifiers.functions.SMO \
2  -t data/iris.arff \
3  -C 1.0 \
4  -x 10
5
6# -C  Complexity parameter (regularization)
7# -K  Kernel: "weka.classifiers.functions.supportVector.PolyKernel" for polynomial

Train/Test Split

bash
1# Use a separate test file
2java -cp $WEKAJAR weka.classifiers.trees.J48 \
3  -t train.arff \
4  -T test.arff
5
6# Percentage split (66% train, 34% test)
7java -cp $WEKAJAR weka.classifiers.trees.J48 \
8  -t data/iris.arff \
9  -split-percentage 66

Saving and Loading Models

bash
1# Save trained model
2java -cp $WEKAJAR weka.classifiers.trees.J48 \
3  -t data/iris.arff \
4  -d model.j48
5
6# Load model and classify new data
7java -cp $WEKAJAR weka.classifiers.trees.J48 \
8  -l model.j48 \
9  -T new_data.arff \
10  -p 0

Data Preprocessing (Filters)

bash
1# Normalize numeric attributes to [0, 1]
2java -cp $WEKAJAR weka.filters.unsupervised.attribute.Normalize \
3  -i data.arff -o normalized.arff
4
5# Remove an attribute (e.g., column 1)
6java -cp $WEKAJAR weka.filters.unsupervised.attribute.Remove \
7  -R 1 \
8  -i data.arff -o filtered.arff
9
10# Discretize numeric attributes
11java -cp $WEKAJAR weka.filters.unsupervised.attribute.Discretize \
12  -B 5 \
13  -i data.arff -o discretized.arff
14
15# Resample (under/oversample for class imbalance)
16java -cp $WEKAJAR weka.filters.supervised.instance.Resample \
17  -B 1.0 \
18  -i data.arff -o resampled.arff
19
20# Chain filters using FilteredClassifier
21java -cp $WEKAJAR weka.classifiers.meta.FilteredClassifier \
22  -F "weka.filters.unsupervised.attribute.Normalize" \
23  -W weka.classifiers.trees.J48 \
24  -t data.arff -x 10

Clustering

bash
1# K-Means clustering
2java -cp $WEKAJAR weka.clusterers.SimpleKMeans \
3  -t data/iris.arff \
4  -N 3 \
5  -x 10
6
7# -N  Number of clusters
8
9# Expectation-Maximization
10java -cp $WEKAJAR weka.clusterers.EM \
11  -t data/iris.arff \
12  -N -1
13
14# -N -1  Auto-select number of clusters using cross-validation

Feature Selection

bash
1# Evaluate attributes using InfoGain
2java -cp $WEKAJAR weka.attributeSelection.InfoGainAttributeEval \
3  -i data/iris.arff \
4  -s "weka.attributeSelection.Ranker -T 0.01"
5
6# Use CfsSubsetEval with BestFirst search
7java -cp $WEKAJAR weka.attributeSelection.CfsSubsetEval \
8  -i data/iris.arff \
9  -s "weka.attributeSelection.BestFirst -D 1"

Scripting and Automation

bash
1#!/bin/bash
2# Run multiple classifiers and compare results
3
4WEKAJAR=~/weka-3-8-6/weka.jar
5DATA=data/iris.arff
6
7classifiers=(
8    "weka.classifiers.trees.J48"
9    "weka.classifiers.trees.RandomForest -I 100"
10    "weka.classifiers.bayes.NaiveBayes"
11    "weka.classifiers.functions.SMO"
12    "weka.classifiers.lazy.IBk -K 5"
13)
14
15for clf in "${classifiers[@]}"; do
16    echo "=== $clf ==="
17    java -cp $WEKAJAR $clf -t $DATA -x 10 2>&1 | grep "Correctly Classified"
18    echo
19done

Common Pitfalls

  • Memory for large datasets: Weka loads the entire dataset into memory. For large files, increase the JVM heap: java -Xmx4g -cp $WEKAJAR .... Without this, you get OutOfMemoryError.
  • Class attribute position: Weka assumes the last attribute is the class by default. If your class column is not last, specify it with -c (e.g., -c 1 for the first column).
  • ARFF format errors: Missing quotes around nominal values with spaces, incorrect @attribute declarations, or mismatched data types cause cryptic parsing errors. Validate your ARFF file before running experiments.
  • Classpath issues: Weka packages (installed via the Package Manager) are not automatically on the classpath. Add them: java -cp $WEKAJAR:~/wekafiles/packages/*/weka.jar ... or load them programmatically.
  • Reproducibility: Set the random seed with -s for randomized algorithms (e.g., -s 42) to ensure reproducible results across runs.

Summary

  • Run any Weka classifier from the command line: java -cp weka.jar weka.classifiers.trees.J48 -t data.arff -x 10
  • Use -t for training file, -T for test file, -x for cross-validation folds, -d/-l to save/load models
  • Preprocess data with filters: weka.filters.unsupervised.attribute.Normalize, Remove, Discretize
  • Use FilteredClassifier to chain preprocessing and classification in one command
  • Increase memory with -Xmx4g for large datasets
  • Script multiple experiments in bash for systematic comparisons

Course illustration
Course illustration

All Rights Reserved.