Huggingface transformers trainer output not giving any predictions?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When the Hugging Face Trainer.predict() method returns empty or unexpected predictions, the cause is usually a misconfigured dataset, a missing compute_metrics function, or the model returning logits instead of labels. The Trainer returns raw model outputs (logits) by default — you must post-process them to get actual class predictions. Other common causes include passing an empty dataset, mismatched tokenization, or not calling predict() at all (confusing it with evaluate()).
Basic Prediction Setup
trainer.predict() returns a PredictionOutput with .predictions (raw logits), .label_ids (ground truth), and .metrics (if compute_metrics was provided).
Fix 1: Convert Logits to Predictions
The most common "no predictions" issue is that developers expect class labels but get logit arrays. np.argmax() converts logits to class indices.
Fix 2: Add compute_metrics
Without compute_metrics, output.metrics only contains the loss. Adding this function populates metrics with accuracy, F1, and any other metrics you compute.
Fix 3: Ensure Dataset Has Correct Columns
The Trainer expects specific columns. If input_ids or attention_mask are missing, the model receives no input and produces no predictions. Always verify the dataset format after tokenization.
Fix 4: Rename Label Column
If the label column is not named labels, the Trainer cannot compute the loss or pass labels to the model. Rename it to labels for compatibility.
Fix 5: Handle Empty Predictions
If predictions are empty, the dataset is likely empty or incorrectly formatted. Always check the dataset size and column names first.
Fix 6: evaluate() vs predict()
evaluate() only computes metrics — it does not return per-example predictions. Use predict() when you need the actual model outputs for each input.
Token Classification (NER) Predictions
For token classification, predictions have three dimensions. Apply argmax along the last axis, then map indices to label names.
Common Pitfalls
- Expecting labels instead of logits:
trainer.predict()returns raw logits (or probabilities for some models). Applynp.argmax()for classification or.squeeze()for regression to get usable predictions. - Confusing evaluate() with predict():
evaluate()returns only metrics, not per-example predictions. Always usepredict()when you need the model's output for each input. - Missing tokenization columns: If the dataset lacks
input_idsandattention_mask, the model receives no input. Always verify columns withdataset.column_namesafter tokenization. - Wrong label column name: The Trainer expects a column named
labels. If your dataset useslabel,target, orsentiment, rename it. Without the labels column, loss computation fails silently. - Not setting dataset format to torch: After tokenization, call
dataset.set_format("torch")to convert columns to PyTorch tensors. Without this, the Trainer may fail to batch the data correctly.
Summary
trainer.predict()returns raw logits — usenp.argmax()to convert to class predictions- Add
compute_metricsto the Trainer to get accuracy, F1, and other metrics in the output - Use
predict()for per-example predictions, notevaluate()(which only returns metrics) - Ensure the dataset has
input_ids,attention_mask, andlabelscolumns - Rename non-standard label columns to
labelsfor Trainer compatibility - Check
dataset.column_namesandlen(dataset)when debugging empty predictions

