sentiment analysis
tensorflow
natural language processing
machine learning
AI

Sentiment Analysis using tensorflow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Sentiment analysis is the task of predicting whether text expresses a positive, negative, or sometimes neutral opinion. In TensorFlow, a modern starting point is a text pipeline built with TextVectorization, an embedding layer, and a small classifier that can be trained end to end from raw strings.

A Minimal TensorFlow Sentiment Pipeline

The first step is to prepare labeled text. This tiny example is deliberately small so the code stays runnable:

python
1import tensorflow as tf
2
3texts = [
4    "this movie was fantastic",
5    "absolutely loved the soundtrack",
6    "the plot was boring and slow",
7    "terrible acting and weak dialogue",
8    "great product and fast delivery",
9    "i want a refund this is awful",
10]
11
12labels = [1, 1, 0, 0, 1, 0]
13
14dataset = tf.data.Dataset.from_tensor_slices((texts, labels)).batch(2)

In a real project you would use a much larger dataset, but the structure is the same.

Vectorize the Text

Neural networks do not consume raw strings directly. TextVectorization turns text into integer token sequences.

python
1vectorizer = tf.keras.layers.TextVectorization(
2    max_tokens=5000,
3    output_mode="int",
4    output_sequence_length=20,
5)
6
7vectorizer.adapt(tf.data.Dataset.from_tensor_slices(texts).batch(2))

This layer learns a vocabulary from the training text and maps each sentence to fixed-length token IDs.

Build the Model

A straightforward baseline model uses:

  • 'TextVectorization to tokenize'
  • 'Embedding to learn word representations'
  • 'GlobalAveragePooling1D to reduce the sequence to one vector'
  • 'Dense layers for the final prediction'
python
1model = tf.keras.Sequential([
2    tf.keras.Input(shape=(1,), dtype=tf.string),
3    vectorizer,
4    tf.keras.layers.Embedding(input_dim=5000, output_dim=16),
5    tf.keras.layers.GlobalAveragePooling1D(),
6    tf.keras.layers.Dense(16, activation="relu"),
7    tf.keras.layers.Dense(1, activation="sigmoid"),
8])
9
10model.compile(
11    optimizer="adam",
12    loss="binary_crossentropy",
13    metrics=["accuracy"],
14)

For binary sentiment classification, a single sigmoid output is the simplest setup.

Train and Predict

Now fit the model on the dataset:

python
model.fit(dataset, epochs=10, verbose=0)

Then run predictions on new text:

python
1examples = tf.constant([
2    "the app is clean and easy to use",
3    "this update broke everything",
4])
5
6scores = model.predict(examples, verbose=0)
7
8for text, score in zip(examples.numpy(), scores):
9    print(text.decode("utf-8"), float(score[0]))

Scores closer to 1.0 indicate positive sentiment, while values near 0.0 indicate negative sentiment.

When to Use a More Advanced Model

The simple embedding model is a good baseline, but it has limits. If you need better accuracy on larger datasets, you might move to:

  • bidirectional LSTM or GRU layers
  • pretrained embeddings
  • transformer-based encoders

Still, baseline models are valuable. They train quickly, are easy to debug, and often perform well enough for product feedback, review filtering, or lightweight moderation tasks.

Data Quality Matters More Than Fancy Layers

A sentiment model is only as good as its labels. If the dataset mixes sarcasm, domain-specific jargon, and inconsistent labels, changing the architecture alone will not fix the problem.

Before increasing complexity, check:

  • class balance
  • text cleaning rules
  • label quality
  • whether neutral examples should be their own class

For three-way sentiment classification, change the final layer to a three-unit softmax output and use an appropriate categorical loss.

Common Pitfalls

The biggest mistake is fitting TextVectorization on all available text, including validation or test data. That leaks vocabulary information and makes evaluation less trustworthy. Adapt the vectorizer on training text only.

Another issue is confusing model output with hard labels. A sigmoid output is a probability-like score, not automatically a class. Choose a threshold such as 0.5, and tune it if false positives and false negatives have different costs.

Finally, do not expect toy datasets to generalize. The example above is good for understanding TensorFlow mechanics, not for production-grade sentiment analysis. Real models need substantially more data and careful evaluation.

Summary

  • TensorFlow sentiment analysis usually starts with text vectorization and a classifier.
  • 'TextVectorization plus Embedding is a strong baseline for binary sentiment tasks.'
  • Train on labeled text and predict with raw strings.
  • Improve quality by fixing data issues before reaching for more complex models.
  • Treat output scores as probabilities or confidence signals, not as labels by themselves.

Course illustration
Course illustration

All Rights Reserved.