Sentiment Analysis using tensorflow

sentiment analysis

tensorflow

natural language processing

machine learning

Sentiment Analysis using tensorflow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Sentiment analysis is the task of predicting whether text expresses a positive, negative, or sometimes neutral opinion. In TensorFlow, a modern starting point is a text pipeline built with TextVectorization, an embedding layer, and a small classifier that can be trained end to end from raw strings.

A Minimal TensorFlow Sentiment Pipeline

The first step is to prepare labeled text. This tiny example is deliberately small so the code stays runnable:

python

1import tensorflow as tf
2
3texts = [
4    "this movie was fantastic",
5    "absolutely loved the soundtrack",
6    "the plot was boring and slow",
7    "terrible acting and weak dialogue",
8    "great product and fast delivery",
9    "i want a refund this is awful",
10]
11
12labels = [1, 1, 0, 0, 1, 0]
13
14dataset = tf.data.Dataset.from_tensor_slices((texts, labels)).batch(2)

In a real project you would use a much larger dataset, but the structure is the same.

Vectorize the Text

Neural networks do not consume raw strings directly. TextVectorization turns text into integer token sequences.

python

1vectorizer = tf.keras.layers.TextVectorization(
2    max_tokens=5000,
3    output_mode="int",
4    output_sequence_length=20,
5)
6
7vectorizer.adapt(tf.data.Dataset.from_tensor_slices(texts).batch(2))

This layer learns a vocabulary from the training text and maps each sentence to fixed-length token IDs.

Build the Model

A straightforward baseline model uses:

'TextVectorization to tokenize'
'Embedding to learn word representations'
'GlobalAveragePooling1D to reduce the sequence to one vector'
'Dense layers for the final prediction'

python

1model = tf.keras.Sequential([
2    tf.keras.Input(shape=(1,), dtype=tf.string),
3    vectorizer,
4    tf.keras.layers.Embedding(input_dim=5000, output_dim=16),
5    tf.keras.layers.GlobalAveragePooling1D(),
6    tf.keras.layers.Dense(16, activation="relu"),
7    tf.keras.layers.Dense(1, activation="sigmoid"),
8])
9
10model.compile(
11    optimizer="adam",
12    loss="binary_crossentropy",
13    metrics=["accuracy"],
14)

For binary sentiment classification, a single sigmoid output is the simplest setup.

Train and Predict

Now fit the model on the dataset:

python

model.fit(dataset, epochs=10, verbose=0)

Then run predictions on new text:

python

1examples = tf.constant([
2    "the app is clean and easy to use",
3    "this update broke everything",
4])
5
6scores = model.predict(examples, verbose=0)
7
8for text, score in zip(examples.numpy(), scores):
9    print(text.decode("utf-8"), float(score[0]))

Scores closer to 1.0 indicate positive sentiment, while values near 0.0 indicate negative sentiment.

When to Use a More Advanced Model

The simple embedding model is a good baseline, but it has limits. If you need better accuracy on larger datasets, you might move to:

bidirectional LSTM or GRU layers
pretrained embeddings
transformer-based encoders

Still, baseline models are valuable. They train quickly, are easy to debug, and often perform well enough for product feedback, review filtering, or lightweight moderation tasks.

Data Quality Matters More Than Fancy Layers

A sentiment model is only as good as its labels. If the dataset mixes sarcasm, domain-specific jargon, and inconsistent labels, changing the architecture alone will not fix the problem.

Before increasing complexity, check:

class balance
text cleaning rules
label quality
whether neutral examples should be their own class

For three-way sentiment classification, change the final layer to a three-unit softmax output and use an appropriate categorical loss.

Common Pitfalls

The biggest mistake is fitting TextVectorization on all available text, including validation or test data. That leaks vocabulary information and makes evaluation less trustworthy. Adapt the vectorizer on training text only.

Another issue is confusing model output with hard labels. A sigmoid output is a probability-like score, not automatically a class. Choose a threshold such as 0.5, and tune it if false positives and false negatives have different costs.

Finally, do not expect toy datasets to generalize. The example above is good for understanding TensorFlow mechanics, not for production-grade sentiment analysis. Real models need substantially more data and careful evaluation.

Summary

TensorFlow sentiment analysis usually starts with text vectorization and a classifier.
'TextVectorization plus Embedding is a strong baseline for binary sentiment tasks.'
Train on labeled text and predict with raw strings.
Improve quality by fixing data issues before reaching for more complex models.
Treat output scores as probabilities or confidence signals, not as labels by themselves.