TensorFlow
BERT
Intermediate Layers
TF Hub
Machine Learning

How to access BERT intermediate layer outputs in TF Hub Module?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When you load a BERT encoder from TensorFlow Hub, you usually get a callable module that returns a dictionary of outputs. If you only need sentence classification, pooled_output is often enough. But if you want token-level embeddings or hidden states from each Transformer block, you need to know which output key exposes them and what TF Hub actually preserves.

The Three BERT Output Levels That Matter

Modern TF Hub BERT encoders commonly return a dictionary with keys such as:

  • 'pooled_output'
  • 'sequence_output'
  • 'encoder_outputs'

These correspond to different levels of representation.

pooled_output is a single vector per input sequence. It is useful for classification tasks.

sequence_output is the final contextual embedding for every token position. Its shape is typically [batch_size, seq_length, hidden_size].

encoder_outputs is the most interesting one for intermediate-layer access. It is a list-like structure containing the output of each Transformer block. The last element matches the final sequence_output.

Accessing Intermediate Layers With hub.KerasLayer

For current TF Hub BERT models, use the preprocessing model and encoder model together.

python
1import tensorflow as tf
2import tensorflow_hub as hub
3
4preprocess = hub.KerasLayer(
5    "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"
6)
7encoder = hub.KerasLayer(
8    "https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1",
9    trainable=False,
10)
11
12text = tf.constant(["TensorFlow Hub makes BERT reusable."])
13encoder_inputs = preprocess(text)
14outputs = encoder(encoder_inputs)
15
16print(outputs.keys())
17print(outputs["pooled_output"].shape)
18print(outputs["sequence_output"].shape)
19print(len(outputs["encoder_outputs"]))
20print(outputs["encoder_outputs"][0].shape)

The important line is outputs["encoder_outputs"]. That is how you access the hidden states from all Transformer layers when the SavedModel exports them.

Choosing the Right Output

Different tasks call for different levels of output.

Use pooled_output when you need one embedding for the entire sequence.

Use sequence_output when you need the final contextual embedding of each token, for example in token labeling or custom pooling.

Use encoder_outputs when you want to inspect or combine hidden states from specific BERT layers. This is common in probing, interpretability work, and experiments that concatenate or average selected hidden layers.

For example, to get the second Transformer block output:

python
second_block = outputs["encoder_outputs"][1]
print(second_block.shape)

To compare the last block with the final sequence output:

python
last_block = outputs["encoder_outputs"][-1]
final_output = outputs["sequence_output"]
print(tf.reduce_all(tf.equal(last_block, final_output)))

What If the Hub Layer Does Not Expose Internals

This is the main limitation of TF Hub packaging. A hub.KerasLayer wraps a SavedModel as one layer. You usually do not get a restorable internal Keras graph that you can traverse layer by layer with ordinary model.layers inspection.

So there are two separate notions of "intermediate output":

  • outputs intentionally exported by the Hub model, such as encoder_outputs
  • internal sublayers inside the original model architecture

The first is accessible through the returned output dictionary. The second is often not directly inspectable from the Hub wrapper.

If you need true internal-layer surgery, custom heads inserted between blocks, or per-sublayer inspection, you may be better off loading a TensorFlow Model Garden BERT encoder directly instead of the TF Hub wrapper.

A Practical Feature-Extraction Example

Suppose you want to average the top four layer outputs for each token instead of using only the final layer.

python
1hidden_states = outputs["encoder_outputs"]
2stacked = tf.stack(hidden_states[-4:], axis=0)
3averaged = tf.reduce_mean(stacked, axis=0)
4print(averaged.shape)

This produces a tensor with the same token-level shape as sequence_output, but the values now represent an average across multiple high-level layers.

That pattern is common in representation experiments because different BERT layers encode different kinds of information.

Legacy TF1 hub.Module Context

Older code samples may use the legacy TF1 hub.Module API rather than hub.KerasLayer. The exact calling pattern differs, but the conceptual rule is the same: check what the module exports. If the signature does not expose intermediate outputs, you cannot assume they are available just because the original model had them internally.

So if you are maintaining legacy code, inspect the module's documented outputs before trying to index hidden states.

Common Pitfalls

A common mistake is expecting sequence_output to be a list of all hidden layers. It is only the final token-level output.

Another issue is trying to inspect encoder.layers after loading TF Hub BERT as a single hub.KerasLayer. The internal Transformer blocks are usually not surfaced that way.

Be careful to pair the encoder with the correct preprocessing model. BERT input packing must match what the encoder expects.

Finally, do not assume every TF Hub text encoder exports encoder_outputs. Check the returned keys on the actual model you loaded.

Summary

  • TF Hub BERT encoders typically return pooled_output, sequence_output, and sometimes encoder_outputs.
  • 'encoder_outputs is the key to intermediate Transformer-block outputs when it is exported.'
  • 'sequence_output is only the final token-level representation.'
  • 'hub.KerasLayer wraps the model as a single layer, so internal Keras sublayers are not usually inspectable.'
  • If you need deeper architectural control, use a more direct BERT implementation instead of relying only on TF Hub packaging.
  • Always inspect the actual output keys of the loaded module before building downstream code.

Course illustration
Course illustration

All Rights Reserved.