How to access BERT intermediate layer outputs in TF Hub Module?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When you load a BERT encoder from TensorFlow Hub, you usually get a callable module that returns a dictionary of outputs. If you only need sentence classification, pooled_output is often enough. But if you want token-level embeddings or hidden states from each Transformer block, you need to know which output key exposes them and what TF Hub actually preserves.
The Three BERT Output Levels That Matter
Modern TF Hub BERT encoders commonly return a dictionary with keys such as:
- '
pooled_output' - '
sequence_output' - '
encoder_outputs'
These correspond to different levels of representation.
pooled_output is a single vector per input sequence. It is useful for classification tasks.
sequence_output is the final contextual embedding for every token position. Its shape is typically [batch_size, seq_length, hidden_size].
encoder_outputs is the most interesting one for intermediate-layer access. It is a list-like structure containing the output of each Transformer block. The last element matches the final sequence_output.
Accessing Intermediate Layers With hub.KerasLayer
For current TF Hub BERT models, use the preprocessing model and encoder model together.
The important line is outputs["encoder_outputs"]. That is how you access the hidden states from all Transformer layers when the SavedModel exports them.
Choosing the Right Output
Different tasks call for different levels of output.
Use pooled_output when you need one embedding for the entire sequence.
Use sequence_output when you need the final contextual embedding of each token, for example in token labeling or custom pooling.
Use encoder_outputs when you want to inspect or combine hidden states from specific BERT layers. This is common in probing, interpretability work, and experiments that concatenate or average selected hidden layers.
For example, to get the second Transformer block output:
To compare the last block with the final sequence output:
What If the Hub Layer Does Not Expose Internals
This is the main limitation of TF Hub packaging. A hub.KerasLayer wraps a SavedModel as one layer. You usually do not get a restorable internal Keras graph that you can traverse layer by layer with ordinary model.layers inspection.
So there are two separate notions of "intermediate output":
- outputs intentionally exported by the Hub model, such as
encoder_outputs - internal sublayers inside the original model architecture
The first is accessible through the returned output dictionary. The second is often not directly inspectable from the Hub wrapper.
If you need true internal-layer surgery, custom heads inserted between blocks, or per-sublayer inspection, you may be better off loading a TensorFlow Model Garden BERT encoder directly instead of the TF Hub wrapper.
A Practical Feature-Extraction Example
Suppose you want to average the top four layer outputs for each token instead of using only the final layer.
This produces a tensor with the same token-level shape as sequence_output, but the values now represent an average across multiple high-level layers.
That pattern is common in representation experiments because different BERT layers encode different kinds of information.
Legacy TF1 hub.Module Context
Older code samples may use the legacy TF1 hub.Module API rather than hub.KerasLayer. The exact calling pattern differs, but the conceptual rule is the same: check what the module exports. If the signature does not expose intermediate outputs, you cannot assume they are available just because the original model had them internally.
So if you are maintaining legacy code, inspect the module's documented outputs before trying to index hidden states.
Common Pitfalls
A common mistake is expecting sequence_output to be a list of all hidden layers. It is only the final token-level output.
Another issue is trying to inspect encoder.layers after loading TF Hub BERT as a single hub.KerasLayer. The internal Transformer blocks are usually not surfaced that way.
Be careful to pair the encoder with the correct preprocessing model. BERT input packing must match what the encoder expects.
Finally, do not assume every TF Hub text encoder exports encoder_outputs. Check the returned keys on the actual model you loaded.
Summary
- TF Hub BERT encoders typically return
pooled_output,sequence_output, and sometimesencoder_outputs. - '
encoder_outputsis the key to intermediate Transformer-block outputs when it is exported.' - '
sequence_outputis only the final token-level representation.' - '
hub.KerasLayerwraps the model as a single layer, so internal Keras sublayers are not usually inspectable.' - If you need deeper architectural control, use a more direct BERT implementation instead of relying only on TF Hub packaging.
- Always inspect the actual output keys of the loaded module before building downstream code.

