TensorFlow
machine learning
file queue
filename extraction
data processing

Accessing filename from file queue in Tensor Flow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

If you are using TensorFlow's old queue-based input pipeline, the filename usually comes back as part of the key returned by the reader. In modern TensorFlow, the cleaner answer is to use tf.data and carry the file path through the dataset directly.

Old Queue-Based Input Pipelines

Older TensorFlow code often looked like this:

python
1import tensorflow as tf
2
3filename_queue = tf.compat.v1.train.string_input_producer(
4    ["file1.txt", "file2.txt"],
5    num_epochs=1,
6    shuffle=False
7)
8
9reader = tf.compat.v1.TextLineReader()
10key, value = reader.read(filename_queue)

Here, key is not just a record counter. It usually contains the filename together with position information. So if the question is "how do I get the filename from the file queue," the practical answer is: inspect the key tensor returned by the reader.

Example in a Session

python
1import tensorflow as tf
2
3tf.compat.v1.disable_eager_execution()
4
5filename_queue = tf.compat.v1.train.string_input_producer(
6    ["file1.txt"],
7    num_epochs=1,
8    shuffle=False
9)
10
11reader = tf.compat.v1.TextLineReader()
12key, value = reader.read(filename_queue)
13
14with tf.compat.v1.Session() as sess:
15    sess.run([
16        tf.compat.v1.local_variables_initializer(),
17        tf.compat.v1.global_variables_initializer()
18    ])
19
20    coord = tf.train.Coordinator()
21    threads = tf.compat.v1.train.start_queue_runners(sess=sess, coord=coord)
22
23    try:
24        while True:
25            k, v = sess.run([key, value])
26            print("key:", k.decode("utf-8"))
27            print("value:", v.decode("utf-8"))
28    except tf.errors.OutOfRangeError:
29        pass
30    finally:
31        coord.request_stop()
32        coord.join(threads)

The printed key commonly contains something like the filename and line number.

Modern tf.data Approach

Queue runners are legacy TensorFlow. The modern replacement is tf.data, where the filename is naturally part of the dataset:

python
1import tensorflow as tf
2
3files = tf.data.Dataset.list_files("*.txt", shuffle=False)
4
5for path in files.take(3):
6    print(path.numpy().decode("utf-8"))

If you want both path and file contents, map a function that keeps the path:

python
1def load_file(path):
2    text = tf.io.read_file(path)
3    return path, text
4
5dataset = files.map(load_file)

Another advantage of tf.data is that the path remains an ordinary tensor in the pipeline. You do not have to reverse-engineer it from reader metadata, and you can pass it through batching, mapping, and debugging steps much more explicitly than in queue-runner code.

That is much clearer than parsing metadata out of a queue-reader key.

When to Keep the Filename

The filename is often useful for:

  • debugging bad examples
  • grouping records by source file
  • writing predictions back with source context
  • applying file-specific parsing logic

So keeping it in the pipeline is often a good idea rather than something to discard immediately.

This is especially true when training data comes from many shards or mixed data sources. Once a bad example appears, the filename can be the fastest way to trace that example back to the upstream generator or raw file that produced it.

The same path information is useful when writing validation reports, counting examples per source file, or skipping known-bad shards temporarily during investigation.

That kind of provenance becomes more important as datasets grow, because debugging one bad record is much faster when the input pipeline keeps its source filename attached all the way through processing.

Common Pitfalls

  • Looking for the filename in the queue object instead of the reader output.
  • Forgetting that queue-runner code is legacy and more awkward than tf.data.
  • Parsing the old key format too rigidly instead of treating it as reader metadata.
  • Dropping the path early and then losing traceability during preprocessing.
  • Mixing eager-style TensorFlow with old session-and-queue code without compatibility wrappers.

Summary

  • In old queue-based TensorFlow, get the filename from the reader's key.
  • The key usually includes filename and position metadata.
  • In modern TensorFlow, prefer tf.data and carry the path explicitly.
  • Keeping filenames in the pipeline helps debugging and provenance.
  • If you are starting new code, use tf.data instead of queue runners.

Course illustration
Course illustration

All Rights Reserved.