TensorFlow
concurrent sessions
machine learning
deep learning
Python programming

Running multiple tensorflow sessions concurrently

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Running multiple TensorFlow sessions concurrently is useful for parallel model training, hyperparameter search, and serving multiple models. In TensorFlow 1.x, each tf.Session can run in its own thread or process with separate graphs. In TensorFlow 2.x, eager execution removed the session API, but you can still run multiple models concurrently using Python threading, multiprocessing, or tf.distribute strategies. GPU memory management is the main challenge — multiple sessions competing for GPU memory without proper configuration will crash with out-of-memory errors.

TensorFlow 1.x: Multiple Sessions

python
1import tensorflow.compat.v1 as tf
2tf.disable_eager_execution()
3
4# Session 1: simple computation
5graph1 = tf.Graph()
6with graph1.as_default():
7    a = tf.constant(2.0)
8    b = tf.constant(3.0)
9    result1 = tf.multiply(a, b)
10
11# Session 2: different computation
12graph2 = tf.Graph()
13with graph2.as_default():
14    x = tf.constant(10.0)
15    y = tf.constant(5.0)
16    result2 = tf.add(x, y)
17
18# Run concurrently with threads
19import threading
20
21results = {}
22
23def run_session(graph, op, name):
24    with tf.Session(graph=graph) as sess:
25        results[name] = sess.run(op)
26
27t1 = threading.Thread(target=run_session, args=(graph1, result1, "multiply"))
28t2 = threading.Thread(target=run_session, args=(graph2, result2, "add"))
29
30t1.start()
31t2.start()
32t1.join()
33t2.join()
34
35print(results)  # {'multiply': 6.0, 'add': 15.0}

Each session must have its own tf.Graph. Using the same graph across sessions without synchronization causes race conditions.

GPU Memory Configuration

By default, TensorFlow allocates all available GPU memory to the first session. Configure memory growth to allow multiple sessions to share the GPU:

python
1import tensorflow as tf
2
3# Option 1: Allow memory growth (allocate as needed)
4gpus = tf.config.experimental.list_physical_devices('GPU')
5for gpu in gpus:
6    tf.config.experimental.set_memory_growth(gpu, True)
7
8# Option 2: Limit memory per session (TF 1.x)
9import tensorflow.compat.v1 as tf1
10config = tf1.ConfigProto()
11config.gpu_options.per_process_gpu_memory_fraction = 0.4  # 40% of GPU memory
12session = tf1.Session(config=config)
13
14# Option 3: Set memory limit (TF 2.x)
15gpus = tf.config.experimental.list_physical_devices('GPU')
16if gpus:
17    tf.config.set_logical_device_configuration(
18        gpus[0],
19        [tf.config.LogicalDeviceConfiguration(memory_limit=2048)]  # 2 GB
20    )

TensorFlow 2.x: Concurrent Model Training

python
1import tensorflow as tf
2import threading
3
4def train_model(model_name, units, epochs):
5    model = tf.keras.Sequential([
6        tf.keras.layers.Dense(units, activation='relu', input_shape=(10,)),
7        tf.keras.layers.Dense(1)
8    ])
9    model.compile(optimizer='adam', loss='mse')
10
11    # Generate dummy data
12    import numpy as np
13    X = np.random.randn(1000, 10).astype('float32')
14    y = np.random.randn(1000, 1).astype('float32')
15
16    model.fit(X, y, epochs=epochs, verbose=0)
17    loss = model.evaluate(X, y, verbose=0)
18    print(f"{model_name}: loss = {loss:.4f}")
19
20# Train two models concurrently
21t1 = threading.Thread(target=train_model, args=("ModelA", 32, 10))
22t2 = threading.Thread(target=train_model, args=("ModelB", 64, 10))
23
24t1.start()
25t2.start()
26t1.join()
27t2.join()

Multiprocessing for True Parallelism

Python's GIL limits threading to one CPU thread at a time. For CPU-bound work, use multiprocessing:

python
1import multiprocessing as mp
2import tensorflow as tf
3import numpy as np
4
5def train_in_process(config):
6    # Each process gets its own TensorFlow runtime
7    model = tf.keras.Sequential([
8        tf.keras.layers.Dense(config['units'], activation='relu', input_shape=(10,)),
9        tf.keras.layers.Dense(1)
10    ])
11    model.compile(optimizer='adam', loss='mse')
12
13    X = np.random.randn(1000, 10).astype('float32')
14    y = np.random.randn(1000, 1).astype('float32')
15
16    model.fit(X, y, epochs=config['epochs'], verbose=0)
17    loss = model.evaluate(X, y, verbose=0)
18    return f"{config['name']}: loss = {loss:.4f}"
19
20if __name__ == '__main__':
21    configs = [
22        {'name': 'small', 'units': 16, 'epochs': 5},
23        {'name': 'medium', 'units': 64, 'epochs': 10},
24        {'name': 'large', 'units': 128, 'epochs': 15},
25    ]
26
27    with mp.Pool(3) as pool:
28        results = pool.map(train_in_process, configs)
29
30    for r in results:
31        print(r)

Each process has its own Python interpreter and TensorFlow runtime, avoiding GIL contention.

Using CUDA_VISIBLE_DEVICES

Assign specific GPUs to each process:

python
1import os
2import multiprocessing as mp
3
4def train_on_gpu(gpu_id, model_config):
5    os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
6    import tensorflow as tf
7    # This process now only sees the assigned GPU
8    model = tf.keras.Sequential([...])
9    model.fit(...)
10
11if __name__ == '__main__':
12    p1 = mp.Process(target=train_on_gpu, args=(0, config_a))
13    p2 = mp.Process(target=train_on_gpu, args=(1, config_b))
14    p1.start()
15    p2.start()
16    p1.join()
17    p2.join()

Set CUDA_VISIBLE_DEVICES before importing TensorFlow in each process.

Common Pitfalls

  • Not configuring GPU memory growth: TensorFlow allocates all GPU memory by default. Two sessions on the same GPU without set_memory_growth(True) or per_process_gpu_memory_fraction causes the second to crash with CUDA_ERROR_OUT_OF_MEMORY.
  • Sharing a graph across threads without synchronization: In TF 1.x, two sessions running operations on the same graph concurrently can corrupt shared state. Each thread should use its own tf.Graph and tf.Session.
  • Setting CUDA_VISIBLE_DEVICES after importing TensorFlow: TensorFlow reads GPU configuration at import time. Setting the environment variable after import tensorflow has no effect. Set it before the import in each subprocess.
  • Using threading for CPU-bound training: Python's GIL serializes CPU-bound threads. For models that train primarily on CPU, use multiprocessing instead of threading for actual parallelism.
  • Ignoring TF 2.x eager execution: In TF 2.x, tf.Session is removed by default. Code that creates tf.Session objects directly will fail. Use tf.compat.v1.Session with disable_eager_execution() for legacy code, or restructure to use Keras model.fit() directly.

Summary

  • TF 1.x supports multiple concurrent sessions — each must have its own tf.Graph
  • TF 2.x uses eager execution — run concurrent models with threading or multiprocessing using Keras APIs
  • Configure GPU memory growth with set_memory_growth(True) to prevent OOM errors when sharing a GPU
  • Use multiprocessing for CPU-bound training to bypass Python's GIL
  • Assign GPUs to processes with CUDA_VISIBLE_DEVICES before importing TensorFlow
  • For distributed training across multiple GPUs, prefer tf.distribute.MirroredStrategy over manual session management

Course illustration
Course illustration

All Rights Reserved.