Python
list
data structures
programming
memory management

Create a list with initial capacity in Python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Python lists do not have a direct "initial capacity" parameter like Java's ArrayList(capacity). To pre-allocate a list of known size, create it with placeholder values: [None] * n or [0] * n. This avoids repeated memory reallocations during append operations. For the best performance with numeric data, use array.array or NumPy arrays, which pre-allocate contiguous memory blocks. In practice, Python's over-allocation strategy makes append() amortized O(1), so pre-allocation is only necessary for very large lists or performance-critical code.

Pre-allocate with Placeholder Values

python
1# Create a list of 1,000,000 elements initialized to None
2data = [None] * 1_000_000
3
4# Create a list of zeros
5counts = [0] * 256  # One slot per byte value
6
7# Create with a specific default
8flags = [False] * 100
9
10# Then fill in actual values
11for i in range(1_000_000):
12    data[i] = compute(i)

[None] * n allocates a list with exactly n slots in one operation, avoiding incremental growth.

Pre-allocate vs Append Performance

python
1import time
2
3n = 10_000_000
4
5# Method 1: Append (incremental growth)
6start = time.perf_counter()
7lst1 = []
8for i in range(n):
9    lst1.append(i)
10t1 = time.perf_counter() - start
11
12# Method 2: Pre-allocate then assign
13start = time.perf_counter()
14lst2 = [None] * n
15for i in range(n):
16    lst2[i] = i
17t2 = time.perf_counter() - start
18
19# Method 3: List comprehension (fastest pure Python)
20start = time.perf_counter()
21lst3 = [i for i in range(n)]
22t3 = time.perf_counter() - start
23
24# Method 4: list(range()) (fastest overall)
25start = time.perf_counter()
26lst4 = list(range(n))
27t4 = time.perf_counter() - start
28
29print(f"Append:         {t1:.3f}s")
30print(f"Pre-allocate:   {t2:.3f}s")
31print(f"Comprehension:  {t3:.3f}s")
32print(f"list(range):    {t4:.3f}s")
33
34# Typical results:
35# Append:         0.85s
36# Pre-allocate:   0.65s
37# Comprehension:  0.45s
38# list(range):    0.25s

How Python Lists Grow Internally

python
1import sys
2
3# Python over-allocates to make append() amortized O(1)
4lst = []
5prev_size = 0
6for i in range(100):
7    lst.append(i)
8    size = sys.getsizeof(lst)
9    if size != prev_size:
10        print(f"len={len(lst):3d}, allocated bytes={size}, "
11              f"slots~{(size - 56) // 8}")
12        prev_size = size
13
14# Output (CPython 3.11, 64-bit):
15# len=  1, allocated bytes=88,  slots~4
16# len=  5, allocated bytes=120, slots~8
17# len=  9, allocated bytes=184, slots~16
18# len= 17, allocated bytes=248, slots~24
19# len= 26, allocated bytes=312, slots~32
20# ...

Python's growth pattern is roughly new_size = old_size + (old_size >> 3) + 6. This means the list grows by about 12.5% each time it runs out of space, with an amortized O(1) cost per append.

NumPy Arrays for Numeric Data

python
1import numpy as np
2
3# Pre-allocate with specific size and dtype
4arr = np.zeros(1_000_000, dtype=np.float64)    # All zeros
5arr = np.empty(1_000_000, dtype=np.float64)    # Uninitialized (fastest)
6arr = np.ones(1_000_000, dtype=np.int32)       # All ones
7arr = np.full(1_000_000, -1, dtype=np.int32)   # All -1
8
9# 2D arrays
10matrix = np.zeros((1000, 1000), dtype=np.float64)
11
12# NumPy uses contiguous memory — much more efficient than Python lists
13# for numeric computation

NumPy arrays use a fixed amount of memory per element (e.g., 8 bytes for float64). A Python list of floats uses about 28 bytes per float object plus 8 bytes per list pointer — roughly 4x more memory.

array.array for Typed Arrays

python
1from array import array
2
3# Typed array — more memory-efficient than list for numeric data
4# 'i' = signed int, 'd' = double, 'f' = float
5int_array = array('i', [0] * 1_000_000)
6float_array = array('d', [0.0] * 1_000_000)
7
8# Supports append like list
9a = array('i')
10for i in range(100):
11    a.append(i)
12
13# Memory comparison
14import sys
15lst = list(range(1000))
16arr = array('i', range(1000))
17print(f"list:  {sys.getsizeof(lst)} bytes")   # ~8856 bytes
18print(f"array: {sys.getsizeof(arr)} bytes")   # ~4064 bytes

bytearray for Byte Buffers

python
1# Pre-allocate a byte buffer
2buf = bytearray(4096)  # 4KB buffer initialized to zeros
3
4# Read into it
5with open("data.bin", "rb") as f:
6    n = f.readinto(buf)
7
8# Mutable — can modify in place
9buf[0] = 0xFF

When Pre-allocation Matters

python
1# Pre-allocation matters when:
2# 1. You know the exact size upfront
3# 2. The list is very large (millions of elements)
4# 3. You are in a tight loop where reallocation adds measurable overhead
5
6# Pre-allocation does NOT matter when:
7# 1. The list is small (under 10,000 elements)
8# 2. You are building the list from a comprehension or generator
9# 3. The bottleneck is elsewhere (I/O, network, computation)
10
11# Prefer list comprehensions over pre-allocate + assign
12# This is both faster and more Pythonic:
13result = [x ** 2 for x in range(n)]
14
15# Over this:
16result = [None] * n
17for i in range(n):
18    result[i] = i ** 2

Common Pitfalls

  • Using [[]] * n for a list of lists: [[]] * n creates n references to the same list object. Modifying one modifies all: a = [[]] * 3; a[0].append(1) gives [[1], [1], [1]]. Use [[] for _ in range(n)] to create independent lists.
  • Pre-allocating when a list comprehension would be simpler: result = [None] * n followed by a loop is more verbose and often slower than result = [f(i) for i in range(n)]. List comprehensions are optimized in CPython and are the idiomatic way to build lists.
  • Using list.append() in performance-critical NumPy code: Appending to a Python list and converting to NumPy at the end (np.array(lst)) is much slower than pre-allocating a NumPy array with np.empty(n) and filling it with index assignment. The list approach creates Python objects that NumPy must then unbox.
  • Confusing np.empty with np.zeros: np.empty allocates memory without initializing it — the array contains whatever was previously in that memory location. This is faster than np.zeros but dangerous if you forget to fill every element. Use np.zeros unless you are certain every element will be written before it is read.
  • Over-optimizing small lists: For lists under 10,000 elements, the difference between append and pre-allocation is microseconds. Python's over-allocation strategy handles small lists efficiently. Focus optimization efforts on the actual bottleneck, not on pre-allocating a list of 100 items.

Summary

  • Use [value] * n to pre-allocate a Python list with a known size
  • Use list comprehensions ([f(i) for i in range(n)]) for the most Pythonic and often fastest approach
  • Use numpy.zeros(n) or numpy.empty(n) for numeric arrays with fixed memory per element
  • Python lists grow by about 12.5% each time — append() is already amortized O(1)
  • Pre-allocation provides measurable speedup only for very large lists (millions of elements)

Course illustration
Course illustration

All Rights Reserved.