Why null-terminated strings? Or null-terminated vs. characters length storage
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Null-terminated strings (used in C) store characters followed by a '\0' byte to mark the end. Length-prefixed strings (used in Pascal, Rust, Go, and most modern languages) store the character count alongside the data. Null termination was chosen by C for simplicity — it requires no extra metadata and works naturally with pointer arithmetic. Length-prefixed strings are faster for length lookups (O(1) vs O(n)), can contain embedded null bytes, and are safer against buffer overflows. Most modern languages use length-prefixed strings.
Null-Terminated Strings (C-Style)
The null terminator '\0' (byte value 0) signals the end of the string. Functions like strlen, strcpy, and printf rely on this convention.
Length-Prefixed Strings
Why C Chose Null Termination
In 1972 when C was created, memory was extremely limited. Storing a length field (even 1-2 bytes) for every string was considered wasteful. Null termination uses just 1 extra byte regardless of pointer or integer size.
Comparison
| Feature | Null-Terminated | Length-Prefixed |
| Length lookup | O(n) — scan to \0 | O(1) — read stored length |
| Embedded nulls | Not possible | Supported |
| Substring | Pointer offset (easy) | New struct (copy metadata) |
| Buffer overflow risk | High (no bounds info) | Low (length is known) |
| Memory overhead | 1 byte per string | 4-16 bytes per string |
| Concatenation | O(n+m) — find end first | O(m) — append at known offset |
The Buffer Overflow Problem
Because null-terminated strings carry no length information, functions like strcpy and strcat cannot check whether the destination buffer is large enough. This has been the source of countless security vulnerabilities.
Embedded Null Bytes
Binary data and certain file formats contain null bytes. Null-terminated strings cannot represent this data faithfully, which is why languages that handle binary data use length-prefixed strings.
How Modern Languages Handle Strings
Most modern languages use length-prefixed strings internally. C++ provides both through std::string which maintains a null terminator at the end for c_str() compatibility.
Common Pitfalls
- Forgetting the null terminator when allocating C strings:
malloc(strlen(s))allocates one byte too few. You needmalloc(strlen(s) + 1)to include space for the'\0'. This off-by-one error is one of the most common C bugs. - Assuming
strlenis O(1): In C,strlenmust scan the entire string to find the null terminator. Callingstrlenin a loop condition (for (i = 0; i < strlen(s); i++)) makes the loop O(n^2). Store the length in a variable first. - Using
strncpyas a safestrcpyreplacement:strncpydoes not guarantee null termination — if the source is longer than the destination, the result is not null-terminated. Usesnprintforstrlcpyinstead. - Mixing null-terminated and length-prefixed strings in FFI: When calling C libraries from Python, Rust, or Go, you must convert length-prefixed strings to null-terminated format (and vice versa). Forgetting this causes garbage output or crashes at the language boundary.
- Assuming all string operations are O(1) in modern languages: While
len()is O(1), operations like concatenation, slicing, and comparison are still O(n). Length-prefixed strings improve metadata access, not all string operations.
Summary
- C uses null-terminated strings for simplicity and minimal memory overhead — just 1 extra byte per string
- Length-prefixed strings provide O(1) length lookup, support embedded null bytes, and reduce buffer overflow risk
- Null termination was a reasonable choice in 1972 when memory was scarce, but it has caused decades of security vulnerabilities
- Most modern languages (Python, Java, Go, Rust) use length-prefixed strings
- C++
std::stringprovides both — length-prefixed storage with a null terminator for C compatibility - When interfacing with C code (FFI), convert between the two formats explicitly

