null-terminated strings
string handling
memory management
programming techniques
computer science

Why null-terminated strings? Or null-terminated vs. characters length storage

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Null-terminated strings (used in C) store characters followed by a '\0' byte to mark the end. Length-prefixed strings (used in Pascal, Rust, Go, and most modern languages) store the character count alongside the data. Null termination was chosen by C for simplicity — it requires no extra metadata and works naturally with pointer arithmetic. Length-prefixed strings are faster for length lookups (O(1) vs O(n)), can contain embedded null bytes, and are safer against buffer overflows. Most modern languages use length-prefixed strings.

Null-Terminated Strings (C-Style)

c
1// Memory layout: ['H', 'e', 'l', 'l', 'o', '\0']
2char str[] = "Hello";
3
4// Finding the length requires scanning the entire string
5size_t len = strlen(str);  // O(n) — walks until '\0' is found
6
7// Implementation of strlen
8size_t my_strlen(const char *s) {
9    size_t len = 0;
10    while (s[len] != '\0') {
11        len++;
12    }
13    return len;
14}
15
16// Pointer arithmetic works naturally
17char *ptr = str;
18while (*ptr) {
19    putchar(*ptr);
20    ptr++;
21}

The null terminator '\0' (byte value 0) signals the end of the string. Functions like strlen, strcpy, and printf rely on this convention.

Length-Prefixed Strings

c
1// Pascal-style: first byte stores the length
2// Memory layout: [5, 'H', 'e', 'l', 'l', 'o']
3// Max string length: 255 (single byte prefix)
4
5// Modern approach: length stored as a separate field
6struct String {
7    size_t length;    // 8 bytes on 64-bit
8    size_t capacity;  // allocated buffer size
9    char *data;       // pointer to character data
10};
rust
1// Rust String: length + capacity + pointer
2let s = String::from("Hello");
3println!("{}", s.len());  // O(1) — length is stored
4
5// Rust &str: pointer + length (a "fat pointer")
6let slice: &str = "Hello";
7// No null terminator needed — length is part of the reference
go
// Go string: pointer + length
s := "Hello"
fmt.Println(len(s))  // O(1) — length is stored in the string header

Why C Chose Null Termination

c
1// 1. Minimal overhead: only 1 extra byte, no separate length field
2char name[] = "Alice";  // 6 bytes total (5 chars + '\0')
3
4// 2. Works with pointer arithmetic
5void print_string(const char *s) {
6    while (*s) putchar(*s++);  // simple pointer-based iteration
7}
8
9// 3. Substrings are trivial — just point to a different position
10const char *str = "Hello, World!";
11const char *world = str + 7;  // points to "World!" — no copy needed
12
13// 4. No maximum length restriction from the prefix size
14// Pascal strings with 1-byte prefix are limited to 255 characters

In 1972 when C was created, memory was extremely limited. Storing a length field (even 1-2 bytes) for every string was considered wasteful. Null termination uses just 1 extra byte regardless of pointer or integer size.

Comparison

FeatureNull-TerminatedLength-Prefixed
Length lookupO(n) — scan to \0O(1) — read stored length
Embedded nullsNot possibleSupported
SubstringPointer offset (easy)New struct (copy metadata)
Buffer overflow riskHigh (no bounds info)Low (length is known)
Memory overhead1 byte per string4-16 bytes per string
ConcatenationO(n+m) — find end firstO(m) — append at known offset

The Buffer Overflow Problem

c
1// Null-terminated strings have no built-in bounds checking
2char buffer[10];
3strcpy(buffer, "This string is way too long for the buffer");
4// Buffer overflow! Writes past the end of buffer into adjacent memory
5
6// Safe alternative: strncpy (but has its own issues)
7strncpy(buffer, "This is too long", sizeof(buffer) - 1);
8buffer[sizeof(buffer) - 1] = '\0';
9
10// Modern C: strlcpy (BSD) or snprintf
11snprintf(buffer, sizeof(buffer), "%s", "This is too long");
12// Truncates safely and always null-terminates

Because null-terminated strings carry no length information, functions like strcpy and strcat cannot check whether the destination buffer is large enough. This has been the source of countless security vulnerabilities.

Embedded Null Bytes

c
1// C strings cannot contain '\0' within the data
2char data[] = "Hello\0World";
3printf("%s\n", data);       // prints "Hello" — stops at first '\0'
4printf("%zu\n", strlen(data)); // 5 — only counts to first '\0'
python
1# Python strings (length-prefixed) can contain null bytes
2s = "Hello\x00World"
3print(len(s))   # 11 — null byte is counted
4print(s)        # Hello World (display varies by terminal)

Binary data and certain file formats contain null bytes. Null-terminated strings cannot represent this data faithfully, which is why languages that handle binary data use length-prefixed strings.

How Modern Languages Handle Strings

python
1# Python: length-prefixed Unicode string
2s = "Hello"
3len(s)  # O(1)
4
5# Java: length-prefixed char array (UTF-16)
6# String s = "Hello";
7# s.length();  // O(1)
8
9# C++: std::string is length-prefixed but also null-terminated (for C compatibility)
10// std::string s = "Hello";
11// s.size();   // O(1)
12// s.c_str();  // returns null-terminated const char*
13
14# Rust: String/&str are length-prefixed, CString for C interop
15// let s = String::from("Hello");
16// s.len();  // O(1)
17// let c = CString::new("Hello").unwrap();  // null-terminated for FFI

Most modern languages use length-prefixed strings internally. C++ provides both through std::string which maintains a null terminator at the end for c_str() compatibility.

Common Pitfalls

  • Forgetting the null terminator when allocating C strings: malloc(strlen(s)) allocates one byte too few. You need malloc(strlen(s) + 1) to include space for the '\0'. This off-by-one error is one of the most common C bugs.
  • Assuming strlen is O(1): In C, strlen must scan the entire string to find the null terminator. Calling strlen in a loop condition (for (i = 0; i < strlen(s); i++)) makes the loop O(n^2). Store the length in a variable first.
  • Using strncpy as a safe strcpy replacement: strncpy does not guarantee null termination — if the source is longer than the destination, the result is not null-terminated. Use snprintf or strlcpy instead.
  • Mixing null-terminated and length-prefixed strings in FFI: When calling C libraries from Python, Rust, or Go, you must convert length-prefixed strings to null-terminated format (and vice versa). Forgetting this causes garbage output or crashes at the language boundary.
  • Assuming all string operations are O(1) in modern languages: While len() is O(1), operations like concatenation, slicing, and comparison are still O(n). Length-prefixed strings improve metadata access, not all string operations.

Summary

  • C uses null-terminated strings for simplicity and minimal memory overhead — just 1 extra byte per string
  • Length-prefixed strings provide O(1) length lookup, support embedded null bytes, and reduce buffer overflow risk
  • Null termination was a reasonable choice in 1972 when memory was scarce, but it has caused decades of security vulnerabilities
  • Most modern languages (Python, Java, Go, Rust) use length-prefixed strings
  • C++ std::string provides both — length-prefixed storage with a null terminator for C compatibility
  • When interfacing with C code (FFI), convert between the two formats explicitly

Course illustration
Course illustration

All Rights Reserved.