C programming
strlen optimization
string manipulation
performance tuning
algorithm improvement

How to implement strlen as fast as possible

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Implementing strlen efficiently is a classic low-level optimization topic in C. The straightforward byte-by-byte loop is simple and portable, while faster versions use word-at-a-time scanning and CPU-specific tricks. In practice, correctness and memory safety come first, then optimization based on measurement.

Baseline Portable Implementation

The baseline implementation scans one byte at a time until it finds the null terminator. This is easy to verify and works on any platform.

c
1#include <stddef.h>
2
3size_t my_strlen_basic(const char *s) {
4    const char *p = s;
5    while (*p) {
6        p++;
7    }
8    return (size_t)(p - s);
9}

This is a useful correctness reference even if you later add optimized variants.

Word-at-a-Time Technique

A common optimization reads machine words and detects whether any byte in the word is zero using bit operations.

c
1#include <stddef.h>
2#include <stdint.h>
3
4static inline int has_zero_byte(uint64_t x) {
5    return ((x - 0x0101010101010101ULL) & ~x & 0x8080808080808080ULL) != 0;
6}
7
8size_t my_strlen_word(const char *s) {
9    const char *p = s;
10
11    while (((uintptr_t)p & 7) != 0) {
12        if (*p == '�') return (size_t)(p - s);
13        p++;
14    }
15
16    const uint64_t *w = (const uint64_t *)p;
17    while (!has_zero_byte(*w)) {
18        w++;
19    }
20
21    p = (const char *)w;
22    while (*p) {
23        p++;
24    }
25
26    return (size_t)(p - s);
27}

This can reduce loop iterations significantly on long strings.

Benchmark Before Optimizing

Micro-optimizations should be guided by timing data on your target platform. Compiler built-ins and libc implementations are often already highly tuned.

c
1#include <stdio.h>
2#include <string.h>
3#include <time.h>
4
5int main(void) {
6    const char *text = "abcdefghijklmnopqrstuvwxyz0123456789";
7    const int n = 10000000;
8
9    clock_t t0 = clock();
10    for (int i = 0; i < n; i++) {
11        (void)strlen(text);
12    }
13    clock_t t1 = clock();
14
15    printf("libc strlen ticks: %ld
16", (long)(t1 - t0));
17    return 0;
18}

If standard library strlen is already optimal for your use case, custom implementations may not be worth maintenance cost.

Safety and Undefined Behavior Considerations

Optimized implementations must still avoid undefined behavior. Unaligned access may be slower or invalid on some architectures. Reading past mapped memory boundaries can crash if pointer assumptions are wrong. Keep fallback byte scanning logic and alignment checks explicit.

When performance is critical, use profiling tools and test across architectures before adopting custom routines in production.

Consider Compiler and Platform Intrinsics

Compilers already recognize common string patterns and may replace calls with tuned built-ins. Before writing custom assembly-style logic, inspect generated code and benchmark with optimization flags enabled.

bash
gcc -O3 -march=native -S strlen_bench.c

In many environments, the standard library implementation uses vector instructions and outperforms naive custom code.

Test Correctness with Edge Cases

Any custom strlen variant must be validated on empty strings, aligned and unaligned pointers, very long strings, and randomized data.

c
1#include <assert.h>
2
3void run_tests(void) {
4    assert(my_strlen_basic("") == 0);
5    assert(my_strlen_basic("a") == 1);
6    assert(my_strlen_basic("hello") == 5);
7}

For optimized versions, compare results against strlen across thousands of generated strings.

Keep Optimization Maintainable

Low-level bit tricks can become hard to maintain for most teams. If gains are small, prefer standard library calls and focus optimization work where profiling shows larger impact. Maintenance cost is part of performance engineering decisions.

Compare Against libc in Continuous Tests

If you keep a custom implementation, add automated checks that compare output with strlen over randomized inputs in CI. This catches correctness regressions early and reduces risk from future compiler or platform changes.

Common Pitfalls

  • Optimizing before measuring real bottlenecks.
  • Assuming one implementation performs best across all CPUs.
  • Introducing undefined behavior through unsafe pointer reads.
  • Replacing standard library calls without strong evidence of benefit.

Summary

  • Start with a correct baseline strlen implementation.
  • Use word-at-a-time scanning only when profiling justifies it.
  • Benchmark against standard library performance on target hardware.
  • Preserve safety and portability while optimizing.

Course illustration
Course illustration

All Rights Reserved.