bit counting
memory management
programming
computer science
data processing

Bit counting in a contiguous memory chunk

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Bit counting over a contiguous memory chunk means counting how many 1 bits appear across a byte buffer. This comes up in bitmap processing, compression, networking, and low-level analytics, so the best implementation depends not only on the bit trick you use but also on how efficiently you scan memory.

Start With a Simple Byte-Wise Scan

The most direct solution is to walk the buffer byte by byte and add the population count of each byte to a running total.

Here is a portable C example using Brian Kernighan's bit-clearing trick for a single byte:

c
1#include <stddef.h>
2#include <stdint.h>
3#include <stdio.h>
4
5static int popcount_byte(uint8_t value) {
6    int count = 0;
7    while (value) {
8        value &= (uint8_t)(value - 1);
9        count++;
10    }
11    return count;
12}
13
14size_t count_bits(const uint8_t *buffer, size_t length) {
15    size_t total = 0;
16    for (size_t i = 0; i < length; i++) {
17        total += popcount_byte(buffer[i]);
18    }
19    return total;
20}
21
22int main(void) {
23    uint8_t data[] = {0x0F, 0x03, 0x80};
24    printf("%zu\n", count_bits(data, 3));
25    return 0;
26}

This is easy to reason about and correct on any platform that supports standard C integer types.

Use Built-In Popcount When the Compiler Provides It

Modern compilers often expose a popcount builtin that can map to efficient CPU instructions when available. That is usually better than hand-writing the inner loop.

c
1#include <stddef.h>
2#include <stdint.h>
3
4size_t count_bits_fast(const uint8_t *buffer, size_t length) {
5    size_t total = 0;
6
7    for (size_t i = 0; i < length; i++) {
8        total += __builtin_popcount((unsigned int)buffer[i]);
9    }
10
11    return total;
12}

The same idea can be extended to 32-bit or 64-bit chunks if your platform, alignment, and aliasing rules let you do so safely.

A Lookup Table Is Still Practical

Another classic solution is a 256-entry lookup table for every possible byte value.

c
1#include <stddef.h>
2#include <stdint.h>
3
4static uint8_t popcount_table[256];
5
6static void init_table(void) {
7    for (int i = 0; i < 256; i++) {
8        uint8_t value = (uint8_t)i;
9        uint8_t count = 0;
10        while (value) {
11            value &= (uint8_t)(value - 1);
12            count++;
13        }
14        popcount_table[i] = count;
15    }
16}
17
18size_t count_bits_table(const uint8_t *buffer, size_t length) {
19    size_t total = 0;
20    for (size_t i = 0; i < length; i++) {
21        total += popcount_table[buffer[i]];
22    }
23    return total;
24}

This trades a tiny amount of memory for a very small and predictable inner loop.

Memory Access Matters Too

A contiguous memory chunk is friendly to the CPU cache, which is good news. In large buffers, sequential scanning is usually the right baseline. The popcount algorithm is only part of the performance story. Cache behavior, alignment, and loop overhead also matter.

That is why a theoretically clever bit trick can lose in practice if it forces awkward loads or extra repacking steps. If your data is already a byte buffer, count it directly instead of first converting it into some other representation.

Wider Words Can Reduce Overhead

If the buffer is aligned and your platform allows it, processing 64-bit chunks can reduce loop overhead compared with byte-wise iteration. But the tradeoff is portability and complexity.

For many real programs, a good compromise is:

  • keep the buffer scan sequential
  • use compiler popcount support
  • benchmark before attempting wider-word reinterpretation

The memory walk is often more important than the specific micro-optimization.

Common Pitfalls

  • Optimizing the popcount math while ignoring the cost of reading the buffer itself.
  • Repacking or copying the buffer before counting bits and losing any speed advantage.
  • Assuming a hand-written trick will beat the compiler's builtin on modern CPUs.
  • Casting to wider integer types without considering alignment and portability.
  • Benchmarking tiny synthetic inputs and then drawing conclusions about large real buffers.

Summary

  • Bit counting over a contiguous memory chunk usually starts with a sequential scan and a running total.
  • A byte-wise implementation is portable and easy to verify.
  • Compiler builtins are often the best practical speedup when available.
  • Lookup tables remain a useful middle ground between portability and performance.
  • For large buffers, memory access patterns matter almost as much as the popcount operation itself.

Course illustration
Course illustration

All Rights Reserved.