file size conversion
human-readable format
data size readability
file management
data processing

Get a human-readable version of a file size

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Raw byte counts are precise, but they are not pleasant to read. A value such as 7340032 is much easier for a user to understand as 7.0 MiB or 7.3 MB, depending on which unit system you want to use. The main engineering question is not just how to divide the number, but whether you want binary units or decimal units and how much rounding makes sense.

Binary Versus Decimal Units

There are two common conventions.

Binary units use powers of 1024 and are often labeled KiB, MiB, GiB, and so on. Decimal units use powers of 1000 and are labeled KB, MB, GB.

Both are valid, but they mean different things. Storage-device marketing often uses decimal units, while operating systems and low-level tooling frequently use binary interpretation.

That is why a formatter should make the base explicit instead of hard-coding one interpretation and hoping nobody notices.

A Simple Python Formatter

Here is a small function that formats a byte count using either binary or decimal units:

python
1def human_size(num_bytes, binary=True):
2    if num_bytes < 0:
3        raise ValueError("num_bytes must be non-negative")
4
5    base = 1024 if binary else 1000
6    units = ["B", "KiB", "MiB", "GiB", "TiB"] if binary else ["B", "KB", "MB", "GB", "TB"]
7
8    size = float(num_bytes)
9    unit_index = 0
10
11    while size >= base and unit_index < len(units) - 1:
12        size /= base
13        unit_index += 1
14
15    if unit_index == 0:
16        return f"{int(size)} {units[unit_index]}"
17
18    return f"{size:.1f} {units[unit_index]}"
19
20
21print(human_size(7340032))
22print(human_size(7340032, binary=False))

This prints values that are easier to scan in logs, file browsers, and reports.

Why the Loop-Based Approach Works Well

The repeated-division loop is simple and robust. It naturally stops at the largest unit that still keeps the value readable, and it avoids a long list of nested if checks.

It is also easy to extend. If your application handles petabytes, add more unit labels. If you want two decimal places for larger values, change the format string. The underlying structure stays the same.

Handling Edge Cases

A good formatter should decide what to do with:

  • '0 bytes'
  • very small values that should stay in B
  • extremely large numbers that exceed the last unit label
  • negative numbers, which usually signal a bug or invalid input

The example above returns 0 B naturally and raises an exception for negative input. That is a reasonable default because file sizes should not be negative.

Standard Libraries and Existing Helpers

Many ecosystems already have helpers for this job. Python packages such as humanize can format sizes, and some frameworks provide built-in utilities. If you only need a display string and do not care about custom unit policy, reusing a library is often better than maintaining your own helper.

Still, a custom function is useful when you need predictable output, no extra dependency, or a clear distinction between binary and decimal labels.

Rounding and UX Choices

A human-readable size is not just a math conversion. It is a presentation decision.

For example, 1536 bytes can appear as 1.5 KiB, 2 KB, or 1536 B depending on the audience and the level of precision you want. A file manager may prioritize readability, while a debugging tool may prioritize exactness.

That means the formatter should match the context. User-facing dashboards and logs often want rounded values. Accounting or low-level diagnostics may need the raw byte count alongside the formatted size.

Common Pitfalls

The biggest pitfall is mixing binary math with decimal labels. If you divide by 1024 but print MB, you are likely to confuse users and possibly yourself later.

Another mistake is rounding too early. If you round at each division step instead of at the end, larger units can drift noticeably from the true value.

Developers also forget the zero and negative cases. Those are easy to handle explicitly, and doing so prevents odd output such as -0.0 KB.

Finally, do not assume one unit system is universally correct. The right choice depends on the domain, so the formatter should make that choice visible rather than hidden.

Summary

  • Human-readable file sizes are about both math and presentation.
  • Decide explicitly whether you want binary units such as MiB or decimal units such as MB.
  • A loop that repeatedly divides by 1024 or 1000 is simple and reliable.
  • Handle zero, large values, and invalid negative input deliberately.
  • If the output is user-facing, choose rounding and labels that match the audience rather than only the raw number.

Course illustration
Course illustration