How to know the size of the string in bytes?

string size

bytes calculation

string length

programming guide

data measurement

How to know the size of the string in bytes?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A string's length in characters and its size in bytes are different things. In ASCII, each character is 1 byte. In UTF-8, characters take 1-4 bytes depending on the code point. UTF-16 uses 2-4 bytes. The byte size depends on the encoding, and every language has a different way to measure it. Getting this right matters for database column sizing, network protocol buffers, file I/O, and API payload limits.

Python

python

1text = "Hello, World!"
2
3# Length in characters
4print(len(text))  # 13
5
6# Size in bytes (UTF-8)
7print(len(text.encode('utf-8')))   # 13 (ASCII chars = 1 byte each)
8
9# With multi-byte characters
10text = "Hello, 世界!"
11print(len(text))                    # 9 characters
12print(len(text.encode('utf-8')))   # 15 bytes (Chinese chars = 3 bytes each in UTF-8)
13print(len(text.encode('utf-16')))  # 22 bytes (includes 2-byte BOM)
14
15# sys.getsizeof includes Python object overhead
16import sys
17print(sys.getsizeof(text))  # 82 (includes object header, not just string data)

Use len(s.encode('utf-8')) for the actual byte count of the string content.

JavaScript

javascript

1// String length (UTF-16 code units)
2const text = "Hello, 世界!";
3console.log(text.length);  // 9
4
5// Byte size in UTF-8
6const encoder = new TextEncoder();
7const bytes = encoder.encode(text);
8console.log(bytes.length);  // 15
9
10// Using Blob (browser)
11const blob = new Blob([text]);
12console.log(blob.size);  // 15
13
14// Node.js
15console.log(Buffer.byteLength(text, 'utf-8'));  // 15
16
17// Emoji handling
18const emoji = "Hello 👋";
19console.log(emoji.length);                       // 8 (surrogate pair = 2 code units)
20console.log(new TextEncoder().encode(emoji).length);  // 11 bytes in UTF-8

JavaScript's .length returns UTF-16 code units, not characters or bytes.

Java

java

1String text = "Hello, 世界!";
2
3// Character count
4System.out.println(text.length());  // 9
5
6// Byte size in UTF-8
7byte[] utf8Bytes = text.getBytes("UTF-8");
8System.out.println(utf8Bytes.length);  // 15
9
10// Byte size in other encodings
11byte[] utf16Bytes = text.getBytes("UTF-16");
12System.out.println(utf16Bytes.length);  // 20 (includes 2-byte BOM)
13
14byte[] asciiBytes = text.getBytes("US-ASCII");
15System.out.println(asciiBytes.length);  // 9 (non-ASCII replaced with '?')

C#

csharp

1string text = "Hello, 世界!";
2
3// Character count
4Console.WriteLine(text.Length);  // 9
5
6// Byte size in UTF-8
7int utf8Size = System.Text.Encoding.UTF8.GetByteCount(text);
8Console.WriteLine(utf8Size);  // 15
9
10// Byte size in UTF-16 (C# internal representation)
11int utf16Size = System.Text.Encoding.Unicode.GetByteCount(text);
12Console.WriteLine(utf16Size);  // 18
13
14// Get the actual bytes
15byte[] bytes = System.Text.Encoding.UTF8.GetBytes(text);
16Console.WriteLine(bytes.Length);  // 15

Go

1package main
2
3import (
4    "fmt"
5    "unicode/utf8"
6)
7
8func main() {
9    text := "Hello, 世界!"
10
11    // Byte length (Go strings are UTF-8 by default)
12    fmt.Println(len(text))  // 15
13
14    // Character (rune) count
15    fmt.Println(utf8.RuneCountInString(text))  // 9
16
17    // len() on a Go string already gives bytes, not characters
18    ascii := "Hello"
19    fmt.Println(len(ascii))  // 5 bytes = 5 characters (all ASCII)
20}

Go strings are byte slices encoded in UTF-8. len(s) returns bytes, not characters.

C / C++

1#include <stdio.h>
2#include <string.h>
3
4int main() {
5    // ASCII string
6    const char *text = "Hello, World!";
7    printf("Bytes: %zu\n", strlen(text));  // 13 (excluding null terminator)
8    printf("With null: %zu\n", strlen(text) + 1);  // 14
9
10    // UTF-8 string
11    const char *utf8 = "Hello, 世界!";
12    printf("Bytes: %zu\n", strlen(utf8));  // 15
13    // strlen counts bytes, not characters in UTF-8
14    return 0;
15}

cpp

1#include <iostream>
2#include <string>
3
4int main() {
5    std::string text = "Hello, 世界!";
6    std::cout << text.size() << std::endl;   // 15 bytes
7    std::cout << text.length() << std::endl; // 15 bytes (same as size())
8
9    // For character count, use a UTF-8 library or count manually
10}

Ruby

ruby

1text = "Hello, 世界!"
2
3puts text.length          # 9 (characters)
4puts text.bytesize        # 15 (bytes in UTF-8)
5puts text.encode('UTF-16').bytesize  # 20 (bytes in UTF-16)
6
7# Encoding info
8puts text.encoding        # UTF-8

PHP

php

1$text = "Hello, 世界!";
2
3echo strlen($text);        // 15 (bytes)
4echo mb_strlen($text);     // 9 (characters, requires mbstring extension)
5
6// strlen() in PHP returns bytes, not characters
7// mb_strlen() counts actual characters

Encoding Size Reference

Character	UTF-8	UTF-16	UTF-32
ASCII (A, 1, !)	1 byte	2 bytes	4 bytes
Latin (e, n, u)	2 bytes	2 bytes	4 bytes
CJK (中, 世, 界)	3 bytes	2 bytes	4 bytes
Emoji (👋, 🎉)	4 bytes	4 bytes	4 bytes

Database Considerations

sql

1-- MySQL: VARCHAR(255) means 255 characters, not bytes
2-- But the byte limit depends on the row format:
3-- UTF-8 (utf8mb3): up to 3 bytes per character
4-- UTF-8 (utf8mb4): up to 4 bytes per character (supports emoji)
5
6CREATE TABLE users (
7    name VARCHAR(100) CHARACTER SET utf8mb4
8    -- Can store 100 characters, using up to 400 bytes
9);
10
11-- Check actual byte length in MySQL
12SELECT LENGTH(name) AS bytes, CHAR_LENGTH(name) AS chars FROM users;

Common Pitfalls

Confusing len() with byte size: In Python and Ruby, len() returns characters. In Go and C, len() returns bytes. Know what your language's default string length function measures.
Assuming 1 character = 1 byte: True only for ASCII. A single emoji like 👋 takes 4 bytes in UTF-8. Chinese characters take 3 bytes. Always use the encoding-specific byte count function.
JavaScript .length is not characters or bytes: It returns UTF-16 code units. Emoji and some characters use 2 code units (surrogate pairs), so .length overcounts characters and undercounts bytes.
Database VARCHAR limits: MySQL's VARCHAR(n) is in characters, but the row has a byte limit. A VARCHAR(255) column with utf8mb4 can use up to 1020 bytes per value.
Null terminators in C: strlen() returns the byte count excluding the null terminator. The actual memory used is strlen(s) + 1.

Summary

String length (characters) and byte size are different — always specify which you need
Use encoding-specific functions: Python len(s.encode('utf-8')), JS new TextEncoder().encode(s).length, Java s.getBytes("UTF-8").length
UTF-8 uses 1-4 bytes per character; UTF-16 uses 2-4 bytes; ASCII uses exactly 1 byte
Go's len(s) returns bytes; Python/Ruby len(s) returns characters
Always check your database's character set when sizing VARCHAR columns