What's the u prefix in a Python string?

Python

Python Strings

Unicode

u prefix

Python2 vs Python3

What's the u prefix in a Python string?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

The u prefix marks a Unicode string literal. Its meaning depends on which Python generation you are looking at: in Python 2 it distinguished Unicode text from byte strings, while in modern Python 3 it is mostly a compatibility feature and behaves the same as a normal string literal.

What It Meant In Python 2

Python 2 had two common text-like types:

'str, which was a sequence of bytes'
'unicode, which represented text characters'

That is why these two literals were different in Python 2:

python

s1 = "hello"
s2 = u"hello"

Here:

's1 was a byte string'
's2 was a Unicode string'

If you were handling non-ASCII text in Python 2, using u"..." was important because it avoided many encoding problems.

What Happens In Python 3

In Python 3, ordinary string literals are already Unicode:

python

1s1 = "hello"
2s2 = u"hello"
3
4print(type(s1))
5print(type(s2))
6print(s1 == s2)

Output:

python

<class 'str'>
<class 'str'>
True

So in Python 3, u"hello" and "hello" produce the same str type.

The prefix was brought back for source compatibility with codebases that needed to run on both Python 2 and Python 3.

Unicode Versus Bytes In Python 3

The real distinction in Python 3 is no longer u versus plain strings. It is text versus bytes:

python

1text = "café"
2raw = b"caf\xc3\xa9"
3
4print(type(text))
5print(type(raw))

In Python 3:

'str is Unicode text'
'bytes is raw binary data'

That is the difference that now matters when reading files, calling network APIs, or encoding and decoding data.

Why You Still See `u"..."` Today

There are a few reasons modern code still contains the prefix:

the codebase used to support Python 2
the author wants compatibility with older shared modules
the string was copied from legacy examples

It is not wrong in Python 3. It is just usually unnecessary.

For example, this is perfectly valid:

python

message = u"hello"

But most new Python 3 code simply writes:

python

message = "hello"

When The Prefix Does Not Help

The u prefix does not solve encoding issues by itself. If bytes arrive from a file or socket, you still need to decode them correctly:

python

data = b"caf\xc3\xa9"
text = data.decode("utf-8")
print(text)

Likewise, when writing text to a byte-oriented destination, you still encode:

python

text = "café"
data = text.encode("utf-8")
print(data)

So if you are debugging a Unicode bug in Python 3, the answer is usually about str versus bytes, not about missing u prefixes.

Python has several literal prefixes, and it helps not to confuse them:

'r"..." for raw strings'
'b"..." for bytes'
'f"..." for formatted strings'
'u"..." for Unicode compatibility syntax'

Some can be combined, such as rf"...", but u is mainly historical in modern Python.

Common Pitfalls

Thinking u"..." creates a different runtime type from "..." in Python 3.
Confusing Unicode text with encoded bytes.
Trying to fix byte-decoding bugs by adding a u prefix.
Reading Python 2 examples without noticing that str behaved differently there.
Assuming the prefix is invalid in Python 3. It is accepted, just usually unnecessary.

Summary

In Python 2, u"..." created a Unicode string instead of a byte string.
In Python 3, u"..." and "..." both create ordinary str values.
The modern real distinction is str versus bytes.
You still see u mainly for compatibility with older code.
In new Python 3 code, the prefix is usually optional and unnecessary.

What's the u prefix in a Python string?

Master System Design with Codemia

Introduction

What It Meant In Python 2

What Happens In Python 3

Unicode Versus Bytes In Python 3

Why You Still See u"..." Today

When The Prefix Does Not Help

Related String Prefixes

Common Pitfalls

Summary

Why You Still See `u"..."` Today