Python
string representation
__str__
__unicode__
programming tips

__str__ versus __unicode__

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The __str__ versus __unicode__ question is mostly a Python 2 compatibility topic. In modern Python 3 code, __unicode__ does not exist as part of the data model, and __str__ should return normal Unicode text. The confusion comes from old Python 2 code, where byte strings and Unicode strings were different types and object display methods had to account for both.

How It Worked In Python 2

Python 2 had two important text-like types:

  • 'str for byte strings'
  • 'unicode for Unicode text'

Because of that split, classes sometimes implemented both __str__ and __unicode__.

A legacy Python 2 style class might look like this:

python
1class Person(object):
2    def __init__(self, name):
3        self.name = name
4
5    def __unicode__(self):
6        return u"Person: {0}".format(self.name)
7
8    def __str__(self):
9        return self.__unicode__().encode("utf-8")

In that world:

  • 'unicode(obj) called __unicode__'
  • 'str(obj) expected a byte string from __str__'

That design existed because Python 2 could not treat all text as one unified Unicode type.

How It Works In Python 3

Python 3 simplified the model. str is Unicode text, so __str__ should return a str object and that is usually all you need.

python
1class Person:
2    def __init__(self, name):
3        self.name = name
4
5    def __str__(self):
6        return f"Person: {self.name}"
7
8
9p = Person("Ana")
10print(str(p))

There is no separate __unicode__ hook to implement in normal Python 3 code.

If you need a byte representation in Python 3, the relevant method is __bytes__, not __unicode__.

python
1class Payload:
2    def __init__(self, value):
3        self.value = value
4
5    def __str__(self):
6        return self.value
7
8    def __bytes__(self):
9        return self.value.encode("utf-8")

Where __repr__ Fits In

This topic is often easier to understand when you place __repr__ beside __str__.

  • '__str__ is the readable, user-facing representation'
  • '__repr__ is the developer-facing representation used for debugging when possible'
python
1class Person:
2    def __init__(self, name):
3        self.name = name
4
5    def __repr__(self):
6        return f"Person(name={self.name!r})"
7
8    def __str__(self):
9        return f"Person: {self.name}"

In modern code, that pairing matters much more than __str__ versus __unicode__.

Porting Old Code Safely

When migrating Python 2 code, the safest mental model is to remove the old byte-versus-Unicode split from your object methods. Keep text as Unicode internally and make __str__ return that text directly.

If legacy code used __unicode__ as the source of truth, you can usually fold that logic into __str__ during the port and delete the older method entirely. That tends to simplify the class and remove a whole category of encoding mistakes.

The only time you still need special handling is when code must emit raw bytes for a protocol or file format, and that belongs in __bytes__ or an explicit encoding step, not in __str__.

What To Do In Real Code Today

If you are writing Python 3, implement __str__ when you want a readable string form. Ignore __unicode__ entirely unless you are maintaining Python 2 compatibility code, which is rare now.

If you are maintaining old Python 2 code, the common historical pattern was:

  • put the real text logic in __unicode__
  • have __str__ return encoded bytes derived from it

But that pattern should stay in legacy maintenance, not in new Python 3 code.

Common Pitfalls

The most common mistake is trying to implement __unicode__ in Python 3 and expecting it to be called. It will not be part of normal string conversion.

Another issue is returning bytes from __str__ in Python 3. __str__ must return a text str, not bytes.

Developers also sometimes use __str__ for debugging-heavy output when __repr__ would be more appropriate. The two methods serve different audiences.

Finally, when porting Python 2 code, test every place that used implicit string conversion. Old byte-oriented assumptions often break during migration.

Summary

  • '__unicode__ is mainly a Python 2 concept tied to the old unicode type.'
  • In Python 3, implement __str__ for readable Unicode text output.
  • Use __bytes__ if you need a byte representation in modern code.
  • '__repr__ is usually the more relevant companion to __str__ today.'
  • Do not carry Python 2 string patterns into new Python 3 code unless you truly maintain legacy compatibility.

Course illustration
Course illustration

All Rights Reserved.