Python
MD5
Hashing
String Manipulation
Coding Tutorial

How to get MD5 sum of a string using python?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Getting the MD5 hash of a string in Python is simple with the standard library hashlib module. The important detail is that hash functions operate on bytes, not directly on Python text objects. In practice, that means you encode the string, compute the hash, and then render the digest in hexadecimal form.

Basic MD5 Example

Python includes MD5 support without any third-party dependency.

python
1import hashlib
2
3text = "hello world"
4digest = hashlib.md5(text.encode("utf-8")).hexdigest()
5
6print(digest)

encode("utf-8") converts the string to bytes. hexdigest() returns the familiar 32-character lowercase MD5 string.

If you need the raw 16-byte value instead, use digest().

python
1import hashlib
2
3raw = hashlib.md5("hello world".encode("utf-8")).digest()
4print(raw)
5print(len(raw))

Why Encoding Matters

The same visible text can produce different hashes if the byte encoding differs. For modern Python code, utf-8 should usually be your default.

python
1import hashlib
2
3text = "cafe"
4
5utf8_hash = hashlib.md5(text.encode("utf-8")).hexdigest()
6latin1_hash = hashlib.md5(text.encode("latin-1")).hexdigest()
7
8print("utf-8 :", utf8_hash)
9print("latin1:", latin1_hash)

If your Python result does not match another system, check the encoding before checking anything else.

Put the Logic Behind a Helper

For repeated use, wrap the operation in a function.

python
1import hashlib
2
3
4def md5_hex(text: str, encoding: str = "utf-8") -> str:
5    return hashlib.md5(text.encode(encoding)).hexdigest()
6
7
8print(md5_hex("hello world"))
9print(md5_hex("hello world", "utf-8"))

This keeps call sites clean and standardizes encoding rules across the codebase.

Verify Against a Known Hash

MD5 is deterministic, so it is easy to test with a known value.

python
1import hashlib
2
3expected = "5eb63bbbe01eeed093cb22bb8f5acdc3"
4actual = hashlib.md5("hello world".encode("utf-8")).hexdigest()
5
6print(actual)
7print(actual == expected)

Checks like this are useful in tests or migration scripts where hash compatibility matters.

Incremental Hashing

Even though strings are usually hashed in one call, it helps to know that hashlib objects can be updated in parts.

python
1import hashlib
2
3h = hashlib.md5()
4h.update("hello ".encode("utf-8"))
5h.update("world".encode("utf-8"))
6
7print(h.hexdigest())

This produces the same result as hashing "hello world" in one step. The pattern is useful when data arrives in chunks.

When MD5 Is Acceptable

MD5 is still acceptable for some non-security tasks:

  • simple checksums
  • deterministic content identifiers in non-adversarial systems
  • compatibility with old protocols that already require MD5

MD5 is not suitable for:

  • password storage
  • digital signatures
  • security-sensitive integrity checks where collisions matter

If you need a secure general-purpose hash, use SHA-256 or stronger.

python
1import hashlib
2
3text = "hello world"
4sha256_digest = hashlib.sha256(text.encode("utf-8")).hexdigest()
5
6print(sha256_digest)

For passwords, use a dedicated password-hashing algorithm such as Argon2, bcrypt, or scrypt.

Normalization Before Hashing

Sometimes the business rule says text should be normalized before hashing, for example trimming whitespace or lowercasing user input.

python
1import hashlib
2
3
4def normalized_md5(text: str) -> str:
5    normalized = text.strip().lower()
6    return hashlib.md5(normalized.encode("utf-8")).hexdigest()
7
8
9print(normalized_md5(" Example "))
10print(normalized_md5("example"))

Only normalize when the rule is explicit. Do not silently alter bytes in protocol or security code unless the format requires it.

Common Pitfalls

One common mistake is passing a Python string directly to hashlib.md5 instead of bytes. In Python 3, that raises a type error.

Another mistake is forgetting that encoding changes the result. Two programs that hash the same visible text can still disagree if their byte encodings differ.

Developers also use MD5 for password hashing because it is easy and built in. That is a security mistake.

Finally, inconsistent input normalization such as extra spaces or different newline styles often causes mismatched digests and gets blamed on the hash function.

Summary

  • Use hashlib.md5(text.encode("utf-8")).hexdigest() for the standard MD5 string.
  • Hash functions work on bytes, so encoding is part of the result.
  • Wrap hashing in a helper when you need consistent behavior across a project.
  • MD5 is acceptable for checksums and compatibility, not for password storage.
  • When hashes differ, compare the exact input bytes before debugging anything else.

Course illustration
Course illustration

All Rights Reserved.