XML
Python
Pretty Print
Code Formatting
Programming

Pretty printing XML in Python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Pretty printing XML in Python improves readability, debugging speed, and code review quality, especially when payloads are large. The challenge is that different libraries format slightly differently and may change whitespace or declarations. This guide compares practical options and shows how to keep output clean without breaking semantics.

Core Topic Sections

Choose the right formatter for your use case

Python offers three common approaches:

  1. xml.dom.minidom from standard library.
  2. xml.etree.ElementTree with indent in modern Python.
  3. lxml.etree for advanced XML features and speed.

For simple tools, standard library is usually enough. For namespaces, schema workflows, or large documents, lxml is often better.

Pretty print with minidom

minidom is easy to use, but it may add extra blank lines if input already contains text nodes with whitespace.

python
1from xml.dom import minidom
2
3raw_xml = "<root><item id='1'>alpha</item><item id='2'>beta</item></root>"
4dom = minidom.parseString(raw_xml)
5pretty = dom.toprettyxml(indent="  ")
6
7print(pretty)

If output has too many blank lines, filter empty lines:

python
clean = "\n".join(line for line in pretty.splitlines() if line.strip())
print(clean)

This is acceptable for scripts and small documents.

Pretty print with ElementTree

From Python 3.9 onward, xml.etree.ElementTree.indent provides clean formatting without external dependencies.

python
1import xml.etree.ElementTree as ET
2
3raw_xml = "<catalog><book><title>Pragmatic XML</title></book></catalog>"
4root = ET.fromstring(raw_xml)
5ET.indent(root, space="  ")
6
7xml_text = ET.tostring(root, encoding="unicode")
8print(xml_text)

When writing to file with declaration:

python
tree = ET.ElementTree(root)
tree.write("catalog.xml", encoding="utf-8", xml_declaration=True)

This approach is stable and easy to maintain in standard Python environments.

Pretty print with lxml for advanced workflows

lxml supports robust parser options, namespace-heavy documents, and high-performance operations.

python
1from lxml import etree
2
3parser = etree.XMLParser(remove_blank_text=True)
4root = etree.fromstring(b"<root><a>1</a><b><c>2</c></b></root>", parser=parser)
5
6pretty = etree.tostring(root, pretty_print=True, xml_declaration=True, encoding="UTF-8")
7print(pretty.decode("utf-8"))

remove_blank_text=True helps normalize inconsistent whitespace from source files.

Preserve meaning while changing formatting

Pretty printing should change appearance, not data meaning. Be careful with:

  1. Text nodes where whitespace is significant.
  2. Mixed content XML where text and child nodes interleave.
  3. Attribute ordering expectations in external tools.

Before reformatting in pipelines, test a parse-serialize round trip and validate business logic consumers.

Encoding and declaration strategy

Many bugs come from encoding mismatch, not indentation itself. Use UTF-8 consistently and decide whether declaration is required.

Recommended file output rules:

  1. Always write with explicit encoding.
  2. Include declaration for files exchanged across systems.
  3. Keep line endings consistent in repositories.

Stable formatting conventions reduce noisy diffs and merge conflicts.

Build a reusable utility function

A small utility keeps behavior consistent across scripts.

python
1import xml.etree.ElementTree as ET
2
3def pretty_xml(xml_text: str) -> str:
4    root = ET.fromstring(xml_text)
5    ET.indent(root, space="  ")
6    return ET.tostring(root, encoding="unicode")
7
8if __name__ == "__main__":
9    sample = "<root><user><name>Ada</name></user></root>"
10    print(pretty_xml(sample))

Centralizing formatting policy makes future changes simple.

Test formatting in automation

If XML formatting is part of CI or release tasks:

  1. Add fixtures for representative XML shapes.
  2. Assert output remains parseable.
  3. Compare normalized structures rather than exact whitespace when appropriate.

This prevents accidental breakage when library versions change.

Common Pitfalls

  • Using toprettyxml directly and committing outputs with excessive blank lines.
  • Reformatting XML where whitespace is semantically significant.
  • Writing files without explicit encoding and causing downstream parse failures.
  • Mixing formatter libraries across tools and producing inconsistent diffs.
  • Comparing XML as raw strings instead of comparing parsed structure.

Summary

  • Python supports XML pretty printing through minidom, ElementTree, and lxml.
  • ElementTree.indent is a strong default for modern standard-library workflows.
  • lxml is best for advanced parsing and namespace-heavy documents.
  • Formatting should improve readability without changing XML meaning.
  • Use shared utility functions and tests for consistent, reliable output.

Course illustration
Course illustration

All Rights Reserved.