Python
XML
JSON
Data Conversion
Programming

Converting XML to JSON using Python?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Converting XML to JSON in Python is easy at the syntax level and tricky at the data-model level. XML and JSON do not map perfectly, so the real job is not just parsing the file, but deciding how to represent attributes, repeated elements, text nodes, and namespaces in a JSON shape that your application can actually use.

The Quick Path with xmltodict

For straightforward conversions, xmltodict is the most convenient tool. It parses XML into nested Python dictionaries and lists, which you can then serialize with the standard json module.

python
1import json
2import xmltodict
3
4xml_text = """
5<book id="42">
6    <title>Example</title>
7    <authors>
8        <author>Ada</author>
9        <author>Grace</author>
10    </authors>
11</book>
12"""
13
14data = xmltodict.parse(xml_text)
15json_text = json.dumps(data, indent=2)
16print(json_text)

This is the fastest way to get from XML text to JSON text, and it is usually good enough when the XML structure is regular and the consumer can tolerate library conventions.

How Attributes and Text Are Represented

XML has features that JSON does not have natively. xmltodict handles that by using conventions:

  • attributes are typically stored under keys prefixed with @
  • text content may be stored under a key such as #text

That means the converted structure is not a universal standard. It is one library's mapping choice. If another system expects a different JSON shape, you may need a custom conversion step.

Repeated Elements Become Lists

One of the most important edge cases is repeated XML tags. In JSON, repeated children usually become a list:

python
1import xmltodict
2
3xml_text = """
4<items>
5    <item>one</item>
6    <item>two</item>
7</items>
8"""
9
10data = xmltodict.parse(xml_text)
11print(data["items"]["item"])

That prints a Python list. However, if the source XML sometimes has one item and sometimes many, the resulting shape can vary between a scalar value and a list. That inconsistency is a common source of bugs.

Custom Conversion with ElementTree

If you need full control over the output shape, parse the XML yourself and build the JSON structure explicitly. The standard library's xml.etree.ElementTree is enough for many cases.

python
1import json
2import xml.etree.ElementTree as ET
3
4def xml_to_dict(element):
5    children = list(element)
6    if not children:
7        return element.text.strip() if element.text else ""
8
9    result = {}
10    for child in children:
11        value = xml_to_dict(child)
12        if child.tag in result:
13            if not isinstance(result[child.tag], list):
14                result[child.tag] = [result[child.tag]]
15            result[child.tag].append(value)
16        else:
17            result[child.tag] = value
18    return result
19
20xml_text = "<person><name>Ada</name><skill>math</skill><skill>logic</skill></person>"
21root = ET.fromstring(xml_text)
22print(json.dumps(xml_to_dict(root), indent=2))

This takes more work, but it gives you predictable control over how repeated tags, missing values, or attributes should appear.

File-Based Conversion

Converting a file is just a small extension of the same pattern:

python
1import json
2import xmltodict
3
4with open("input.xml", "r", encoding="utf-8") as f:
5    data = xmltodict.parse(f.read())
6
7with open("output.json", "w", encoding="utf-8") as f:
8    json.dump(data, f, indent=2, ensure_ascii=False)

The conversion itself is simple. The harder part is deciding whether the default mapping matches the schema expectations of the next system.

Namespaces and Mixed Content

XML namespaces can make tag names verbose, and mixed content can make the tree harder to flatten cleanly into JSON. If the source documents use namespaces heavily or mix text with nested elements inside the same node, expect to write custom handling instead of relying on a one-line conversion.

That is not a Python limitation. It is a consequence of XML being richer than plain JSON objects.

Common Pitfalls

  • Assuming XML and JSON have a one-to-one structural mapping.
  • Forgetting that repeated elements may need consistent list handling.
  • Ignoring attributes and then losing important metadata during conversion.
  • Using a library default shape when the downstream API expects a different JSON contract.
  • Underestimating namespaces and mixed content in real-world XML feeds.

Summary

  • 'xmltodict is the easiest way to convert simple XML to JSON in Python.'
  • XML attributes, repeated tags, and text nodes need explicit mapping choices.
  • Use ElementTree when you need a custom JSON shape.
  • File conversion is simple once the mapping rule is clear.
  • The hard part is not parsing XML, but choosing a JSON representation that stays consistent.

Course illustration
Course illustration

All Rights Reserved.