lxml
Ubuntu
installation guide
Python
Linux tutorial

How to install lxml on Ubuntu

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The lxml library is one of the fastest and most feature-rich XML and HTML processing libraries available for Python, built on top of the C libraries libxml2 and libxslt. Because it includes C extensions, installing it on Ubuntu is not always as simple as running pip install. Understanding why the installation can fail and what system-level dependencies are needed will save you from frustrating build errors.

Why pip install lxml Can Fail

When you run pip install lxml, pip downloads the source distribution and attempts to compile the C extensions on your machine. This compilation step requires C header files and development libraries that are not installed on a fresh Ubuntu system by default. Without them, you will see errors like fatal error: libxml/xmlversion.h: No such file or directory or error: command 'gcc' failed.

The fix is straightforward: install the system-level build dependencies before running pip.

Installing System Dependencies

Before installing lxml through pip, install the required development libraries and a C compiler:

bash
sudo apt update
sudo apt install -y build-essential python3-dev libxml2-dev libxslt1-dev zlib1g-dev

Here is what each package provides:

  • build-essential -- the GCC compiler and related build tools
  • python3-dev -- Python header files needed to compile C extensions
  • libxml2-dev -- development headers for the libxml2 XML parser
  • libxslt1-dev -- development headers for the libxslt XSLT processor
  • zlib1g-dev -- compression library headers used by libxml2

Once those are in place, pip can compile lxml without errors:

bash
pip install lxml

Installing lxml via pip in a Virtual Environment

Using a virtual environment is the recommended practice because it isolates your project dependencies from the system Python. The system-level C libraries still need to be installed globally, but the Python package itself lives inside the virtual environment:

bash
1# Create and activate a virtual environment
2python3 -m venv myenv
3source myenv/bin/activate
4
5# Install lxml inside the virtual environment
6pip install lxml

This avoids conflicts between projects that may need different versions of lxml.

Using apt install as an Alternative

If you do not need the latest version of lxml or want to avoid compilation entirely, Ubuntu ships a pre-built package:

bash
sudo apt install python3-lxml

This installs a version packaged by Ubuntu maintainers, so it may lag behind the latest PyPI release. It also installs into the system Python, not into a virtual environment. This approach works well for system scripts or quick prototyping where having the absolute latest version is not critical.

Verifying the Installation

After installation, confirm that lxml is working correctly:

python
1import lxml
2print(lxml.__version__)
3
4from lxml import etree
5root = etree.Element("root")
6child = etree.SubElement(root, "child")
7child.text = "Hello from lxml"
8print(etree.tostring(root, pretty_print=True).decode())

Expected output:

 
15.1.0
2<root>
3  <child>Hello from lxml</child>
4</root>

If the import succeeds and the version prints, your installation is working.

Troubleshooting Common Compilation Errors

If you still see build failures after installing the dependencies, try these steps:

bash
1# Make sure pip itself is up to date
2pip install --upgrade pip setuptools wheel
3
4# Force a rebuild from source (skips cached wheels)
5pip install lxml --no-binary lxml
6
7# Check that pkg-config can find libxml2
8pkg-config --cflags libxml-2.0

If pkg-config cannot find libxml-2.0, the development headers are missing or not properly registered. Reinstalling libxml2-dev usually resolves this.

On older Ubuntu versions (18.04 or earlier), you may also need to install python3-pip explicitly:

bash
sudo apt install python3-pip

Common Pitfalls

  • Running pip install lxml without system dependencies results in cryptic C compilation errors. Always install libxml2-dev and libxslt1-dev first.
  • Forgetting python3-dev means the Python header files are missing, causing the build to fail even when the XML libraries are present.
  • Mixing system lxml with virtualenv lxml can lead to import confusion. Choose one approach and stick with it for each project.
  • Using sudo pip install installs packages into the system Python and can break system tools. Use a virtual environment or apt install python3-lxml instead.
  • Assuming apt install python3-lxml gives the latest version can cause issues if your code depends on recent lxml features. Check the version with python3 -c "import lxml; print(lxml.__version__)".

Summary

  • Install system build dependencies (libxml2-dev, libxslt1-dev, python3-dev, zlib1g-dev) before running pip install lxml.
  • Use a virtual environment to keep project dependencies isolated while still relying on system-level C libraries.
  • Use sudo apt install python3-lxml as a quick alternative that avoids compilation entirely, at the cost of a potentially older version.
  • Verify the installation by importing lxml and running a basic XML operation.
  • Keep pip, setuptools, and wheel up to date to minimize build issues.

Course illustration
Course illustration

All Rights Reserved.