Python
directory tree
file system
os module
coding tutorial

Directory-tree listing in Python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Listing a directory tree sounds simple until you need readable output, recursive traversal, and safe handling of missing permissions or symbolic links. Python gives you solid tools for this in both os and pathlib, and the right choice depends on whether you want full control or a cleaner object-oriented API.

Recursive Listing with os.walk

The most common solution is os.walk(). It yields a tuple containing the current directory, the subdirectories under it, and the files in it. Because it walks recursively for you, it is usually the best starting point for command-line utilities and maintenance scripts.

python
1from pathlib import Path
2import os
3
4
5def print_tree(start_dir: str) -> None:
6    start_path = Path(start_dir).resolve()
7    print(start_path.name)
8
9    for root, dirs, files in os.walk(start_path):
10        root_path = Path(root)
11        level = len(root_path.relative_to(start_path).parts)
12        indent = "    " * level
13
14        for directory in sorted(dirs):
15            print(f"{indent}|-- {directory}/")
16
17        for file_name in sorted(files):
18            print(f"{indent}|-- {file_name}")
19
20
21if __name__ == "__main__":
22    print_tree(".")

This works well, but there is an important detail: os.walk() lists the current directory and then all children under it. If you want a prettier tree, you often need to compute indentation yourself, as shown above.

Another useful feature is that you can modify dirs in place to control traversal. That lets you skip heavy or irrelevant folders such as .git, node_modules, or build directories.

python
1import os
2
3
4def list_source_tree(start_dir: str) -> None:
5    ignored = {"node_modules", ".git", "__pycache__"}
6
7    for root, dirs, files in os.walk(start_dir):
8        dirs[:] = [d for d in dirs if d not in ignored]
9
10        print(f"\n{root}")
11        for file_name in sorted(files):
12            if file_name.endswith((".py", ".md")):
13                print(f"  - {file_name}")
14
15
16if __name__ == "__main__":
17    list_source_tree(".")

Filtering dirs[:] instead of reassigning dirs is the key detail. os.walk() reads that list to decide where to recurse next.

Using pathlib for Cleaner Code

If you prefer modern path handling, pathlib is easier to read. It does not replace every os.walk() use case, but it makes many tasks simpler, especially when you want Path objects instead of raw strings.

python
1from pathlib import Path
2
3
4def print_python_files(start_dir: str) -> None:
5    root = Path(start_dir)
6
7    for path in sorted(root.rglob("*.py")):
8        if path.is_file():
9            print(path.relative_to(root))
10
11
12if __name__ == "__main__":
13    print_python_files(".")

rglob() is excellent when your real goal is "find every file matching a pattern" rather than "manually process directory state at each level." The output is often simpler, and path manipulation becomes much clearer because Path methods like relative_to(), name, and suffix are built in.

That said, rglob() is not a direct drop-in replacement for a visual tree printer. When you need separate access to directories and files at each level, os.walk() still gives better structure.

Building a Useful Tree Printer

A production-friendly tree listing usually needs a few extras:

  • deterministic ordering with sorted()
  • exclusion of hidden or generated folders
  • graceful handling of permission errors
  • optional depth limits for very large trees

Here is a version with a depth limit:

python
1from pathlib import Path
2import os
3
4
5def print_tree(start_dir: str, max_depth: int = 2) -> None:
6    start = Path(start_dir).resolve()
7    print(start.name)
8
9    for root, dirs, files in os.walk(start):
10        root_path = Path(root)
11        depth = len(root_path.relative_to(start).parts)
12
13        if depth >= max_depth:
14            dirs[:] = []
15
16        indent = "    " * depth
17
18        for name in sorted(dirs):
19            print(f"{indent}|-- {name}/")
20
21        for name in sorted(files):
22            print(f"{indent}|-- {name}")
23
24
25if __name__ == "__main__":
26    print_tree(".", max_depth=3)

Limiting depth is important in monorepos or large data directories where a full recursive printout becomes noisy and slow.

Common Pitfalls

One common mistake is assuming os.listdir() is recursive. It is not. It only lists one directory level, so you need additional logic if you expect nested output.

Another problem is following symbolic links without realizing it. Recursive traversal can loop or revisit content unexpectedly if your code follows links carelessly. If symbolic links matter in your project, inspect them explicitly with Path.is_symlink() and decide how to handle them.

Permission errors are also easy to ignore during local testing. On a development machine, everything may work, but production or CI environments can contain restricted directories. Wrap sensitive operations in try and except PermissionError when you need resilience.

Finally, be careful with output formatting. Developers often print directories as they are visited and then print files later, which can create confusing order. Sorting and computing indentation from the relative path keeps the tree predictable.

Summary

  • Use os.walk() when you need structured recursive traversal with control over directories and files.
  • Use pathlib.Path.rglob() when you mainly want to find matching files with cleaner path handling.
  • Modify dirs[:] in os.walk() to skip folders or limit recursion.
  • Sort directory and file names so output stays stable across runs.
  • Plan for symbolic links, permission errors, and very deep trees before treating a simple script as production-ready.

Course illustration
Course illustration

All Rights Reserved.