Design a File System

Topics Covered

Requirements and Use Cases

What the System Must Do

Core Entities

Why Composite Is the Right Pattern

Thinking Through Delete

What We Defer

File and Directory Modeling

The FileSystemNode Base Class

The File Class

The Directory Class

Path as a Value Object

Putting It Together

Design Decisions Worth Discussing

Permissions and Access Control

UNIX-Style Permission Model

Cascade Checks Along the Path

Protection Proxy Pattern

Default Permissions and Inheritance

Search and Traversal

The Visitor Pattern for Search

DFS vs BFS Traversal

Path Resolution with Dot Navigation

Combining Patterns

When to Use Which Traversal

Practice

Every operating system needs a way to organize data into a hierarchy of files and directories. Before writing any classes, you need to understand what a file system actually does and which responsibilities belong in an object-oriented design versus which belong in the operating system kernel.

Composite file system tree

What the System Must Do

A file system supports a small set of core operations:

  • Create files and directories at a given path
  • Delete files and directories (with recursive deletion for non-empty directories)
  • Move a file or directory from one location to another
  • Get size of a file (its own size) or a directory (the total size of everything inside it)
  • List contents of a directory
  • Resolve paths like /home/user/docs/report.txt into actual nodes in the tree

These operations reveal the central insight: files and directories are both "things in the file system" but behave differently. A file has content and a fixed size. A directory has no content of its own: it holds other files and directories. Yet both have a name, a parent, and a path. This is the classic setup for the Composite pattern.

Core Entities

From the use cases, you can extract the key classes:

EntityResponsibility
FileSystemNodeAbstract base for anything in the tree: name, parent, path
FileA leaf node with size and content
DirectoryA composite node that holds children
FileSystemThe root-level manager that provides the public API
PermissionAccess control metadata for each node
UserRepresents who is performing operations

Why Composite Is the Right Pattern

The Composite pattern lets you treat individual objects and groups of objects through the same interface. When client code calls getSize() on any node, it does not need to know whether that node is a file or a directory. A file returns its own size. A directory sums its children's sizes: and those children might be files or more directories. The recursion handles itself.

This uniform treatment eliminates type-checking conditionals throughout the codebase. Every operation that walks the tree, size calculation, path display, deletion, works the same way regardless of depth or structure.

Key Insight

File systems are one of the most natural examples of the Composite pattern. The part-whole hierarchy is inherent in the domain, not imposed by the design. When the domain itself is recursive, Composite is almost always the right choice.

Thinking Through Delete

Deletion is more nuanced than creation. Deleting a file is straightforward: remove it from its parent's children collection. But deleting a directory raises a question: what if it is not empty? You have two choices:

  • Refuse to delete non-empty directories (safe, requires the user to empty it first)
  • Recursive delete, remove all children first, then the directory itself

Most real file systems support both (e.g., rmdir fails on non-empty directories, while rm -rf recurses). In your design, recursive delete is another example of the Composite pattern at work: calling delete() on a directory triggers delete() on every child, which triggers delete() on their children, all the way down to the leaves. The recursion terminates when every leaf is removed.

What We Defer

An OOD interview does not expect you to model disk blocks, journaling, or memory-mapped I/O. Those are operating system internals. Focus on the object relationships and behaviors. You should also defer symbolic links initially: they introduce cycles into the tree, which complicates traversal and deletion. Mention them as an extension point, but do not let them derail the core design.

You should also set aside concurrent access for the initial design. In a real operating system, multiple processes access the file system simultaneously, requiring locks and transaction semantics. For OOD purposes, assume single-threaded access and mention concurrency as an extension point if time permits.

With the requirements clear, you can now define the class hierarchy. The goal is a clean separation between the abstract concept of "something in the file system" and the two concrete forms it takes: files that store data and directories that organize other nodes.

Permission cascade check

The FileSystemNode Base Class

Every node in the file system shares three properties: a name, a reference to its parent directory, and the ability to compute its absolute path. The base class captures this:

python
1from abc import ABC, abstractmethod
2from typing import Optional
3
4class FileSystemNode(ABC):
5    def __init__(self, name: str, parent: Optional['Directory'] = None):
6        self.name = name
7        self.parent = parent
8
9    def get_path(self) -> str:
10        if self.parent is None:
11            return "/" + self.name
12        return self.parent.get_path().rstrip("/") + "/" + self.name
13
14    @abstractmethod
15    def get_size(self) -> int:
16        pass
17
18    @abstractmethod
19    def is_directory(self) -> bool:
20        pass

The get_path() method walks up the parent chain: each node only knows its own name and its parent, but by chaining those links, you reconstruct the full path. This is the same principle behind linked list traversal.

The File Class

A file is a leaf node. It has no children. For an OOD design, you model its size as a stored value rather than simulating actual content storage:

python
1class File(FileSystemNode):
2    def __init__(self, name: str, size: int, parent: Optional['Directory'] = None):
3        super().__init__(name, parent)
4        self.size = size
5
6    def get_size(self) -> int:
7        return self.size
8
9    def is_directory(self) -> bool:
10        return False

The Directory Class

A directory is a composite node. It holds children and implements get_size() by summing them recursively:

python
1class Directory(FileSystemNode):
2    def __init__(self, name: str, parent: Optional['Directory'] = None):
3        super().__init__(name, parent)
4        self.children: dict[str, FileSystemNode] = {}
5
6    def add_child(self, node: FileSystemNode) -> None:
7        if node.name in self.children:
8            raise FileExistsError(f"'{node.name}' already exists")
9        node.parent = self
10        self.children[node.name] = node
11
12    def remove_child(self, name: str) -> FileSystemNode:
13        if name not in self.children:
14            raise FileNotFoundError(f"'{name}' not found")
15        node = self.children.pop(name)
16        node.parent = None
17        return node
18
19    def get_child(self, name: str) -> FileSystemNode:
20        if name not in self.children:
21            raise FileNotFoundError(f"'{name}' not found")
22        return self.children[name]
23
24    def list_children(self) -> list[str]:
25        return sorted(self.children.keys())
26
27    def get_size(self) -> int:
28        return sum(child.get_size() for child in self.children.values())
29
30    def is_directory(self) -> bool:
31        return True

Notice that children is a dictionary keyed by name, not a list. This gives O(1) lookup by name: essential because path resolution navigates one segment at a time, and each segment is a name lookup in the current directory.

Common Pitfall

A directory must never contain itself as a child, directly or indirectly. The add_child method should validate that the new child is not an ancestor of the current directory. Without this check, get_size() and get_path() would recurse infinitely.

Path as a Value Object

Paths like /home/user/docs/report.txt appear everywhere in a file system. Rather than passing raw strings around, a Path value object parses the string into segments and provides navigation utilities:

python
1class Path:
2    def __init__(self, path_string: str):
3        self.is_absolute = path_string.startswith("/")
4        parts = path_string.strip("/").split("/")
5        self.segments = [p for p in parts if p]
6
7    def resolve(self, root: Directory) -> FileSystemNode:
8        current: FileSystemNode = root
9        for segment in self.segments:
10            if not current.is_directory():
11                raise NotADirectoryError(f"'{current.name}' is not a directory")
12            current = current.get_child(segment)
13        return current

This separates parsing logic from tree traversal logic. The resolve method walks the tree one segment at a time, which makes it easy to add . and .. handling later.

Putting It Together

The FileSystem class ties everything into a usable API:

python
1class FileSystem:
2    def __init__(self):
3        self.root = Directory("")
4
5    def create_file(self, path: str, size: int) -> File:
6        p = Path(path)
7        parent = Path("/".join(p.segments[:-1])).resolve(self.root)
8        file = File(p.segments[-1], size)
9        parent.add_child(file)
10        return file
11
12    def create_directory(self, path: str) -> Directory:
13        p = Path(path)
14        parent = Path("/".join(p.segments[:-1])).resolve(self.root)
15        directory = Directory(p.segments[-1])
16        parent.add_child(directory)
17        return directory
18
19    def get_size(self, path: str) -> int:
20        node = Path(path).resolve(self.root)
21        return node.get_size()
22
23    def move(self, src: str, dest: str) -> None:
24        src_path = Path(src)
25        dest_path = Path(dest)
26        src_parent = Path("/".join(src_path.segments[:-1])).resolve(self.root)
27        node = src_parent.remove_child(src_path.segments[-1])
28        dest_dir = dest_path.resolve(self.root)
29        dest_dir.add_child(node)

The move operation shows why FileSystem exists as a coordinator: it removes from one directory and adds to another, a cross-cutting action that neither directory owns alone.

Design Decisions Worth Discussing

Several design choices in this model deserve explicit justification:

Parent references are bidirectional. Each node knows its parent, and each directory knows its children. This costs a bit of memory but makes get_path() possible without searching the entire tree. It also makes move straightforward: update the parent reference and the children dictionary in one step.

Names are unique within a directory. The dictionary enforces this implicitly, but add_child raises an explicit error. This matches real file systems: you cannot have two files with the same name in the same folder.

FileSystem owns path resolution. Rather than letting client code traverse the tree manually, all path-based operations go through FileSystem. This creates a single point for validation, logging, and future permission checks. If you later add caching of recently accessed paths, it happens in one place.

A file system without permissions is a shared notebook with no locks on any drawer. Anyone can read, modify, or delete anything. Real file systems enforce access control so that users can only perform operations they are authorized for. The design challenge is adding this enforcement cleanly, without scattering permission checks throughout every method.

DFS vs BFS file traversal

UNIX-Style Permission Model

The most widely used permission model assigns three categories of access to every node:

  • Owner, the user who created the file or directory
  • Group, a set of users who share access
  • Others, everyone else

Each category gets three permission bits: read, write, and execute. For files, read means viewing content, write means modifying it, and execute means running it as a program. For directories, read means listing contents, write means adding or removing children, and execute means traversing through the directory (entering it as part of a path).

python
1class Permission:
2    def __init__(self, owner: str, group: str):
3        self.owner = owner
4        self.group = group
5        self.owner_bits = 0b111  # rwx
6        self.group_bits = 0b101  # r-x
7        self.other_bits = 0b100  # r--
8
9    def can_read(self, user: str, user_groups: list[str]) -> bool:
10        if user == "root":
11            return True
12        if user == self.owner:
13            return bool(self.owner_bits & 0b100)
14        if self.group in user_groups:
15            return bool(self.group_bits & 0b100)
16        return bool(self.other_bits & 0b100)
17
18    def can_write(self, user: str, user_groups: list[str]) -> bool:
19        if user == "root":
20            return True
21        if user == self.owner:
22            return bool(self.owner_bits & 0b010)
23        if self.group in user_groups:
24            return bool(self.group_bits & 0b010)
25        return bool(self.other_bits & 0b010)

The root user bypasses all permission checks. This is a deliberate design decision: system administration requires unrestricted access, and encoding this at the permission level keeps the check logic centralized.

Cascade Checks Along the Path

Accessing /home/user/docs/report.txt requires more than just checking permissions on report.txt. You must verify that the current user has execute permission on every directory in the path: /, home, user, and docs. If any directory along the way denies execute access, the operation fails: even if the target file itself grants full access.

python
1def check_path_access(self, path: str, user: str,
2                      user_groups: list[str]) -> bool:
3    segments = path.strip("/").split("/")
4    current = self.root
5    for segment in segments[:-1]:
6        if not current.permission.can_execute(user, user_groups):
7            return False
8        current = current.get_child(segment)
9    # Check the final node for the specific operation
10    return True

This cascade check is why directories have an execute bit at all. It controls traversal, not execution. Without it, any user who knows a file's full path could access it regardless of the directory structure's intended protection.

Interview Tip

The Proxy pattern is a clean way to add permission checks. Wrap each FileSystemNode with a PermissionProxy that checks access before delegating to the real node. This keeps permission logic out of File and Directory entirely: they remain focused on their structural responsibilities.

Protection Proxy Pattern

Rather than adding permission checks inside File.get_size() or Directory.add_child(), you wrap nodes with a proxy that enforces access control:

python
1class PermissionProxy(FileSystemNode):
2    def __init__(self, node: FileSystemNode, user: str,
3                 user_groups: list[str]):
4        self._node = node
5        self._user = user
6        self._user_groups = user_groups
7
8    def get_size(self) -> int:
9        if not self._node.permission.can_read(
10            self._user, self._user_groups
11        ):
12            raise PermissionError("Read access denied")
13        return self._node.get_size()
14
15    def get_path(self) -> str:
16        return self._node.get_path()

The proxy implements the same interface as the real node. Client code cannot tell the difference: it calls get_size() and either gets the answer or gets a PermissionError. This is the essence of the Proxy pattern: same interface, added control.

The advantage is separation of concerns. File and Directory know nothing about permissions. Permission knows nothing about files. The proxy bridges them. If you later switch from UNIX permissions to access control lists (ACLs), only the Permission class and the proxy change.

Default Permissions and Inheritance

When a new file or directory is created, it needs initial permissions. Two common strategies exist:

Fixed defaults: every new node gets the same permission bits (e.g., rwxr-xr-x for directories, rw-r--r-- for files). This is simple but inflexible.

Inherited from parent: a new node copies its parent directory's permissions. This means files created inside a restricted directory are automatically restricted too. The parent acts as a template.

In practice, most systems use a combination: a system-wide default (called a umask in UNIX) that masks out certain bits, combined with the creating user's identity as the owner. For an OOD interview, mentioning that permissions should be initialized at creation time and that inheritance from the parent is a reasonable default demonstrates depth of thought without overcomplicating the model.

A file system is only useful if you can find things in it. Search and traversal are operations that walk the tree, but they differ in purpose: traversal visits every node in some order, while search walks the tree looking for nodes that match specific criteria. Both benefit from patterns that decouple the walking logic from the actions performed at each node.

Path resolution navigation

When you search a file system, you might look for files by name, by extension, by size threshold, or by modification date. Each criterion is a different operation, but all share the same traversal structure. The Visitor pattern separates the operation from the tree structure.

python
1from abc import ABC, abstractmethod
2
3class FileSystemVisitor(ABC):
4    @abstractmethod
5    def visit_file(self, file: 'File') -> None:
6        pass
7
8    @abstractmethod
9    def visit_directory(self, directory: 'Directory') -> None:
10        pass
11
12class SearchByExtensionVisitor(FileSystemVisitor):
13    def __init__(self, extension: str):
14        self.extension = extension
15        self.results: list[File] = []
16
17    def visit_file(self, file: File) -> None:
18        if file.name.endswith(self.extension):
19            self.results.append(file)
20
21    def visit_directory(self, directory: Directory) -> None:
22        pass  # Directories have no extension to match

Each node accepts the visitor:

python
1# Added to FileSystemNode
2def accept(self, visitor: FileSystemVisitor) -> None:
3    pass
4
5# In File
6def accept(self, visitor: FileSystemVisitor) -> None:
7    visitor.visit_file(self)
8
9# In Directory
10def accept(self, visitor: FileSystemVisitor) -> None:
11    visitor.visit_directory(self)
12    for child in self.children.values():
13        child.accept(visitor)

Adding a new search criterion means writing a new visitor class. The tree classes never change. This is the Open-Closed Principle in action: you extend behavior by adding new classes, not by modifying existing ones.

DFS vs BFS Traversal

Not all searches benefit from the same traversal order. Two fundamental strategies exist:

Depth-First Search (DFS) goes as deep as possible before backtracking. It follows one branch all the way to the leaves, then moves to the next branch. DFS uses less memory because it only needs to track the current path, not all nodes at the current level.

Breadth-First Search (BFS) visits all nodes at the current depth before going deeper. It explores level by level. BFS is useful when you expect the target to be near the root or when you want to find files at a specific depth.

python
1from collections import deque
2
3class DFSIterator:
4    def __init__(self, root: FileSystemNode):
5        self._stack = [root]
6
7    def __iter__(self):
8        return self
9
10    def __next__(self) -> FileSystemNode:
11        if not self._stack:
12            raise StopIteration
13        node = self._stack.pop()
14        if node.is_directory():
15            # Push children in reverse so leftmost is visited first
16            children = list(node.children.values())
17            self._stack.extend(reversed(children))
18        return node
19
20class BFSIterator:
21    def __init__(self, root: FileSystemNode):
22        self._queue = deque([root])
23
24    def __iter__(self):
25        return self
26
27    def __next__(self) -> FileSystemNode:
28        if not self._queue:
29            raise StopIteration
30        node = self._queue.popleft()
31        if node.is_directory():
32            self._queue.extend(node.children.values())
33        return node

The Strategy pattern lets you swap between these traversal algorithms at runtime. The search method accepts a traversal strategy, making it configurable without conditionals:

python
1def search(root: FileSystemNode, visitor: FileSystemVisitor,
2           traversal: str = "dfs") -> None:
3    if traversal == "dfs":
4        iterator = DFSIterator(root)
5    else:
6        iterator = BFSIterator(root)
7    for node in iterator:
8        node.accept(visitor)

Path Resolution with Dot Navigation

Real file systems support . (current directory) and .. (parent directory) in paths. An enhanced path resolver handles these:

python
1class PathResolver:
2    def resolve(self, path: str, root: Directory,
3                current: Directory) -> FileSystemNode:
4        if path.startswith("/"):
5            node = root
6            segments = path.strip("/").split("/")
7        else:
8            node = current
9            segments = path.split("/")
10
11        for segment in segments:
12            if segment == "" or segment == ".":
13                continue
14            elif segment == "..":
15                if node.parent is not None:
16                    node = node.parent
17            else:
18                if not node.is_directory():
19                    raise NotADirectoryError(
20                        f"'{node.name}' is not a directory"
21                    )
22                node = node.get_child(segment)
23        return node

This resolver supports both absolute paths (starting with /) and relative paths (starting from the current directory). The .. segment navigates up by following the parent reference: the same linked-list-style traversal used in get_path(), but in the opposite direction.

Combining Patterns

The real power of this design comes from combining patterns. A single search operation might use:

  • Visitor to define what you are looking for (search by extension, size threshold, etc.)
  • Iterator (DFS or BFS) to control the order you visit nodes
  • Strategy to select the traversal algorithm at runtime

For example, a "find all files larger than 1 MB" operation creates a SizeThresholdVisitor and pairs it with a DFSIterator. A "find the shallowest directory named 'config'" operation creates a NameMatchVisitor and pairs it with a BFSIterator because BFS finds the shallowest match first.

This composability means you do not need a dedicated method for every combination of criteria and traversal order. You build the operation from small, reusable pieces. Each piece is independently testable, and new combinations require no code changes to existing classes.

When to Use Which Traversal

Here is a quick decision guide:

ScenarioBest TraversalReason
Find a file you know is deeply nestedDFSReaches deep nodes quickly
Find the shallowest matchBFSVisits level by level
Calculate total tree sizeEitherBoth visit every node
List directory contents at depth NBFSNaturally tracks depth
Memory-constrained environmentDFSO(depth) vs O(width) space

Both DFS and BFS have the same time complexity: both visit every node exactly once, giving O(n). The difference is in space: DFS uses O(d) where d is tree depth, while BFS uses O(w) where w is the maximum width. In a file system with many files per directory but moderate depth, DFS is typically more memory-efficient.

Practice

Build a complete file system and a search engine that traverses it.

Loading problem...

Loading editor...

Loading problem...

Loading editor...