Interview Question Data structure for a large social network

Social Networks

Data Structures

Interview Questions

Algorithm Design

Technical Interviews

Interview Question Data structure for a large social network

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

For an interview question about a large social network, the core idea is usually to model users and relationships as a graph. The right answer is not just "use a graph" though; you also need to explain which graph representation fits the expected queries, how it scales, and what tradeoffs appear when the network becomes too large for one machine.

Start with the Graph Model

A social network is naturally a graph:

each user is a node
each friendship or follow relationship is an edge

If the network represents mutual friendship, the graph is usually undirected. If it represents followers, the graph is directed.

For most interview settings, the best in-memory representation is an adjacency list backed by hash-based sets. That gives efficient traversal of a user's direct neighbors without wasting memory on a massive mostly-empty matrix.

An adjacency matrix is almost never the right answer for a truly large network because the memory cost grows with n squared, not with the actual number of relationships.

A Simple Adjacency-List Example

This Python example models an undirected friendship graph.

python

1from collections import defaultdict
2
3class SocialGraph:
4    def __init__(self):
5        self.friends = defaultdict(set)
6
7    def add_user(self, user_id: int) -> None:
8        self.friends[user_id]
9
10    def add_friendship(self, a: int, b: int) -> None:
11        self.friends[a].add(b)
12        self.friends[b].add(a)
13
14    def get_friends(self, user_id: int):
15        return self.friends[user_id]
16
17    def mutual_friends(self, a: int, b: int):
18        return self.friends[a] & self.friends[b]
19
20
21graph = SocialGraph()
22graph.add_friendship(1, 2)
23graph.add_friendship(1, 3)
24graph.add_friendship(2, 3)
25graph.add_friendship(2, 4)
26
27print(graph.get_friends(2))
28print(graph.mutual_friends(1, 2))

This structure is a good interview answer because it matches common operations:

list a user's friends
check whether two users are connected directly
find mutual friends for recommendations

Talk About Query Patterns, Not Only Storage

A strong answer explains why the structure supports the expected workload.

For example:

friend lookup is efficient because you can access one user's adjacency set directly
mutual-friend queries can use set intersection
breadth-first search can answer shortest-connection questions such as "degrees of separation"

That matters more than repeating generic graph vocabulary. Interviewers are usually testing whether you can connect the data structure to realistic product features.

What Changes at Real Scale

Once the graph is extremely large, the problem stops being only about the local data structure. Distribution becomes the real challenge.

At that point you may need:

sharding users across machines
replicating hot accounts or hot edges
caching popular adjacency lists
asynchronous pipelines for recommendation precomputation

A good interview answer often says something like: "I would still model the data as an adjacency list, but I would partition the graph by user ID or another strategy and accept that cross-shard graph traversal is more expensive."

That shows you understand the difference between algorithm design and systems design.

Directed Versus Undirected Networks

Do not assume every social graph is symmetric.

If the question sounds like Facebook-style friendship, undirected edges make sense.

If it sounds like Twitter-style following, the graph is directed and it can be helpful to keep both outgoing and incoming adjacency lists, depending on the queries you need to answer.

For example, "who do I follow" and "who follows me" are different lookups in a directed graph.

What a Good Interview Answer Sounds Like

A concise but strong answer is usually:

model the network as a graph
store it as an adjacency list for memory efficiency
use hash sets for fast edge checks and mutual-friend queries
use BFS for shortest path or separation queries
discuss sharding and caching once scale exceeds one machine

That is better than jumping immediately to a specific database product name. Database choice matters, but the interviewer usually wants the underlying model first.

Common Pitfalls

Saying "use an adjacency matrix" for a billion-user network is the classic bad answer because the memory cost is unrealistic.

Ignoring whether edges are directed or undirected also weakens the design.

Another mistake is focusing only on storage and forgetting the target queries. A data structure answer is only convincing when it matches actual operations.

Finally, if the question mentions large scale, do not stop at the single-machine representation. Show that you know distribution and caching become necessary.

Summary

a large social network is best modeled as a graph
an adjacency list is usually the right core representation for sparse real-world networks
hash-based neighbor sets support fast friend lookup and mutual-friend queries
BFS handles path and separation problems well on top of that graph
at very large scale, partitioning, caching, and replication matter as much as the local data structure itself