Design Azure Blob Storage
by cascade_infinity429
20
282
Microsoft loves storage system design questions, especially for staff level.
I broke it into layers: API gateway, metadata service, storage engine, replication manager.
The metadata service maps blob names to physical locations. Used a partitioned B-tree index with the partition key derived from the storage account and container name. This allows prefix scans for listing blobs in a container.
For the storage engine, discussed append-only log-structured storage similar to how Azure actually does it. Small blobs (< 256KB) are packed into extent files, large blobs are split into 4MB chunks stored as separate extents.
Replication was the meat of the discussion. Talked about synchronous replication within a storage stamp (3 copies) and asynchronous geo-replication across regions. The interviewer asked about consistency guarantees: strong consistency for reads-after-writes within a region, eventual consistency for geo-replicated reads.
Garbage collection for deleted blobs was a good topic. Used a tombstone approach with a background compaction process.
The follow up about performance tiers (hot, cool, archive) was interesting. Archive tier moves data to cheaper storage with higher retrieval latency. Discussed the rehydration process and SLA implications.
The interview was 75 minutes and we covered everything from API design to disk layout. Very thorough process.