Assumptions
Storage Required for Web Pages:
Let us assume each web page on average has 100KB of data, so we need
Storage = Number of Pages * Average Page size * 12 months
Storage = 1 billion * 100 KB * 12
Storage = 1,000,000,000 * 1,00,000 * 12 = 1.2 PB
So, we would need 1.2 PB storage for 1 year
Storage required for media files
Since each web page can have 10 media files and each media file is 1 MB.
Storage = Number of Pages * 1 media file size * 10 files * 12 months
Storage = 1 billion * 1 MB * 10 * 12
Storage = 1,000,000,000 * 1,000,000 * 10 * 12 = 120 PB
So, we would need 120 PB storage for 1 year
Below is the list of API's required for the system, although this might not be the exhaustive list, this provides a good starting point.
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Explain any trade offs you have made and why you made certain tech choices...
Try to discuss as many failure scenarios/bottlenecks as possible.
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?