User Request-> decide what are searching for, what we are download heuristics, and definition of done
User Response -> What do we return to the user, is this a PDF, JSON, etc
Bot:
Needs to be able to able to search the web
grab text/image/video/audio, etc
Download and store all of this information
Return in single zip file
Performance: don't necessarily need your information immediately
Availability: Only certain of bots, queue for bot usage
Security: Authorization for use
Data Types: everything, mutable
Scale: each bot can scale horizontally
Data Types: blob storage for storing the data
Database: NoSQL for bot crawled information
SQL: for users, authentication, etc
Service: the bot!
Queueing for bot usage
Load balancing: for multiple bots
API Gateway: for user authorization
Caching: probably not needed at scale, especially for requests, may be used for specific bots
Web Crawler Itself:
Given some user input: "give me a cheaper option for X"
Predefined list of URLs, websites, and search terms to check
For each individual website, look for specific search terms (semantic search)
Rate what is found on a specific website...
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Explain any trade offs you have made and why you made certain tech choices...
Try to discuss as many failure scenarios/bottlenecks as possible.
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?