/shorten will be called with POST and the long url send on body in order to compute the shorten version of a long url, first it will check in the cache if the computation already exists, if so it fetches it, if not, it will be saved in the db as: Id, long url, short url and then cached. base62 encoding or a hash function could be used to achieve this.
/redirect/{value} will be called to fetch the long url from the read only db, which is a copy of the actual database done through master-slave concept, and which redirects the user to the retrieved long url
the client will send the requests to a load balancer, this one is needed to handle multiple requests as you mentioned there will be millions a day.
after this, the load balancer will forward the request to one of the servers, multiple servers are needed to handle a large amount of requests and even geo based in order to decrease the latency as much as possible.
the request reaches API Gateway, which either declines the call due to exceeding rate limit, or call the write service, in this case called shortener service, or the lookup one which is readonly. we separated these 2 servers to decouple write and reads as much as possible. shortener service then looks up in cache, for example Redis which will invalidate the cache every hour, if the request wasnt already computed, if so it returns it, if not it saves it in db and then caches it. in order to avoid collisions, a hash function can be used, if a collision does appear than we could add an extra predefined string to the original url and reapply the hash function.
the lookup service will use a copy of the master db, or even more copies to reduce latency, and it will fetch the long url from there and then redirect the user to it
for fetching the long urls based on the short ones, we could use a master-slave relation for the db, we will increase the read speed and if needed we can always scale this up. as a trade off, the write operations will be slower and all the slaves will need to be kept in sync with the master db
for the short url generation we could use a hash function, and in case of a collision we could add a predefined string to the original long url and then hash again. another option here would be base 62 encoding. also we could decouple this by using a message queue and publishing an event there, which will later on be consumed and respond to