Add scheme to URL if needed
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Applications often receive hostnames or partial URLs from users, config files, or databases. If the value is missing a scheme such as https, you may need to add one before making a request or rendering a clickable link, but you need to do it carefully so you do not corrupt values that already have a valid scheme.
What Counts as a Missing Scheme
A URL scheme is the part before the first colon in values like https://example.com, ftp://files.example.com, or mailto:[email protected]. The tricky part is that not every string without :// is necessarily missing a scheme, and not every string with a colon is an HTTP-style URL.
For example:
- '
example.comis missing a scheme.' - '
https://example.comalready has one.' - '
mailto:[email protected]already has one even though it has no//.' - '
//cdn.example.com/app.jsis protocol-relative and should usually be handled separately.'
That means a simple if not url.startswith("http") check is often too weak.
A Safe Python Implementation
Python’s urllib.parse module is a good fit because it parses URLs without making assumptions about HTTP only.
This function does four useful things:
- trims whitespace
- rejects empty input
- preserves URLs that already have any scheme
- upgrades protocol-relative values into a full URL using a default scheme
That is a much safer baseline than checking for http:// and https:// only.
Why String Prefix Checks Fail
A naive version often looks like this:
This breaks on non-HTTP schemes such as ftp: and mailto:. It also ignores protocol-relative URLs and can produce invalid results if the input has leading spaces.
Prefix checks are acceptable in very constrained code where you control every input format, but for reusable utilities, parser-based checks are more reliable.
JavaScript Version
If you need similar behavior in browser or Node.js code, you can use the built-in URL constructor with a fallback strategy.
The regular expression checks whether the string begins with a valid URI scheme. That avoids hard-coding only HTTP-based schemes.
Decide on the Default Scheme Explicitly
Most modern applications should default to https rather than http. That is the right default for browser navigation, API endpoints, and most public web services.
Still, do not hide that choice. Make the default scheme a parameter or configuration value if the application may target internal services, local development servers, or non-HTTP protocols.
If the system only accepts web URLs, it can also be reasonable to validate the result after normalization and reject anything that does not use http or https.
Validation and Normalization Are Separate Concerns
Adding a missing scheme does not guarantee the final value is a valid or safe URL. https://not a host still is not a useful network target. In production code, normalize first, then validate.
For example, you might:
- add a default scheme
- parse the result
- verify that a hostname exists
- restrict allowed schemes to
httpandhttps - reject private or local addresses if security rules require it
Keeping normalization separate from validation makes the utility easier to test and reason about.
Common Pitfalls
The most common mistake is checking only for http:// and https://, which breaks existing schemes such as mailto: or ftp:.
Another issue is blindly prepending https:// to protocol-relative URLs like //example.com, producing malformed values.
Whitespace is also easy to overlook. A URL copied from user input may look valid but fail parsing until trimmed.
Finally, adding a scheme is not enough if the application also needs strict validation. Normalization and validation should be separate steps.
Summary
- Do not assume a missing
httpprefix is the only case that needs handling. - Use a URL parser or a scheme-aware check instead of a simple string prefix test.
- Default to
httpsunless your application has a clear reason not to. - Treat protocol-relative URLs such as
//example.comas a special case. - Normalize first, then validate based on your application’s security and correctness rules.

