URL
web development
URL scheme
programming
HTTP

Add scheme to URL if needed

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Applications often receive hostnames or partial URLs from users, config files, or databases. If the value is missing a scheme such as https, you may need to add one before making a request or rendering a clickable link, but you need to do it carefully so you do not corrupt values that already have a valid scheme.

What Counts as a Missing Scheme

A URL scheme is the part before the first colon in values like https://example.com, ftp://files.example.com, or mailto:[email protected]. The tricky part is that not every string without :// is necessarily missing a scheme, and not every string with a colon is an HTTP-style URL.

For example:

  • 'example.com is missing a scheme.'
  • 'https://example.com already has one.'
  • 'mailto:[email protected] already has one even though it has no //.'
  • '//cdn.example.com/app.js is protocol-relative and should usually be handled separately.'

That means a simple if not url.startswith("http") check is often too weak.

A Safe Python Implementation

Python’s urllib.parse module is a good fit because it parses URLs without making assumptions about HTTP only.

python
1from urllib.parse import urlsplit
2
3
4def add_scheme_if_needed(value: str, default_scheme: str = "https") -> str:
5    value = value.strip()
6    if not value:
7        raise ValueError("URL cannot be empty")
8
9    if value.startswith("//"):
10        return f"{default_scheme}:{value}"
11
12    parts = urlsplit(value)
13    if parts.scheme:
14        return value
15
16    return f"{default_scheme}://{value}"
17
18
19samples = [
20    "example.com",
21    "https://example.com",
22    "mailto:[email protected]",
23    "//cdn.example.com/app.js",
24]
25
26for sample in samples:
27    print(add_scheme_if_needed(sample))

This function does four useful things:

  • trims whitespace
  • rejects empty input
  • preserves URLs that already have any scheme
  • upgrades protocol-relative values into a full URL using a default scheme

That is a much safer baseline than checking for http:// and https:// only.

Why String Prefix Checks Fail

A naive version often looks like this:

python
1def add_https(value: str) -> str:
2    if value.startswith("http://") or value.startswith("https://"):
3        return value
4    return "https://" + value

This breaks on non-HTTP schemes such as ftp: and mailto:. It also ignores protocol-relative URLs and can produce invalid results if the input has leading spaces.

Prefix checks are acceptable in very constrained code where you control every input format, but for reusable utilities, parser-based checks are more reliable.

JavaScript Version

If you need similar behavior in browser or Node.js code, you can use the built-in URL constructor with a fallback strategy.

javascript
1function addSchemeIfNeeded(value, defaultScheme = "https") {
2  const trimmed = value.trim();
3  if (!trimmed) {
4    throw new Error("URL cannot be empty");
5  }
6
7  if (trimmed.startsWith("//")) {
8    return `${defaultScheme}:${trimmed}`;
9  }
10
11  if (/^[a-zA-Z][a-zA-Z\d+.-]*:/.test(trimmed)) {
12    return trimmed;
13  }
14
15  return `${defaultScheme}://${trimmed}`;
16}
17
18console.log(addSchemeIfNeeded("example.com"));
19console.log(addSchemeIfNeeded("mailto:[email protected]"));

The regular expression checks whether the string begins with a valid URI scheme. That avoids hard-coding only HTTP-based schemes.

Decide on the Default Scheme Explicitly

Most modern applications should default to https rather than http. That is the right default for browser navigation, API endpoints, and most public web services.

Still, do not hide that choice. Make the default scheme a parameter or configuration value if the application may target internal services, local development servers, or non-HTTP protocols.

If the system only accepts web URLs, it can also be reasonable to validate the result after normalization and reject anything that does not use http or https.

Validation and Normalization Are Separate Concerns

Adding a missing scheme does not guarantee the final value is a valid or safe URL. https://not a host still is not a useful network target. In production code, normalize first, then validate.

For example, you might:

  • add a default scheme
  • parse the result
  • verify that a hostname exists
  • restrict allowed schemes to http and https
  • reject private or local addresses if security rules require it

Keeping normalization separate from validation makes the utility easier to test and reason about.

Common Pitfalls

The most common mistake is checking only for http:// and https://, which breaks existing schemes such as mailto: or ftp:.

Another issue is blindly prepending https:// to protocol-relative URLs like //example.com, producing malformed values.

Whitespace is also easy to overlook. A URL copied from user input may look valid but fail parsing until trimmed.

Finally, adding a scheme is not enough if the application also needs strict validation. Normalization and validation should be separate steps.

Summary

  • Do not assume a missing http prefix is the only case that needs handling.
  • Use a URL parser or a scheme-aware check instead of a simple string prefix test.
  • Default to https unless your application has a clear reason not to.
  • Treat protocol-relative URLs such as //example.com as a special case.
  • Normalize first, then validate based on your application’s security and correctness rules.

Course illustration
Course illustration

All Rights Reserved.