How to validate a url in Python? Malformed or not
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Python provides multiple ways to validate URLs, from basic syntax checks to full schema validation. The built-in urllib.parse.urlparse() splits a URL into components, and you can check if required parts (scheme and netloc) are present. For stricter validation, the validators library provides a one-line check. For production applications, combine syntax validation with re patterns or the validators package, and optionally check reachability with requests or urllib.
Basic Validation with urllib.parse
urlparse() splits the URL into scheme, netloc, path, params, query, and fragment. A valid URL needs at minimum a scheme (http, https, etc.) and a network location (netloc).
Stricter Validation with Scheme Whitelist
Always restrict the scheme to http and https when validating user-supplied URLs to prevent scheme-based attacks.
Using the validators Library
validators.url() performs more thorough checks than urlparse, including verifying the domain format and TLD presence.
Regex-Based Validation
Regex gives full control over what you accept but is harder to maintain and easy to get wrong. Prefer validators or urlparse for most use cases.
Checking URL Reachability
Only check reachability when necessary — it adds latency and makes network requests. Syntax validation is sufficient for most form inputs.
Validating URL Components
Django and Pydantic Validators
Common Pitfalls
- Accepting
urlparseresults without checking scheme and netloc:urlparse("not-a-url")does not raise an error — it returns a result with an empty scheme and the input as the path. Always check that bothschemeandnetlocare non-empty. - Not restricting the URL scheme:
urlparse("javascript:alert(1)")parses as a valid URL with schemejavascript. If you accept any scheme, you may allow XSS or file access attacks. Whitelisthttpandhttpsfor web URLs. - Using regex that is too permissive or too strict: URL validation regex is notoriously hard to get right. It either rejects valid URLs (internationalized domains, unusual ports) or accepts invalid ones. Use a tested library instead of writing your own regex.
- Checking reachability for every URL: Making an HTTP request for each URL validation adds latency and can be exploited for SSRF (Server-Side Request Forgery). Only check reachability when explicitly needed, and never for user-supplied URLs in server-side code without SSRF protections.
- Not handling internationalized domain names (IDN): URLs like
https://münchen.deare valid but use non-ASCII characters.urlparsehandles them, but custom regex patterns may reject them. Useidnaencoding or thevalidatorslibrary which supports IDN.
Summary
- Use
urlparse()with scheme and netloc checks for basic validation - Restrict schemes to
httpandhttpsfor user-supplied URLs - The
validatorslibrary provides comprehensive one-line URL validation - Django's
URLValidatorand Pydantic'sHttpUrlintegrate with their respective frameworks - Only check URL reachability when explicitly needed — syntax validation is usually sufficient
- Avoid custom regex for URL validation — use tested libraries that handle edge cases

