How to get UTF-8 working in Java webapps?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Getting UTF-8 working in a Java web application is not one setting. It is a chain of settings across request decoding, response headers, views, database connections, and sometimes even server connector configuration. UTF-8 problems usually happen because one link in that chain still uses a legacy default.
Start with Request and Response Encoding
In a servlet-based app, request decoding and response encoding must both be explicit.
The request encoding must be set before reading parameters. If you call getParameter first, the wrong decoding may already have happened.
Use a Filter So You Do Not Repeat Yourself
In real applications, you usually want one encoding filter rather than per-servlet boilerplate.
In Spring applications, the equivalent is often CharacterEncodingFilter, which centralizes the same idea.
Configure JSPs and Templates Too
If your views are JSP-based, the page itself must declare UTF-8.
Without that, the servlet layer may be correct while the rendered page still advertises or uses the wrong encoding.
The same principle applies to other template engines: the view layer must emit UTF-8 content, not merely receive UTF-8 input.
Ensure the Database Side Uses Unicode
UTF-8 does not stop at the web layer. If the database or JDBC connection is misconfigured, the app may accept UTF-8 correctly and still store or retrieve corrupted text.
For MySQL-style databases, the schema and connection both matter.
And the JDBC URL often needs explicit Unicode parameters depending on the driver and database.
For other databases, the exact parameters differ, but the principle is the same: the database layer must also be Unicode-safe.
Server Connector Settings Can Matter
For GET query strings and path parameters, the application server or servlet container can also influence decoding. If the container decodes the URL with the wrong default before your app sees it, the problem is already upstream.
That is why UTF-8 issues sometimes appear only on GET requests but not POST bodies. POST form bodies are often fixed by request encoding, while URL decoding may depend more on the server connector configuration.
Test with Real Non-ASCII Data
Do not verify UTF-8 using only ASCII text. ASCII works under many encodings and can hide misconfiguration.
Use samples that include characters such as:
- accented Latin text
- Cyrillic
- Japanese or Chinese characters
- emoji if your full stack is supposed to support them
If those round-trip correctly from browser to app to database and back, your configuration is probably coherent.
Common Pitfalls
A common mistake is setting response UTF-8 and forgetting request UTF-8, or the other way around.
Another mistake is calling request.getParameter before setCharacterEncoding, which is too late.
Developers also often fix the servlet layer but leave JSP, template, database, or connector settings unchanged.
Finally, testing only with ASCII can make a broken UTF-8 setup look correct when it is not.
Summary
- UTF-8 in Java web apps is an end-to-end configuration, not a single switch.
- Set request encoding before reading request parameters.
- Set response content type and charset explicitly.
- Use a filter to apply UTF-8 consistently across the application.
- Verify the view layer, server connector, and database are all Unicode-safe too.

