Java
UTF-8
Web Development
Coding
Web Applications

How to get UTF-8 working in Java webapps?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Getting UTF-8 working in a Java web application is not one setting. It is a chain of settings across request decoding, response headers, views, database connections, and sometimes even server connector configuration. UTF-8 problems usually happen because one link in that chain still uses a legacy default.

Start with Request and Response Encoding

In a servlet-based app, request decoding and response encoding must both be explicit.

java
1import jakarta.servlet.ServletException;
2import jakarta.servlet.annotation.WebServlet;
3import jakarta.servlet.http.HttpServlet;
4import jakarta.servlet.http.HttpServletRequest;
5import jakarta.servlet.http.HttpServletResponse;
6import java.io.IOException;
7
8@WebServlet("/hello")
9public class HelloServlet extends HttpServlet {
10    @Override
11    protected void doPost(HttpServletRequest request, HttpServletResponse response)
12            throws ServletException, IOException {
13        request.setCharacterEncoding("UTF-8");
14        response.setContentType("text/html; charset=UTF-8");
15        response.setCharacterEncoding("UTF-8");
16
17        String name = request.getParameter("name");
18        response.getWriter().write("Hello, " + name);
19    }
20}

The request encoding must be set before reading parameters. If you call getParameter first, the wrong decoding may already have happened.

Use a Filter So You Do Not Repeat Yourself

In real applications, you usually want one encoding filter rather than per-servlet boilerplate.

java
1import jakarta.servlet.Filter;
2import jakarta.servlet.FilterChain;
3import jakarta.servlet.ServletRequest;
4import jakarta.servlet.ServletResponse;
5import jakarta.servlet.annotation.WebFilter;
6
7@WebFilter("/*")
8public class Utf8Filter implements Filter {
9    @Override
10    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
11            throws java.io.IOException, jakarta.servlet.ServletException {
12        request.setCharacterEncoding("UTF-8");
13        response.setCharacterEncoding("UTF-8");
14        chain.doFilter(request, response);
15    }
16}

In Spring applications, the equivalent is often CharacterEncodingFilter, which centralizes the same idea.

Configure JSPs and Templates Too

If your views are JSP-based, the page itself must declare UTF-8.

jsp
<%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>

Without that, the servlet layer may be correct while the rendered page still advertises or uses the wrong encoding.

The same principle applies to other template engines: the view layer must emit UTF-8 content, not merely receive UTF-8 input.

Ensure the Database Side Uses Unicode

UTF-8 does not stop at the web layer. If the database or JDBC connection is misconfigured, the app may accept UTF-8 correctly and still store or retrieve corrupted text.

For MySQL-style databases, the schema and connection both matter.

sql
CREATE DATABASE appdb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

And the JDBC URL often needs explicit Unicode parameters depending on the driver and database.

java
String url = "jdbc:mysql://localhost:3306/appdb?useUnicode=true&characterEncoding=UTF-8";

For other databases, the exact parameters differ, but the principle is the same: the database layer must also be Unicode-safe.

Server Connector Settings Can Matter

For GET query strings and path parameters, the application server or servlet container can also influence decoding. If the container decodes the URL with the wrong default before your app sees it, the problem is already upstream.

That is why UTF-8 issues sometimes appear only on GET requests but not POST bodies. POST form bodies are often fixed by request encoding, while URL decoding may depend more on the server connector configuration.

Test with Real Non-ASCII Data

Do not verify UTF-8 using only ASCII text. ASCII works under many encodings and can hide misconfiguration.

Use samples that include characters such as:

  • accented Latin text
  • Cyrillic
  • Japanese or Chinese characters
  • emoji if your full stack is supposed to support them

If those round-trip correctly from browser to app to database and back, your configuration is probably coherent.

Common Pitfalls

A common mistake is setting response UTF-8 and forgetting request UTF-8, or the other way around.

Another mistake is calling request.getParameter before setCharacterEncoding, which is too late.

Developers also often fix the servlet layer but leave JSP, template, database, or connector settings unchanged.

Finally, testing only with ASCII can make a broken UTF-8 setup look correct when it is not.

Summary

  • UTF-8 in Java web apps is an end-to-end configuration, not a single switch.
  • Set request encoding before reading request parameters.
  • Set response content type and charset explicitly.
  • Use a filter to apply UTF-8 consistently across the application.
  • Verify the view layer, server connector, and database are all Unicode-safe too.

Course illustration
Course illustration

All Rights Reserved.