Where to get UTF-8 string literal in Java?

Java

UTF-8

String Literal

Programming

Java String Methods

Where to get UTF-8 string literal in Java?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Java does not have a special "UTF-8 string literal" syntax. A Java string literal is a Unicode string in source code, while UTF-8 is a byte encoding used when text is stored or transmitted outside the String object. Once you separate the ideas of characters and bytes, this topic becomes much clearer.

A Java `String` Is Not "Stored as UTF-8"

When you write a string literal in Java source, you are writing characters.

java

1public class Utf8LiteralExample {
2    public static void main(String[] args) {
3        String text = "café";
4        System.out.println(text);
5    }
6}

The variable text is a Java String. It is not something you "get as UTF-8" inside the language syntax itself. UTF-8 only matters when you convert that string to bytes or read bytes into a string.

That is why questions such as "where do I get a UTF-8 string literal" usually point to a different underlying problem:

the source file encoding is wrong
bytes are being decoded with the wrong charset
bytes are being encoded without specifying UTF-8 explicitly

Use UTF-8 When Converting Between `String` and Bytes

When text crosses an I/O boundary, specify the charset explicitly. Use StandardCharsets.UTF_8 instead of string names such as "UTF-8".

java

1import java.nio.charset.StandardCharsets;
2import java.util.Arrays;
3
4public class Utf8BytesExample {
5    public static void main(String[] args) {
6        String text = "café";
7        byte[] utf8 = text.getBytes(StandardCharsets.UTF_8);
8
9        System.out.println(Arrays.toString(utf8));
10
11        String decoded = new String(utf8, StandardCharsets.UTF_8);
12        System.out.println(decoded);
13    }
14}

This is the correct place where UTF-8 matters: encoding and decoding bytes.

Make Sure the Source File Encoding Is Correct

If non-ASCII characters appear directly in the source file, the source file itself must be saved in an encoding the compiler reads correctly. In modern Java projects that usually means UTF-8 source files.

If the source file encoding is wrong, a literal such as "café" may already be corrupted before the program even runs.

Most modern build tools and IDEs default to UTF-8, but the rule is still important: if you type Unicode characters directly into source, save the source file as UTF-8 and make sure the compiler reads it as UTF-8.

Use Unicode Escapes When Necessary

If you cannot rely on the source file encoding, Java also supports Unicode escape sequences in source code.

java

1public class UnicodeEscapeExample {
2    public static void main(String[] args) {
3        String text = "caf\u00E9";
4        System.out.println(text);
5    }
6}

This still creates the same Java string value. It is not more "UTF-8" than the direct literal. It is simply another way to express the characters in source code.

Unicode escapes are useful in edge cases, but they are harder to read than normal source text. Prefer readable UTF-8 source files unless you have a specific reason not to.

File and Network I/O Are Where UTF-8 Matters Most

Here is a practical example using UTF-8 with file output and input.

java

1import java.io.IOException;
2import java.nio.charset.StandardCharsets;
3import java.nio.file.Files;
4import java.nio.file.Path;
5
6public class Utf8FileExample {
7    public static void main(String[] args) throws IOException {
8        Path path = Path.of("message.txt");
9        String message = "Hello, café";
10
11        Files.writeString(path, message, StandardCharsets.UTF_8);
12        String loaded = Files.readString(path, StandardCharsets.UTF_8);
13
14        System.out.println(loaded);
15    }
16}

The important part is that both writing and reading specify UTF-8 explicitly.

Common Pitfalls

The most common mistake is thinking UTF-8 describes the in-memory Java String itself. UTF-8 is an external encoding for bytes, not a special kind of Java literal.

Another issue is relying on platform-default encodings. If you omit the charset during I/O, the program may behave differently on different machines.

Developers also sometimes type non-ASCII characters into a source file that is not actually saved as UTF-8. The resulting corruption looks like a runtime problem, but the real issue happened at compile time or even earlier in the editor.

Finally, do not convert text to bytes and back unnecessarily. If the data is already a String, keep it as characters until you reach an actual I/O boundary.

Summary

Java has no special UTF-8 string literal syntax.
A Java String represents characters; UTF-8 is a byte encoding.
Use StandardCharsets.UTF_8 when converting between strings and bytes.
Save source files as UTF-8 if you include non-ASCII characters directly.
Use Unicode escapes only when direct source text is not practical.

Where to get UTF-8 string literal in Java?

Master System Design with Codemia

Introduction

A Java String Is Not "Stored as UTF-8"

Use UTF-8 When Converting Between String and Bytes

Make Sure the Source File Encoding Is Correct

Use Unicode Escapes When Necessary

File and Network I/O Are Where UTF-8 Matters Most

Common Pitfalls

Summary

A Java `String` Is Not "Stored as UTF-8"

Use UTF-8 When Converting Between `String` and Bytes