Java
CSV
API
programming
data-processing

CSV API for Java

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Java does not include a full-featured CSV API in the standard library, even though CSV is one of the most common data exchange formats. You can read lines with BufferedReader, but correct CSV handling requires more than splitting on commas.

Real CSV data includes quoted fields, embedded commas, escaped quotes, and sometimes different delimiters. In practice, the right answer is usually to use a dedicated library such as Apache Commons CSV or OpenCSV.

Why String.split(",") Is Not Enough

CSV looks simple until you hit a value such as "New York, NY" or a field with embedded quotes. A naive split breaks immediately because commas inside quoted text are part of the field, not separators.

For example, this row has three columns, not four:

text
1,"New York, NY","quoted ""value"""

A CSV library handles those cases correctly and keeps your parsing code focused on records rather than delimiter edge cases.

Apache Commons CSV as a Practical Default

Apache Commons CSV is a strong default choice because it is small, widely used, and supports different CSV dialects through CSVFormat.

Add the dependency with Maven:

xml
1<dependency>
2  <groupId>org.apache.commons</groupId>
3  <artifactId>commons-csv</artifactId>
4  <version>1.14.1</version>
5</dependency>

Then parse a file with headers like this:

java
1import java.io.Reader;
2import java.nio.file.Files;
3import java.nio.file.Path;
4import org.apache.commons.csv.CSVFormat;
5import org.apache.commons.csv.CSVParser;
6import org.apache.commons.csv.CSVRecord;
7
8public class ReadCsvExample {
9    public static void main(String[] args) throws Exception {
10        CSVFormat format = CSVFormat.DEFAULT.builder()
11            .setHeader()
12            .setIgnoreHeaderCase(true)
13            .setTrim(true)
14            .get();
15
16        try (Reader reader = Files.newBufferedReader(Path.of("users.csv"));
17             CSVParser parser = format.parse(reader)) {
18
19            for (CSVRecord record : parser) {
20                String id = record.get("id");
21                String name = record.get("name");
22                String email = record.get("email");
23                System.out.printf("%s %s %s%n", id, name, email);
24            }
25        }
26    }
27}

Calling setHeader() with no arguments tells the parser to read the first record as the header row. That makes later column access much safer than relying on numeric indexes.

Writing CSV Safely

A good CSV API should also write data correctly, including quoting fields when necessary. Apache Commons CSV provides CSVPrinter for that purpose.

java
1import java.io.Writer;
2import java.nio.file.Files;
3import java.nio.file.Path;
4import org.apache.commons.csv.CSVFormat;
5import org.apache.commons.csv.CSVPrinter;
6
7public class WriteCsvExample {
8    public static void main(String[] args) throws Exception {
9        CSVFormat format = CSVFormat.DEFAULT.builder()
10            .setHeader("id", "name", "city")
11            .get();
12
13        try (Writer writer = Files.newBufferedWriter(Path.of("output.csv"));
14             CSVPrinter printer = new CSVPrinter(writer, format)) {
15
16            printer.printRecord("1", "Ada", "Toronto");
17            printer.printRecord("2", "Grace", "New York, NY");
18        }
19    }
20}

The printer handles quoting rules for you, which is exactly the kind of detail you do not want to reinvent.

Choosing a Library

Apache Commons CSV is not the only option. OpenCSV is also popular and includes bean mapping features that some projects prefer. The best choice depends on your needs:

  • Commons CSV for straightforward parsing and writing
  • OpenCSV if you want annotation-driven bean binding
  • Jackson CSV if you are already using Jackson heavily

The important point is that you should use a real CSV parser instead of ad hoc string handling.

Charset and Format Decisions

CSV itself is not one single format. Files differ in delimiter, quote character, line ending, header handling, and character encoding. A robust API should let you control those choices explicitly.

For example, some files are semicolon-delimited rather than comma-delimited. Others come from Excel exports with platform-specific line endings. Libraries such as Commons CSV let you start from predefined formats or build a custom one when needed.

Always be explicit about charset as well. UTF-8 is common, but enterprise exports still arrive in other encodings often enough to matter.

Common Pitfalls

The most common mistake is using String.split(",") and assuming the problem is solved. That approach fails as soon as quoted fields or embedded commas appear.

Another issue is loading large files entirely into memory when streaming record by record would be enough. CSV libraries usually let you iterate through records efficiently.

Header handling also causes bugs. If the file includes a header row, configure the parser accordingly instead of treating the header as data or hard-coding column indexes.

Finally, do not ignore charset. A parser may be perfectly correct and still produce broken text if the input encoding is wrong.

Summary

  • Java’s standard library does not provide a complete CSV API for real-world CSV handling.
  • A dedicated library is the correct solution because CSV has quoting and escaping rules.
  • Apache Commons CSV is a practical default for reading and writing CSV safely.
  • Header-aware parsing is safer than relying on column indexes alone.
  • Correct CSV handling requires attention to delimiters, quoting rules, streaming, and charset.

Course illustration
Course illustration

All Rights Reserved.