CSV API for Java
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Java does not include a full-featured CSV API in the standard library, even though CSV is one of the most common data exchange formats. You can read lines with BufferedReader, but correct CSV handling requires more than splitting on commas.
Real CSV data includes quoted fields, embedded commas, escaped quotes, and sometimes different delimiters. In practice, the right answer is usually to use a dedicated library such as Apache Commons CSV or OpenCSV.
Why String.split(",") Is Not Enough
CSV looks simple until you hit a value such as "New York, NY" or a field with embedded quotes. A naive split breaks immediately because commas inside quoted text are part of the field, not separators.
For example, this row has three columns, not four:
A CSV library handles those cases correctly and keeps your parsing code focused on records rather than delimiter edge cases.
Apache Commons CSV as a Practical Default
Apache Commons CSV is a strong default choice because it is small, widely used, and supports different CSV dialects through CSVFormat.
Add the dependency with Maven:
Then parse a file with headers like this:
Calling setHeader() with no arguments tells the parser to read the first record as the header row. That makes later column access much safer than relying on numeric indexes.
Writing CSV Safely
A good CSV API should also write data correctly, including quoting fields when necessary. Apache Commons CSV provides CSVPrinter for that purpose.
The printer handles quoting rules for you, which is exactly the kind of detail you do not want to reinvent.
Choosing a Library
Apache Commons CSV is not the only option. OpenCSV is also popular and includes bean mapping features that some projects prefer. The best choice depends on your needs:
- Commons CSV for straightforward parsing and writing
- OpenCSV if you want annotation-driven bean binding
- Jackson CSV if you are already using Jackson heavily
The important point is that you should use a real CSV parser instead of ad hoc string handling.
Charset and Format Decisions
CSV itself is not one single format. Files differ in delimiter, quote character, line ending, header handling, and character encoding. A robust API should let you control those choices explicitly.
For example, some files are semicolon-delimited rather than comma-delimited. Others come from Excel exports with platform-specific line endings. Libraries such as Commons CSV let you start from predefined formats or build a custom one when needed.
Always be explicit about charset as well. UTF-8 is common, but enterprise exports still arrive in other encodings often enough to matter.
Common Pitfalls
The most common mistake is using String.split(",") and assuming the problem is solved. That approach fails as soon as quoted fields or embedded commas appear.
Another issue is loading large files entirely into memory when streaming record by record would be enough. CSV libraries usually let you iterate through records efficiently.
Header handling also causes bugs. If the file includes a header row, configure the parser accordingly instead of treating the header as data or hard-coding column indexes.
Finally, do not ignore charset. A parser may be perfectly correct and still produce broken text if the input encoding is wrong.
Summary
- Java’s standard library does not provide a complete CSV API for real-world CSV handling.
- A dedicated library is the correct solution because CSV has quoting and escaping rules.
- Apache Commons CSV is a practical default for reading and writing CSV safely.
- Header-aware parsing is safer than relying on column indexes alone.
- Correct CSV handling requires attention to delimiters, quoting rules, streaming, and charset.

