How to do a batch insert in MySQL
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Inserting rows one at a time into MySQL is straightforward but becomes a performance bottleneck when dealing with thousands or millions of records. Batch inserts combine multiple rows into a single statement or transaction, drastically reducing network round trips and query parsing overhead. This article covers multi-row INSERT syntax, LOAD DATA INFILE for bulk loading, prepared statements for application-level batching, and the configuration settings that affect batch insert performance.
Multi-Row INSERT Syntax
The most common batch insert technique in MySQL is the multi-row INSERT statement, which specifies multiple value tuples in a single query.
A single multi-row INSERT can be 10 to 20 times faster than equivalent individual INSERT statements because the server parses and optimizes the query once and writes all rows in a single operation.
Using LOAD DATA INFILE
For the fastest possible bulk loading, MySQL provides LOAD DATA INFILE, which reads data directly from a file on disk. This method bypasses the SQL parser entirely and is typically 20 times faster than INSERT statements.
The LOCAL keyword instructs MySQL to read the file from the client machine rather than the server filesystem. Note that local_infile must be enabled on both the server and client for this to work.
Transaction Wrapping for Performance
Wrapping batch inserts in an explicit transaction avoids the overhead of autocommit, where MySQL commits after every individual statement. This is especially beneficial when executing multiple INSERT statements in a loop.
For very large batches, commit every 5,000 to 10,000 rows to balance performance with memory usage and lock duration.
Prepared Statements with Python
Application-level batch inserts use prepared statements with parameterized queries. The executemany() method in Python's mysql-connector-python library handles batching automatically.
For larger datasets, process records in chunks to avoid building an excessively large query string.
Prepared Statements with Java
In Java, the JDBC addBatch() and executeBatch() methods provide batch insert capability. Adding rewriteBatchedStatements=true to the connection URL tells the MySQL Connector/J driver to combine individual inserts into multi-row statements.
Without rewriteBatchedStatements=true, the JDBC driver sends each statement individually even when using addBatch(), negating the performance benefit.
Performance Tuning Configuration
Several MySQL server variables affect batch insert performance. Adjusting them before a large import can make a significant difference.
The max_allowed_packet setting is particularly important because it limits the maximum size of a single SQL statement. If your multi-row INSERT exceeds this limit, the server rejects the query with an error.
Common Pitfalls
- Exceeding
max_allowed_packet: Very large multi-row INSERT statements fail if they exceed the server'smax_allowed_packetsetting. Split large batches into chunks of 1,000 to 5,000 rows. - Forgetting to commit: When autocommit is disabled, failing to call
COMMITmeans no data is actually written. Always commit after batch operations, and use try-catch blocks to roll back on errors. - Leaving
foreign_key_checksdisabled: Disabling foreign key checks for performance is fine during bulk loading, but forgetting to re-enable them afterward allows invalid data to enter the database. - Not using
rewriteBatchedStatementsin Java: Without this JDBC connection parameter,addBatch()andexecuteBatch()do not actually combine statements, resulting in no performance improvement over individual inserts. - Ignoring index overhead: Tables with many indexes slow down batch inserts significantly because each row triggers index updates. For large imports, consider dropping non-essential indexes, loading the data, and then recreating the indexes.
Summary
- Multi-row INSERT statements combine multiple value tuples into a single query, providing 10 to 20 times faster insertion than individual statements.
LOAD DATA INFILEis the fastest bulk loading method, bypassing the SQL parser entirely.- Wrapping inserts in explicit transactions eliminates per-statement autocommit overhead.
- Use
executemany()in Python andaddBatch()/executeBatch()withrewriteBatchedStatements=truein Java for application-level batching. - Tune
max_allowed_packetand temporarily disable constraint checks during large imports for optimal throughput.

