MySQL
Database
Batch Insert
SQL Tutorial
Data Insertion

How to do a batch insert in MySQL

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Inserting rows one at a time into MySQL is straightforward but becomes a performance bottleneck when dealing with thousands or millions of records. Batch inserts combine multiple rows into a single statement or transaction, drastically reducing network round trips and query parsing overhead. This article covers multi-row INSERT syntax, LOAD DATA INFILE for bulk loading, prepared statements for application-level batching, and the configuration settings that affect batch insert performance.

Multi-Row INSERT Syntax

The most common batch insert technique in MySQL is the multi-row INSERT statement, which specifies multiple value tuples in a single query.

sql
1-- Single-row insert (slow for large datasets)
2INSERT INTO users (name, email, age) VALUES ('Alice', '[email protected]', 30);
3INSERT INTO users (name, email, age) VALUES ('Bob', '[email protected]', 25);
4
5-- Multi-row batch insert (much faster)
6INSERT INTO users (name, email, age) VALUES
7  ('Alice', '[email protected]', 30),
8  ('Bob', '[email protected]', 25),
9  ('Carol', '[email protected]', 28),
10  ('Dave', '[email protected]', 35),
11  ('Eve', '[email protected]', 22);

A single multi-row INSERT can be 10 to 20 times faster than equivalent individual INSERT statements because the server parses and optimizes the query once and writes all rows in a single operation.

Using LOAD DATA INFILE

For the fastest possible bulk loading, MySQL provides LOAD DATA INFILE, which reads data directly from a file on disk. This method bypasses the SQL parser entirely and is typically 20 times faster than INSERT statements.

sql
1-- Load data from a CSV file
2LOAD DATA INFILE '/var/lib/mysql-files/users.csv'
3INTO TABLE users
4FIELDS TERMINATED BY ','
5ENCLOSED BY '"'
6LINES TERMINATED BY '\n'
7IGNORE 1 ROWS
8(name, email, age);
9
10-- Load from a client-side file
11LOAD DATA LOCAL INFILE '/home/user/data/users.csv'
12INTO TABLE users
13FIELDS TERMINATED BY ','
14ENCLOSED BY '"'
15LINES TERMINATED BY '\n'
16IGNORE 1 ROWS
17(name, email, age);

The LOCAL keyword instructs MySQL to read the file from the client machine rather than the server filesystem. Note that local_infile must be enabled on both the server and client for this to work.

Transaction Wrapping for Performance

Wrapping batch inserts in an explicit transaction avoids the overhead of autocommit, where MySQL commits after every individual statement. This is especially beneficial when executing multiple INSERT statements in a loop.

sql
1-- Without transaction: each INSERT auto-commits (slow)
2INSERT INTO orders (product_id, quantity) VALUES (1, 10);
3INSERT INTO orders (product_id, quantity) VALUES (2, 5);
4INSERT INTO orders (product_id, quantity) VALUES (3, 8);
5
6-- With transaction: single commit at the end (fast)
7START TRANSACTION;
8INSERT INTO orders (product_id, quantity) VALUES (1, 10);
9INSERT INTO orders (product_id, quantity) VALUES (2, 5);
10INSERT INTO orders (product_id, quantity) VALUES (3, 8);
11-- ... thousands more rows
12COMMIT;

For very large batches, commit every 5,000 to 10,000 rows to balance performance with memory usage and lock duration.

Prepared Statements with Python

Application-level batch inserts use prepared statements with parameterized queries. The executemany() method in Python's mysql-connector-python library handles batching automatically.

python
1import mysql.connector
2
3conn = mysql.connector.connect(
4    host='localhost',
5    user='root',
6    password='secret',
7    database='myapp'
8)
9cursor = conn.cursor()
10
11# Prepare data as a list of tuples
12users = [
13    ('Alice', '[email protected]', 30),
14    ('Bob', '[email protected]', 25),
15    ('Carol', '[email protected]', 28),
16    ('Dave', '[email protected]', 35),
17    ('Eve', '[email protected]', 22),
18]
19
20query = "INSERT INTO users (name, email, age) VALUES (%s, %s, %s)"
21cursor.executemany(query, users)
22conn.commit()
23
24print(f"Inserted {cursor.rowcount} rows")
25cursor.close()
26conn.close()

For larger datasets, process records in chunks to avoid building an excessively large query string.

python
1def batch_insert(cursor, conn, data, batch_size=5000):
2    query = "INSERT INTO users (name, email, age) VALUES (%s, %s, %s)"
3    for i in range(0, len(data), batch_size):
4        chunk = data[i:i + batch_size]
5        cursor.executemany(query, chunk)
6        conn.commit()
7        print(f"Inserted rows {i} to {i + len(chunk)}")

Prepared Statements with Java

In Java, the JDBC addBatch() and executeBatch() methods provide batch insert capability. Adding rewriteBatchedStatements=true to the connection URL tells the MySQL Connector/J driver to combine individual inserts into multi-row statements.

java
1import java.sql.*;
2
3public class BatchInsertExample {
4    public static void main(String[] args) throws Exception {
5        String url = "jdbc:mysql://localhost:3306/myapp"
6            + "?rewriteBatchedStatements=true";
7        Connection conn = DriverManager.getConnection(url, "root", "secret");
8
9        String sql = "INSERT INTO users (name, email, age) VALUES (?, ?, ?)";
10        PreparedStatement pstmt = conn.prepareStatement(sql);
11
12        conn.setAutoCommit(false);
13
14        String[][] users = {
15            {"Alice", "[email protected]", "30"},
16            {"Bob", "[email protected]", "25"},
17            {"Carol", "[email protected]", "28"},
18        };
19
20        for (String[] user : users) {
21            pstmt.setString(1, user[0]);
22            pstmt.setString(2, user[1]);
23            pstmt.setInt(3, Integer.parseInt(user[2]));
24            pstmt.addBatch();
25        }
26
27        pstmt.executeBatch();
28        conn.commit();
29
30        pstmt.close();
31        conn.close();
32    }
33}

Without rewriteBatchedStatements=true, the JDBC driver sends each statement individually even when using addBatch(), negating the performance benefit.

Performance Tuning Configuration

Several MySQL server variables affect batch insert performance. Adjusting them before a large import can make a significant difference.

sql
1-- Increase max packet size to allow larger multi-row INSERTs
2SET GLOBAL max_allowed_packet = 64 * 1024 * 1024;  -- 64 MB
3
4-- Increase bulk insert buffer for MyISAM tables
5SET GLOBAL bulk_insert_buffer_size = 256 * 1024 * 1024;  -- 256 MB
6
7-- Temporarily disable unique checks for faster loading
8SET unique_checks = 0;
9SET foreign_key_checks = 0;
10
11-- Run your batch inserts here
12
13-- Re-enable checks after loading
14SET unique_checks = 1;
15SET foreign_key_checks = 1;

The max_allowed_packet setting is particularly important because it limits the maximum size of a single SQL statement. If your multi-row INSERT exceeds this limit, the server rejects the query with an error.

Common Pitfalls

  • Exceeding max_allowed_packet: Very large multi-row INSERT statements fail if they exceed the server's max_allowed_packet setting. Split large batches into chunks of 1,000 to 5,000 rows.
  • Forgetting to commit: When autocommit is disabled, failing to call COMMIT means no data is actually written. Always commit after batch operations, and use try-catch blocks to roll back on errors.
  • Leaving foreign_key_checks disabled: Disabling foreign key checks for performance is fine during bulk loading, but forgetting to re-enable them afterward allows invalid data to enter the database.
  • Not using rewriteBatchedStatements in Java: Without this JDBC connection parameter, addBatch() and executeBatch() do not actually combine statements, resulting in no performance improvement over individual inserts.
  • Ignoring index overhead: Tables with many indexes slow down batch inserts significantly because each row triggers index updates. For large imports, consider dropping non-essential indexes, loading the data, and then recreating the indexes.

Summary

  • Multi-row INSERT statements combine multiple value tuples into a single query, providing 10 to 20 times faster insertion than individual statements.
  • LOAD DATA INFILE is the fastest bulk loading method, bypassing the SQL parser entirely.
  • Wrapping inserts in explicit transactions eliminates per-statement autocommit overhead.
  • Use executemany() in Python and addBatch()/executeBatch() with rewriteBatchedStatements=true in Java for application-level batching.
  • Tune max_allowed_packet and temporarily disable constraint checks during large imports for optimal throughput.

Course illustration
Course illustration

All Rights Reserved.