How can I Insert many rows into a MySQL table and return the new IDs?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Bulk inserts are the efficient way to add many rows to MySQL, but getting the generated IDs back is less straightforward than in databases with rich RETURNING support. In the common auto-increment case, the usual pattern is to insert all rows in one statement, capture the first generated ID on the same connection, and infer the range only when the insert semantics make that safe.
Use One Multi-Row Insert First
The fastest starting point is a single insert with many value tuples.
This is usually much faster than sending one insert per row because it reduces round trips and lets MySQL handle the batch as one statement.
The Common Auto-Increment Pattern
For a plain AUTO_INCREMENT primary key, drivers typically expose the first generated ID after the insert. Combined with the number of inserted rows, you can derive the full ID range if the insert is simple and no conflicting semantics such as duplicate-key handling are involved.
A Python example with mysql-connector-python looks like this:
This works because the code stays on one connection and assumes the inserted rows received a contiguous auto-increment block.
Why Connection Scope Matters
The generated-ID information belongs to the same session that performed the insert. If you insert on one pooled connection and then read insert metadata from another, the result is meaningless.
That is why the safe rules are:
- do the insert and ID read on the same connection,
- keep the logic in one transaction or one immediate code path,
- avoid unrelated inserts on that connection before you read the metadata.
If you break those rules, the derived ID list may be wrong even though the insert succeeded.
When Range Inference Is Safe
The usual first_id + rowcount pattern is only reliable when all of these are true:
- the table uses standard auto-increment behavior,
- the statement is a normal insert,
- the rows were actually inserted,
- the driver reports row count in the expected way,
- no duplicate-key logic changed the outcome.
That last condition matters more than people expect.
Duplicate-Key Logic Changes the Story
If you use INSERT IGNORE or ON DUPLICATE KEY UPDATE, row counts and generated IDs can stop matching the input rows in an obvious way. Some rows may be skipped or updated rather than inserted.
In those situations, inferring a clean ID range becomes risky. If you need exact row-to-ID mapping, safer options are:
- insert into a staging table first,
- include a deterministic external key in each row,
- query the inserted rows back by that key after commit,
- generate IDs in the application layer instead.
Chunking for Large Imports
For large imports, chunk the rows so the transaction and statement stay manageable.
This keeps lock durations and memory use under better control while preserving the same connection-local ID capture pattern.
When Application-Generated IDs Are Better
If exact identity tracking matters more than numeric sequence convenience, application-generated IDs such as UUID or ULID can simplify the workflow. You know the row IDs before the insert, which eliminates the need to reconstruct them afterward.
That tradeoff is not always worth it for every table, but it is often worth considering for event data, imports, and distributed write paths.
Common Pitfalls
A common mistake is assuming generated ID ranges are always safe regardless of statement type. They are not. Once duplicate-key behavior enters the picture, naive range inference can be wrong.
Another issue is splitting insert and ID retrieval across different pooled connections. Generated-ID state is connection-local.
Teams also forget that bulk insert performance and exact row identity mapping are different requirements. Sometimes the fastest insert shape is not the cleanest way to recover every inserted ID.
Summary
- Use one multi-row insert for efficient bulk writes.
- Capture generated-ID metadata on the same connection that performed the insert.
- Inferring a contiguous ID range is reasonable only for plain auto-increment inserts.
- Be cautious with duplicate-key logic because it changes row and ID semantics.
- If exact identity mapping is critical, consider staging keys or application-generated IDs instead of inference.

