How to create a DataFrame of random integers with Pandas?

Pandas

DataFrame

random integers

Python

data manipulation

How to create a DataFrame of random integers with Pandas?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

The usual way to build a Pandas DataFrame filled with random integers is to generate a NumPy array first and then wrap it in pd.DataFrame. Pandas itself is excellent at labels and tabular operations, but NumPy is the right tool for fast random number generation. Once you separate those roles, the code becomes simple and predictable.

Basic Example With NumPy and Pandas

python

1import numpy as np
2import pandas as pd
3
4rng = np.random.default_rng(42)
5data = rng.integers(low=0, high=10, size=(4, 3))
6
7df = pd.DataFrame(data, columns=["a", "b", "c"])
8print(df)

What this does:

'default_rng(42) creates a reproducible random generator'
'integers generates random integers in the half-open interval from low to high'
'size=(4, 3) creates four rows and three columns'
'pd.DataFrame(...) attaches tabular structure and column labels'

This is the clean default for most use cases.

Choose the Range Carefully

The upper bound is exclusive. That means low=0, high=10 produces values from 0 through 9, not 10.

python

1import numpy as np
2
3rng = np.random.default_rng(1)
4print(rng.integers(low=1, high=6, size=10))

This behaves like many Python range-style APIs, so it is worth remembering when you expect inclusive bounds.

Add Index Labels and Custom Columns

You can make the DataFrame more realistic by assigning labels.

python

1import numpy as np
2import pandas as pd
3
4rng = np.random.default_rng(7)
5data = rng.integers(100, 200, size=(3, 4))
6
7df = pd.DataFrame(
8    data,
9    index=["row1", "row2", "row3"],
10    columns=["q1", "q2", "q3", "q4"]
11)
12
13print(df)

This is useful for tests, demos, and synthetic datasets where the structure matters as much as the values.

Reproducibility Matters

If you want the same random integers each run, use a fixed seed. If you want different values every time, omit the explicit seed.

Reproducible:

python

rng = np.random.default_rng(123)

Non-reproducible:

python

rng = np.random.default_rng()

For notebooks, tests, and tutorials, fixed seeds are usually the right choice because they make examples stable.

Generate Specific Dtypes

The random integers are usually fine with the default integer dtype, but you can control it when needed.

python

1import numpy as np
2import pandas as pd
3
4rng = np.random.default_rng(5)
5data = rng.integers(0, 100, size=(2, 2), dtype=np.int16)
6
7df = pd.DataFrame(data, columns=["x", "y"])
8print(df.dtypes)

This matters when memory size or downstream type expectations are important.

A One-Liner for Quick Use

For short scripts or ad hoc notebook work, a one-liner is often enough:

python

1import numpy as np
2import pandas as pd
3
4df = pd.DataFrame(np.random.default_rng(0).integers(0, 50, size=(5, 5)))
5print(df)

That is concise, though slightly less readable than a step-by-step version if the dataset shape and labels are important.

Use Cases Beyond Testing

Random integer DataFrames are useful for more than toy examples:

load-testing a transformation pipeline
building quick reproducible demos
validating plotting or aggregation code
generating placeholder tabular input during development

Just remember that synthetic randomness does not guarantee realistic distributions.

Common Pitfalls

Expecting the upper bound passed to integers to be included.
Using the old global random API when a local generator would be cleaner.
Forgetting a seed when reproducibility matters.
Generating the array first and then forgetting to add column names, making later code harder to read.
Assuming random test data is representative of production data quality.

Summary

Generate random integers with NumPy, then wrap the result in a Pandas DataFrame.
'np.random.default_rng().integers(...) is the modern NumPy API to prefer.'
The high bound is exclusive.
Set a seed when you want reproducible output.
Add labels and dtypes intentionally so the DataFrame matches your real use case.