algorithms
computational geometry
nearest neighbor search
spatial data
proximity search

Algorithm for finding nearby points?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finding nearby points is a classic spatial-query problem: given a query point and a distance threshold, return every point that lies within that radius. It shows up in mapping, games, clustering, collision detection, and recommendation systems. The right algorithm mostly depends on data size, dimensionality, and update frequency.

What "Nearby" Usually Means

There are two similar but different problems:

  1. Radius query: find all points within distance r of a query point.
  2. k-nearest neighbors: find the closest k points, even if the radius is not known in advance.

For points in 2D or 3D, Euclidean distance is common. For road networks or latitude/longitude data, another metric may be more appropriate.

For two points (x1, y1) and (x2, y2), the squared Euclidean distance is often enough:

(x2x1)2+(y2y1)2(x2 - x1)^2 + (y2 - y1)^2

Using the squared form avoids an unnecessary square root during comparisons.

Brute Force Is the Correct Baseline

The simplest algorithm checks every point. That is O(n) per query, which sounds slow, but for small datasets it is often the best choice because it has almost no setup cost.

python
1from math import sqrt
2
3
4def nearby_points(points, query, radius):
5    qx, qy = query
6    r2 = radius * radius
7    result = []
8
9    for px, py in points:
10        dx = px - qx
11        dy = py - qy
12        if dx * dx + dy * dy <= r2:
13            result.append((px, py))
14
15    return result
16
17
18points = [(0, 0), (1, 2), (3, 4), (10, 10)]
19print(nearby_points(points, (0, 1), 2.5))

This is the version to write first. It is easy to verify and gives you a benchmark before adding indexing complexity.

Uniform Grid or Spatial Hash

If points live in low-dimensional space and you run many radius queries, a grid is often the most practical upgrade. Divide space into square cells, assign each point to a cell, and only inspect the query cell plus nearby cells.

The key idea is simple: if the search radius is small relative to the full dataset, most cells can be ignored.

python
1from collections import defaultdict
2
3
4class GridIndex:
5    def __init__(self, cell_size):
6        self.cell_size = cell_size
7        self.cells = defaultdict(list)
8
9    def _cell(self, x, y):
10        return (int(x // self.cell_size), int(y // self.cell_size))
11
12    def insert(self, point):
13        x, y = point
14        self.cells[self._cell(x, y)].append(point)
15
16    def query_radius(self, query, radius):
17        qx, qy = query
18        r2 = radius * radius
19        cx, cy = self._cell(qx, qy)
20        reach = int(radius // self.cell_size) + 1
21        result = []
22
23        for ix in range(cx - reach, cx + reach + 1):
24            for iy in range(cy - reach, cy + reach + 1):
25                for px, py in self.cells[(ix, iy)]:
26                    dx = px - qx
27                    dy = py - qy
28                    if dx * dx + dy * dy <= r2:
29                        result.append((px, py))
30
31        return result
32
33
34index = GridIndex(cell_size=2.0)
35for point in [(0, 0), (1, 2), (3, 4), (10, 10)]:
36    index.insert(point)
37
38print(index.query_radius((0, 1), 2.5))

This works especially well for games, particle systems, and map layers where point density is reasonably uniform. It is also easy to update when points move.

KD-Trees and Similar Structures

A k-d tree recursively splits points by coordinate axis. During a query, entire branches can be skipped if their bounding region cannot intersect the search radius. That gives much better average performance than brute force for repeated queries on static or mostly static data.

In 2D or 3D, k-d trees are a strong default. As dimensionality grows, pruning becomes less effective and the tree starts behaving more like a linear scan. This is part of the broader curse of dimensionality.

If your points are rectangles or geographic bounding boxes rather than simple coordinates, R-trees are often a better fit. If you are working in a generic metric space, ball trees can be more flexible.

Which Algorithm Should You Choose?

ApproachBest whenTypical tradeoff
Brute forceSmall datasets or rare queriesNo preprocessing, but O(n) per query
Uniform gridLow-dimensional, roughly uniform dataFast and simple, but cell size matters
KD-treeMany queries on static 2D or 3D pointsGood pruning, weaker in high dimensions
R-treeSpatial objects with bounding boxesGreat for GIS-style indexing
Ball treeMetric spaces or moderately higher dimensionsMore flexible, more complex to tune

A practical rule is to start with brute force, move to a grid if updates are frequent, and move to a k-d tree if the dataset is static and queries are common. For R-trees or ball trees, an existing library is usually the right choice.

Common Pitfalls

  • Using square roots inside the hot loop: compare squared distances instead.
  • Choosing the wrong metric: Euclidean distance is not always correct for geographic coordinates or graph paths.
  • Ignoring data distribution: a grid performs poorly if almost all points land in a few crowded cells.
  • Overusing k-d trees in high dimensions: once dimensions get large, pruning quality drops and the expected speedup can disappear.
  • Skipping the baseline benchmark: many teams build an index before measuring whether O(n) is already fast enough.
  • Forgetting update costs: a structure that is fast to query may be expensive to rebuild after inserts or deletes.

Summary

  • Nearby-point search usually means a radius query, not necessarily k-nearest neighbors.
  • Brute force is the right first implementation because it is simple and correct.
  • Uniform grids are excellent for low-dimensional data with frequent updates.
  • KD-trees are a strong choice for repeated queries on mostly static 2D or 3D point sets.
  • R-trees and ball trees are better fits for more specialized spatial or metric-search workloads.
  • The best algorithm depends on query volume, dimensionality, update frequency, and data distribution.

Course illustration
Course illustration

All Rights Reserved.