algorithms
data structures
arrays
duplicates removal
coding efficiency

Algorithm efficient way to remove duplicate integers from an array

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Removing duplicate integers from an array is a classic problem with several approaches at different time/space trade-offs. The most common solutions are using a hash set (O(n) time, O(n) space, preserves order), sorting then deduplicating (O(n log n) time, O(1) extra space), and the in-place two-pointer technique for sorted arrays. The right choice depends on whether order must be preserved, whether the array is already sorted, and whether extra memory is acceptable.

Hash Set Approach (Fastest)

Use a set to track seen values and build a new array of unique elements:

python
1def remove_duplicates_hashset(arr):
2    seen = set()
3    result = []
4    for x in arr:
5        if x not in seen:
6            seen.add(x)
7            result.append(x)
8    return result
9
10print(remove_duplicates_hashset([4, 2, 7, 2, 4, 1, 7]))
11# [4, 2, 7, 1]  — order preserved
  • Time: O(n) — one pass through the array
  • Space: O(n) — the set stores up to n elements
  • Preserves order: Yes

Sort + Two-Pointer (In-Place, Sorted)

If the array is already sorted (or can be sorted), remove duplicates in place:

python
1def remove_duplicates_sorted(arr):
2    if not arr:
3        return arr
4    arr.sort()
5    write = 1
6    for read in range(1, len(arr)):
7        if arr[read] != arr[write - 1]:
8            arr[write] = arr[read]
9            write += 1
10    del arr[write:]
11    return arr
12
13print(remove_duplicates_sorted([4, 2, 7, 2, 4, 1, 7]))
14# [1, 2, 4, 7]  — sorted, duplicates removed
  • Time: O(n log n) for sorting + O(n) for dedup = O(n log n)
  • Space: O(1) extra (in-place)
  • Preserves order: No (sorted order)

Language Built-Ins

python
1# Python: set (unordered)
2unique = list(set([4, 2, 7, 2, 4, 1, 7]))  # Order not guaranteed
3
4# Python: dict.fromkeys (preserves insertion order, Python 3.7+)
5unique = list(dict.fromkeys([4, 2, 7, 2, 4, 1, 7]))  # [4, 2, 7, 1]
java
1// Java: LinkedHashSet (preserves order)
2int[] arr = {4, 2, 7, 2, 4, 1, 7};
3Set<Integer> set = new LinkedHashSet<>();
4for (int x : arr) set.add(x);
5int[] result = set.stream().mapToInt(Integer::intValue).toArray();
6// [4, 2, 7, 1]
7
8// Java: Arrays.stream().distinct()
9int[] unique = Arrays.stream(arr).distinct().toArray();
10// [4, 2, 7, 1]
javascript
// JavaScript: Set
const arr = [4, 2, 7, 2, 4, 1, 7];
const unique = [...new Set(arr)];  // [4, 2, 7, 1]
csharp
// C#: LINQ Distinct
int[] arr = { 4, 2, 7, 2, 4, 1, 7 };
int[] unique = arr.Distinct().ToArray();  // [4, 2, 7, 1]

Two-Pointer for LeetCode-Style In-Place

The classic interview variant modifies a sorted array in place and returns the new length:

python
1def remove_duplicates_inplace(nums):
2    """LeetCode 26: Remove Duplicates from Sorted Array"""
3    if not nums:
4        return 0
5    k = 1  # write pointer
6    for i in range(1, len(nums)):
7        if nums[i] != nums[k - 1]:
8            nums[k] = nums[i]
9            k += 1
10    return k
11
12nums = [1, 1, 2, 2, 3, 4, 4]
13k = remove_duplicates_inplace(nums)
14print(nums[:k])  # [1, 2, 3, 4]

Bit Manipulation (Fixed Range)

When values are within a known range, use a bit array:

python
1def remove_duplicates_bitmap(arr, max_val=1000):
2    seen = bytearray((max_val + 8) // 8)
3    result = []
4    for x in arr:
5        byte_idx, bit_idx = x >> 3, x & 7
6        if not (seen[byte_idx] & (1 << bit_idx)):
7            seen[byte_idx] |= (1 << bit_idx)
8            result.append(x)
9    return result
10
11print(remove_duplicates_bitmap([4, 2, 7, 2, 4, 1, 7]))
12# [4, 2, 7, 1]

Uses only (max_val / 8) bytes instead of storing full integer objects in a set.

Comparison Table

MethodTimeSpacePreserves OrderSorted Input Required
Hash SetO(n)O(n)YesNo
Sort + Two-PointerO(n log n)O(1)No (sorted)No (sorts it)
Two-Pointer (pre-sorted)O(n)O(1)Yes (sorted)Yes
Bit ArrayO(n)O(max_val)YesNo
dict.fromkeys (Python)O(n)O(n)YesNo

Common Pitfalls

  • Using set() and expecting preserved order: set() in Python does not guarantee insertion order. Use dict.fromkeys() or iterate with a seen set and build the result list manually to preserve order.
  • Applying the two-pointer technique to an unsorted array: The two-pointer in-place approach only works on sorted arrays. On unsorted input it skips non-adjacent duplicates, producing incorrect results. Sort first or use a hash set.
  • Ignoring negative integers in bitmap approach: A bit array indexed by value does not handle negative integers directly. Either offset values (x + offset) or use a hash set for arrays containing negative numbers.
  • Modifying the array while iterating by index: Removing elements from an array during forward iteration shifts indices and causes skipped elements. Use a write pointer instead, or build a new array.
  • Choosing O(n log n) sort when O(n) hash set is available: The sort approach uses less memory (O(1) extra space), but if memory is not constrained and order matters, the hash set approach is faster and simpler. Choose based on the actual constraints.

Summary

  • Use a hash set for O(n) time with order preservation — the most common and practical approach
  • Use sort + two-pointer for O(1) extra space when order does not matter
  • Use the two-pointer in-place technique for already-sorted arrays (classic interview problem)
  • Language built-ins like set(), Distinct(), new Set() handle common cases concisely
  • Bit arrays work for non-negative integers in a known range with minimal memory
  • Always clarify whether order must be preserved and whether the input is sorted before choosing an approach

Course illustration
Course illustration

All Rights Reserved.