Interview Question Find Median From Mega Number Of Integers
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
The phrase "mega number of integers" can mean two different interview problems: either a running median over a stream, or an exact median of data so large it does not fit in memory. The best answer depends on which version the interviewer actually means, so the first step is to clarify the model before jumping into code.
Case 1: Streaming Median with Two Heaps
If numbers arrive one by one and you want to report the median at any time, the standard answer is two heaps:
- a max-heap for the lower half
- a min-heap for the upper half
The heaps are balanced so that:
- their sizes differ by at most one
- every value in the left heap is less than or equal to every value in the right heap
Then the median is:
- the top of the larger heap if the count is odd
- the average of both tops if the count is even
Insertion is O(log n), and median lookup is O(1).
Case 2: Exact Median of Data Too Large for Memory
If the dataset is massive and stored on disk, the interview may be testing whether you notice that the two-heap approach still requires storing everything.
Then the question changes. Possible exact strategies include:
- external sorting
- repeated partitioning with disk-based passes
- counting by value range if the integer range is small and known
The most general exact answer is external sort:
- sort chunks that fit in memory
- write sorted runs to disk
- merge the runs
- stop when you reach the middle position
This is slower than in-memory heaps, but it is scalable and exact.
When the Integer Range Is Small
If the integers are known to lie in a limited range, such as 0 to 1_000_000, a counting approach can be better than sorting.
You count frequencies by value:
This is effectively linear in the number of items plus the value range. It is a strong answer only when the range is truly bounded and not absurdly large.
What Interviewers Usually Want You to Notice
The best candidates usually say something like:
- "If this is a streaming median, I would use two heaps."
- "If the data does not fit in memory and I need the exact final median, I need an external-memory strategy."
- "If the value range is bounded, counting may beat general sorting."
That shows you are modeling the constraints instead of memorizing one trick.
Approximate Answers Are a Different Problem
Sometimes the interviewer will allow approximation for truly enormous data. Then quantile sketches and streaming summaries become relevant. But unless the interviewer explicitly relaxes the requirement, "median" usually means exact median.
That is why you should not jump to approximate methods without confirming the requirement first.
Space Complexity Matters More Than Usual
Interview problems about huge inputs are rarely about just time complexity. They are usually testing whether you think about:
- memory limits
- sequential disk access
- number of passes over the data
- whether exactness is required
So a correct but memory-hungry O(n log n) answer may still miss the real point of the question.
Common Pitfalls
- Giving the two-heaps answer without checking whether the full dataset can fit in memory.
- Assuming streaming median and external exact median are the same problem.
- Forgetting that bounded integer ranges can change the optimal solution completely.
- Offering approximate quantiles when the question asks for the exact median.
- Focusing only on time complexity and ignoring storage constraints.
Summary
- For a running median on a stream, the standard answer is two heaps.
- For an exact median of data larger than memory, use an external-memory strategy such as external sort.
- If the integer range is small, counting can be even better.
- The right solution depends on whether the problem is streaming, disk-based, bounded-range, exact, or approximate.
- In interviews, clarifying the constraints is usually part of the expected answer.

