Arrays and String Fundamentals

DSA Fundamentals

Arrays and String Fundamentals

Topics Covered

Array Traversal Patterns

Forward Traversal and When It Falls Short

Backward Traversal - Why Iterate from the End

The Two-Pass Pattern

When to Use Each Pattern

Prefix Sums

Building a Prefix Sum Array

O(1) Range Sum Queries

Application: Subarray Sum Equals K

Prefix Products - Product of Array Except Self

Running Averages

The Difference Array - The Inverse of Prefix Sums

Prefix Sums in Two Dimensions

When Prefix Sums Are Not Enough

Frequency Counting

Arrays vs Hash Maps - Choose the Right Counter

Anagram Detection with Frequency Counting

In-Place Frequency Marking - The Index-as-Value Trick

Finding Missing Numbers

Frequency Counting as a Building Block

String Manipulation Techniques

The O(n^2) Concatenation Trap

In-Place Array Modification - Overwrite from the End

Swap-Based Partitioning

Character-Level vs String-Level Operations

Putting It Together - Reverse Words In-Place

String Compression - A Common Interview Pattern

Key Takeaways

Practice

Arrays are the most fundamental data structure you will encounter, yet the way you traverse them determines whether your solution is elegant or clumsy. Before reaching for hash maps, trees, or graphs, ask yourself: can I solve this with a smarter traversal?

The answer is surprisingly often yes.

Forward Traversal and When It Falls Short

Forward traversal (index 0 to n-1) is the default. You read elements left to right, accumulate a result, and return it. This works for straightforward aggregation: summing elements, finding maximums, building output arrays.

But forward traversal breaks down when modifying the array in-place while iterating. Consider removing all zeroes from an array. If you iterate forward and delete elements, your indices shift - you either skip elements or need complex bookkeeping.

There is, however, a forward technique that handles in-place modification elegantly: the read-write pointer pattern. One pointer reads through the array while another writes only the elements you want to keep.

The write pointer is always at or behind the read pointer, so you never overwrite data you have not yet examined. This is O(n) time and O(1) space. The same pattern works for removing specific values, compacting sparse arrays, and filtering elements by any predicate.

Backward Traversal - Why Iterate from the End

Iterating from the end solves a different class of problem: when the output is longer than the input and you must work in the same buffer. When you remove or overwrite elements from back to front, earlier indices remain valid because you have already processed everything to the right.

A classic example is merging two sorted arrays where the first array has trailing space. Writing from the end ensures no unprocessed data is overwritten.

The reason this works is that writing from the back never overwrites data you still need - the write pointer is always at or ahead of both read pointers.

Interview Tip

When you see an in-place modification problem where the output array is the same as the input, ask yourself: would writing from the end avoid overwriting unprocessed data? This single question unlocks merge-sorted-array, move-zeroes, and remove-duplicates problems.

The Two-Pass Pattern

Many problems seem to require extra data structures but actually need two passes over the same array. The first pass collects metadata - counts, positions, cumulative values. The second pass uses that metadata to produce the final result.

Two-pass pattern

First pass collects data into a map scanning left to right. Second pass uses the map to produce the result.

Example: Product of Array Except Self. The brute force approach multiplies all elements except the current one - O(n^2). The two-pass approach is O(n):

Pass 1 (left to right): Build an array of prefix products. left[i] is the product of all elements before index i.
Pass 2 (right to left): Multiply each prefix product by the running suffix product (product of all elements after index i).

Why does this work without division? Each element in the result is the product of everything to its left (from pass 1) multiplied by everything to its right (from pass 2). The current element is never included in either product.

Step through the prefix and suffix product passes interactively:

Loading animator...

Example: First Missing Positive. Find the smallest positive integer not present in an unsorted array. The two-pass approach uses the array itself as a hash table:

Pass 1: Place each value v in position v-1 (if v is in range [1, n]). This is a cyclic sort - keep swapping until every in-range value is at its "home" index.
Pass 2: Scan left to right. The first index i where nums[i] != i + 1 means i+1 is missing.

The first pass is O(n) despite the nested while loop - each element moves to its home position at most once, so the total swaps across all iterations is at most n. The second pass is a simple scan. This pattern of "put things where they belong, then find what is missing" appears in many problems: find the duplicate, find all missing numbers, find the first missing positive.

When to Use Each Pattern

Pattern	Use When	Examples
Forward	Simple aggregation, no in-place modification	Sum, max, filter to new array
Backward	In-place modification where forward would overwrite needed data	Merge sorted arrays, fill from end
Two-pass	Result at position i depends on global information	Product except self, next greater element, first missing positive

The key insight is that traversal direction is a design choice, not a default. Choosing the right direction often eliminates the need for extra space or complex logic.

Prefix Sums

Imagine you are building a dashboard that shows the total revenue for any date range a user selects. The naive approach sums the daily revenue from day L to day R on every query - O(n) per query. If users make 10,000 queries per second across a dataset of 365 days, that is 3.65 million additions per second. Prefix sums reduce every query to a single subtraction - O(1) per query, regardless of the range size.

This is the core idea: precompute cumulative sums once, then answer range queries instantly.

Building a Prefix Sum Array

Given an array nums of length n, the prefix sum array prefix has length n+1, where prefix[i] is the sum of all elements from index 0 to i-1.

Prefix sum construction and range query

Prefix sums are built by accumulating values left to right. A range query is answered with a single subtraction.

Why length n+1? Because prefix[0] = 0 handles the edge case where the range starts at index 0. Without it, you would need a special case for left boundary queries.

O(1) Range Sum Queries

To get the sum of elements from index L to R (inclusive):

This works because prefix[R+1] contains the sum of elements 0 through R, and prefix[L] contains the sum of elements 0 through L-1. Subtracting cancels everything before L, leaving exactly the sum from L to R.

The O(n) preprocessing cost pays for itself after just two queries. For k queries on an array of size n, brute force is O(k * n) while prefix sums give O(n + k).

Application: Subarray Sum Equals K

One of the most powerful prefix sum applications is finding subarrays that sum to a target value K. The insight: if prefix[j] - prefix[i] = K, then the subarray from index i to j-1 sums to K.

Why does this work? At each position j, you compute the running prefix sum. If any earlier prefix sum equals current_sum - k, then the subarray between that earlier position and the current position sums to exactly K. The hash map tracks how many times each prefix sum has appeared, handling duplicates correctly.

Step through how a hash map finds a two-sum pair in a single pass:

Loading animator...

Interview Tip

The subarray-sum-equals-K pattern combines prefix sums with a hash map. Whenever you see a problem asking about contiguous subarrays with a target sum, this combination should be your first instinct. It reduces O(n^2) brute force to O(n).

Prefix Products - Product of Array Except Self

The prefix sum idea generalizes to any associative operation. For products, build a prefix product array where prefix[i] is the product of elements 0 through i-1. Combined with a suffix product, you can compute "product except self" in O(n) without division.

Running Averages

Prefix sums also give you O(1) running averages for any window. The average of elements from L to R is:

This is useful in time-series analysis, moving average calculations, and any context where you need the mean of a sliding window without recomputing the sum from scratch each time.

The Difference Array - The Inverse of Prefix Sums

A difference array is the inverse operation: given an array, build an array of differences between consecutive elements. More practically, difference arrays let you apply range updates in O(1). If you need to add a value v to every element from index L to R, instead of updating R-L+1 elements, you mark diff[L] += v and diff[R+1] -= v. After all updates, compute the prefix sum of the difference array to get the final values.

This is invaluable for problems involving multiple range increments - booking systems (add 1 to each time slot that is booked), sweep line algorithms, and batch processing scenarios where you need to apply thousands of range updates before reading the final state.

Prefix Sums in Two Dimensions

The same technique extends to 2D matrices. Given an m x n grid, a 2D prefix sum lets you compute the sum of any rectangular subregion in O(1). The preprocessing builds a matrix where prefix[i][j] stores the sum of all elements in the rectangle from (0,0) to (i-1, j-1).

The inclusion-exclusion formula handles overlap: add the two adjacent rectangles, subtract the double-counted corner. This pattern appears in image processing (summed area tables), game development (terrain queries), and any problem involving rectangular range sums.

When Prefix Sums Are Not Enough

Prefix sums handle static arrays well, but if the array changes frequently (element updates between queries), each update invalidates the prefix sum array and requires O(n) reconstruction. For dynamic arrays with both updates and range queries, you need a Fenwick tree (Binary Indexed Tree) or segment tree, which handle both operations in O(log n). Prefix sums remain the right choice when the array is built once and queried many times.

Frequency Counting

Frequency counting is the technique of tallying how often each element appears. It sounds trivial - and the basic version is - but the choice of data structure and the way you apply counting determines whether your solution runs in O(n) or O(n^2), and whether it uses O(1) or O(n) extra space.

Arrays vs Hash Maps - Choose the Right Counter

When the value range is bounded and small (like lowercase English letters, digits 0-9, or ASCII characters), use a fixed-size array as your counter. Array indexing is O(1) with no hashing overhead, no collision handling, and perfect cache locality.

When the value range is unbounded or sparse (arbitrary integers, strings, objects), use a hash map. The overhead of hashing is worth it because an array would need to be impossibly large.

Frequency counter

A hash map is populated character by character, incrementing the count for each character encountered.

The rule is simple: if you can index directly into a small array, do so. Otherwise, use a hash map.

Performance difference in practice. For a string of 1 million characters, an array-based counter completes in roughly 1ms on modern hardware. A hash-map-based counter takes 5-10ms due to hashing overhead, memory allocation for each bucket, and cache-unfriendly memory access patterns. The difference is small in absolute terms but becomes significant when counting is called millions of times (inside a sliding window, for example). When the value range is bounded, always prefer the array.

Anagram Detection with Frequency Counting

Two strings are anagrams if they contain exactly the same characters with the same frequencies. Sorting both strings and comparing is O(n log n). Frequency counting does it in O(n).

The trick is using a single counter array: increment for characters in s, decrement for characters in t. If all counts are zero at the end, the strings are anagrams. This avoids building two separate frequency maps and comparing them.

In-Place Frequency Marking - The Index-as-Value Trick

When the problem says "find the missing number" or "find duplicates" in an array where values are in the range [1, n], you can use the array itself as a frequency counter. The idea: use the sign of nums[i] to mark whether value i has been seen.

Why does this work? Each value v maps to index v-1. When you first encounter v, you negate nums[v-1]. If you encounter v again, nums[v-1] is already negative - indicating a duplicate. This technique uses O(1) extra space by encoding frequency information into the sign bits of the existing array.

Watch how the algorithm detects a duplicate element using a hash set:

Loading animator...

A word of caution: this technique modifies the input array. If the caller needs the original data intact, you must either restore the array afterward (by taking absolute values of all elements) or use a hash set instead. In interviews, always ask whether modifying the input is allowed before using this trick.

Interview Tip

The in-place marking trick only works when values are in range [1, n] for an array of size n. Before applying it, verify this constraint. If values can be 0, negative, or exceed n, you need a hash set instead.

Finding Missing Numbers

The same sign-marking technique finds missing numbers. After marking all present values, any index whose value is still positive corresponds to a missing number.

Frequency Counting as a Building Block

Frequency counting is rarely the entire solution - it is a building block. Count first, then use the counts:

Group anagrams: Count each string's character frequencies, use the frequency tuple as a hash map key to group strings.
Top K frequent elements: Count frequencies, then use a heap or quickselect to find the K highest counts.
First unique character: Count frequencies in pass 1, scan for the first character with count 1 in pass 2.
Minimum window substring: Use two frequency arrays - one for the target pattern, one for the current window - and compare counts to check if the window contains all target characters.

Group Anagrams in Detail. This problem illustrates how frequency counting becomes a hashing key. Given a list of strings, group all anagrams together. The insight: two strings are anagrams if and only if their sorted character sequences are identical - or equivalently, their frequency distributions are identical.

The sorted-key approach is O(n * k log k) where k is the max string length. The frequency-key approach is O(n * k) - faster when strings are long because building a 26-element frequency array is O(k) while sorting is O(k log k). Both use O(n * k) space for the hash map.

String Manipulation Techniques

Strings are arrays of characters, so every array technique applies. But strings have a performance trap that catches many programmers: concatenation in a loop creates a new string on every iteration, turning an O(n) algorithm into O(n^2).

Understanding why this happens - and how to avoid it - is the difference between code that passes all test cases and code that times out on large inputs.

The O(n^2) Concatenation Trap

In most languages, strings are immutable. When you write result += char, the runtime allocates a new string of length len(result) + 1, copies all existing characters, appends the new one, and discards the old string. For n characters, the total work is 1 + 2 + 3 + ... + n = O(n^2).

In Java, use StringBuilder. In C++, use std::string::reserve() or std::ostringstream. In JavaScript, collect parts in an array and join(). The principle is the same across languages: accumulate in a mutable buffer, convert to a string once at the end. Use the language tabs above to see the idiomatic approach for each language.

Why does StringBuilder achieve O(1) amortized appends? It maintains an internal character array that starts at a default capacity (16 in Java). When the array is full, it allocates a new array of double the size and copies the old data. The doubling strategy means each character is copied at most O(log n) times across all resizes, amortizing to O(1) per append. Pre-allocating with new StringBuilder(expectedLength) avoids all resizes entirely.

In-place modification from the back

Read and write pointers move right to left. Writing from the end avoids overwriting unprocessed data.

In-Place Array Modification - Overwrite from the End

When the output is longer than the input and you must work in-place, write from the end to avoid overwriting unprocessed data. The classic example is URL encoding: replacing each space with "%20" (1 character becomes 3).

Why from the end? Because the expanded string is longer than the original. Writing from the front would overwrite characters you have not processed yet. Writing from the end guarantees the write pointer never catches the read pointer.

Swap-Based Partitioning

Some string/array problems require rearranging elements into groups - all vowels before consonants, all lowercase before uppercase, all negatives before positives. Swap-based partitioning does this in O(n) time and O(1) space.

This is the same partitioning logic used in quicksort. The left pointer finds elements that belong on the right side, the right pointer finds elements that belong on the left side, and they swap. The pointers converge toward the middle, and when they meet, every element is on its correct side.

Three-way partitioning extends this to three groups. The Dutch National Flag problem (sort an array of 0s, 1s, and 2s) uses three pointers: low tracks the boundary of 0s, mid scans through the array, and high tracks the boundary of 2s. Elements are swapped into their correct region as mid advances. This is O(n) time with a single pass and O(1) space - no counting pass needed.

Character-Level vs String-Level Operations

A subtle distinction that affects both performance and correctness: operating on individual characters versus operating on whole strings.

Character-level operations iterate through characters one at a time. Use them when the problem involves individual character properties: counting, replacing, validating character types, or building new strings character by character.

String-level operations use built-in methods that process the entire string or substrings. Use them when the problem involves splitting, joining, reversing words, or pattern matching - tasks where built-in methods are both faster (implemented in C/low-level code) and clearer.

The character-level version of reverse-words requires splitting on spaces manually, reversing each word, then reversing the whole string. The string-level version does the same thing in one line. Know both approaches: interviews sometimes restrict you to character-level operations ("do not use split"), and sometimes the string-level approach is the expected answer.

Interview Tip

When an interview problem says 'do it in-place' for strings, they usually mean treat the string as a character array. In Python, convert to a list of characters first, modify in place, then join. In Java, use a char[] array. In C++, std::string is already mutable. The 'in-place' constraint is really about O(1) extra space, not the immutability of the string type.

Putting It Together - Reverse Words In-Place

Reversing the words in a string in-place combines three techniques: reversal, two-pointer traversal, and character-level manipulation.

Reverse the entire character array.
Reverse each individual word (delimited by spaces).
Clean up extra spaces if needed.

Step 1 puts the words in the right order but each word is backwards. Step 2 fixes each word individually. Total time: O(n) with two passes. Extra space: O(1) - just index variables.

String Compression - A Common Interview Pattern

String compression combines traversal, counting, and string building. The task: compress "aabcccccaaa" into "a2b1c5a3". Return the original if the compressed version is not shorter.

This problem tests three things at once: traversal (scanning consecutive characters), counting (tracking run lengths), and efficient string building (using a list instead of concatenation). The early-return optimization - checking if compression would actually shorten the string before building it - is a nice touch that avoids unnecessary work. You can compute the compressed length in a first pass without building the string, then decide whether to proceed.

Key Takeaways

String and array manipulation problems share a common structure: identify the right traversal pattern, choose the right data structure for intermediate results, and avoid hidden quadratic costs. The techniques in this section - buffer-based building, backward writing, swap partitioning, and the two-reversal trick - appear repeatedly across interview problems. Master the patterns and the specific problems become straightforward applications.

Practice

Loading problem...

Done

Course

DSA Fundamentals

Arrays and Hashing

Two Pointers and Sliding Window

Stacks and Queues

Linked Lists

Trees and Graphs

Dynamic Programming

Advanced Data Structures

Advanced Patterns