LeetCode Patterns - 29 Rolling Hash

Post by ailswan June. 02, 2026

中文 ↓

🎯 Rolling Hash

1️⃣ Core Framework

When discussing Rolling Hash, I frame it as:

  1. What rolling hash solves
  2. Hashing a string
  3. Polynomial rolling hash
  4. Prefix hash
  5. Substring hash in O(1)
  6. Sliding window hash
  7. Rabin-Karp string matching
  8. Duplicate substring problems
  9. Collision handling
  10. Senior-level explanation

2️⃣ What Problem Are We Solving?

Rolling hash is used to compare substrings efficiently.

It is commonly used for:

The core idea is:

Instead of comparing substrings character by character,
we convert substrings into numeric hash values.

If two substring hashes are different:

the substrings are definitely different

If two substring hashes are the same:

the substrings are probably equal

because hash collisions are possible.


👉 Interview Answer

Rolling hash is a technique for computing hash values of substrings efficiently.

After preprocessing prefix hashes, we can get the hash of any substring in O(1).

This is useful when the problem requires comparing many substrings, finding duplicate substrings, or doing Rabin-Karp style pattern matching.

Since hash collisions are possible, we may need double hashing or final string verification.


3️⃣ Why Not Compare Strings Directly?


Direct Comparison

If we compare two substrings directly:

s[i:i+k] == s[j:j+k]

This can take:

O(k)

time.


Problem

If we need to compare many substrings, this becomes expensive.

Example:

n substrings
each length k

Direct comparison can become:

O(n * k)

or worse.


Rolling Hash Improvement

Rolling hash allows substring comparison by hash:

hash(i, i+k) == hash(j, j+k)

After preprocessing:

O(1) per substring hash

👉 Interview Answer

Direct substring comparison can be expensive because comparing length-k substrings takes O(k).

Rolling hash preprocesses the string so that each substring hash can be computed in O(1).

This is useful when we need many substring comparisons.


4️⃣ Polynomial Rolling Hash


Basic Idea

Represent a string as a number.

For string:

"abc"

Using base B:

hash = a * B^2 + b * B^1 + c * B^0

If we map:

a = 1
b = 2
c = 3

Then:

hash("abc") = 1 * B^2 + 2 * B + 3

With Mod

To keep numbers small, use modulo:

hash = hash % MOD

Common large prime mod:

10^9 + 7
10^9 + 9

👉 Interview Answer

Polynomial rolling hash treats a string like a number in base B.

Each character contributes based on its position and a power of the base.

We usually take modulo by a large prime to keep values bounded.


5️⃣ Character Mapping


Common Mapping

For lowercase letters:

value = ord(ch) - ord("a") + 1

So:

a -> 1
b -> 2
c -> 3
...
z -> 26

Why Not Use 0 for ‘a’?

If:

a -> 0

then some strings may lose information at the front.

Example:

"a"
"aa"
"aaa"

could become problematic depending on hash formula.

Using 1-based values is safer.


General Characters

For arbitrary ASCII:

value = ord(ch)

👉 Interview Answer

I usually map characters to positive integers.

For lowercase letters, a maps to 1, b maps to 2, and so on.

Using positive values avoids some edge cases where leading zero-valued characters create ambiguous hashes.


6️⃣ Prefix Hash


Goal

We want to compute substring hash quickly.

Define:

prefix[i] = hash of s[0:i]

That means:

prefix[0] = hash of empty string
prefix[1] = hash of s[0]
prefix[2] = hash of s[0:2]

Formula

For each character:

prefix[i + 1] = prefix[i] * BASE + value(s[i])

Modulo:

prefix[i + 1] %= MOD

Code

def build_prefix_hash(s, base, mod):
    n = len(s)
    prefix = [0] * (n + 1)

    for i, ch in enumerate(s):
        value = ord(ch) - ord("a") + 1
        prefix[i + 1] = (prefix[i] * base + value) % mod

    return prefix

👉 Interview Answer

Prefix hash stores the hash of every prefix of the string.

The recurrence is: prefix[i + 1] = prefix[i] * base + value(s[i]).

With prefix hashes, we can derive substring hashes efficiently.


7️⃣ Powers of Base


Why Need Powers?

To remove the left prefix when computing substring hash, we need:

BASE^length

So we precompute powers:

power[i] = BASE^i % MOD

Code

def build_powers(n, base, mod):
    power = [1] * (n + 1)

    for i in range(1, n + 1):
        power[i] = (power[i - 1] * base) % mod

    return power

👉 Interview Answer

To compute a substring hash from prefix hashes, I need powers of the base.

Specifically, I need base^length to subtract the contribution of the prefix before the substring.

So I precompute powers up to n.


8️⃣ Substring Hash in O(1)


Formula

For substring:

s[l:r]

where r is exclusive.

The hash is:

hash = prefix[r] - prefix[l] * power[r - l]

With modulo:

hash = (prefix[r] - prefix[l] * power[r - l]) % MOD

Code

def get_hash(prefix, power, left, right, mod):
    return (prefix[right] - prefix[left] * power[right - left]) % mod

Why It Works

prefix[r] contains:

s[0:r]

prefix[l] * BASE^(r-l) aligns the hash of s[0:l] to remove it from the front.

The remaining part is:

s[l:r]

👉 Interview Answer

To get the hash of s[l:r], I subtract the hash of the prefix before l, shifted by the substring length.

The formula is: prefix[r] - prefix[l] * power[r - l].

Taking modulo gives an O(1) substring hash.


9️⃣ Rolling Window Hash


Sliding Window Problem

Suppose we want hashes of all substrings of length k.

Naively, we could compute each substring hash from scratch.

Rolling hash lets us update from one window to the next.


Window Example

From:

s[i:i+k]

to:

s[i+1:i+k+1]

We remove left character and add right character.


Formula Idea

new_hash = old_hash - left_char * BASE^(k-1)
new_hash = new_hash * BASE
new_hash = new_hash + right_char

With modulo at every step.


👉 Interview Answer

Rolling window hash updates a fixed-length substring hash as the window moves.

It removes the outgoing character, shifts the remaining hash by multiplying by the base, and adds the incoming character.

This allows all length-k window hashes to be computed efficiently.


🔟 Rolling Window Hash Code


Code

def rolling_window_hashes(s, k):
    if k > len(s):
        return []

    base = 26
    mod = 10**9 + 7

    def value(ch):
        return ord(ch) - ord("a") + 1

    hashes = []
    current = 0

    for i in range(k):
        current = (current * base + value(s[i])) % mod

    hashes.append(current)

    highest_power = pow(base, k - 1, mod)

    for i in range(k, len(s)):
        left_value = value(s[i - k])
        right_value = value(s[i])

        current = (current - left_value * highest_power) % mod
        current = (current * base + right_value) % mod

        hashes.append(current)

    return hashes

Note

Prefix hash is often simpler and less error-prone.

Rolling window update is useful when processing a stream.


👉 Interview Answer

Fixed-length rolling hash can update the hash in O(1) per window.

However, in many interview problems, prefix hash is cleaner because any substring hash can be queried directly.

Rolling update is especially useful for streaming or fixed-window problems.


1️⃣1️⃣ Rabin-Karp String Matching


Problem

Find whether pattern appears in text.


Idea

Compute:

pattern hash
hash of every length-m substring in text

If hashes match, then verify the substring to avoid collision.


Code

def rabin_karp(text, pattern):
    if pattern == "":
        return 0

    n = len(text)
    m = len(pattern)

    if m > n:
        return -1

    base = 256
    mod = 10**9 + 7

    def value(ch):
        return ord(ch)

    pattern_hash = 0
    window_hash = 0

    for i in range(m):
        pattern_hash = (pattern_hash * base + value(pattern[i])) % mod
        window_hash = (window_hash * base + value(text[i])) % mod

    highest_power = pow(base, m - 1, mod)

    for i in range(n - m + 1):
        if pattern_hash == window_hash:
            if text[i:i + m] == pattern:
                return i

        if i < n - m:
            left = value(text[i])
            right = value(text[i + m])

            window_hash = (window_hash - left * highest_power) % mod
            window_hash = (window_hash * base + right) % mod

    return -1

👉 Interview Answer

Rabin-Karp uses rolling hash for string matching.

It compares the pattern hash with each window hash in the text.

If the hash matches, we verify the actual substring because collisions are possible.

Average performance is efficient, but collision handling is important.


1️⃣2️⃣ Prefix Hash Template


Full Template

class RollingHash:
    def __init__(self, s, base=911382323, mod=10**9 + 7):
        self.s = s
        self.base = base
        self.mod = mod

        n = len(s)
        self.prefix = [0] * (n + 1)
        self.power = [1] * (n + 1)

        for i, ch in enumerate(s):
            value = ord(ch)
            self.prefix[i + 1] = (self.prefix[i] * base + value) % mod
            self.power[i + 1] = (self.power[i] * base) % mod

    def get_hash(self, left, right):
        return (
            self.prefix[right]
            - self.prefix[left] * self.power[right - left]
        ) % self.mod

Usage

rh = RollingHash("abcdef")

hash_abc = rh.get_hash(0, 3)
hash_bcd = rh.get_hash(1, 4)

👉 Interview Answer

I often wrap rolling hash logic in a class.

The class precomputes prefix hashes and base powers.

Then get_hash(left, right) returns the hash of any substring in O(1).


1️⃣3️⃣ Double Hashing


Why Double Hash?

One hash can collide.

Two different substrings may have the same hash.

To reduce collision probability, use two different mod values.


Code

class DoubleRollingHash:
    def __init__(self, s):
        self.base = 911382323
        self.mod1 = 10**9 + 7
        self.mod2 = 10**9 + 9

        n = len(s)

        self.prefix1 = [0] * (n + 1)
        self.prefix2 = [0] * (n + 1)
        self.power1 = [1] * (n + 1)
        self.power2 = [1] * (n + 1)

        for i, ch in enumerate(s):
            value = ord(ch)

            self.prefix1[i + 1] = (self.prefix1[i] * self.base + value) % self.mod1
            self.prefix2[i + 1] = (self.prefix2[i] * self.base + value) % self.mod2

            self.power1[i + 1] = (self.power1[i] * self.base) % self.mod1
            self.power2[i + 1] = (self.power2[i] * self.base) % self.mod2

    def get_hash(self, left, right):
        h1 = (
            self.prefix1[right]
            - self.prefix1[left] * self.power1[right - left]
        ) % self.mod1

        h2 = (
            self.prefix2[right]
            - self.prefix2[left] * self.power2[right - left]
        ) % self.mod2

        return (h1, h2)

👉 Interview Answer

Since hash collisions are possible, double hashing reduces the collision probability significantly.

It computes two independent hashes using two different mod values.

In practice, comparing a pair of hashes is much safer than comparing one hash.


1️⃣4️⃣ Repeated DNA Sequences


Problem

Find all 10-letter-long sequences that occur more than once in a DNA string.

Example:

s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output:

["AAAAACCCCC", "CCCCCAAAAA"]

Rolling Hash Idea

For every window of length 10:

compute hash
store in seen set
if hash seen again -> repeated

Code

def find_repeated_dna_sequences(s):
    k = 10

    if len(s) < k:
        return []

    seen = set()
    repeated = set()

    for i in range(len(s) - k + 1):
        seq = s[i:i + k]

        if seq in seen:
            repeated.add(seq)
        else:
            seen.add(seq)

    return list(repeated)

Interview Note

For DNA, because only four characters exist:

A, C, G, T

we can also use bit encoding.

But rolling hash is a natural general string solution.


👉 Interview Answer

Repeated DNA Sequences is a fixed-window duplicate detection problem.

We can store every length-10 window in a set.

For a more general or memory-optimized version, rolling hash or bit encoding can be used.

If a window hash appears more than once, the sequence is repeated.


1️⃣5️⃣ Longest Duplicate Substring


Problem

Find the longest substring that appears at least twice.


Pattern

This is a classic combination:

binary search on length
+
rolling hash duplicate check

If there is a duplicate substring of length L, then there must also be duplicate substrings of smaller length.

So we can binary search the answer length.


Check Function

For a fixed length L:

compute hash of every substring of length L
if any hash repeats:
    duplicate exists

👉 Interview Answer

Longest Duplicate Substring can be solved with binary search plus rolling hash.

For a fixed length, rolling hash checks whether any duplicate substring of that length exists.

Since duplicate existence is monotonic by length, we can binary search the maximum valid length.


1️⃣6️⃣ Longest Duplicate Substring Code


Code

def longest_dup_substring(s):
    n = len(s)
    rh = DoubleRollingHash(s)

    def check(length):
        seen = {}

        for i in range(n - length + 1):
            h = rh.get_hash(i, i + length)

            if h in seen:
                return i

            seen[h] = i

        return -1

    left = 1
    right = n - 1

    best_start = -1
    best_len = 0

    while left <= right:
        mid = (left + right) // 2
        start = check(mid)

        if start != -1:
            best_start = start
            best_len = mid
            left = mid + 1
        else:
            right = mid - 1

    if best_start == -1:
        return ""

    return s[best_start:best_start + best_len]

Complexity

Time: O(n log n)
Space: O(n)

👉 Interview Answer

The solution binary searches the duplicate substring length.

For each length, rolling hash checks all substrings of that length in O(n).

Therefore, the total time complexity is O(n log n), and the space complexity is O(n).


1️⃣7️⃣ Substring Equality Queries


Problem

Given many queries:

Are s[l1:r1] and s[l2:r2] equal?

Direct Comparison

Each query may cost:

O(length)

Rolling Hash

After preprocessing:

hash(l1, r1) == hash(l2, r2)

Each query is:

O(1)

Code

def are_substrings_equal(s, queries):
    rh = DoubleRollingHash(s)
    result = []

    for l1, r1, l2, r2 in queries:
        if r1 - l1 != r2 - l2:
            result.append(False)
        else:
            result.append(rh.get_hash(l1, r1) == rh.get_hash(l2, r2))

    return result

👉 Interview Answer

Rolling hash is useful when many substring equality queries are required.

After O(n) preprocessing, each substring hash can be computed in O(1).

Then two substrings of the same length can be compared by their hash values.


1️⃣8️⃣ Palindrome Check With Rolling Hash


Problem

Answer many queries:

Is s[l:r] a palindrome?

Idea

Build hash for:

s
reversed(s)

A substring is palindrome if its forward hash equals corresponding reverse hash.


Index Mapping

For substring:

s[l:r]

The corresponding substring in reversed string is:

rev[n-r : n-l]

Code

def palindrome_queries(s, queries):
    n = len(s)

    forward = DoubleRollingHash(s)
    backward = DoubleRollingHash(s[::-1])

    result = []

    for l, r in queries:
        h1 = forward.get_hash(l, r)
        h2 = backward.get_hash(n - r, n - l)

        result.append(h1 == h2)

    return result

👉 Interview Answer

For many palindrome substring queries, I can build rolling hashes for the original string and the reversed string.

A substring is a palindrome if its forward hash equals the hash of the corresponding reversed substring.

This gives O(1) query time after preprocessing, with collision caveats.


1️⃣9️⃣ Rolling Hash vs KMP


KMP

Best for:

one exact pattern
prefix-suffix structure
deterministic O(n + m)

Rolling Hash

Best for:

many substring comparisons
duplicate substring detection
binary search on substring length
Rabin-Karp matching

Comparison

Pattern Best For
KMP Exact single-pattern matching
KMP Prefix-suffix logic
Rolling Hash Substring equality
Rolling Hash Duplicate substrings
Rolling Hash Binary search on length
Trie / Aho-Corasick Many patterns

👉 Interview Answer

KMP and rolling hash are both string matching tools, but they are useful in different situations.

KMP is deterministic and best for exact single-pattern matching.

Rolling hash is more flexible for comparing many substrings, detecting duplicate substrings, and combining with binary search.

Rolling hash must account for possible collisions.


2️⃣0️⃣ Rolling Hash vs Sliding Window


Sliding Window

Good when:

window condition is based on counts
frequency map
sum
distinct characters
anagram detection

Rolling Hash

Good when:

window content identity matters
need compare exact substrings
need detect duplicate windows
need hash window efficiently

Example

Anagram matching:

use sliding window frequency

Exact repeated substring:

use rolling hash

👉 Interview Answer

Sliding window is best when the condition depends on counts, frequency, sum, or distinct characters.

Rolling hash is best when the exact identity of the substring matters.

If I need to know whether two windows contain exactly the same sequence, rolling hash is a good fit.


2️⃣1️⃣ Collision Handling


What Is Collision?

A collision happens when:

two different strings have the same hash

Example:

hash(a) == hash(b)
but a != b

How to Handle

Common strategies:

double hashing
large random base
large prime mod
verify actual substring on match
use Python string comparison when collision risk matters

In Interviews

Mention:

Hash equality suggests equality,
but does not mathematically guarantee equality under modulo hashing.

👉 Interview Answer

Rolling hash can have collisions.

If two hashes differ, the strings are definitely different.

If two hashes are equal, the strings are probably equal but not guaranteed.

To reduce risk, I can use double hashing or verify the actual substring when hashes match.


2️⃣2️⃣ Choosing Base and Mod


Base

Common choices:

31
131
257
911382323

For lowercase letters, base should be larger than alphabet size.


Mod

Common large primes:

10^9 + 7
10^9 + 9

Practical Note

Avoid base and mod combinations where:

base >= mod

unless you understand the modular behavior.

A common safe choice:

base = 911382323
mod1 = 10**9 + 7
mod2 = 10**9 + 9

But base should be reduced modulo each mod automatically during multiplication.


👉 Interview Answer

The base should usually be larger than the alphabet size.

The mod should be a large prime.

In practice, I often use double hashing with two large prime mods.

The exact choice is less important than using a consistent formula and handling collisions.


2️⃣3️⃣ Common Edge Cases


Empty String

s = ""

Need to avoid indexing errors.


Pattern Longer Than Text

No match possible.


Substring Length Zero

Define carefully depending on problem.


Repeated Characters

Example:

"aaaaaaaa"

Collision-like logic errors often show up here.


Large Input

Need O(n) or O(n log n), not O(n^2 * k).


Negative Mod

In Python:

x % mod

is non-negative.

In some languages, negative modulo may need normalization.


👉 Interview Answer

Common rolling hash edge cases include empty strings, pattern longer than text, zero-length substrings, repeated characters, large inputs, and negative modulo behavior.

In languages like Java or C++, I make sure to normalize negative hash values after subtraction.


2️⃣4️⃣ Common Mistakes


Mistake 1: Forgetting to Multiply by Power

Wrong substring hash:

prefix[r] - prefix[l]

Correct:

prefix[r] - prefix[l] * power[r - l]

Mistake 2: Not Handling Negative Hash

After subtraction, normalize:

hash_value %= mod

Mistake 3: Assuming Equal Hash Means Equal String

Collision is possible.

Use double hash or verify substring.


Mistake 4: Off-by-One Indices

Be clear:

s[l:r] uses r exclusive

Mistake 5: Wrong Reverse Index Mapping

For palindrome query:

s[l:r] maps to rev[n-r:n-l]

Mistake 6: Base Too Small

If base is smaller than alphabet size, collisions may become more likely.


👉 Interview Answer

Common rolling hash mistakes include forgetting the power factor in substring hash, not normalizing negative values, assuming equal hashes guarantee equality, making off-by-one index mistakes, mapping reversed substring indices incorrectly, and choosing weak hash parameters.


2️⃣5️⃣ When Rolling Hash Applies


Good Fit

Use rolling hash when the problem asks for:

substring equality
find duplicate substring
repeated substring detection
string matching
Rabin-Karp
many substring comparisons
palindrome substring queries
binary search over substring length
fixed-window duplicate detection

Problem Signals

Look for words like:

substring
duplicate
repeated
match
pattern
window
same substring
longest duplicate
compare substrings
hash

👉 Interview Answer

I consider rolling hash when the problem requires many substring comparisons.

Common signals include duplicate substrings, repeated substrings, Rabin-Karp matching, substring equality queries, and binary search over substring length.

The key is whether exact substring identity needs to be checked efficiently.


2️⃣6️⃣ When Rolling Hash Does Not Apply


Not Good Fit

Rolling hash may not be ideal when:

need subsequence checking
need anagram matching
need edit distance
need regex matching
need dictionary prefix search
need guaranteed collision-free comparison

Better Alternatives

Problem Type Better Pattern
Subsequence check Two Pointers
Anagram matching Sliding Window
Single exact pattern KMP
Many dictionary words Trie / Aho-Corasick
Edit distance Dynamic Programming
Regex matching DP / Automaton
Palindrome expansion Two Pointers / Manacher

👉 Interview Answer

Rolling hash is not the best tool for every string problem.

For subsequences, two pointers are simpler.

For anagrams, sliding window frequency is better.

For single exact pattern matching, KMP may be more deterministic.

For prefix dictionary lookup, Trie is the natural choice.


2️⃣7️⃣ Standard Templates


Single Rolling Hash Template

class RollingHash:
    def __init__(self, s, base=911382323, mod=10**9 + 7):
        self.base = base
        self.mod = mod

        n = len(s)
        self.prefix = [0] * (n + 1)
        self.power = [1] * (n + 1)

        for i, ch in enumerate(s):
            value = ord(ch)
            self.prefix[i + 1] = (self.prefix[i] * base + value) % mod
            self.power[i + 1] = (self.power[i] * base) % mod

    def get_hash(self, left, right):
        return (
            self.prefix[right]
            - self.prefix[left] * self.power[right - left]
        ) % self.mod

Double Rolling Hash Template

class DoubleRollingHash:
    def __init__(self, s):
        self.base = 911382323
        self.mod1 = 10**9 + 7
        self.mod2 = 10**9 + 9

        n = len(s)

        self.prefix1 = [0] * (n + 1)
        self.prefix2 = [0] * (n + 1)
        self.power1 = [1] * (n + 1)
        self.power2 = [1] * (n + 1)

        for i, ch in enumerate(s):
            value = ord(ch)

            self.prefix1[i + 1] = (self.prefix1[i] * self.base + value) % self.mod1
            self.prefix2[i + 1] = (self.prefix2[i] * self.base + value) % self.mod2

            self.power1[i + 1] = (self.power1[i] * self.base) % self.mod1
            self.power2[i + 1] = (self.power2[i] * self.base) % self.mod2

    def get_hash(self, left, right):
        h1 = (
            self.prefix1[right]
            - self.prefix1[left] * self.power1[right - left]
        ) % self.mod1

        h2 = (
            self.prefix2[right]
            - self.prefix2[left] * self.power2[right - left]
        ) % self.mod2

        return (h1, h2)

Rabin-Karp Template

def rabin_karp(text, pattern):
    if pattern == "":
        return 0

    n = len(text)
    m = len(pattern)

    if m > n:
        return -1

    rh_text = DoubleRollingHash(text)
    rh_pattern = DoubleRollingHash(pattern)

    pattern_hash = rh_pattern.get_hash(0, m)

    for i in range(n - m + 1):
        if rh_text.get_hash(i, i + m) == pattern_hash:
            if text[i:i + m] == pattern:
                return i

    return -1

Duplicate Substring Check Template

def has_duplicate_of_length(s, length):
    rh = DoubleRollingHash(s)
    seen = set()

    for i in range(len(s) - length + 1):
        h = rh.get_hash(i, i + length)

        if h in seen:
            return True

        seen.add(h)

    return False

Palindrome Query Template

def is_palindrome_substring(s, left, right):
    n = len(s)

    forward = DoubleRollingHash(s)
    backward = DoubleRollingHash(s[::-1])

    return (
        forward.get_hash(left, right)
        ==
        backward.get_hash(n - right, n - left)
    )

🧠 Staff-Level Answer Final


👉 Interview Answer Full Version

Rolling hash is a technique for computing numeric fingerprints of substrings.

Instead of comparing substrings character by character, we compare their hash values.

The most common version is polynomial rolling hash, where a string is treated like a number in some base. Each character contributes to the hash according to its position.

To make substring hash queries efficient, I precompute prefix hashes and powers of the base. The prefix hash recurrence is: prefix[i + 1] = prefix[i] * base + value(s[i]).

Then the hash of substring s[l:r] can be computed as: prefix[r] - prefix[l] * power[r - l]. This gives O(1) substring hash after O(n) preprocessing.

Rolling hash is useful when the problem requires many substring comparisons, such as substring equality queries, repeated substring detection, Rabin-Karp string matching, palindrome queries, and longest duplicate substring.

Rabin-Karp uses rolling hash for pattern matching. It compares the pattern hash against every text window hash. When hashes match, we may verify the actual substring to avoid collision issues.

Longest Duplicate Substring is a classic example where rolling hash is combined with binary search. For a fixed length, we use rolling hash to check whether any duplicate substring exists. Since duplicate existence is monotonic by length, binary search finds the maximum valid length.

The main caveat is hash collision. Equal hashes do not mathematically guarantee equal strings under modulo hashing. To reduce collision risk, I can use double hashing, large prime mods, random base, or verify the actual substring when hashes match.

Rolling hash is not always the best string pattern. For exact single-pattern matching, KMP is deterministic. For anagrams, sliding window frequency is better. For prefix dictionary search, Trie is better. For edit distance, dynamic programming is needed.

At a senior level, I would explain: what the hash represents, how prefix hashes are built, why substring hash needs the base power adjustment, how collisions are handled, why double hashing helps, and when rolling hash is more appropriate than KMP, Trie, or sliding window.


⭐ Final Insight

Rolling Hash 的核心不是“把 string 变成数字”。

它的核心是:

用 prefix hash 和 power array, 在 O(1) 时间拿到任意 substring 的 fingerprint。

Senior-level 的重点是:

polynomial hash 公式是什么, substring hash 为什么要减掉 prefix[l] * power[length], 为什么要取 mod, collision 为什么存在, double hashing 如何降低风险, rolling hash 适合 many substring comparisons, 以及什么时候应该改用 KMP、sliding window、Trie 或 DP。

能讲清楚这些, Rolling Hash 就是一类处理 substring equality、duplicate substring、Rabin-Karp matching 和 string fingerprinting 的核心字符串模式。



中文部分


🎯 Rolling Hash


1️⃣ 核心框架

讨论 Rolling Hash 时,我通常从:

  1. Rolling hash 解决什么问题
  2. Hashing a string
  3. Polynomial rolling hash
  4. Prefix hash
  5. O(1) substring hash
  6. Sliding window hash
  7. Rabin-Karp string matching
  8. Duplicate substring problems
  9. Collision handling
  10. 高级工程师解释方式

2️⃣ 要解决什么问题?

Rolling hash 用来高效比较 substrings。

它常用于:

核心思想是:

不是逐个 character 比较 substrings,
而是把 substring 转换成 numeric hash value。

如果两个 substring hashes 不同:

substrings 一定不同

如果两个 substring hashes 相同:

substrings 很可能相同

因为 hash collision 可能存在。


👉 面试回答

Rolling hash 是一种高效计算 substring hash value 的技术。

经过 prefix hash 预处理后, 可以在 O(1) 时间得到任意 substring 的 hash。

它适合需要大量 substring comparison、 查找 duplicate substrings, 或者 Rabin-Karp pattern matching 的问题。

因为 hash collision 可能存在, 所以可能需要 double hashing 或最终的 string verification。


3️⃣ 为什么不直接比较 Strings?


Direct Comparison

如果直接比较两个 substrings:

s[i:i+k] == s[j:j+k]

可能需要:

O(k)

时间。


Problem

如果需要比较很多 substrings, 代价会很高。

Example:

n 个 substrings
每个长度 k

直接比较可能变成:

O(n * k)

甚至更差。


Rolling Hash Improvement

Rolling hash 可以通过 hash 比较:

hash(i, i+k) == hash(j, j+k)

预处理后:

O(1) per substring hash

👉 面试回答

直接比较 substrings 可能很贵, 因为比较长度为 k 的 substring 需要 O(k)。

Rolling hash 会预处理 string, 使每个 substring hash 可以在 O(1) 时间得到。

当我们需要大量 substring comparisons 时, 这个方法非常有用。


4️⃣ Polynomial Rolling Hash


Basic Idea

把 string 表示成一个数字。

对于 string:

"abc"

使用 base B

hash = a * B^2 + b * B^1 + c * B^0

如果映射为:

a = 1
b = 2
c = 3

那么:

hash("abc") = 1 * B^2 + 2 * B + 3

With Mod

为了让数字不会太大, 使用 modulo:

hash = hash % MOD

常见大质数 mod:

10^9 + 7
10^9 + 9

👉 面试回答

Polynomial rolling hash 会把 string 当成 base B 下的一个数字。

每个 character 根据它的位置和 base 的幂贡献 hash value。

通常我们会对一个 large prime 取 mod, 避免数字过大。


5️⃣ Character Mapping


Common Mapping

对于 lowercase letters:

value = ord(ch) - ord("a") + 1

所以:

a -> 1
b -> 2
c -> 3
...
z -> 26

Why Not Use 0 for ‘a’?

如果:

a -> 0

有些 string 的前导信息可能丢失。

Example:

"a"
"aa"
"aaa"

根据 hash 公式, 可能产生不好的边界情况。

使用 1-based values 更安全。


General Characters

对于任意 ASCII:

value = ord(ch)

👉 面试回答

我通常把 characters 映射成 positive integers。

对 lowercase letters, a 映射成 1, b 映射成 2, 以此类推。

使用正数可以避免前导 0 character 带来的歧义。


6️⃣ Prefix Hash


Goal

我们希望快速计算 substring hash。

定义:

prefix[i] = s[0:i] 的 hash

也就是说:

prefix[0] = empty string hash
prefix[1] = s[0] 的 hash
prefix[2] = s[0:2] 的 hash

Formula

对每个 character:

prefix[i + 1] = prefix[i] * BASE + value(s[i])

取 modulo:

prefix[i + 1] %= MOD

Code

def build_prefix_hash(s, base, mod):
    n = len(s)
    prefix = [0] * (n + 1)

    for i, ch in enumerate(s):
        value = ord(ch) - ord("a") + 1
        prefix[i + 1] = (prefix[i] * base + value) % mod

    return prefix

👉 面试回答

Prefix hash 存每个 prefix 的 hash。

递推公式是: prefix[i + 1] = prefix[i] * base + value(s[i])

有了 prefix hashes, 就可以高效推出 substring hash。


7️⃣ Powers of Base


Why Need Powers?

计算 substring hash 时, 为了移除左边的 prefix, 需要:

BASE^length

所以预处理:

power[i] = BASE^i % MOD

Code

def build_powers(n, base, mod):
    power = [1] * (n + 1)

    for i in range(1, n + 1):
        power[i] = (power[i - 1] * base) % mod

    return power

👉 面试回答

从 prefix hashes 计算 substring hash 时, 我需要 base 的 powers。

具体来说, 需要 base^length 来减掉 substring 之前的 prefix contribution。

所以我会预处理 0 到 n 的 powers。


8️⃣ O(1) Substring Hash


Formula

对于 substring:

s[l:r]

其中 r 是 exclusive。

Hash 是:

hash = prefix[r] - prefix[l] * power[r - l]

取 modulo:

hash = (prefix[r] - prefix[l] * power[r - l]) % MOD

Code

def get_hash(prefix, power, left, right, mod):
    return (prefix[right] - prefix[left] * power[right - left]) % mod

Why It Works

prefix[r] 包含:

s[0:r]

prefix[l] * BASE^(r-l) 会把 s[0:l] 的 hash 对齐, 从前面减掉。

剩下的就是:

s[l:r]

👉 面试回答

要得到 s[l:r] 的 hash, 我从 prefix[r] 中减掉 l 之前的 prefix hash, 并用 base power 对齐长度。

公式是: prefix[r] - prefix[l] * power[r - l]

取 modulo 后, 可以 O(1) 得到 substring hash。


9️⃣ Rolling Window Hash


Sliding Window Problem

假设我们要得到所有长度为 k 的 substring hashes。

Naively, 可以每个 substring 从头算 hash。

Rolling hash 可以从一个 window 更新到下一个 window。


Window Example

从:

s[i:i+k]

到:

s[i+1:i+k+1]

移除 left character, 加入 right character。


Formula Idea

new_hash = old_hash - left_char * BASE^(k-1)
new_hash = new_hash * BASE
new_hash = new_hash + right_char

每一步都取 modulo。


👉 面试回答

Rolling window hash 会在 window 移动时 O(1) 更新 hash。

它先移除 outgoing character, 然后把剩余部分乘 base 左移, 最后加入 incoming character。

这样可以高效计算所有 fixed-length window hashes。


🔟 Rolling Window Hash Code


Code

def rolling_window_hashes(s, k):
    if k > len(s):
        return []

    base = 26
    mod = 10**9 + 7

    def value(ch):
        return ord(ch) - ord("a") + 1

    hashes = []
    current = 0

    for i in range(k):
        current = (current * base + value(s[i])) % mod

    hashes.append(current)

    highest_power = pow(base, k - 1, mod)

    for i in range(k, len(s)):
        left_value = value(s[i - k])
        right_value = value(s[i])

        current = (current - left_value * highest_power) % mod
        current = (current * base + right_value) % mod

        hashes.append(current)

    return hashes

Note

Prefix hash 通常更简单且更不容易写错。

Rolling window update 更适合 stream processing。


👉 面试回答

Fixed-length rolling hash 可以每个 window O(1) 更新。

但在很多 interview problems 中, prefix hash 更清晰, 因为可以直接查询任意 substring。

Rolling update 特别适合 fixed-window 或 stream processing。


1️⃣1️⃣ Rabin-Karp String Matching


Problem

判断 pattern 是否出现在 text 中。


Idea

计算:

pattern hash
text 中每个 length-m substring 的 hash

如果 hashes match, 再验证 substring, 避免 collision。


Code

def rabin_karp(text, pattern):
    if pattern == "":
        return 0

    n = len(text)
    m = len(pattern)

    if m > n:
        return -1

    base = 256
    mod = 10**9 + 7

    def value(ch):
        return ord(ch)

    pattern_hash = 0
    window_hash = 0

    for i in range(m):
        pattern_hash = (pattern_hash * base + value(pattern[i])) % mod
        window_hash = (window_hash * base + value(text[i])) % mod

    highest_power = pow(base, m - 1, mod)

    for i in range(n - m + 1):
        if pattern_hash == window_hash:
            if text[i:i + m] == pattern:
                return i

        if i < n - m:
            left = value(text[i])
            right = value(text[i + m])

            window_hash = (window_hash - left * highest_power) % mod
            window_hash = (window_hash * base + right) % mod

    return -1

👉 面试回答

Rabin-Karp 使用 rolling hash 做 string matching。

它比较 pattern hash 和 text 中每个 window 的 hash。

如果 hash 相等, 因为 collision 可能存在, 我会再验证 actual substring。

平均性能很好, 但 collision handling 很重要。


1️⃣2️⃣ Prefix Hash Template


Full Template

class RollingHash:
    def __init__(self, s, base=911382323, mod=10**9 + 7):
        self.s = s
        self.base = base
        self.mod = mod

        n = len(s)
        self.prefix = [0] * (n + 1)
        self.power = [1] * (n + 1)

        for i, ch in enumerate(s):
            value = ord(ch)
            self.prefix[i + 1] = (self.prefix[i] * base + value) % mod
            self.power[i + 1] = (self.power[i] * base) % mod

    def get_hash(self, left, right):
        return (
            self.prefix[right]
            - self.prefix[left] * self.power[right - left]
        ) % self.mod

Usage

rh = RollingHash("abcdef")

hash_abc = rh.get_hash(0, 3)
hash_bcd = rh.get_hash(1, 4)

👉 面试回答

我经常把 rolling hash logic 包成一个 class。

这个 class 会预处理 prefix hashes 和 base powers。

然后 get_hash(left, right) 可以 O(1) 返回任意 substring 的 hash。


1️⃣3️⃣ Double Hashing


Why Double Hash?

一个 hash 可能 collision。

两个不同 substring 可能有相同 hash。

为了降低 collision probability, 可以使用两个不同 mod。


Code

class DoubleRollingHash:
    def __init__(self, s):
        self.base = 911382323
        self.mod1 = 10**9 + 7
        self.mod2 = 10**9 + 9

        n = len(s)

        self.prefix1 = [0] * (n + 1)
        self.prefix2 = [0] * (n + 1)
        self.power1 = [1] * (n + 1)
        self.power2 = [1] * (n + 1)

        for i, ch in enumerate(s):
            value = ord(ch)

            self.prefix1[i + 1] = (self.prefix1[i] * self.base + value) % self.mod1
            self.prefix2[i + 1] = (self.prefix2[i] * self.base + value) % self.mod2

            self.power1[i + 1] = (self.power1[i] * self.base) % self.mod1
            self.power2[i + 1] = (self.power2[i] * self.base) % self.mod2

    def get_hash(self, left, right):
        h1 = (
            self.prefix1[right]
            - self.prefix1[left] * self.power1[right - left]
        ) % self.mod1

        h2 = (
            self.prefix2[right]
            - self.prefix2[left] * self.power2[right - left]
        ) % self.mod2

        return (h1, h2)

👉 面试回答

因为 hash collision 可能发生, double hashing 可以显著降低 collision probability。

它用两个不同 mod 计算两个独立 hash。

实际使用中, 比较一对 hash 会比比较单个 hash 更安全。


1️⃣4️⃣ Repeated DNA Sequences


Problem

找出所有出现超过一次的长度为 10 的 DNA sequences。

Example:

s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output:

["AAAAACCCCC", "CCCCCAAAAA"]

Rolling Hash Idea

对每个长度为 10 的 window:

compute hash
store in seen set
if hash seen again -> repeated

Code

def find_repeated_dna_sequences(s):
    k = 10

    if len(s) < k:
        return []

    seen = set()
    repeated = set()

    for i in range(len(s) - k + 1):
        seq = s[i:i + k]

        if seq in seen:
            repeated.add(seq)
        else:
            seen.add(seq)

    return list(repeated)

Interview Note

对于 DNA, 因为只有四个 characters:

A, C, G, T

也可以使用 bit encoding。

但 rolling hash 是更通用的 string solution。


👉 面试回答

Repeated DNA Sequences 是 fixed-window duplicate detection。

可以把每个 length-10 window 放到 set 中。

更通用或更节省内存的版本, 可以使用 rolling hash 或 bit encoding。

如果某个 window hash 出现超过一次, 就说明 sequence repeated。


1️⃣5️⃣ Longest Duplicate Substring


Problem

找出最长的重复 substring。


Pattern

这是经典组合:

binary search on length
+
rolling hash duplicate check

Why Binary Search?

如果存在长度为 L 的 duplicate substring, 那么也一定存在更短长度的 duplicate substring。

所以可以 binary search answer length。


Check Function

对固定长度 L

计算每个 length-L substring 的 hash
如果任何 hash 重复:
    duplicate exists

👉 面试回答

Longest Duplicate Substring 可以用 binary search 加 rolling hash。

对一个固定 length, rolling hash 检查是否存在 duplicate substring。

因为 duplicate existence 对长度具有 monotonic property, 所以可以 binary search 最大可行长度。


1️⃣6️⃣ Longest Duplicate Substring Code


Code

def longest_dup_substring(s):
    n = len(s)
    rh = DoubleRollingHash(s)

    def check(length):
        seen = {}

        for i in range(n - length + 1):
            h = rh.get_hash(i, i + length)

            if h in seen:
                return i

            seen[h] = i

        return -1

    left = 1
    right = n - 1

    best_start = -1
    best_len = 0

    while left <= right:
        mid = (left + right) // 2
        start = check(mid)

        if start != -1:
            best_start = start
            best_len = mid
            left = mid + 1
        else:
            right = mid - 1

    if best_start == -1:
        return ""

    return s[best_start:best_start + best_len]

Complexity

Time: O(n log n)
Space: O(n)

👉 面试回答

这个解法会 binary search duplicate substring 的长度。

对每个 length, rolling hash 用 O(n) 检查所有 substrings。

因此总时间复杂度是 O(n log n), 空间复杂度是 O(n)。


1️⃣7️⃣ Substring Equality Queries


Problem

给定大量 queries:

s[l1:r1] 和 s[l2:r2] 是否相同?

Direct Comparison

每个 query 可能需要:

O(length)

Rolling Hash

预处理后:

hash(l1, r1) == hash(l2, r2)

每个 query 是:

O(1)

Code

def are_substrings_equal(s, queries):
    rh = DoubleRollingHash(s)
    result = []

    for l1, r1, l2, r2 in queries:
        if r1 - l1 != r2 - l2:
            result.append(False)
        else:
            result.append(rh.get_hash(l1, r1) == rh.get_hash(l2, r2))

    return result

👉 面试回答

当需要大量 substring equality queries 时, rolling hash 很有用。

O(n) 预处理之后, 每个 substring hash 可以 O(1) 得到。

然后相同长度的两个 substrings 可以通过 hash values 比较。


1️⃣8️⃣ Palindrome Check With Rolling Hash


Problem

回答很多 queries:

s[l:r] 是否是 palindrome?

Idea

分别对下面两个 string 建 hash:

s
reversed(s)

如果 substring 的 forward hash 等于对应 reverse hash, 则它是 palindrome。


Index Mapping

对于 substring:

s[l:r]

在 reversed string 中对应:

rev[n-r : n-l]

Code

def palindrome_queries(s, queries):
    n = len(s)

    forward = DoubleRollingHash(s)
    backward = DoubleRollingHash(s[::-1])

    result = []

    for l, r in queries:
        h1 = forward.get_hash(l, r)
        h2 = backward.get_hash(n - r, n - l)

        result.append(h1 == h2)

    return result

👉 面试回答

对大量 palindrome substring queries, 我可以对原 string 和 reversed string 分别建立 rolling hash。

如果一个 substring 的 forward hash 等于 reversed string 中对应 substring 的 hash, 它就是 palindrome。

预处理后 query 是 O(1), 但仍然有 collision caveat。


1️⃣9️⃣ Rolling Hash vs KMP


KMP

适合:

one exact pattern
prefix-suffix structure
deterministic O(n + m)

Rolling Hash

适合:

many substring comparisons
duplicate substring detection
binary search on substring length
Rabin-Karp matching

Comparison

Pattern Best For
KMP Exact single-pattern matching
KMP Prefix-suffix logic
Rolling Hash Substring equality
Rolling Hash Duplicate substrings
Rolling Hash Binary search on length
Trie / Aho-Corasick Many patterns

👉 面试回答

KMP 和 rolling hash 都是 string matching tools, 但适用场景不同。

KMP 是 deterministic, 更适合 exact single-pattern matching。

Rolling hash 更灵活, 适合比较大量 substrings、 检测 duplicate substrings, 以及和 binary search 结合。

Rolling hash 需要考虑 collision。


2️⃣0️⃣ Rolling Hash vs Sliding Window


Sliding Window

适合条件基于:

window condition is based on counts
frequency map
sum
distinct characters
anagram detection

Rolling Hash

适合:

window content identity matters
need compare exact substrings
need detect duplicate windows
need hash window efficiently

Example

Anagram matching:

use sliding window frequency

Exact repeated substring:

use rolling hash

👉 面试回答

Sliding window 适合 condition 依赖 counts、 frequency、sum 或 distinct characters 的问题。

Rolling hash 适合 exact substring identity 重要的场景。

如果我要判断两个 windows 是否是完全相同的 sequence, rolling hash 是很好的选择。


2️⃣1️⃣ Collision Handling


What Is Collision?

Collision 表示:

两个不同 strings 有相同 hash

Example:

hash(a) == hash(b)
but a != b

How to Handle

常见策略:

double hashing
large random base
large prime mod
verify actual substring on match
use Python string comparison when collision risk matters

In Interviews

要说明:

Hash equality suggests equality,
but does not mathematically guarantee equality under modulo hashing.

👉 面试回答

Rolling hash 可能发生 collision。

如果两个 hashes 不同, strings 一定不同。

如果两个 hashes 相同, strings 很可能相同, 但不能绝对保证。

为了降低风险, 可以使用 double hashing, 或在 hash match 时验证 actual substring。


2️⃣2️⃣ Choosing Base and Mod


Base

常见选择:

31
131
257
911382323

对于 lowercase letters, base 应该大于 alphabet size。


Mod

常见 large primes:

10^9 + 7
10^9 + 9

Practical Note

避免 base 和 mod 组合不合理:

base >= mod

除非你清楚 modular behavior。

常见选择:

base = 911382323
mod1 = 10**9 + 7
mod2 = 10**9 + 9

乘法时会自动对每个 mod 取余。


👉 面试回答

Base 通常应该大于 alphabet size。

Mod 应该是 large prime。

实际使用中, 我通常会用 double hashing 和两个 large prime mods。

具体数值不是最核心的, 关键是公式一致, 并且要处理 collision。


2️⃣3️⃣ 常见 Edge Cases


Empty String

s = ""

需要避免 indexing error。


Pattern Longer Than Text

不可能匹配。


Substring Length Zero

根据题目定义清楚。


Repeated Characters

Example:

"aaaaaaaa"

很多 rolling hash 逻辑错误会在这里暴露。


Large Input

需要 O(n) 或 O(n log n), 不能 O(n^2 * k)。


Negative Mod

Python 中:

x % mod

结果是 non-negative。

但在一些语言中, negative modulo 需要手动 normalize。


👉 面试回答

Rolling hash 常见 edge cases 包括 empty strings、 pattern longer than text、 zero-length substrings、 repeated characters、 large input, 以及 negative modulo behavior。

在 Java 或 C++ 中, subtraction 之后要手动 normalize negative hash value。


2️⃣4️⃣ 常见错误


Mistake 1: 忘记乘 Power

错误 substring hash:

prefix[r] - prefix[l]

正确:

prefix[r] - prefix[l] * power[r - l]

Mistake 2: 没处理 Negative Hash

减法后 normalize:

hash_value %= mod

Mistake 3: 认为 Equal Hash 一定 Equal String

Collision 可能存在。

使用 double hash 或 verify substring。


Mistake 4: Off-by-One Indices

要明确:

s[l:r] uses r exclusive

Mistake 5: Reverse Index Mapping 错误

Palindrome query 中:

s[l:r] maps to rev[n-r:n-l]

Mistake 6: Base 太小

如果 base 小于 alphabet size, collision probability 可能上升。


👉 面试回答

Rolling hash 常见错误包括: substring hash 忘记 power factor、 没有 normalize negative value、 认为 equal hashes 一定代表 equal strings、 off-by-one index 错误、 reversed substring index mapping 错误, 以及选择 weak hash parameters。


2️⃣5️⃣ 什么时候 Rolling Hash 适用?


Good Fit

当题目问:

substring equality
find duplicate substring
repeated substring detection
string matching
Rabin-Karp
many substring comparisons
palindrome substring queries
binary search over substring length
fixed-window duplicate detection

Problem Signals

看到这些关键词想到 rolling hash:

substring
duplicate
repeated
match
pattern
window
same substring
longest duplicate
compare substrings
hash

👉 面试回答

当问题需要大量 substring comparisons 时, 我会考虑 rolling hash。

常见信号包括 duplicate substrings、 repeated substrings、Rabin-Karp matching、 substring equality queries, 以及 binary search over substring length。

关键是题目是否需要高效判断 exact substring identity。


2️⃣6️⃣ 什么时候 Rolling Hash 不适用?


Not Good Fit

Rolling hash 不一定适合:

need subsequence checking
need anagram matching
need edit distance
need regex matching
need dictionary prefix search
need guaranteed collision-free comparison

Better Alternatives

Problem Type Better Pattern
Subsequence check Two Pointers
Anagram matching Sliding Window
Single exact pattern KMP
Many dictionary words Trie / Aho-Corasick
Edit distance Dynamic Programming
Regex matching DP / Automaton
Palindrome expansion Two Pointers / Manacher

👉 面试回答

Rolling hash 不是所有 string problem 的最佳选择。

对 subsequence, two pointers 更简单。

对 anagram, sliding window frequency 更好。

对 single exact pattern matching, KMP 更 deterministic。

对 prefix dictionary lookup, Trie 是更自然的选择。


2️⃣7️⃣ 标准模板


Single Rolling Hash Template

class RollingHash:
    def __init__(self, s, base=911382323, mod=10**9 + 7):
        self.base = base
        self.mod = mod

        n = len(s)
        self.prefix = [0] * (n + 1)
        self.power = [1] * (n + 1)

        for i, ch in enumerate(s):
            value = ord(ch)
            self.prefix[i + 1] = (self.prefix[i] * base + value) % mod
            self.power[i + 1] = (self.power[i] * base) % mod

    def get_hash(self, left, right):
        return (
            self.prefix[right]
            - self.prefix[left] * self.power[right - left]
        ) % self.mod

Double Rolling Hash Template

class DoubleRollingHash:
    def __init__(self, s):
        self.base = 911382323
        self.mod1 = 10**9 + 7
        self.mod2 = 10**9 + 9

        n = len(s)

        self.prefix1 = [0] * (n + 1)
        self.prefix2 = [0] * (n + 1)
        self.power1 = [1] * (n + 1)
        self.power2 = [1] * (n + 1)

        for i, ch in enumerate(s):
            value = ord(ch)

            self.prefix1[i + 1] = (self.prefix1[i] * self.base + value) % self.mod1
            self.prefix2[i + 1] = (self.prefix2[i] * self.base + value) % self.mod2

            self.power1[i + 1] = (self.power1[i] * self.base) % self.mod1
            self.power2[i + 1] = (self.power2[i] * self.base) % self.mod2

    def get_hash(self, left, right):
        h1 = (
            self.prefix1[right]
            - self.prefix1[left] * self.power1[right - left]
        ) % self.mod1

        h2 = (
            self.prefix2[right]
            - self.prefix2[left] * self.power2[right - left]
        ) % self.mod2

        return (h1, h2)

Rabin-Karp Template

def rabin_karp(text, pattern):
    if pattern == "":
        return 0

    n = len(text)
    m = len(pattern)

    if m > n:
        return -1

    rh_text = DoubleRollingHash(text)
    rh_pattern = DoubleRollingHash(pattern)

    pattern_hash = rh_pattern.get_hash(0, m)

    for i in range(n - m + 1):
        if rh_text.get_hash(i, i + m) == pattern_hash:
            if text[i:i + m] == pattern:
                return i

    return -1

Duplicate Substring Check Template

def has_duplicate_of_length(s, length):
    rh = DoubleRollingHash(s)
    seen = set()

    for i in range(len(s) - length + 1):
        h = rh.get_hash(i, i + length)

        if h in seen:
            return True

        seen.add(h)

    return False

Palindrome Query Template

def is_palindrome_substring(s, left, right):
    n = len(s)

    forward = DoubleRollingHash(s)
    backward = DoubleRollingHash(s[::-1])

    return (
        forward.get_hash(left, right)
        ==
        backward.get_hash(n - right, n - left)
    )

🧠 Staff-Level Answer Final


👉 面试回答完整版本

Rolling hash 是一种给 substrings 计算 numeric fingerprints 的技术。

它不是逐个 character 比较 substrings, 而是比较它们的 hash values。

最常见的是 polynomial rolling hash, 也就是把 string 看成某个 base 下的数字。 每个 character 根据它的位置贡献 hash。

为了高效查询 substring hash, 我会预处理 prefix hashes 和 powers of base。 Prefix hash 的递推是: prefix[i + 1] = prefix[i] * base + value(s[i])

然后 substring s[l:r] 的 hash 可以用: prefix[r] - prefix[l] * power[r - l]。 这样 O(n) 预处理后, 任意 substring hash 可以 O(1) 得到。

Rolling hash 适合需要大量 substring comparison 的问题, 例如 substring equality queries、 repeated substring detection、 Rabin-Karp string matching、 palindrome queries、 和 longest duplicate substring。

Rabin-Karp 会用 rolling hash 做 pattern matching。 它比较 pattern hash 和每个 text window hash。 当 hashes match 时, 可以验证 actual substring, 避免 collision 问题。

Longest Duplicate Substring 是 rolling hash 结合 binary search 的经典例子。 对固定 length, 使用 rolling hash 检查是否有 duplicate substring。 因为 duplicate existence 对 length 具有 monotonic property, 所以可以 binary search 最大合法 length。

Rolling hash 最大的 caveat 是 hash collision。 Equal hashes 在 modulo hashing 下并不能数学上保证 strings 一定相等。 为了降低风险, 可以使用 double hashing、large prime mods、random base, 或在 hash match 时验证 actual substring。

Rolling hash 也不是所有 string problem 的最佳工具。 对 exact single-pattern matching, KMP 更 deterministic。 对 anagrams, sliding window frequency 更好。 对 dictionary prefix search, Trie 更好。 对 edit distance, 需要 dynamic programming。

高级工程师解释 rolling hash 时, 我会讲清楚: hash 表示什么, prefix hash 如何构建, 为什么 substring hash 需要 base power adjustment, collisions 如何处理, double hashing 为什么有用, 以及 rolling hash 什么时候比 KMP、Trie 或 sliding window 更合适。


⭐ Final Insight

Rolling Hash 的核心不是“把 string 变成数字”。

它的核心是:

用 prefix hash 和 power array, 在 O(1) 时间拿到任意 substring 的 fingerprint。

Senior-level 的重点是:

polynomial hash 公式是什么, substring hash 为什么要减掉 prefix[l] * power[length], 为什么要取 mod, collision 为什么存在, double hashing 如何降低风险, rolling hash 适合 many substring comparisons, 以及什么时候应该改用 KMP、sliding window、Trie 或 DP。

能讲清楚这些, Rolling Hash 就是一类处理 substring equality、duplicate substring、Rabin-Karp matching 和 string fingerprinting 的核心字符串模式。

Implement