LeetCode Patterns - 28 String Matching (KMP)

Post by ailswan June. 02, 2026

中文 ↓

🎯 String Matching (KMP)

1️⃣ Core Framework

When discussing String Matching (KMP), I frame it as:

  1. What string matching means
  2. Brute force string matching
  3. Why brute force is inefficient
  4. Prefix and suffix idea
  5. LPS array
  6. How KMP uses LPS
  7. Build LPS array
  8. KMP search process
  9. Common KMP problems
  10. Senior-level explanation

2️⃣ What Problem Are We Solving?

String matching asks:

Does pattern exist inside text?
If yes, where does it start?

It is commonly used for:

The core idea is:

KMP avoids rechecking characters
by using information from the pattern itself.

👉 Interview Answer

String matching is about finding whether a pattern appears inside a text.

The brute force approach compares the pattern at every possible starting position.

KMP improves this by precomputing the pattern’s prefix-suffix information.

When a mismatch happens, KMP does not move the text pointer backward. Instead, it uses the LPS array to decide how much of the pattern can still be reused.


3️⃣ Brute Force String Matching


Problem

Given:

text = "hello"
pattern = "ll"

Return:

2

because "ll" starts at index 2.


Brute Force Idea

Try every starting position in text.

For each position, compare pattern character by character.


Code

def str_str(text, pattern):
    if pattern == "":
        return 0

    n = len(text)
    m = len(pattern)

    for i in range(n - m + 1):
        j = 0

        while j < m and text[i + j] == pattern[j]:
            j += 1

        if j == m:
            return i

    return -1

Complexity

Time: O(n * m)
Space: O(1)

👉 Interview Answer

The brute force solution tries every possible starting index in the text.

At each index, it compares the pattern character by character.

In the worst case, this can take O(n * m) time, where n is the text length and m is the pattern length.


4️⃣ Why Brute Force Is Inefficient


Example

text    = "aaaaaaaaab"
pattern = "aaaab"

Brute force repeatedly compares many of the same characters.

At each starting position, it matches several as, then fails near the end.


Problem

Brute force forgets useful information.

If we already matched:

aaaa

and then failed, we should not restart completely.

There may be useful overlap inside the pattern.


KMP Improvement

KMP asks:

After mismatch,
what is the longest prefix of the pattern
that is also a suffix of what we already matched?

This lets us shift the pattern intelligently.


👉 Interview Answer

Brute force is inefficient because it rechecks characters that were already matched.

KMP avoids this by using prefix-suffix information from the pattern.

After a mismatch, KMP knows how much of the previous match can still be reused, so it avoids unnecessary comparisons.


5️⃣ Prefix and Suffix


Prefix

A prefix is a substring that starts at the beginning.

For:

"abab"

Prefixes are:

"a"
"ab"
"aba"
"abab"

Suffix

A suffix is a substring that ends at the end.

For:

"abab"

Suffixes are:

"b"
"ab"
"bab"
"abab"

Proper Prefix / Proper Suffix

In KMP, we use proper prefix and proper suffix.

That means the whole string itself is not allowed.

For:

"abab"

Longest proper prefix that is also suffix:

"ab"

Length:

2

👉 Interview Answer

KMP is based on prefix and suffix overlap.

For a substring of the pattern, we want to know the longest proper prefix that is also a suffix.

This tells us how much of the pattern can be reused after a mismatch.


6️⃣ LPS Array


What Is LPS?

LPS means:

Longest Proper Prefix which is also Suffix

For each index i, lps[i] means:

the length of the longest proper prefix of pattern[0:i+1]
that is also a suffix of pattern[0:i+1]

Example

pattern = "ababaca"

LPS:

index:   0 1 2 3 4 5 6
char:    a b a b a c a
lps:     0 0 1 2 3 0 1

Meaning

For substring:

"ababa"

Longest proper prefix that is also suffix is:

"aba"

So:

lps[4] = 3

👉 Interview Answer

The LPS array stores, for every prefix of the pattern, the length of the longest proper prefix that is also a suffix.

It is the key preprocessing step in KMP.

During search, when a mismatch happens, the LPS array tells us where to continue in the pattern.


7️⃣ Intuition Behind LPS


Pattern

pattern = "ababaca"

Suppose we matched:

ababa

Then mismatch happens.

The matched part is:

ababa

Its longest prefix also suffix is:

aba

So instead of restarting from pattern index 0, we can continue from pattern index 3.


Why?

Because the suffix already matched in text:

aba

can serve as the prefix of the next possible pattern match.


👉 Interview Answer

LPS tells us how much matched pattern can still be useful after a mismatch.

If a matched substring has a suffix equal to a prefix of the pattern, we do not need to compare that prefix again.

We can jump the pattern pointer to the LPS value.


8️⃣ Build LPS Array


Two Pointers

Use:

i = current index being computed
length = length of current prefix-suffix match

Start with:

i = 1
length = 0

Because:

lps[0] = 0

Logic

If:

pattern[i] == pattern[length]

Then we extend current prefix-suffix:

length += 1
lps[i] = length
i += 1

If mismatch:

length = lps[length - 1]

If length is already 0:

lps[i] = 0
i += 1

👉 Interview Answer

To build the LPS array, I scan the pattern with two pointers.

i is the current position, and length is the length of the current prefix-suffix match.

If characters match, I extend the match.

If they do not match, I fall back using the previous LPS value.


9️⃣ Build LPS Code


Code

def build_lps(pattern):
    m = len(pattern)
    lps = [0] * m

    length = 0
    i = 1

    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1

    return lps

Complexity

Time: O(m)
Space: O(m)

Each pointer moves forward or falls back using precomputed values.


👉 Interview Answer

Building the LPS array takes O(m) time.

Even though the length pointer may fall back, it never causes repeated full rescans.

The LPS array uses O(m) space, where m is the pattern length.


🔟 KMP Search Process


Two Pointers

Use:

i = pointer in text
j = pointer in pattern

Match

If:

text[i] == pattern[j]

Move both:

i += 1
j += 1

Full Match

If:

j == len(pattern)

Pattern found at:

i - j

Mismatch

If mismatch and j != 0:

j = lps[j - 1]

If mismatch and j == 0:

i += 1

👉 Interview Answer

KMP uses two pointers: one for the text and one for the pattern.

On a match, both pointers move forward.

On a mismatch, the text pointer does not move backward.

Instead, the pattern pointer jumps using the LPS array.


1️⃣1️⃣ KMP Search Code


Code

def kmp_search(text, pattern):
    if pattern == "":
        return 0

    n = len(text)
    m = len(pattern)

    lps = build_lps(pattern)

    i = 0
    j = 0

    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == m:
                return i - j
        else:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1

    return -1

Complexity

Time: O(n + m)
Space: O(m)

👉 Interview Answer

KMP search runs in O(n + m).

Building the LPS array takes O(m), and scanning the text takes O(n).

The key property is that the text pointer never moves backward.


1️⃣2️⃣ Implement strStr()


Problem

Return the first index where needle appears in haystack.

If not found, return -1.


Code

def str_str(haystack, needle):
    if needle == "":
        return 0

    lps = build_lps(needle)

    i = 0
    j = 0

    while i < len(haystack):
        if haystack[i] == needle[j]:
            i += 1
            j += 1

            if j == len(needle):
                return i - j
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1

    return -1

👉 Interview Answer

Implement strStr can be solved with KMP.

I preprocess the needle into an LPS array.

Then I scan the haystack using two pointers.

On mismatch, I use LPS to move the pattern pointer instead of restarting from scratch.


1️⃣3️⃣ Find All Occurrences


Problem

Return all starting indices where pattern appears in text.


Key Difference

After finding a match, do not stop.

Instead, continue by resetting:

j = lps[j - 1]

This allows overlapping matches.


Code

def find_all_occurrences(text, pattern):
    if pattern == "":
        return []

    lps = build_lps(pattern)

    result = []
    i = 0
    j = 0

    while i < len(text):
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == len(pattern):
                result.append(i - j)
                j = lps[j - 1]
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1

    return result

Example

text = "aaaa"
pattern = "aa"

Output:

[0, 1, 2]

👉 Interview Answer

To find all occurrences, I continue the KMP search after a match.

After recording a match, I set the pattern pointer to lps[j - 1].

This allows overlapping matches to be found.


1️⃣4️⃣ Repeated Substring Pattern


Problem

Given a string, determine whether it can be built by repeating one of its substrings.

Example:

"abab" -> True
"aba"  -> False
"abcabcabc" -> True

LPS Insight

For string s, let:

longest_prefix_suffix = lps[-1]

If this value is greater than 0 and:

n % (n - longest_prefix_suffix) == 0

then the string is repeated.


Code

def repeated_substring_pattern(s):
    lps = build_lps(s)
    n = len(s)

    longest = lps[-1]
    unit_length = n - longest

    return longest > 0 and n % unit_length == 0

👉 Interview Answer

Repeated Substring Pattern can be solved using the LPS array.

If a string has a long prefix that is also a suffix, it may be composed of repeated units.

The repeated unit length is n - lps[-1].

If n is divisible by that length, then the string is repeated.


1️⃣5️⃣ Why Repeated Substring Works


Example

s = "abcabcabc"

LPS last value:

6

Because:

"abcabc"

is both prefix and suffix.

So unit length:

n - lps[-1] = 9 - 6 = 3

Unit:

"abc"

Since:

9 % 3 == 0

The string is repeated.


👉 Interview Answer

If a string is made by repeating a substring, then it has a large overlap between prefix and suffix.

The LPS last value captures this overlap.

The candidate repeated unit length is the total length minus that overlap.

If the total length is divisible by the unit length, the string can be formed by repeating that unit.


1️⃣6️⃣ Shortest Palindrome


Problem

Given a string s, add characters in front to make it a palindrome with minimum additions.

Example:

s = "aacecaaa"
answer = "aaacecaaa"

KMP Idea

Find the longest palindromic prefix.

Construct:

combined = s + "#" + reversed(s)

The last LPS value gives the length of the longest prefix of s that matches a suffix of reversed(s).

That is the longest palindromic prefix.


Code

def shortest_palindrome(s):
    rev = s[::-1]
    combined = s + "#" + rev

    lps = build_lps(combined)
    longest_pal_prefix = lps[-1]

    suffix = s[longest_pal_prefix:]

    return suffix[::-1] + s

👉 Interview Answer

Shortest Palindrome can be solved by finding the longest palindromic prefix.

I build s + "#" + reversed(s) and compute its LPS array.

The last LPS value gives the longest prefix of s that is also a palindrome.

Then I reverse the remaining suffix and add it to the front.


1️⃣7️⃣ KMP for Prefix-Suffix Problems


Pattern

KMP is not only for substring search.

It is also useful when the problem asks about:

longest prefix that is also suffix
repeated pattern
border of string
periodicity
palindromic prefix

Common Signal

If a problem asks:

prefix equals suffix

think about LPS.


👉 Interview Answer

KMP is useful beyond substring search.

The LPS array directly captures prefix-suffix structure.

So problems involving repeated patterns, string borders, periodicity, or longest prefix-suffix relationships can often be solved with KMP preprocessing.


1️⃣8️⃣ KMP vs Rabin-Karp


KMP

Uses:

prefix-suffix structure
deterministic matching
O(n + m)

Rabin-Karp

Uses:

rolling hash
good for multiple pattern or substring comparison
possible hash collision
average O(n + m)

Comparison

Pattern Best For
KMP Exact single-pattern matching
KMP Prefix-suffix problems
Rabin-Karp Rolling substring hash
Rabin-Karp Many substring comparisons
Trie / Aho-Corasick Many patterns

👉 Interview Answer

KMP is deterministic and uses prefix-suffix information.

Rabin-Karp uses rolling hash and is useful for substring hashing problems.

For exact single-pattern matching, KMP gives guaranteed O(n + m).

For many patterns, Trie-based approaches like Aho-Corasick may be more appropriate.


1️⃣9️⃣ KMP vs Trie


KMP

Best for:

one pattern
one text
exact substring search
prefix-suffix relationship

Trie

Best for:

many words
prefix search
dictionary matching
autocomplete
word search

Key Difference

KMP optimizes matching one pattern against a long text.

Trie organizes many strings by shared prefixes.


👉 Interview Answer

KMP and Trie solve different string problems.

KMP is best when matching one pattern against a text.

Trie is best when storing many words and answering prefix-based queries.

If the problem is about prefix search over a dictionary, Trie is usually better.

If it is about finding a pattern inside a text, KMP is a strong choice.


2️⃣0️⃣ KMP vs Two Pointers


Two Pointers

Good when:

characters are consumed once
simple comparison
palindrome check
subsequence check

KMP

Good when:

pattern may partially match then fail
need avoid rechecking
substring matching
prefix-suffix reuse

Example

Subsequence:

"abc" in "ahbgdc"

Use two pointers.

Substring:

"ll" in "hello"

Use string matching.


👉 Interview Answer

Two pointers are enough when each character can be consumed once without backtracking.

KMP is useful when a pattern can partially match and then fail, because it reuses matched prefix information.

For subsequence problems, two pointers are usually simpler.

For substring matching, KMP is more appropriate.


2️⃣1️⃣ Common Edge Cases


Empty Pattern

Usually:

return 0

for strStr.


Pattern Longer Than Text

Impossible to match.

Return:

-1

Pattern Equals Text

Return:

0

Repeated Characters

Example:

text = "aaaaaa"
pattern = "aaa"

KMP should handle overlapping matches.


No Match

Return:

-1

Single Character Pattern

LPS is:

[0]

👉 Interview Answer

Common KMP edge cases include empty pattern, pattern longer than text, pattern equal to text, repeated characters, overlapping matches, no match, and single-character patterns.

Empty pattern behavior depends on the problem, but for strStr it usually returns 0.


2️⃣2️⃣ Common Mistakes


Mistake 1: Confusing Prefix With Suffix

LPS is not any repeated substring.

It is:

longest proper prefix that is also suffix

Mistake 2: Moving Text Pointer Backward

KMP never moves the text pointer backward.

Only the pattern pointer moves using LPS.


Mistake 3: Wrong Fallback in LPS Build

On mismatch during LPS build:

length = lps[length - 1]

not:

length -= 1

Mistake 4: Forgetting j = lps[j - 1] After Match

Needed for finding all occurrences and overlapping matches.


Mistake 5: Off-by-One Return Index

When full match found:

return i - j

Mistake 6: Not Handling Empty Pattern

May cause index error.


👉 Interview Answer

Common KMP mistakes include misunderstanding LPS, moving the text pointer backward, using the wrong fallback while building LPS, forgetting to reset j after a full match, returning the wrong start index, and not handling empty patterns.


2️⃣3️⃣ When KMP Applies


Good Fit

Use KMP when the problem asks for:

find substring
first occurrence of pattern
all occurrences of pattern
exact pattern matching
longest prefix also suffix
repeated substring pattern
string periodicity
palindromic prefix

Problem Signals

Look for words like:

pattern
substring
needle
haystack
prefix
suffix
repeated
periodic
match
occurrence

👉 Interview Answer

I consider KMP when the problem involves exact substring matching or prefix-suffix structure.

Common signals include pattern search, first occurrence, all occurrences, repeated substring, longest prefix-suffix, and string periodicity.


2️⃣4️⃣ When KMP Does Not Apply


Not Good Fit

KMP may not be ideal when:

need edit distance
need approximate matching
need anagram matching
need many dictionary words
need subsequence checking
need regex matching

Better Alternatives

Problem Type Better Pattern
Subsequence check Two Pointers
Anagram matching Sliding Window
Many words prefix search Trie
Many pattern matching Aho-Corasick
Approximate matching Dynamic Programming
Regex matching DP / Automaton
Palindrome center expansion Two Pointers

👉 Interview Answer

KMP is for exact pattern matching and prefix-suffix reuse.

If the problem is about subsequences, sliding-window anagrams, dictionary prefix search, edit distance, or regex matching, other patterns are usually better.


2️⃣5️⃣ Standard Templates


Build LPS Template

def build_lps(pattern):
    lps = [0] * len(pattern)

    length = 0
    i = 1

    while i < len(pattern):
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length > 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1

    return lps

KMP First Match Template

def kmp_search(text, pattern):
    if pattern == "":
        return 0

    lps = build_lps(pattern)

    i = 0
    j = 0

    while i < len(text):
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == len(pattern):
                return i - j
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1

    return -1

KMP All Matches Template

def kmp_all_matches(text, pattern):
    if pattern == "":
        return []

    lps = build_lps(pattern)

    result = []
    i = 0
    j = 0

    while i < len(text):
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == len(pattern):
                result.append(i - j)
                j = lps[j - 1]
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1

    return result

Repeated Substring Template

def repeated_substring_pattern(s):
    lps = build_lps(s)

    longest = lps[-1]
    unit_length = len(s) - longest

    return longest > 0 and len(s) % unit_length == 0

Shortest Palindrome Template

def shortest_palindrome(s):
    rev = s[::-1]
    combined = s + "#" + rev

    lps = build_lps(combined)
    longest = lps[-1]

    return s[longest:][::-1] + s

🧠 Staff-Level Answer Final


👉 Interview Answer Full Version

String matching is the problem of finding whether a pattern appears inside a text, and if so, where it appears.

The brute force approach tries every possible starting position in the text and compares the pattern character by character. In the worst case, this takes O(n * m) time.

KMP improves this by avoiding repeated comparisons. It preprocesses the pattern into an LPS array, where LPS means longest proper prefix that is also suffix.

For each position in the pattern, the LPS value tells us the length of the longest prefix of the pattern that is also a suffix of the pattern substring ending at that position.

During search, KMP uses two pointers: one pointer for the text and one pointer for the pattern. If characters match, both pointers move forward. If characters mismatch, the text pointer does not move backward. Instead, the pattern pointer jumps to lps[j - 1].

This is the key reason KMP runs in O(n + m): each text character is processed forward, and pattern fallback is controlled by the LPS array.

KMP is useful for exact substring search, first occurrence, all occurrences, and overlapping matches. It is also useful for prefix-suffix problems, such as repeated substring pattern, string periodicity, and shortest palindrome.

KMP is not always the best tool. For subsequence problems, two pointers are simpler. For anagram search, sliding window is better. For many dictionary words, Trie or Aho-Corasick may be better. For edit distance or approximate matching, dynamic programming is more appropriate.

At a senior level, I would explain: why brute force repeats work, what the LPS array represents, how LPS is built, how mismatch fallback works, why the text pointer never moves backward, how overlapping matches are handled, and when KMP should be replaced by a simpler or more specialized technique.


⭐ Final Insight

KMP 的核心不是“背代码”。

它的核心是理解:

pattern 自己内部有什么 prefix-suffix overlap, mismatch 之后哪些已经匹配的字符还能复用。

Senior-level 的重点是:

LPS 到底表示什么, 为什么是 proper prefix 和 suffix, mismatch 时为什么 j = lps[j - 1], text pointer 为什么不回退, 如何找 overlapping matches, repeated substring 为什么可以用 lps[-1], 以及什么时候 KMP 不如 sliding window、two pointers 或 Trie。

能讲清楚这些, KMP 就是一种处理 exact substring matching 和 prefix-suffix reuse 的核心字符串算法。



中文部分


🎯 String Matching (KMP)


1️⃣ 核心框架

讨论 String Matching (KMP) 时,我通常从:

  1. String matching 是什么
  2. Brute force string matching
  3. 为什么 brute force 低效
  4. Prefix 和 suffix 思想
  5. LPS array
  6. KMP 如何使用 LPS
  7. Build LPS array
  8. KMP search process
  9. 常见 KMP problems
  10. 高级工程师解释方式

2️⃣ 要解决什么问题?

String matching 问的是:

pattern 是否存在于 text 中?
如果存在,它从哪里开始?

它常用于:

核心思想是:

KMP 通过利用 pattern 自身的信息,
避免重复检查已经匹配过的 characters。

👉 面试回答

String matching 是在 text 中查找 pattern 是否出现的问题。

Brute force 会在每个可能的起点重新比较 pattern。

KMP 通过预处理 pattern 的 prefix-suffix 信息来优化。

当 mismatch 发生时, KMP 不会把 text pointer 往回移动。 它会使用 LPS array 判断 pattern pointer 应该跳到哪里。


3️⃣ Brute Force String Matching


Problem

给定:

text = "hello"
pattern = "ll"

返回:

2

因为 "ll" 从 index 2 开始。


Brute Force Idea

尝试 text 中每个 starting position。

对每个位置, 逐个比较 pattern characters。


Code

def str_str(text, pattern):
    if pattern == "":
        return 0

    n = len(text)
    m = len(pattern)

    for i in range(n - m + 1):
        j = 0

        while j < m and text[i + j] == pattern[j]:
            j += 1

        if j == m:
            return i

    return -1

Complexity

Time: O(n * m)
Space: O(1)

👉 面试回答

Brute force solution 会尝试 text 中每个可能的起点。

在每个起点, 它逐个 character 比较 pattern。

最坏情况下, time complexity 是 O(n * m), 其中 n 是 text length, m 是 pattern length。


4️⃣ 为什么 Brute Force 低效?


Example

text    = "aaaaaaaaab"
pattern = "aaaab"

Brute force 会反复比较很多相同的 characters。

每个 starting position 都会匹配多个 a, 然后在接近结尾处失败。


Problem

Brute force 忘掉了有用的信息。

如果我们已经匹配了:

aaaa

然后失败, 不应该完全从头开始。

Pattern 内部可能存在可以复用的 overlap。


KMP Improvement

KMP 问的是:

Mismatch 后,
已经匹配的部分中,
最长的 pattern prefix 同时也是 suffix 是多少?

这样可以智能移动 pattern。


👉 面试回答

Brute force 低效, 因为它会重新检查已经匹配过的 characters。

KMP 利用 pattern 的 prefix-suffix 信息避免重复比较。

当 mismatch 发生时, KMP 知道之前匹配的部分有多少还能继续复用, 所以可以减少无效比较。


5️⃣ Prefix and Suffix


Prefix

Prefix 是从开头开始的 substring。

对于:

"abab"

Prefixes 是:

"a"
"ab"
"aba"
"abab"

Suffix

Suffix 是以结尾结束的 substring。

对于:

"abab"

Suffixes 是:

"b"
"ab"
"bab"
"abab"

Proper Prefix / Proper Suffix

KMP 中使用 proper prefix 和 proper suffix。

意思是不能包含整个 string 本身。

对于:

"abab"

Longest proper prefix that is also suffix 是:

"ab"

长度是:

2

👉 面试回答

KMP 基于 prefix 和 suffix 的 overlap。

对 pattern 的某个 prefix substring, 我们想知道最长的 proper prefix, 同时也是 suffix。

这个信息告诉我们 mismatch 后有多少 pattern 可以复用。


6️⃣ LPS Array


What Is LPS?

LPS 表示:

Longest Proper Prefix which is also Suffix

对每个 index ilps[i] 表示:

pattern[0:i+1] 中,
最长的 proper prefix 同时也是 suffix 的长度

Example

pattern = "ababaca"

LPS:

index:   0 1 2 3 4 5 6
char:    a b a b a c a
lps:     0 0 1 2 3 0 1

Meaning

对于 substring:

"ababa"

Longest proper prefix that is also suffix 是:

"aba"

所以:

lps[4] = 3

👉 面试回答

LPS array 存的是 pattern 每个 prefix 中, 最长 proper prefix 同时也是 suffix 的长度。

它是 KMP 的核心预处理步骤。

Search 过程中发生 mismatch 时, LPS array 会告诉我们 pattern pointer 应该跳到哪里继续。


7️⃣ LPS 的直觉


Pattern

pattern = "ababaca"

假设已经匹配:

ababa

然后发生 mismatch。

已经匹配的部分是:

ababa

它的最长 prefix-suffix 是:

aba

所以不需要从 pattern index 0 重新开始。

可以从 pattern index 3 继续。


Why?

因为 text 中已经匹配的 suffix:

aba

可以作为下一次可能匹配的 pattern prefix。


👉 面试回答

LPS 告诉我们 mismatch 后, 已经匹配过的 pattern 中哪些部分仍然有用。

如果已匹配 substring 的 suffix 等于 pattern 的 prefix, 就不需要重新比较这个 prefix。

可以直接把 pattern pointer 跳到 LPS value。


8️⃣ Build LPS Array


Two Pointers

使用:

i = 当前正在计算的位置
length = 当前 prefix-suffix match 的长度

初始:

i = 1
length = 0

因为:

lps[0] = 0

Logic

如果:

pattern[i] == pattern[length]

说明可以扩展当前 prefix-suffix:

length += 1
lps[i] = length
i += 1

如果 mismatch:

length = lps[length - 1]

如果 length 已经是 0:

lps[i] = 0
i += 1

👉 面试回答

构建 LPS array 时, 我用两个 pointers 扫描 pattern。

i 表示当前正在计算的位置, length 表示当前 prefix-suffix match 的长度。

如果 characters match, 就扩展 match。

如果 mismatch, 就通过之前的 LPS value 回退 length。


9️⃣ Build LPS Code


Code

def build_lps(pattern):
    m = len(pattern)
    lps = [0] * m

    length = 0
    i = 1

    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1

    return lps

Complexity

Time: O(m)
Space: O(m)

每个 pointer 要么前进, 要么通过已经计算好的 LPS 回退。


👉 面试回答

构建 LPS array 是 O(m) time。

虽然 length pointer 会 fallback, 但不会导致完整重新扫描。

LPS array 使用 O(m) space, m 是 pattern length。


🔟 KMP Search Process


Two Pointers

使用:

i = text pointer
j = pattern pointer

Match

如果:

text[i] == pattern[j]

两个 pointer 都前进:

i += 1
j += 1

Full Match

如果:

j == len(pattern)

Pattern 起始位置是:

i - j

Mismatch

如果 mismatch 且 j != 0

j = lps[j - 1]

如果 mismatch 且 j == 0

i += 1

👉 面试回答

KMP 使用两个 pointers: 一个指向 text, 一个指向 pattern。

如果 characters match, 两个 pointers 都向前走。

如果 mismatch, text pointer 不回退。

Pattern pointer 会根据 LPS array 跳转。


1️⃣1️⃣ KMP Search Code


Code

def kmp_search(text, pattern):
    if pattern == "":
        return 0

    n = len(text)
    m = len(pattern)

    lps = build_lps(pattern)

    i = 0
    j = 0

    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == m:
                return i - j
        else:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1

    return -1

Complexity

Time: O(n + m)
Space: O(m)

👉 面试回答

KMP search 是 O(n + m)。

构建 LPS array 是 O(m), 扫描 text 是 O(n)。

关键是 text pointer 永远不会往回走。


1️⃣2️⃣ Implement strStr()


Problem

返回 needlehaystack 中第一次出现的 index。

如果没有出现, 返回 -1


Code

def str_str(haystack, needle):
    if needle == "":
        return 0

    lps = build_lps(needle)

    i = 0
    j = 0

    while i < len(haystack):
        if haystack[i] == needle[j]:
            i += 1
            j += 1

            if j == len(needle):
                return i - j
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1

    return -1

👉 面试回答

Implement strStr 可以用 KMP。

我先把 needle 预处理成 LPS array。

然后用两个 pointers 扫描 haystack。

当 mismatch 发生时, 使用 LPS 移动 pattern pointer, 而不是完全重新开始。


1️⃣3️⃣ Find All Occurrences


Problem

返回 pattern 在 text 中出现的所有 starting indices。


Key Difference

找到一个 match 后不要停止。

而是继续:

j = lps[j - 1]

这样可以处理 overlapping matches。


Code

def find_all_occurrences(text, pattern):
    if pattern == "":
        return []

    lps = build_lps(pattern)

    result = []
    i = 0
    j = 0

    while i < len(text):
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == len(pattern):
                result.append(i - j)
                j = lps[j - 1]
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1

    return result

Example

text = "aaaa"
pattern = "aa"

Output:

[0, 1, 2]

👉 面试回答

要找到所有 occurrences, KMP 在找到一个 match 后继续搜索。

记录 match 后, 把 pattern pointer 设置成 lps[j - 1]

这样可以找到 overlapping matches。


1️⃣4️⃣ Repeated Substring Pattern


Problem

给定一个 string, 判断它是否可以由某个 substring 重复多次构成。

Example:

"abab" -> True
"aba"  -> False
"abcabcabc" -> True

LPS Insight

对 string s, 令:

longest_prefix_suffix = lps[-1]

如果这个值大于 0, 并且:

n % (n - longest_prefix_suffix) == 0

说明 string 是重复构成的。


Code

def repeated_substring_pattern(s):
    lps = build_lps(s)
    n = len(s)

    longest = lps[-1]
    unit_length = n - longest

    return longest > 0 and n % unit_length == 0

👉 面试回答

Repeated Substring Pattern 可以用 LPS array。

如果一个 string 有很长的 prefix 同时也是 suffix, 它可能由重复 unit 构成。

Repeated unit length 是 n - lps[-1]

如果 n 可以被这个 length 整除, 说明 string 可以由该 unit 重复构成。


1️⃣5️⃣ Why Repeated Substring Works


Example

s = "abcabcabc"

LPS last value:

6

因为:

"abcabc"

同时是 prefix 和 suffix。

所以 unit length 是:

n - lps[-1] = 9 - 6 = 3

Unit 是:

"abc"

因为:

9 % 3 == 0

所以 string 是 repeated pattern。


👉 面试回答

如果一个 string 是由 substring 重复构成的, 它的 prefix 和 suffix 会有很大的 overlap。

LPS 的最后一个值捕捉了这个 overlap。

候选 repeated unit length 是 total length 减去这个 overlap。

如果 total length 能被 unit length 整除, 说明可以由这个 unit 重复构成。


1️⃣6️⃣ Shortest Palindrome


Problem

给定 string s, 在前面添加最少 characters, 使它变成 palindrome。

Example:

s = "aacecaaa"
answer = "aaacecaaa"

KMP Idea

找到 longest palindromic prefix。

构造:

combined = s + "#" + reversed(s)

最后一个 LPS value 表示: s 的最长 prefix, 同时匹配 reversed(s) 的 suffix。

这就是 longest palindromic prefix。


Code

def shortest_palindrome(s):
    rev = s[::-1]
    combined = s + "#" + rev

    lps = build_lps(combined)
    longest_pal_prefix = lps[-1]

    suffix = s[longest_pal_prefix:]

    return suffix[::-1] + s

👉 面试回答

Shortest Palindrome 可以转化为找到 longest palindromic prefix。

我构造 s + "#" + reversed(s), 然后计算 LPS array。

最后一个 LPS value 就是 s 的最长 palindromic prefix 长度。

然后把剩余 suffix reverse 后加到前面。


1️⃣7️⃣ KMP for Prefix-Suffix Problems


Pattern

KMP 不只用于 substring search。

它也适合处理:

longest prefix that is also suffix
repeated pattern
border of string
periodicity
palindromic prefix

Common Signal

如果题目问:

prefix equals suffix

就可以考虑 LPS。


👉 面试回答

KMP 不只是 substring search。

LPS array 本身就表示 prefix-suffix structure。

所以涉及 repeated pattern、string border、 periodicity 或 longest prefix-suffix relationship 的问题, 经常可以用 KMP preprocessing 解决。


1️⃣8️⃣ KMP vs Rabin-Karp


KMP

使用:

prefix-suffix structure
deterministic matching
O(n + m)

Rabin-Karp

使用:

rolling hash
good for multiple pattern or substring comparison
possible hash collision
average O(n + m)

Comparison

Pattern Best For
KMP Exact single-pattern matching
KMP Prefix-suffix problems
Rabin-Karp Rolling substring hash
Rabin-Karp Many substring comparisons
Trie / Aho-Corasick Many patterns

👉 面试回答

KMP 是 deterministic algorithm, 使用 prefix-suffix information。

Rabin-Karp 使用 rolling hash, 适合 substring hashing 问题。

对 exact single-pattern matching, KMP 提供 guaranteed O(n + m)。

如果是 many patterns, Trie-based approaches,比如 Aho-Corasick, 可能更合适。


1️⃣9️⃣ KMP vs Trie


KMP

适合:

one pattern
one text
exact substring search
prefix-suffix relationship

Trie

适合:

many words
prefix search
dictionary matching
autocomplete
word search

Key Difference

KMP 优化的是一个 pattern 和一个长 text 的匹配。

Trie 组织的是很多 strings 的 shared prefixes。


👉 面试回答

KMP 和 Trie 解决的是不同 string problems。

KMP 适合把一个 pattern 放到 text 中匹配。

Trie 适合存很多 words, 并做 prefix-based queries。

如果问题是 dictionary prefix search, Trie 通常更好。

如果问题是在 text 中找 pattern, KMP 是很好的选择。


2️⃣0️⃣ KMP vs Two Pointers


Two Pointers

适合:

characters are consumed once
simple comparison
palindrome check
subsequence check

KMP

适合:

pattern may partially match then fail
need avoid rechecking
substring matching
prefix-suffix reuse

Example

Subsequence:

"abc" in "ahbgdc"

用 two pointers。

Substring:

"ll" in "hello"

用 string matching。


👉 面试回答

当每个 character 只需要被消费一次, 不需要复杂回退时, two pointers 通常足够。

KMP 适合 pattern 会部分匹配然后失败的情况, 因为它可以复用已经匹配的 prefix 信息。

对 subsequence problems, two pointers 更简单。

对 substring matching, KMP 更合适。


2️⃣1️⃣ 常见 Edge Cases


Empty Pattern

通常:

return 0

对于 strStr


Pattern Longer Than Text

不可能匹配。

返回:

-1

Pattern Equals Text

返回:

0

Repeated Characters

Example:

text = "aaaaaa"
pattern = "aaa"

KMP 应该能处理 overlapping matches。


No Match

返回:

-1

Single Character Pattern

LPS 是:

[0]

👉 面试回答

KMP 常见 edge cases 包括 empty pattern、 pattern longer than text、 pattern equal to text、 repeated characters、 overlapping matches、 no match, 以及 single-character pattern。

Empty pattern 的行为取决于题目, 但 strStr 中通常返回 0。


2️⃣2️⃣ 常见错误


Mistake 1: 混淆 Prefix 和 Suffix

LPS 不是任意 repeated substring。

它是:

longest proper prefix that is also suffix

Mistake 2: 移动 Text Pointer Backward

KMP 不会让 text pointer 后退。

只会通过 LPS 移动 pattern pointer。


Mistake 3: Build LPS 时 Fallback 错误

LPS build mismatch 时应该:

length = lps[length - 1]

不是:

length -= 1

Mistake 4: Match 后忘记 j = lps[j - 1]

找 all occurrences 和 overlapping matches 时需要。


Mistake 5: Return Index Off-by-One

找到完整 match 时:

return i - j

Mistake 6: 没处理 Empty Pattern

可能导致 index error。


👉 面试回答

KMP 常见错误包括: 误解 LPS、 让 text pointer 后退、 build LPS 时使用错误 fallback、 完整 match 后忘记 reset j、 返回 start index 时 off-by-one, 以及没有处理 empty pattern。


2️⃣3️⃣ 什么时候 KMP 适用?


Good Fit

当题目问:

find substring
first occurrence of pattern
all occurrences of pattern
exact pattern matching
longest prefix also suffix
repeated substring pattern
string periodicity
palindromic prefix

Problem Signals

看到这些关键词想到 KMP:

pattern
substring
needle
haystack
prefix
suffix
repeated
periodic
match
occurrence

👉 面试回答

当问题涉及 exact substring matching 或 prefix-suffix structure 时, 我会考虑 KMP。

常见信号包括 pattern search、first occurrence、 all occurrences、repeated substring、 longest prefix-suffix 和 string periodicity。


2️⃣4️⃣ 什么时候 KMP 不适用?


Not Good Fit

KMP 不一定适合:

need edit distance
need approximate matching
need anagram matching
need many dictionary words
need subsequence checking
need regex matching

Better Alternatives

Problem Type Better Pattern
Subsequence check Two Pointers
Anagram matching Sliding Window
Many words prefix search Trie
Many pattern matching Aho-Corasick
Approximate matching Dynamic Programming
Regex matching DP / Automaton
Palindrome center expansion Two Pointers

👉 面试回答

KMP 适合 exact pattern matching 和 prefix-suffix reuse。

如果问题是 subsequence、sliding-window anagram、 dictionary prefix search、edit distance, 或 regex matching, 通常有更合适的 pattern。


2️⃣5️⃣ 标准模板


Build LPS Template

def build_lps(pattern):
    lps = [0] * len(pattern)

    length = 0
    i = 1

    while i < len(pattern):
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length > 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1

    return lps

KMP First Match Template

def kmp_search(text, pattern):
    if pattern == "":
        return 0

    lps = build_lps(pattern)

    i = 0
    j = 0

    while i < len(text):
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == len(pattern):
                return i - j
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1

    return -1

KMP All Matches Template

def kmp_all_matches(text, pattern):
    if pattern == "":
        return []

    lps = build_lps(pattern)

    result = []
    i = 0
    j = 0

    while i < len(text):
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == len(pattern):
                result.append(i - j)
                j = lps[j - 1]
        else:
            if j > 0:
                j = lps[j - 1]
            else:
                i += 1

    return result

Repeated Substring Template

def repeated_substring_pattern(s):
    lps = build_lps(s)

    longest = lps[-1]
    unit_length = len(s) - longest

    return longest > 0 and len(s) % unit_length == 0

Shortest Palindrome Template

def shortest_palindrome(s):
    rev = s[::-1]
    combined = s + "#" + rev

    lps = build_lps(combined)
    longest = lps[-1]

    return s[longest:][::-1] + s

🧠 Staff-Level Answer Final


👉 面试回答完整版本

String matching 是在 text 中查找 pattern 是否出现, 以及出现在哪里的问题。

Brute force approach 会尝试 text 中每一个 possible starting position, 然后逐个 character 比较 pattern。 最坏情况下, 它是 O(n * m) time。

KMP 的优化点是避免重复比较。 它会把 pattern 预处理成 LPS array。 LPS 表示 longest proper prefix that is also suffix。

对 pattern 的每个位置, LPS value 表示 pattern 当前 prefix substring 中, 最长的 prefix 同时也是 suffix 的长度。

Search 时, KMP 使用两个 pointers: 一个指向 text, 一个指向 pattern。 如果 characters match, 两个 pointers 都前进。 如果 mismatch, text pointer 不会回退。 Pattern pointer 会跳到 lps[j - 1]

这是 KMP 能达到 O(n + m) 的关键: 每个 text character 都只向前处理, pattern fallback 由 LPS array 控制。

KMP 适合 exact substring search、 first occurrence、 all occurrences, 以及 overlapping matches。 它也适合 prefix-suffix problems, 比如 repeated substring pattern、 string periodicity 和 shortest palindrome。

KMP 不一定总是最好的工具。 对 subsequence problems, two pointers 更简单。 对 anagram search, sliding window 更好。 对 many dictionary words, Trie 或 Aho-Corasick 更合适。 对 edit distance 或 approximate matching, dynamic programming 更合适。

高级工程师解释 KMP 时, 我会讲清楚: brute force 为什么重复做功, LPS array 表示什么, LPS 如何构建, mismatch fallback 如何工作, 为什么 text pointer 不回退, overlapping matches 如何处理, 以及什么时候 KMP 应该被更简单或更专业的技术替代。


⭐ Final Insight

KMP 的核心不是“背代码”。

它的核心是理解:

pattern 自己内部有什么 prefix-suffix overlap, mismatch 之后哪些已经匹配的字符还能复用。

Senior-level 的重点是:

LPS 到底表示什么, 为什么是 proper prefix 和 suffix, mismatch 时为什么 j = lps[j - 1], text pointer 为什么不回退, 如何找 overlapping matches, repeated substring 为什么可以用 lps[-1], 以及什么时候 KMP 不如 sliding window、two pointers 或 Trie。

能讲清楚这些, KMP 就是一种处理 exact substring matching 和 prefix-suffix reuse 的核心字符串算法。

Implement