187. Repeated DNA Sequences
Bit Maniputation, Hash Table, String, Sliding Window, Hash Function, Rolling Hash ·Problem Statement
link: https://leetcode.com/problems/repeated-dna-sequences/ https://leetcode.cn/problems/repeated-dna-sequences/
The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.
For example, “ACGAATTCCG” is a DNA sequence. When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]
Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]
Solution Approach
Use a hash table to track the frequency of every 10-letter sequence in the DNA.
Algorithm
- Initialize a dictionary to keep a count of each 10-letter substring encountered in the DNA sequence.
- Loop through the string and for each 10-letter substring, update its count in the dictionary.
- Filter out and return the sequences that have a count greater than or equal to 2.
Implement
class Solution:
def findRepeatedDnaSequences(self, s: str) -> List[str]:
dic = defaultdict(int)
res = []
for i in range(len(s) - 9):
sub = s[i:i + 10]
dic[sub] += 1
print(dic)
for k, v in dic.items():
if v >= 2:
res.append(k)
return res