LeetCode 187. Repeated DNA Sequences

Post by ailswan Oct.25, 2023

187. Repeated DNA Sequences

Problem Statement

link: https://leetcode.com/problems/repeated-dna-sequences/ https://leetcode.cn/problems/repeated-dna-sequences/

The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.

For example, “ACGAATTCCG” is a DNA sequence. When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC","CCCCCAAAAA"]

Input: s = "AAAAAAAAAAAAA" Output: ["AAAAAAAAAA"]

Solution Approach

Use a hash table to track the frequency of every 10-letter sequence in the DNA.

Algorithm

  1. Initialize a dictionary to keep a count of each 10-letter substring encountered in the DNA sequence.
  2. Loop through the string and for each 10-letter substring, update its count in the dictionary.
  3. Filter out and return the sequences that have a count greater than or equal to 2.

Implement

    class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        dic = defaultdict(int)
        res = []

        for i in range(len(s) - 9):
            sub = s[i:i + 10]
            dic[sub] += 1
        print(dic)
        for k, v in dic.items():
            if v >= 2:
                res.append(k)
        return res