Leetcode - Repeated DNA Sequences

My code:

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        ArrayList<String> ret = new ArrayList<String>();
        if (s == null || s.length() <= 10)
            return ret;
        HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
        for (int i = 0; i <= s.length() - 10; i++) {
            String tmp = s.substring(i, i + 10);
            int key = toNum(tmp);
            if (!map.containsKey(key)) {
                map.put(key, 1);
            }
            else {
                int val = map.get(key);
                val++;
                map.put(key, val);
                if (val == 2)
                    ret.add(tmp);
            }
        }
        return ret;
    }
    
    private int toNum(String tmp) {
        int ret = 0;
        for (int i = 0; i < tmp.length(); i++) {
            char temp = tmp.charAt(i);
            int b = 0;
            switch (temp) {
            case 'A':
                b |= 0;
                break;
            case 'C':
                b |= 1;
                break;
            case 'G':
                b |= 2;
                break;
            case 'T':
                b |= 3;
                break;
            default:
                break;
            }
            ret |= b;
            ret = (ret << 2);
        }
        return ret;
    }
}

这道题木我也没做出来。状态不是很好。
然后看了解法,感觉实在是好暴力。。。
做法就是,
把一个字符串,每个10位都存放到hashmap中。
也就是不停地移动一个字母,然后把接下来的10位字符串存入hashmap并且记录次数。
然后如果key是string,内存会爆掉。
于是将string编码。
A: 1
C: 2
G: 3
T: 4
然后一个10位的字符串可以由一个20比特的数字表示,也就是一个int(32位)来表示。
然后就可以做了。
我参考了这个博客。
http://betterpoetrythancode.blogspot.com/2015/02/repeated-dna-sequences-leetcode-bit.html

**
总结:
hashmap, 编码 <- bit manipulation
**
Anyway, Good luck, Richardo!

My code:

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> ret = new ArrayList<String>();
        if (s == null || s.length() == 0) {
            return ret;
        }
        
        int end = 10;
        HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
        HashSet<String> repeat = new HashSet<String>();
        while (end <= s.length()) {
            String sub = s.substring(end - 10, end);
            int hashcode = hashCode(sub);
            if (!map.containsKey(hashcode)) {
                map.put(hashcode, 1);
            }
            else {
                map.put(hashcode, map.get(hashcode) + 1);
                if (!repeat.contains(sub)) {
                    repeat.add(sub);
                    ret.add(sub);
                }
            }
            end++;
        }
        
        return ret;
    }
    
    private int hashCode(String s) {
        int ret = 0;
        for (int i = 0; i < s.length(); i++) {
            char curr = s.charAt(i);
            int b = 0;
            switch (curr) {
                case 'A':
                    b |= 0;
                    break;
                case 'C':
                    b |= 1;
                    break;
                case 'G':
                    b |= 2;
                    break;
                case 'T':
                    b |= 3;
                    break;
                default:
                    break;
            }
            ret |= b;
            ret = (ret << 2);
        }
        return ret;
    }
}

reference:
https://discuss.leetcode.com/topic/8894/clean-java-solution-hashmap-bits-manipulation

一开始自己写出来了纯 HashMap的解法。很简单。

然后以为 Bit manipulation 是加速的,其实不是,是用来节省内存空间的,类似于一种 hashcode

因为 HashMap key is string, memory cost is big
each string has 10 characters.
each character in Java is 16 bit
one string should domain 160 bits

If we use Integer rather than String as key, the cost is 32 bits per key

So how to use Integer to represent a 10-character string?

We try to design some hash functions.

A = 0X00
B = 0X01
C = 0X10
D = 0X11

根据每个character,然后不停地异或,不停地移位,向左移两位,
得出这个integer。
其他的差不多了。为了避免重复,还需要多加一个set
以前的做法,没加,看来testcase有问题。

Anyway, Good luck, Richardo! — 09/22/2016

    原文作者:Richardo92
    原文地址: https://www.jianshu.com/p/f7d38129c04a#comments
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞