如何在Trie数据结构中使用字符串频率列表?

我正在对各种数据结构进行一些性能测试.在我的列表中,我有HashMap和Trie数据结构.我完成了HashMap但不确定如何使用Trie来解决以下问题 –

我有一个文本文件,其中包含200万个英文单词,其频率为此格式 –

hello 100
world 5000
good 2000
bad 9000
...

现在我逐行读取这个文件并将其存储在HashMap中 – 第一个拆分字符串作为HashMap中的键,下一个拆分字符串作为HashMap中的值,因此我能够使用下面的代码测量插入性能.

Map<String, String> wordTest = new HashMap<String, String>();

try {
        fis = new FileInputStream(FILE_LOCATION);
        reader = new BufferedReader(new InputStreamReader(fis));

        String line = reader.readLine();
        while (line != null) {
        String[] splitString = line.split("\\s+");
        // now put it in HashMap as key value  pair
        wordTest.put(splitString[0].toLowerCase().trim(), splitString[1].trim());

        line = reader.readLine();
    }
}

现在,我将如何实现Trie数据结构以在Trie中加载与HashMap相同的内容?然后在String上执行查找基础?这是我第一次使用Trie数据结构这么一点点混淆.

更新: –

下面是我的TrieImpl类

public class TrieImpl {

    //root node
    private TrieNode r;

    public TrieImpl() {
        r = new TrieNode();
    }

    public boolean has(String word) {
        return r.has(word);
    }

    public void insert(String word){
        r.insert(word);
    }

    public String toString() {
        return r.toString();
    }

    public static void main(String[] args) {

        TrieImpl t = new TrieImpl();

        System.out.println("Testing some strings");
        t.insert("HELLO"); // how do I pass string and its count
        t.insert("WORLD"); // how do I pass string and its count

    }
}

以下是我的TrieNode类 –

public class TrieNode {

    // make child nodes
    private TrieNode[] c;
    // flag for end of word
    private boolean flag = false;

    public TrieNode() {
        c = new TrieNode[26]; // 1 for each letter in alphabet
    }

    protected void insert(String word) {
        int val = word.charAt(0) - 64;

        // if the value of the child node at val is null, make a new node
        // there to represent the letter
        if (c[val] == null) {
            c[val] = new TrieNode();
        }

        // if word length > 1, then word is not finished being added.
        // otherwise, set the flag to true so we know a word ends there.
        if (word.length() > 1) {
            c[val].insert(word.substring(1));
        } else {
            c[val].flag = true;
        }
    }

    public boolean has(String word) {
        int val = word.charAt(0) - 64;
        if (c[val] != null && word.length() > 1) {
            c[val].has(word.substring(1));
        } else if (c[val].flag == true && word.length() == 1) {
            return true;
        }

        return false;
    }

    public String toString() {
        return "";
    }
}

现在,我将如何扩展它以传递特定的字符串及其计数,然后对String进行查找?

最佳答案 您只需向TrieNode类添加元素频率即可.

public class TrieNode {

    // make child nodes
    private TrieNode[] c;
    // flag for end of word
    private boolean flag = false;
    //stores frequency if flag is set
    private int frequency;

现在在insert方法中,在设置flag..change方法签名时适当添加频率

protected void insert(String word, int frequency) {
    int val = word.charAt(0) - 64;
    ..........
    ..........
    // if the value of the child node at val is null, make a new nod
    if (word.length() > 1) {
        c[val].insert(word.substring(1),frequency);
    } else {
        c[val].flag = true;
        c[val].frequency = frequency;
    }
}

现在创建一个新的方法来获取频率.它可以类似于has方法,你跟随分支直到结束,最后当你发现标志设置时,返回频率.

public int getFreq(String word) {
    int val = word.charAt(0) - 64;
    if (word.length() > 1) {
        return c[val].getFreq(word.substring(1));
    } else if (c[val].flag == true && word.length() == 1) {
        return c[val].frequency;
    } else
        return -1;
}

– – – – – – – – – – – – – – – -编辑 – – – – – – – – – ——

首先使用has方法检查字符串,然后使用getFreq方法

    public int getFreq(String word) {
        if(has(word))
            return getFreqHelper(word);
        else
            return -1; //this indicates word is not present
    }

    private int getFreqHelper(String word) {
        int val = word.charAt(0) - 64;
        if (word.length() > 1) {
            return c[val].getFreq(word.substring(1));
        } else if (c[val].flag == true && word.length() == 1) {
            return c[val].frequency;
        } else
        return -1;
}
点赞