Trie树（字典树）的C++实现

2019年3月16日 286次阅读来源: Trie树

问题描述：

Trie树

又称单词查找树，是一种树形结构，是一种哈希树的变种。典型应用是用于统计，排序和保存大量的字符串（但不仅限于字符串），所以经常被搜索引擎系统用于文本词频统计。

举个例子：os,oh,old,char,chat这些关键词构成的trie树：

root

/ \

c o

/ / \

h s h

/ \

r t

trie树特点：

①根节点不包含字符，其他节点均包含一个字符

②每个节点的最大分支数为字符可能取值的个数

③每个节点对应的单词为从根节点到该节点的路径上的字符的排列。

④两个单词的公共前缀为根节点到这两个点的路径的公共部分。

应用：

①词频统计：trie树中的公共前缀只由一条路径表示，因此相对于hash方法，节约了内存。

②前缀匹配：检索所有以某前缀开头的字符串。朴素做法的时间复杂度为O（n*N），n为前缀长度，N为单词个数，而使用trie树可以做到O（h）h为检索的单词的长度。

③串排序：N个字符串按字典序排序，只需trie树进行先序遍历

④串的快速检索：trie树的节点结构中可加入记录其他信息的变量，如某词在文章中第一次出现的位置编号等，问题例：给出N个单词组成的熟词表，以及一篇全用小写英文书写的文章，请你按最早出现的顺序写出所有不在熟词表中的生词。在这道题中，我们可以用字典树，先把熟词建一棵树，然后读入文章进行比较，这种方法效率是比较高的。

算法：

树种单词的存储策略有两种：

①每个节点存一个字符

②每个节点存从根节点到该节点的路径表示的字符串

后一种方式需要的空间可能较多，但是检索的时候不需要一个辅助栈来记录路径，效率较高，本着空间换时间的原则，本文采用第二种存储方式。

代码实现：

测试：对单词集word.txt构建trie树，并给每个单词置一个ID号（可以认为是单词在字符集中第一次出现的位置），并检索其中以‘fu’开头的单词。

//TrieTreeNode.h
#pragma once
#include<iostream>
using namespace std;


template<class T>
class TrieTreeNode
{
public:
<span style="white-space:pre">	</span>TrieTreeNode(int MaxBranch)//用于构造根节点
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>MaxBranchNum = MaxBranch;
<span style="white-space:pre">		</span>ChildNodes = new TrieTreeNode<T>*[MaxBranchNum];
<span style="white-space:pre">		</span>for (int i = 0; i < MaxBranchNum; i++)
<span style="white-space:pre">			</span>ChildNodes[i] = NULL;
<span style="white-space:pre">		</span>word = NULL;
<span style="white-space:pre">		</span>Freq = 0;
<span style="white-space:pre">		</span>ID = -1;
<span style="white-space:pre">	</span>}
public:
<span style="white-space:pre">	</span>int MaxBranchNum;//最大分支数；
<span style="white-space:pre">	</span>char* word;//单词字符串的指针
<span style="white-space:pre">	</span>TrieTreeNode<T> **ChildNodes;
<span style="white-space:pre">	</span>int Freq;//词频统计
<span style="white-space:pre">	</span>int ID;//构建TrieTree树时的插入顺序，可用来记录字符串第一次出现的位置
};

//TrieTree.h
#pragma once
#include<iostream>
#include"TrieTreeNode.h"
using namespace std;


template<class T>
class TrieTree
{
<span style="white-space:pre">	</span>//Insert时为节点代表的单词word分配内存，Delete时只修改Freq而不删除word，Search时以Freq的数值作为判断依据，而不是根据word是否为NULL
public:
<span style="white-space:pre">	</span>TrieTree(const int size);
<span style="white-space:pre">	</span>~TrieTree(){ Destroy(root); };
<span style="white-space:pre">	</span>void Insert(const T* str);//插入单词str
<span style="white-space:pre">	</span>void Insert(const T* str, const int num);//插入单词str，带有编号信息
<span style="white-space:pre">	</span>int Search(const T* str);//查找单词str，返回出现次数
<span style="white-space:pre">	</span>bool Delete(const T* str);//删除单词str
<span style="white-space:pre">	</span>void PrintALL();//打印trie树中所有节点对应的单词
<span style="white-space:pre">	</span>void PrintPre(const T* str);//打印以str为前缀的单词
private:
<span style="white-space:pre">	</span>void Print(const TrieTreeNode<T>* p);
<span style="white-space:pre">	</span>void Destroy(TrieTreeNode<T>* p);//由析构函数调用，释放以p为根节点的树的空间
private:
<span style="white-space:pre">	</span>TrieTreeNode<T>* root;
<span style="white-space:pre">	</span>int MaxBranchNum;//最大分支数
};


template<class T>
void TrieTree<T>::Destroy(TrieTreeNode<T>* p)
{
<span style="white-space:pre">	</span>if (!p)
<span style="white-space:pre">		</span>return;
<span style="white-space:pre">	</span>for (int i = 0; i < MaxBranchNum; i++)
<span style="white-space:pre">		</span>Destroy(p->ChildNodes[i]);
<span style="white-space:pre">	</span>if (!p->word)
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>delete[] p->word;//只是释放了char数组word的空间，指针word本身的空间未释放，由后续的delete p释放
<span style="white-space:pre">		</span>p->word = NULL;
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>delete p;//释放节点空间
<span style="white-space:pre">	</span>p = NULL;//节点指针置为空
<span style="white-space:pre">	</span>//以上的置NULL的两句无太大意义，但是：编程习惯
}


template<class T>
bool TrieTree<T>::Delete(const T* str)
{
<span style="white-space:pre">	</span>TrieTreeNode<T>* p = root;
<span style="white-space:pre">	</span>if (!str)
<span style="white-space:pre">		</span>return false;
<span style="white-space:pre">	</span>for (int i = 0; str[i]; i++)
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>int index = str[i] - 'a';
<span style="white-space:pre">		</span>if (p->ChildNodes[index])
<span style="white-space:pre">			</span>p = p->ChildNodes[index];
<span style="white-space:pre">		</span>else return false;
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>p->Freq = 0;
<span style="white-space:pre">	</span>p->ID = -1;
<span style="white-space:pre">	</span>return true;
}


template<class T>
void TrieTree<T>::PrintPre(const T* str)
{
<span style="white-space:pre">	</span>TrieTreeNode<T>* p = root;
<span style="white-space:pre">	</span>if (!str)
<span style="white-space:pre">		</span>return;
<span style="white-space:pre">	</span>for (int i = 0; str[i]; i++)
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>int index = str[i] - 'a';
<span style="white-space:pre">		</span>if (p->ChildNodes[index])
<span style="white-space:pre">			</span>p = p->ChildNodes[index];
<span style="white-space:pre">		</span>else return;
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>cout << "以" << str << "为前缀的单词有:" << endl;
<span style="white-space:pre">	</span>Print(p);
}


template<class T>
int TrieTree<T>::Search(const T* str)
{
<span style="white-space:pre">	</span>TrieTreeNode<T>* p = root;
<span style="white-space:pre">	</span>if (!str)
<span style="white-space:pre">		</span>return -1;
<span style="white-space:pre">	</span>for (int i = 0; str[i]; i++)
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>int index = str[i] - 'a';
<span style="white-space:pre">		</span>if (p->ChildNodes[index])
<span style="white-space:pre">			</span>p = p->ChildNodes[index];
<span style="white-space:pre">		</span>else return 0;
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>return p->Freq;
}


template<class T>
TrieTree<T>::TrieTree(const int size)
{
<span style="white-space:pre">	</span>MaxBranchNum = size;
<span style="white-space:pre">	</span>root = new TrieTreeNode<T>(MaxBranchNum);//根节点不储存字符
}


template<class T>
void TrieTree<T>::Insert(const T* str)
{
<span style="white-space:pre">	</span>TrieTreeNode<T>* p = root;
<span style="white-space:pre">	</span>for (int i = 0; str[i]; i++)
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>if (str[i]<'a' || str[i]>'z')
<span style="white-space:pre">		</span>{
<span style="white-space:pre">			</span>cout << "格式错误！" << endl;
<span style="white-space:pre">			</span>return;
<span style="white-space:pre">		</span>}
<span style="white-space:pre">		</span>int index = str[i] - 'a';//下溯的分支编号
<span style="white-space:pre">		</span>if (!p->ChildNodes[index])
<span style="white-space:pre">			</span>p->ChildNodes[index] = new TrieTreeNode<T>(MaxBranchNum);
<span style="white-space:pre">		</span>p = p->ChildNodes[index];
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>if (!p->word)//该词以前没有出现过
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>p->word = new char[strlen(str) + 1];
<span style="white-space:pre">		</span>strcpy_s(p->word, strlen(str) + 1, str);
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>p->Freq++;
}


template<class T>
void TrieTree<T>::Insert(const T* str, const int num)
{
<span style="white-space:pre">	</span>TrieTreeNode<T>* p = root;
<span style="white-space:pre">	</span>for (int i = 0; str[i]; i++)
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>if (str[i]<'a' || str[i]>'z')
<span style="white-space:pre">		</span>{
<span style="white-space:pre">			</span>cout << "格式错误！" << endl;
<span style="white-space:pre">			</span>return;
<span style="white-space:pre">		</span>}
<span style="white-space:pre">		</span>int index = str[i] - 'a';//下溯的分支编号
<span style="white-space:pre">		</span>if (!p->ChildNodes[index])
<span style="white-space:pre">			</span>p->ChildNodes[index] = new TrieTreeNode<T>(MaxBranchNum);
<span style="white-space:pre">		</span>p = p->ChildNodes[index];
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>if (!p->word)//该词以前没有出现过
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>p->word = new char[strlen(str) + 1];
<span style="white-space:pre">		</span>strcpy_s(p->word, strlen(str) + 1, str);
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>p->Freq++;
<span style="white-space:pre">	</span>if (num < p->ID || p->ID == -1)//取最小的num作为当前节点代表的单词的ID
<span style="white-space:pre">		</span>p->ID = num;
}


template<class T>
void TrieTree<T>::PrintALL()
{
<span style="white-space:pre">	</span>Print(root);
}


template<class T>
void TrieTree<T>::Print(const TrieTreeNode<T>* p)
{
<span style="white-space:pre">	</span>if (p == NULL)
<span style="white-space:pre">		</span>return;
<span style="white-space:pre">	</span>if (p->Freq > 0)
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>cout << "单词:" << p->word << "<span style="white-space:pre">	</span>频数:" << p->Freq;
<span style="white-space:pre">		</span>if (p->ID >= 0)
<span style="white-space:pre">			</span>cout << "<span style="white-space:pre">		</span>ID:" << p->ID;
<span style="white-space:pre">		</span>cout << endl;
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>for (int i = 0; i < MaxBranchNum; i++)
<span style="white-space:pre">	</span>{
<span style="white-space:pre">		</span>if (p->ChildNodes[i])
<span style="white-space:pre">		</span>{
<span style="white-space:pre">			</span>Print(p->ChildNodes[i]);
<span style="white-space:pre">		</span>}
<span style="white-space:pre">	</span>}


}

//main.cpp
#pragma once
#include<iostream>
#include<fstream>
#include"TrieTree.h"
using namespace std;

void test(TrieTree<char>* t)
{
	char* charbuffer = new char[50];
	char* cb = charbuffer;

	fstream fin("d:\\words.txt");
	if (!fin){
		cout << "File open error!\n";
		return;
	}
	char c;
	int num = 0;
	while ((c = fin.get()) != EOF)
	{
		if (c >= '0'&&c <= '9')
			num = num * 10 + c - '0';
		if (c >= 'a'&&c <= 'z')
			*cb++ = c;
		if (c == '\n')
		{
			*cb = NULL;
			t->Insert(charbuffer, num);
			cb = charbuffer;
			num = 0;
		}
	}
	fin.close();
}


void main()
{
	TrieTree<char>* t = new TrieTree<char>(26);
	char* pre = "fu";
	/*char* c1 = "fuck";
	char* c2 = "class";
	char* c3 = "name";
	char* c4 = NULL;
	char* c5 = "fucka";
	char* c6 = "fuckaa";
	char* c7 = "fuckaabc";
	t->Insert(c1);
	t->Delete(c1);
	t->Insert(c1);
	t->Insert(c2);
	t->Insert(c5);
	t->Insert(c6);
	t->Insert(c7);*/
	test(t);
	t->PrintALL();
	cout << endl;
	t->PrintPre(pre);
	//cout << t->Search(c1) << endl;
	system("pause");
}

检索结果：

《Trie树（字典树）的C++实现》

    原文作者：Trie树
    原文地址: https://blog.csdn.net/hgqqtql/article/details/42309049
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。