字符串面试题系列之四：字符串匹配

2024年5月23日 292次阅读

前言

******************************************************************
本系列文章所提供的算法均在以下环境下编译通过。
【算法编译环境】Federa 8，linux 2.6.35.6-45.fc14.i686
【处理器】 Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
【内存】 2025272 kB
如果有问题或者纰漏或者有好的建议更或者有更好的算法，请不吝赐教。
*****************************************************************

正文

在面试当中，字符串匹配的题目也是数见不鲜。当然今天写的并不是KMP等一些经典字符串匹配算法。因为这些经典算法在一些博客和网页中已经讲解的很详细，而且配有插图，这里就不多言语了。最近在细看微软面试100题，这是其中的两道题目。

第一道题：字符串匹配

【题目】实现一个挺高级的字符匹配算法：给一串很长字符串，要求找到符合要求的字符串
【例子】如目的串是123，则1******3***2 ,12*****3这些都要找出来。
【分析】前面也讲到了，字符串匹配题目数见不鲜，其中的技巧也很多。借助hash就是一种。什么意思呢？因为字符串都是ASCII编码，总共是挚友256种情况。于是我们可以定义一个hash_table【256】数组。那具体用这个hash数组做什么事情，我想不同的目的就有不同的用法。需要细细体会。那对于这道题我们如何设计算法呢？以下是文字算法过程：
第一步：遍历src字符串，并且hash_table相应的位置置1，表明该字符存在src中。
第二步：遍历dest字符串，对于每个字符，我们通过hash_table来判断是否存在，即是否为1，如果是，则存在。
第三步：如果判断所有字符都存在，说明匹配。返回之。
基于上面的文字思路基础上，我们写下如下的code：

#include <iostream>
#include <cstring>

bool string_match( const char * const src, const char * const dest )
{
   int srcLen = strlen( src );
   int destLen = strlen( dest );
   int hash_table[256] = { 0 };
   bool bMatch = true;
   for( int i = 0; i < srcLen; i++ )
   {
      hash_table[ (int)src[i] ] = 1;
   }
   for( int i = 0; i < destLen; i++ )
   {
      // if hash_table doesn't contain string 2,
      // set set bMatch as false.
      if( 0 == hash_table[ (int)dest[i] ] )
      {
         bMatch = false;
      }
   }
   return bMatch;
}

int main( int argc, char ** argv )
{
   char src[] = "1**2**3******";
   char dest[] = "123";
   bool isMatch = string_match( src, dest );
   if( isMatch )
   {
      std::cout << "match" << std::endl;
   }
   else
   {
      std::cout << "not match" << std::endl;
   }
   return 0;
}

第二道题：最短字符串匹配

【题目】就是给一个很长的字符串str 还有一个字符集比如{a,b,c} 找出str里包含{a,b,c}的最短子串。要求O(n)
【例子】字符集是a,b,c，字符串是abdcaabcx，则最短子串为abc

【分析】这道题依然是字符串匹配，但此题是求最短字符串。技巧依然是借助上面的hash _table。那此题我们如何分析呢？首先看到最短两个字,我们会想到什么呢？对！就是先设置一个最小值min，然后每一次处理得到一个值value，如果value比min小，则将min替换成value。就是这种方法。好了，这道题与上面有一个不一样的地方，就是对谁先初始化hash_table。我用红色字标出。且看下面文字算法描述：
第一步：遍历dest字符串，并且hash_table相应的位置置1，表明该字符存在dest中。
第二步：遍历src字符串，如果当前字符在dest中，即hash_table相应的值为1，我们每判断一次都做sum++操作，如果sum值等于dest长度，说明一次匹配成功。假设该字串位于front和rear之间。求出其长度，再跟最小长度min对比。如果比min还小，则替换掉min。
第三步：将front到end之间的字符串复制到结果中，返回之。
基于上面的文字思路基础上，我们写下如下的code，如果上面文字们描述不清楚，请结合下面代码理解就清楚了。算法本身很简单。

#include <iostream>
#include <cstring>

void string_min_match( const char * const src,
   const char * const dest, char * result )
{
   /* srcLen is the length of src string. */
   int srcLen = strlen( src );
   /* destLen is the length of dest string. */
   int destLen = strlen( dest );
   /* front point to start of result string. */
   int front = 0;
   /* rear point to end of result string. */
   int index = 0;
    /* because the strings is ascii, so we can
      make a hash table of them and its length is 256.*/
   int hash_table[256] = { 0 };
   /* a counter. */
   int sum = 0;
   int totalLen = 0;
   /* the minimun length of match string. */
   int min = srcLen;
   // init hash_table array.
   for( int i = 0; i < destLen; i++ )
   {
      hash_table[ (int)dest[i] ] ++;
   }
   // handle every character in src string.
   for( int i = 0; i < srcLen; i++ )
   {
      if( sum < destLen )
      {
         if( hash_table[ (int)src[i] ] == 1 )
         {
            sum ++;
         }
         if( sum == 1 )
         {
            index = i;
         }
      }
      else if( sum == destLen )
      {
         totalLen = i - index;
         if( totalLen < min )
         {
            min = totalLen;
            front = index;
         }
         sum = 0;
         totalLen = 0;
      }
   }
   // copy the minimun string to result.
   memcpy( result, &src[front], min );
}

int main( int argc, char ** argv )
{
   char src[] = "abdcaabcx";
   char dest[] = "abc";
   char result[10] = { 0 };
   string_min_match( src, dest, result );
   std::cout << result << std::endl;
}

总结：

希望通过自己写这些算法，一来督促自己每天学习，二来可以锻炼自己的写文章的能力。三来可以将这些东西分享给大家。虽然微软都那么多题，然后各大网站的面试题五花八门，却没有人做过认真整理。这两道题都是字符串匹配。望对大家有所帮助。

作者：Alex
出处：http://blog.csdn.net/hellotime
本文版权归作者所有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文连接，否则保留追究法律责任的权利。