010 Regular Expression Matching[H]

2019年4月19日 194次阅读来源: 澪同学

1 题目描述

Implement regular expression matching with support for
'.' and
'*'.

难度：Hard

2 题目样例

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

3 题意分析

题意很简单，就是实现正则匹配中的.和*，注意模式串要完全匹配。

4 思路分析

1 递归

（定义《010 Regular Expression Matching[H]》表示从下标字符开始到结尾的子串，即）

分情况讨论。

对于.，由于其可以跟任何字符匹配，因此跟正常的匹配过程没有什么区别，只要《010 Regular Expression Matching[H]》和匹配并且和能匹配即可，即递归求解。

对于*，由于其可以匹配0个或者多个，我们也分开考虑。假设《010 Regular Expression Matching[H]》为某个字符，是*，那么如果原串中根本没有出现，那么只要和匹配即可，如果原串中至少出现了一次，那么我们去掉首次出现就又回到了0个或者多个的问题上，所以我们只要递归求解和是否匹配即可。

同时在递归的过程中缓存相关数据，代码如下（变量名没取好，懒得改了）

class Solution {
public:
    vector<vector<int>> m;
    bool Match(string &s,int l, string &p, int r){
        if(m[l][r]!=-1)
            return m[l][r];
        bool ans;
        if(r==p.size())
            ans = (l==s.size());
        else{
            bool match = (l!=s.size() && (s[l] == p[r] || p[r] == '.'));
            if(p.size()>=2+r && p[r+1] == '*')
                ans = Match(s,l, p,r+2) || (match && Match(s,l+1, p,r));
            else
                ans =match && Match(s,l+1,p,r+1);
        }
        m[l][r] = ans;
        return ans;
    }
    bool isMatch(string s, string p) {
        int k = max(s.size(), p.size());
        m.resize(k+1);
        for(auto &it : m)
            it.assign(k+1,-1);
        return Match(s,0,p,0);
    }
};

复杂度不在这里分析，接下来会提到。

2 动态规划

注意到刚才的递归方法中都是尾递归，也就是说我们完全可以写出一个完全等价的迭代版本，其实这正是动态规划。

假设s和p的长度分别为m和n，定义dp数组的定义为：

表示和是否匹配，最终答案为

所以根据刚才递归算法的分析，反向思维我们很容易就可以写出转移方程

《010 Regular Expression Matching[H]》

注意递推方程中的==应该包含《010 Regular Expression Matching[H]》为.而为任意字符的情况。

代码如下（摘自LeetCode），注意他的《010 Regular Expression Matching[H]》表示的字符串第一个字符下标是1而不是0。

class Solution {
public:
    bool isMatch(string s, string p) {
        int m = s.length(), n = p.length();
        vector<vector<bool> > dp(m + 1, vector<bool> (n + 1, false));
        dp[0][0] = true;
        for (int i = 0; i <= m; i++)
            for (int j = 1; j <= n; j++)
                if (p[j - 1] != '*')
                    dp[i][j] = i > 0 && dp[i - 1][j - 1] && (s[i - 1] == p[j - 1] || p[j - 1] == '.');
                else
                    dp[i][j] = dp[i][j - 2] || (i > 0 && (s[i - 1] == p[j - 2] || p[j - 2] == '.') && dp[i - 1][j]);
        return dp[m][n];
    }
};

之前已经提到了，这个复杂度和尾递归的形式是完全等价的，因此两个算法的时间复杂度都是《010 Regular Expression Matching[H]》，空间复杂度为。

个人觉得递归版本更加符合人类思维（函数式编程？），如果没有递归版本做铺垫想直接想出动态规划的方法可能有点困难。

5 后记

看到正则难道第一反应不是自动机吗！！！！

啊啊啊，气死我了，还是太菜了，写了两个小时的自动机还是有问题，有时间一定要学学正则表达式引擎实现然后写一个自动机AC的版本，感觉可能还是我自动机构造思路有问题。

当然如果你写了自动机的版本，也欢迎在评论给出。

    原文作者：澪同学
    原文地址: https://zhuanlan.zhihu.com/p/33566243
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。