正则表达式 | Greediness | JSON

2019年5月22日 263次阅读来源: 姚屹晨

一.正则表达式

1.精确匹配：直接给出字符。

①\d：（digits）匹配[0-9]
②\w：（word character）匹配 [a-zA-Z0-9_]，注意下划线喔。
③\s：（whitespace character）匹配（空格），Tab，Carriage Return（回车），new line（换行），form feed（换页）。

2.Limiting Repetition

匹配变化的字符，在正则表达式中，用*表示任意个字符（包括0个）{0,}，用+表示至少一个字符{1,}，用？表示0个或1个字符{0,1}，用{n}表示n个字符，用{n,m}表示n到m个字符。

3.要做更精确地匹配，可以使用`[]`（方括号）表示范围：

[\w]：可以匹配一个数字、字母或者下划线。
[a-zA-Z\_\$][0-9a-zA-Z\_\$]*可以匹配由字母或下划线、$开头，后接任意个由一个数字、字母或下划线、$组成的字符串，也就是JavaScript允许的变量名。

4.others

A|B可以匹配A或B

(J|j)ava(S|s)cript 可以匹配：'JavaScript'、'Javascript'、'javaScript'、'javascript'

^表示行的开始，^\d表示必须以数字开头。
$表示行的结束，\d$表示必须以数字结束。

5.JavaScript创建正则表达式的两种方式

//***/正则表达式/***
var re1 = /yyc\-007/;
re1;
>>>/yyc\-007/

//new RegExp('正则表达式')
var re2 = new RegExp('yyc\-007');
re2;
>>>/yyc-007/

var re2 = new RegExp('yyc\\-007');//注意转义
re2;
>>>/yyc\-007/

6.判断给定的字符串是否符合条件，可使用test()

var re = /^\d{3}\-\d{3,8}$/;
re.test('123-12345678');
>>>true

re.test('123-123456789');
>>>false

7.切分字符串


//a-b之间两个空格，b-c之间三个空格
'a  b   c'.split(' ');
>>>(6) ["a", "", "b", "", "", "c"]

//使用正则表达式
'a  b   c'.split(/\s+/);
>>>(3) ["a", "b", "c"]

More split()：
String.prototype.split()
标准对象 | toString() | Date | split() | HTTP报文

8.分组

①是什么？用圆括号()表示的就是要提取的分组（Group）。

《正则表达式 | Greediness | JSON》 exec实例
exec描述

9.贪婪匹配

①是什么？匹配尽可能多的字符。

var re = /^(\d+)(0*)$/;
re.exec('10086');
>>>(3) ["10086", "10086", "", index: 0, input: "10086"]

②正则匹配默认就是贪婪匹配。

10.全局搜索

①是什么？JavaScript的正则表达式有几个特殊的标志，最常用的就是g，表示全局匹配。

②还有什么特殊的标志？i表示忽略大小写；m表示执行多行匹配。

③特殊标志在正则表达式中如何表示？

var r1 = /test/g;
//等价于
var r2 = new RegExp('test','g');

④全局匹配的作用？可以多次执行exec()来搜索一个匹配的字符串。

var s = 'JavaScript, VBScript, CoffeeScript and ECMAScript';
var re = /[a-zA-Z]+Script/g;
re.exec(s);
>>>["JavaScript", index: 0, input: "JavaScript, VBScript, 
CoffeeScript and ECMAScript"]

re.exec(s);
>>>["VBScript", index: 12, input: "JavaScript, VBScript, 
CoffeeScript and ECMAScript"]

re.exec(s);
>>>["CoffeeScript", index: 22, input: "JavaScript, VBScript,
 CoffeeScript and ECMAScript"]

re.exec(s);
>>>["ECMAScript", index: 39, input: "JavaScript, VBScript, 
CoffeeScript and ECMAScript"]

re.exec(s);
>>>null

扩展：Regular-Expressions.info—repeat

一.Greediness

1.是什么？

This is a first test.

这是个字符串，我想用正则表达式匹配其中的HTML标签，也就是和。
结果却是这样的：

《正则表达式 | Greediness | JSON》一段.png

但是我想要的结果是这样：

《正则表达式 | Greediness | JSON》我想要的结果.png

2.形成这种错误匹配的原因：

①第一个符号是<：这是个literal（字面量）

《正则表达式 | Greediness | JSON》《.png

②第二个符号是：.dot（点）

the dot matches any character except newlines.

《正则表达式 | Greediness | JSON》点.png
点实际情况.png

③第三个符号是+号（Plus）

《正则表达式 | Greediness | JSON》 +号效果.png

The dot is repeated by the plus. The plus is greedy. Therefore, the engine will repeat the dot as many times as it can. The dot matches e, so the regex continues to try to match the dot with the next character. m is matched, and the dot is repeated once more. The next character is the >. You should see the problem by now. The dot matches the >, and the engine continues repeating the dot. The dot will match all remaining characters in the string. The dot fails when the engine has reached the void after the end of the string. Only at this point does the regex engine continue with the next token:>.

由于点（dot）后面紧紧跟随着加号（Plus），而Plus是贪婪匹配的，所以点号重复执行，也就是在剩余的字符串中不断匹配，直至遇到换行符。然后，正则引擎才会继续去执行下一个符号>。

④此时正则引擎执行第四个符号：>

现在的情况是这样的：<.+ 已经从字符串中匹配了first test. 同时正则引擎已经执行到达了该字符串的结尾。因此，没有>给我们匹配了。

《正则表达式 | Greediness | JSON》 +号效果.png

但是，正则引擎的记忆力很好，它记得返回的路，专业术语叫backtrack（回溯）。

It will reduce the repetition of the plus by one, and then continue trying the remainder of the regex.

此时，.+匹配的结果为em>first test（注意少了最后一个点喔），正则表达式中下一个符号仍为>，但是，字符串中的下一个字符，也就是最后一个字符是.，匹配失败。
继续回溯，此时.+匹配的结果为em>first tes，仍不匹配。直至.+匹配的结果为em>first</em，此时正则表达式的下一个字符与该字符串中的下一字符匹配。
结果：

《正则表达式 | Greediness | JSON》一段.png

3.那如何阻止贪婪匹配，获得自己想要的匹配内容呢？

①最快的解决办法就是将贪婪匹配变成”懒虫”模式。

The quick fix to this problem is to make the plus lazy instead of greedy.

②如何变成”懒虫”模式呢？

You can do that by putting a question mark? after the plus + in the regex. You can do the same with the star*, the curly braces{} and the question mark? itself.

③”懒虫模式”机制：

Again, < matches the first < in the string. The next token is the dot, this time repeated by a lazy plus. This tells the regex engine to repeat the dot as few times as possible. The minimum is one. So the engine matches the dot with e. The requirement has been met, and the engine continues with > and m. This fails. Again, the engine will backtrack. But this time, the backtracking will force the lazy plus to expand rather than reduce its reach. So the match of .+ is expanded to em, and the engine tries again to continue with >. Now, > is matched successfully. The last token in the regex has been matched. The engine reports that  has been successfully matched.

首先正则表达式中的<符号匹配字符串中的<符号。第二步，正则表达式中的符号是.点（dot），因为现在采用”懒虫模式”，所以正则表达式引擎尽可能少的重复执行点操作符，由于紧跟在.点（dot）后面的第三个符号是+（Plus），因此点操作符至少执行一次。这次点操作符匹配的是e，接下来的工作是：正则表达式中的最后一个符号>与字符串中的下一个字符M匹配，匹配失败。此时，引擎将再次回溯，只不过，这次是拓展（expand）而不是减少（reduce）：.+扩展成em，此时正则表达式中的最后一个符号>与字符串中的下一个字符>匹配，匹配成功，完事。

《正则表达式 | Greediness | JSON》懒虫模式.png

二.JSON

1.JSON：JavaScript Object Notation.

2.创始人：Douglas Crockford

3.JSON中的数据类型

number
boolean
string
null
array
object

4.为了统一解析，JSON的字符串规定必须使用双引号`""`，Object的键也必须使用双引号`""`。

5.可以将任何JavaScript对象序列化成一个JSON格式的字符串，通过网络传输给其他计算机。接收方将其反序列化成一个JavaScript对象，就可以在JavaScript中直接使用这个对象。

6.那么如何序列化？

var Person = {
    name: 'Gerg',
    age: 21,
    sex: 'male',
    city: 'Shanghai'
};
JSON.stringify(Person);
>>>"{"name":"Gerg","age":21,"sex":"male","city":"Shanghai"}"

①这看起来也太费劲了，麻烦来点缩进：

var Person = {
    name: 'Gerg',
    age: 21,
    sex: 'male',
    city: 'Shanghai'
};
JSON.stringify(Person,null,' ');
>>>
"{
"name": "Gerg",
"age": 21,
"sex": "male",
"city": "Shanghai"
}"

②我只想知道你的名字和年龄，JSON.stringify方法的第二个参数可以满足你的需求。

var Person = {
    name: 'Gerg',
    age: 21,
    sex: 'male',
    city: 'Shanghai'
};
JSON.stringify(Person,['name','age'],' ');
>>>
"{
"name": "Gerg",
"age": 21
}"

③还可以传入一个函数，这样对象的每个键值都会被函数先处理：

var Person = {
    name: 'Gerg',
    age: 21,
    sex: 'male',
    city: 'Shanghai'
};
function convert(key,value){
    if(typeof value === 'string'){
        return value.toUpperCase();
    }
    return value;
}
JSON.stringify(Person,convert,' ');
>>>
"{
"name": "GERG",
"age": 21,
"sex": "MALE",
"city": "SHANGHAI"
}"

④如果还想精确控制如何序列化Person对象，可以给Person对象定义一个toJSON()的方法，直接返回JSON应该序列化的数据：

var Person = {
    name: 'Gerg',
    age: 21,
    sex: 'male',
    city: 'Shanghai',
    toJSON: function(){
        return {
            'Name': this.name,
            'Age': this.age
        };
    }
};
JSON.stringify(Person);
>>>"{"Name":"Gerg","Age":21}"

7.那么又如何反序列化？

①对于一个JSON格式的字符串，可直接使用JSON.parse()解析成一个JavaScript对象：

var Person = {
    name: 'Gerg',
    age: 21,
    sex: 'male',
    city: 'Shanghai',
};
var JSON1 = JSON.stringify(Person,['name','age']);
JSON1;
>>>"{"name":"Gerg","age":21}"

var JSON1ToObj1 = JSON.parse(JSON1);
JSON1ToObj1;
>>>{name: "Gerg", age: 21}

②JSON.parse()还可以接收一个函数，用于改变解析后的属性。

var Person = {
    name: 'Gerg',
    age: 21,
    sex: 'male',
    city: 'Shanghai',
};
var JSON1 = JSON.stringify(Person,['name','age']);
JSON1;
>>>"{"name":"Gerg","age":21}"

var JSON1ToObj1 = JSON.parse(JSON1,function(key,value){
    if(key === 'name'){
        return 'Hello ' + value;
    }
    if(key === 'age'){
        return value*2;
    }
    return value;
});
JSON1ToObj1;
>>>{name: "Hello Gerg", age: 42}

    原文作者：姚屹晨
    原文地址: https://www.jianshu.com/p/671acf331c46
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。