1、B*树索引
1.1 概述
B*树索引是Oracle默认的索引结构。我们使用CREATE INDEX语句创建索引时,创建的就是B*树索引。B*树索引的结构一个二
叉树,由根节点(root node)、分支部分(branch node)和叶子节点(leaf node)构成。
提示:B*树的B是单词Balanced的缩写,平衡之意。
● 根节点:包含指向分支节点的信息。
● 分支节点:包含指向下一级分支节点或指向叶子节点的信息。
● 叶子节点:包含索引列的值和对应数据记录行的ROWID的信息。另外叶子节点还存有指向相邻叶子节点的信息,叶子节点之间
的指向是双向的,这样叶子节点就组成成了一个双向链接(如下图所示)。又因为索引的建立是经过排序的,所以叶子节点
之间是按排序链接的。叶子节点都具体相同的深度。
下面通过一个图来描述B*树索引的结构图:
B*树索引拥有一个树形的结构,Oracle索引检索数据,先从根节点开始,然后经过分支节点,找到对应叶子点上的字段值,因为
叶子节点存有对应数据记录行的ROWID,这样就能通过ROWID找到表中数据记录了。
1.2 B*树索引特点
下面列出B*树索引的一些特点:
① B树索引建立是忽略字段的NULL值的。
因为B*树索引中忽略了NULL值,所以如果SQL的where子句中有‘字段 IS NULL’或‘字段 IS NOT NULL’的条件时,即便该字
建立了索引,Oracle执行SQL时也不会走索引查询的。
比如创建一个t_student表,并在name字段上创建一个唯一索引,然后查看下面两个语句的执行计划:
(1) SELECT * FROM t_student WHERE name IS NULL;
和
(2) SELECT * FROM t_student WHERE name = 'CHEN';
发现语句(1)没有走索引,而语句(2)是走索引的。具体操作如下
创建表和索引:
create table T_STUDENT
(
gid NUMBER(38) not null,
name VARCHAR2(100) not null,
age NUMBER(3),
sex VARCHAR2(1),
grade NUMBER(4),
description VARCHAR2(1000)
)
create unique index STUDENT_NAME_INDEX on T_STUDENT (NAME)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
SELECT * FROM t_student WHERE name IS NULL;
查看执行计划:
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name IS NULL;
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2963365469
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 0 (0)| |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| T_STUDENT | 4 | 52 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NULL IS NOT NULL)
14 rows selected
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name = 'CHEN';
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3397000786
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 1
| 1 | TABLE ACCESS BY INDEX ROWID| T_STUDENT | 1 | 13 | 1
|* 2 | INDEX UNIQUE SCAN | STUDENT_NAME_INDEX | 1 | | 0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NAME"='CHEN')
14 rows selected
② 无论对于大数据表还是小数据表,B*树索引的效率几乎是相同的,因为B*树索引索引叶子节点具有相同深度。
③ B*树索引对于大表可以提供更加明显更好的性能,因为索引的深度一般不会超过4(1个头节点,2级分支节点和1级叶子
节点)。事实上,由于头块几乎总是被加载到内存中,分支节点也通常被加载到内存中,因此检索索引时需要进行的物
理磁盘访问通常只有1到2次。
④ B*索引支持范围查询(包括<、>、between、like模糊查询)和精度查询(=)。因为每个叶子节点可以链接到相邻的前
后两个叶子节点,而且叶子节点链是有序的。
⑤ 检索B*树索引节点,比对是从首字母开始比对的,如果查询时条件时前模糊的(比如like ‘%chen’),将不走索引。
如下语句是不走索引的:
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name like '%CHEN';
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3237932320
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_STUDENT | 1 | 13 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME" LIKE '%CHEN')
13 rows selected
⑥ Oracle会自动为主键和唯一键约束创建索引。
1.3 不走索引的情况
介绍这部分内容通过例子比较直观,例子都是基于表t_student进行的,并为表的name、age两个字段建立索引,先列出创建表
和索引的代码:
create table T_STUDENT
(
gid NUMBER(38) not null,
name VARCHAR2(100) not null,
age NUMBER(3),
sex VARCHAR2(1),
grade NUMBER(4),
description VARCHAR2(1000)
);
-- Create/Recreate indexes
create index STUDENT_AGE_INDEX on T_STUDENT (AGE)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
create unique index STUDENT_NAME_INDEX on T_STUDENT (NAME)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate primary, unique and foreign key constraints
alter table T_STUDENT
add primary key (GID)
using index
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
下面列出一些不走索引的情况:
(1) 被索引的列存在隐式类型转换时
在select查询语句中,VARCHAR2类型和Number类型比较时,VARCHAR2类型将转换为Number类型;VARCHAR2类型和Date类型比较时,VARCHAR2类型将转换为Date类型。
比如:
SELECT * FROM t_student WHERE name = 111;
等同于
SELECT * FROM t_student WHERE to_number(name) = 111;
(2) 被索引的列使用了函数时(除非建立函数索引)
(3) 查询条件中使用了OR,而OR关系中的列没有全部建立索引
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name = 'chen' OR age = 19;
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3956840407
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 |
| 1 | TABLE ACCESS BY INDEX ROWID | T_STUDENT | 1 | 13 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | |
| 3 | BITMAP OR | | | |
| 4 | BITMAP CONVERSION FROM ROWIDS| | | |
|* 5 | INDEX RANGE SCAN | STUDENT_AGE_INDEX | | |
| 6 | BITMAP CONVERSION FROM ROWIDS| | | |
|* 7 | INDEX RANGE SCAN | STUDENT_NAME_INDEX | | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("AGE"=19)
7 - access("NAME"='chen')
20 rows selected
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name = 'chen' OR sex = '1';
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3237932320
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_STUDENT | 1 | 13 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SEX"='1' OR "NAME"='chen')
13 rows selected
从执行计划可以看到:
语句:
SELECT * FROM t_student WHERE name = 'chen' OR age = 19;
OR关系中的列name和age都有被建了索引,所以语句执行时也是走索引的;
而语句:
SELECT * FROM t_student WHERE name = 'chen' OR sex = '1';
OR关系中的列sex是没有建立索引的,速印语句执行时不走索引。
(4) 被索引的列使用了not in或not exist时(使用in或exist时是可以走索引的)
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name IN ('CHEN', 'LIANG');
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1281816732
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 26 |
| 1 | INLIST ITERATOR | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T_STUDENT | 2 | 26 |
|* 3 | INDEX UNIQUE SCAN | STUDENT_NAME_INDEX | 2 | |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("NAME"='CHEN' OR "NAME"='LIANG')
15 rows selected
SQL> EXPLAIN PLAN
2 FOR SELECT * FROM t_student WHERE name NOT IN ('CHEN', 'LIANG');
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3237932320
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 26 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_STUDENT | 2 | 26 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME"<>'LIANG' AND "NAME"<>'CHEN')
13 rows selected
(5) 被索引的列使用like时,比较的值最前端使用了通配符,如like ‘%_’
(6) 被索引的列使用IS NULL或IS NOT NULL(位图索引可以解决这个问题,因为位图索引保存NULL信息)
(7) 被索引的列使用不等于号!=或<>
(8) 被索引的列的值基数太低(即重复值太多)时
比如性别字段sex只有‘男’和‘女’两个值,即便在sex创建索引,也不会走索引的。
(9) 如果查询出的结果占到总数据量的比例很大(一般在15%以上,不是绝对值)
1.4 索引的选择性
一个或一组列的选择性是判断在这些列上创建索引的有用性的度量标准。例如一个列的值基数比较高(或者说大部分数据是非
重复的),
我们就认为这个列上的索引是有选择性的。有选择性的索引比没有选择性的索引高效得多。
优化器会确定可供使用的各个索引的选择性,并自动使用选择性最好的索引。
1.5 组合索引
也叫复合索引,由多个列组成的索引,组合索引的优势是比单键索引更有选择性,因为组合索引比单键索引指向更少的记录。
1.5.1 组合索引什么时候起作用
组合索引并不要求定义索引的列都出现在where子句中才能起作用,只要组合索引的前导列出现在where子句中即可。
只要定义索引时第一个列出现在where子句,就是有前导列了。比如定义组合索引的列为(column1, column2, column3),
那么前导列有(column1)、(column1, column2)、(column1, column3)、(column1, column2, column3),而
(column2, column3)则不是前导列。
为表t_student创建两个索引:组合索引student_cons_index和单列索引student_name_index:
create index STUDENT_CONS_INDEX on T_STUDENT (NAME, SEX, AGE, GRADE)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
create index STUDENT_NAME_INDEX on T_STUDENT (NAME)
tablespace TEST_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
下面是sql语句where子句有组合索引前导列和非前导列时,查询语句的执行计划:
SQL> explain plan
2 for select * from t_student where name='LILEI' and grade = 666; --(name, grade)是student_cons_index索引的前导列
Explained
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2397198704
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 1
| 1 | TABLE ACCESS BY INDEX ROWID| T_STUDENT | 1 | 13 | 1
|* 2 | INDEX RANGE SCAN | STUDENT_CONS_INDEX | 1 | | 1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NAME"='LILEI' AND "GRADE"=666)
filter("GRADE"=666)
15 rows selected
SQL> explain plan
2 for select * from t_student where sex='1' and grade = 666; --(sex, grade)不是student_cons_index索引的前导列
Explained
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3237932320
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_STUDENT | 1 | 13 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SEX"='1' AND "GRADE"=666)
13 rows selected
1.5.2 组合索引和单列索引选择性比较
上面我们创建了两个索引,一个是name列的单列索引student_name_index,另一个是包含name列的组合索引
student_cons_index,如果执行如下语句:
select * from t_student where name = 'LILEI';
会选择哪个索引呢?
因为组合索引的选择性比单列索引的选择性要好,所以会走student_cons_index索引,执行计划如下:
SQL> explain plan
2 for select * from t_student where name = 'LILEI';
Explained
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2397198704
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2
| 1 | TABLE ACCESS BY INDEX ROWID| T_STUDENT | 1 | 13 | 2
|* 2 | INDEX RANGE SCAN | STUDENT_CONS_INDEX | 1 | | 1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NAME"='LILEI')
14 rows selected
1.6 函数索引
函数索引支持创建基于函数或表达式的索引。函数或表达式必须具备确定性(deterministic),这意味着不管环境、数据
库或
日期时间如何变化,给定的输入必须始终返回同样的输出。入下面这个函数索引就是无效的:
CREATE OR REPLACE FUNCTION days_left(p_eff_date DATE)
RETURN NUMBER
IS
BEGIN
RETURN ((p_eff_date - sysdate);
END;
/
CREATE INDEX cust_i_eff_days ON
customers ( days_left(cust_eff_to));