多列索引
我们经常听到一些人说”把WHERE条件里的列都加上索引”,其实这个建议非常错误。在多个列上建立单独的索引大部分情况下并不能提高MySQL的查询性能。MySQL在5.0之后引入了一种叫“索引合并”(index merge)的策略,一定程度上可以使用表上的多个单列索引来定位指定的行。但是当服务器对多个索引做联合操作时,通常需要耗费大量CPU和内存资源在算法的缓存、排序和合并操作上,特别是当其中有些索引的选择性不高,需要合并扫描大量的数据的时候。
这个时候,我们需要一个多列索引。
案例
创建一个测试数据库和数据表:
CREATE DATABASE IF NOT EXISTS db_test default charset utf8 COLLATE utf8_general_ci;
use db_test;
CREATE TABLE payment (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
staff_id INT UNSIGNED NOT NULL,
customer_id INT UNSIGNED NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
利用存储过程插入1000w行随机数据(表引擎可以先设置为MyISAM,然后改为InnoDB):
DROP PROCEDURE IF EXISTS add_payment;
DELIMITER //
create PROCEDURE add_payment(in num INT)
BEGIN
DECLARE rowid INT DEFAULT 0;
SET @exesql = 'INSERT INTO payment(staff_id, customer_id) values (?, ?)';
WHILE rowid < num DO
SET @staff_id = (1 + FLOOR(5000*RAND()) );
SET @customer_id = (1 + FLOOR(500000*RAND()));
SET rowid = rowid + 1;
prepare stmt FROM @exesql;
EXECUTE stmt USING @staff_id, @customer_id;
END WHILE;
END //
DELIMITER ;
或者你可以直接下载使用我的测试数据(也是利用上面的存储过程,但是我之后调整了数据):
测试数据
添加两个单列索引(执行过程要花点时间,建议分开一句一句执行):
ALTER TABLE `payment` ADD INDEX idx_customer_id(`customer_id`);
ALTER TABLE `payment` ADD INDEX idx_staff_id(`staff_id`);
查询一条数据利用到两个列的索引:
select count(*) from payment where staff_id = 2205 AND customer_id = 93112;
查看执行计划:
mysql> explain select count(*) from payment where staff_id = 2205 AND customer_id = 93112;
+----+-------------+---------+-------------+------------------------------+------------------------------+---------+------+-------+-------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------------+------------------------------+------------------------------+---------+------+-------+-------------------------------------------------------------------------+
| 1 | SIMPLE | payment | index_merge | idx_customer_id,idx_staff_id | idx_staff_id,idx_customer_id | 4,4 | NULL | 11711 | Using intersect(idx_staff_id,idx_customer_id); Using where; Using index |
+----+-------------+---------+-------------+------------------------------+------------------------------+---------+------+-------+-------------------------------------------------------------------------+
1 row in set (0.00 sec)
可以看到type是index_merge,Extra中提示Using intersect(idx_staff_id,idx_customer_id);
这便是索引合并,利用两个索引,然后合并两个结果(取交集或者并集或者两者都有)
查询结果:
mysql> select count(*) from payment where staff_id = 2205 AND customer_id = 93112 ;
+----------+
| count(*) |
+----------+
| 178770 |
+----------+
1 row in set (0.12 sec)
然后删除以上索引,添加多列索引:
ALTER TABLE payment DROP INDEX idx_customer_id;
ALTER TABLE payment DROP INDEX idx_staff_id;
ALTER TABLE `payment` ADD INDEX idx_customer_id_staff_id(`customer_id`, `staff_id`);
注意,多列索引很关注索引列的顺序(因为customer_id的选择性更大,所以把它放前面)
查询:
mysql> select count(*) from payment where staff_id = 2205 AND customer_id = 93112;
+----------+
| count(*) |
+----------+
| 178770 |
+----------+
1 row in set (0.05 sec)
发现多列索引加快的查询(这里数据量还是较小,更大的时候比较更明显)
注意
多列索引的列顺序至关重要,如何选择索引的列顺序有一个经验法则:将选择性最高的列放到索引最前列(但是不是绝对的)。经验法则考虑全局的基数和选择性,而不是某个具体的查询:
mysql> select count(DISTINCT staff_id) / count(*) AS staff_id_selectivity, count(DISTINCT customer_id) / count(*) AS customer_id_selectivity, count(*) from payment\G;
*************************** 1. row ***************************
staff_id_selectivity: 0.0005
customer_id_selectivity: 0.0500
count(*): 10000000
1 row in set (6.29 sec)
customer_id的选择性更高,所以将它作为索引列的第一位。
多列索引只能匹配最左前缀,也就是说:
select * from payment where staff_id = 2205 AND customer_id = 93112 ;
select count(*) from payment where customer_id = 93112 ;
可以利用索引,但是
select * from payment where staff_id = 2205 ;
不能利用索引。