mysql explain 详解

发表于 2020-02-05 更新于 2022-12-10 分类于 mysql 阅读次数： Waline：本文字数： 7.3k 阅读时长 ≈ 7 分钟

在使用MySQL时，如果发现查询语句耗时，会进行排查及调优，其中常用的一个方法是用explain查看sql执行计划。

(1) explain小试牛刀

通过一个简单的例子来了解explain的使用

CREATE TABLE `t2` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_a` (`a`),
  KEY `idx_b` (`b`)
) ENGINE=InnoDB ;

往表 t2 中插入 10 万行记录，取值按整数递增，即：(1,1,1)，(2,2,2)，(3,3,3) 直到 (100000,100000,100000)。

用存储过程来插入10万数据，sql如下：

delimiter ;;
create procedure idata()
begin
  declare i int;
  set i=1;
  while(i<=100000)do
    insert into t values(i, i, i);
    set i=i+1;
  end while;
end;;
delimiter ;

call idata(); /** 调用存储过程 */

假设需要执行下面这条语句，语句的执行结果最多只有1条，因为有limit 1，但是表t2有2个索引 idx_a idx_b ，MySQL会选用哪个？
如果是我们来分析，肯定是选idx_a，因为 a between 1 and 1000 只需要扫描idx_a索引1000个值，而 b between 50000 and 100000需要扫描idx_b索引50000个值

mysql> select * from t where (a between 1 and 1000)  and (b between 50000 and 100000) order by b limit 1;

mysql> explain select * from t2 where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1;
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                              |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+
|  1 | SIMPLE      | t2    | NULL       | range | a,b           | b    | 5       | NULL | 50128 |     1.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql>

结合慢查询日志

# Time: 2022-12-10T17:11:35.593173+08:00
# User@Host: admin[admin] @ localhost [127.0.0.1]  Id:    16
# Query_time: 0.105205  Lock_time: 0.000003 Rows_sent: 0  Rows_examined: 50001
SET timestamp=1670663495;
select * from t2 where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1;

慢日志的Rows_examined：执行该SQL，实际扫描的行数，真实值
慢日志的Rows_sent：执行该SQL，实际返回的行数

之前优化器选择使用索引 b，是因为它认为使用索引 b 可以避免排序（b 本身是索引，已经是有序的了，如果选择索引 b 的话，不需要再做排序，只需要遍历），所以即使扫描行数多，也判定为代价更小。

mysql> explain select * from t2 where (a between 1 and 1000) and (b between 50000 and 100000) order by b,a limit 1;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | t2    | NULL       | range | a,b           | a    | 5       | NULL | 1000 |    50.00 | Using index condition; Using where; Using filesort |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------------+
1 row in set, 1 warning (0.01 sec)

mysql>

order by b,a 这种写法，要求按照 b,a 排序，就意味着使用这两个索引都需要排序。因此，扫描行数成了影响决策的主要条件，于是此时优化器选了只需要扫描 1000 行的索引 a。

# Time: 2022-12-10T17:27:04.352721+08:00
# User@Host: admin[admin] @ localhost [127.0.0.1]  Id:    16
# Query_time: 0.020284  Lock_time: 0.000020 Rows_sent: 0  Rows_examined: 1000
SET timestamp=1670664424;
select * from t2 where (a between 1 and 1000) and (b between 50000 and 100000) order by b,a limit 1;

mysql>  explain
    ->  select count(*)
    ->  from table_c
    ->  where l_id not in (
    ->    select l_id from table_l
    ->  ) ;
+----+--------------------+---------------+------------+----------------+---------------+--------------+---------+------+--------+----------+------------------------------------+
| id | select_type        | table         | partitions | type           | possible_keys | key          | key_len | ref  | rows   | filtered | Extra                              |
+----+--------------------+---------------+------------+----------------+---------------+--------------+---------+------+--------+----------+------------------------------------+
|  1 | PRIMARY            | table_c       | NULL       | index          | NULL          | idx_l_id     | 51      | NULL | 197037 |   100.00 | Using where; Using index           |
|  2 | DEPENDENT SUBQUERY | table_l       | NULL       | index_subquery | idx_l_id      | idx_l_id     | 387     | func |      2 |   100.00 | Using index; Full scan on NULL key |
+----+--------------------+---------------+------------+----------------+---------------+--------------+---------+------+--------+----------+------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql>

(2) explain参数详解

id	Columns	JSON Name	Meaning
1	id	select_id	每个select子句的标识id
2	select_type	None	select语句的类型
3	table	table_name	当前表名
4	partitions	partitions	匹配的分区
5	type	access_type	当前表内访问方式 join type
6	possible_keys	possible_keys	可能使用到的索引
7	key	key	经过优化器评估最终使用的索引
8	key_len	key_length	使用到的索引长度
9	ref	ref	引用到的上一个表的列
10	rows	rows	rows_examined，要得到最终记录索要扫描经过的记录数
11	filtered	filtered	按表条件过滤行的百分比
12	Extra	None	额外的信息说明

(2.1) id

SELECT识别符。这是SELECT查询序列号。这个不重要,查询序号即为sql语句执行的顺序。

(2.2) select_type

select_type语句类型有

select类型，它有以下几种:

id	select_type value	JSON name	Meaning
1	SIMPLE	None	简单的SELECT语句（不包括UNION操作或子查询操作）
2	PRIMARY	None	PRIMARY：查询中最外层的SELECT（如两表做UNION或者存在子查询的外层的表操作为PRIMARY，内层的操作为UNION）
3	UNION	None	UNION：UNION操作中，查询中处于内层的SELECT（内层的SELECT语句与外层的SELECT语句没有依赖关系）
4	DEPENDENT UNION	dependent(true)	DEPENDENT UNION：UNION操作中，查询中处于内层的SELECT（内层的SELECT语句与外层的SELECT语句有依赖关系）
5	UNIOIN RESULT	union_result	UNION RESULT：UNION操作的结果，id值通常为NULL
6	SUBQUERY	None	SUBQUERY：子查询中首个SELECT（如果有多个子查询存在）
7	DEPENDENT SUBQUERY	dependent(true)	DEPENDENT SUBQUERY：子查询中首个SELECT，但依赖于外层的表（如果有多个子查询存在）
8	DERIVED	None	DERIVED：被驱动的SELECT子查询（子查询位于FROM子句）
9	MATERIALIZED	materialized_form_subquery	MATERIALIZED：被物化的子查询
10	UNCACHEABLE SUBQUERY	cacheable(false)	UNCACHEABLE SUBQUERY：对于外层的主表，子查询不可被物化，每次都需要计算（耗时操作）
11	UNCACHEABLE UNION	cacheable(false)	UNCACHEABLE UNION：UNION操作中，内层的不可被物化的子查询（类似于UNCACHEABLE SUBQUERY）

(2.3) table

当前表名

(2.4) partitions

匹配的分区

(2.5) type

当前表内访问方式

性能由好到坏排序：

id	type value	Meaning
1	system	表中只有一行
2	const	单表中最多有一个匹配行，primary key 或者 unique index的检索
3	eq_ref	多表连接中被驱动表的连接列上有primary key或者unique index的检索
4	ref	与eq_ref类似，但不是使用primary key或者unique index，而是普通索引。也可以是单表上non-unique索引检索
5	fulltext	使用FULLTEXT索引执行连接
6	ref_or_null	与ref类似，区别在于条件中包含对NULL的查询
7	index_merge	索引合并优化，利用一个表里的N个索引查询,key_len表示这些索引键的和最长长度。
8	unique_subquery	in的后面是一个查询primary key\unique字段的子查询
9	index_subquery	in的后面是一个查询普通index字段的子查询
10	range	单表索引中的范围查询,使用索引查询出单个表中的一些行数据。ref列会变为null
11	index	等于ALL。它有两种情况：(1)覆盖索引 (2)用索引的顺序做一个全表扫描。
12	all	全表扫描

(2.6) possible_keys

提示使用哪个索引会在该表中找到行

(2.7) key

MYSQL使用的索引

(2.8) key_len

key_len 说明
key_len: 4 // INT NOT NULL
key_len: 5 // INT DEFAULT NULL
key_len: 30 // CHAR(30) NOT NULL
key_len: 32 // VARCHAR(30) NOT NULL
key_len: 92 // VARCHAR(30) NOT NULL CHARSET=utf8

key_len大小的计算规则:
a、一般地，key_len 等于索引列类型字节长度，例如int类型为4-bytes，bigint为8-bytes；
b、如果是字符串类型，还需要同时考虑字符集因素，例如：CHAR(30) UTF8则key_len至少是90-bytes；
c、若该列类型定义时允许NULL，其key_len还需要再加 1-bytes；
d、若该列类型为变长类型，例如 VARCHAR（TEXT\BLOB不允许整列创建索引，如果创建部分索引，也被视为动态列类型），其key_len还需要再加 2-bytes;

(2.9) ref

ref列显示使用哪个列或常数与key一起从表中选择行。

(2.10) rows

rows_examined，要得到最终记录索要扫描经过的记录数，这个数越小越好。

(2.11) filterrd

按表条件过滤行的百分比

(2.12) Extra

Extra是对执行计划的额外说明，包含重要信息。

例如：

id	type value	Meaning
1	const row not found	所要查询的表为空
2	Distinct	mysql正在查询distinct值，因此当它每查到一个distinct值之后就会停止当前组的搜索，去查询下一个值
3	Impossible WHERE	where条件总为false，表里没有满足条件的记录
4	Impossible WHERE noticed after reading const tables	在优化器评估了const表之后，发现where条件均不满足
5	no matching row in const table	当前join的表为const表，不能匹配
6	Not exists	优化器发现内表记录不可能满足where条件
7	Select tables optimized away	在没有group by子句时，对于MyISAM的select count(*)操作，或者当对于min(),max()的操作可以利用索引优化，优化器发现只会返回一行。
8	Using filesort	使用filesort来进行order by操作
9	Using index	覆盖索引
10	Using index for group-by	对于group by列或者distinct列，可以利用索引检索出数据，而不需要去表里查数据、分组、排序、去重等等
11	Using join buffer	之前的表连接在nested loop之后放进join buffer，再来和本表进行join。适用于本表的访问type为range，index或all
12	Using sort_union,using union,using intersect	index_merge的三种情况
13	Using temporary	使用了临时表来存储中间结果集，适用于group by，distinct，或order by列为不同表的列。
14	Using where	在存储引擎层检索出记录后，在server利用where条件进行过滤，并返回给客户端

(3) Explain妙用

查询表的大概数据总数。
EXPLAIN SELECT * FROM t1 ;
查询表t1的大概总数。

References

[1] EXPLAIN Statement
[2] explain-output
[3] MYSQL explain详解