sql-server – 在查询中更改比较值时高度绝望的执行时间

2024年1月18日 202次阅读

我对查询的执行时间有疑问,这让我感到困惑.

我知道一些方法可以解决问题并获得更好和可接受的执行时间,但仍然没有解决问题发生的原因.

样本表

我们有两个表,由外键相关.

表格1

| Id | IdTable2 |
|:--:|:--------:|
|  1 |     4    |
|  2 |     7    |
|  3 |     8    |
|  4 |     6    |
|  5 |     4    |
|  6 |     1    |
|  7 |     1    |
|  8 |     6    |
|  9 |     7    |
| 10 |     1    |

表2

| Id | ValueField |
|:--:|:----------:|
|  1 |      0     |
|  2 |      0     |
|  3 |      0     |
|  4 |      1     |
|  5 |      0     |
|  6 |      1     |
|  7 |      0     |

询问

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = ?);

哪里？可以是0或1

真实数据计数

上面的表只是一个简化示例,但这些表的实际行数如下：

>表1：60420行
>表2：62行
> Table2,ValueField 0：51行
> Table2,ValueField 1：11行
>表1,IdTable2,ValueField 0：599行
>带有IdTable2的Table1,ValueField 1：59821行

问题

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 1);
-- Execution time HIGH

好吧,首先我认为子查询是斗争,但如果子查询是问题,不同的值将不会在如此绝望的时间执行,所以我想可能检索到的数据量是问题,所以我试试这个：

SELECT * FROM Table1 WHERE IdTable2 IN (1,2,3,5,7); -- Equivalent of ValueField 0
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (4,6); -- Equivalent of ValueField 1
-- Execution time LOW/INSTANT

嗯…检索到的数据也不是,让我们尝试别的：

SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 NOT IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT

如果我扭转它会发生什么？

SELECT * FROM Table1 WHERE IdTable2 NOT IN (SELECT Id FROM Table2 WHERE ValueField = 1);
-- Execution time LOW/INSTANT
SELECT * FROM Table1 WHERE IdTable2 IN (SELECT Id FROM Table2 WHERE ValueField = 0);
-- Execution time LOW/INSTANT

嗯……这几乎告诉我问题不在于子查询和数据,但是为什么与ValueField = 1比较并且使用IN导致问题并且没有其他选择可以复制HIGH执行时间？

执行计划

对于SQL IN ValueField 1：

SELECT * FROM Incidencias WHERE EstadoWorkflow in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 1);

http://s000.tinyupload.com/index.php?file_id=19036217708532467879

对于SQL IN ValueField 0：

SELECT * FROM Incidencias WHERE EstadoWorkflow in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 0);

http://s000.tinyupload.com/index.php?file_id=49593927895920014301

对于SQL NOT IN ValueField 0：

SELECT * FROM Incidencias WHERE EstadoWorkflow not in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 0);

http://s000.tinyupload.com/index.php?file_id=03901091628843565847

对于SQL NOT IN ValueField 1：

SELECT * FROM Incidencias WHERE EstadoWorkflow not in (SELECT IdEstadoWorkflow FROM EstadosWorkflows WHERE Final = 1);

http://s000.tinyupload.com/index.php?file_id=69996775965382534356

查询与我在示例中发布的内容相同,但是使用其他名称,这是示例查询与实际查询的等效字典.

>表1：Incidencias
>表2：EstadosWorkflows
> IdTable2：EstadoWorkflow
> Table2.Id：IdEstadoWorkflow
> ValueField：最终

相反,为了更好的阅读：

> Incidencias：表1
> EstadosWorkflows：表2
> EstadoWorkflow：IdTable2
> IdEstadoWorkflow：Table2.Id
>最终：ValueField

真实生产查询

这些查询与查询计划显示相同的问题,但有额外的昂贵操作(如巨大的存在和连接),问题变得更糟.
我真的希望我没有用简化的例子误导你.

使用值0查询IN

SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 0) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间：266ms.
执行计划：http://s000.tinyupload.com/index.php?file_id=36115325682943356233

使用值1查询IN

SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 1) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间：28506ms.
执行计划：http://s000.tinyupload.com/index.php?file_id=72827687005228029776

查询NOT IN值为0

SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow not in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 0) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间：498ms.
执行计划：http://s000.tinyupload.com/index.php?file_id=35554889075362686964

查询NOT IN值为1

SELECT distinct top 15 this_.IdIncidencia as y0_, this_.Fecha as y1_ 
FROM Incidencias this_ inner join Usuarios usuario1_ on this_.Usuario=usuario1_.IdUsuario inner join Usuarios_Perfiles perfiles5_ on usuario1_.IdUsuario=perfiles5_.Usuario and (perfiles5_.perfil in (select perfiles.idperfil from perfiles where perfiles.borrado = 0)) inner join Perfiles prf2_ on perfiles5_.Perfil=prf2_.IdPerfil 
WHERE 
this_.Instancia = 4 and 
this_.EstadoWorkflow not in (SELECT this_0_.IdEstadoWorkflow as y0_ FROM EstadosWorkflows this_0_ WHERE this_0_.Final = 1) and 
exists (SELECT this_0_.IdPerfilPermiso as y0_ FROM Perfiles_Permisos this_0_ inner join Permisos prm1_ on this_0_.Permiso=prm1_.IdPermiso WHERE this_0_.IdPerfilPermiso in (206558, 206559, 209393, 209394) and (this_0_.PerfilAutorizado = prf2_.IdPerfil and this_0_.TipologiaAutorizada = this_.Tipologia and prm1_.Controlador = 'Incidencias' and prm1_.Accion = 'Index')) 
ORDER BY this_.Fecha desc

执行时间：386ms.
执行计划：http://s000.tinyupload.com/index.php?file_id=11500314236594795220

最佳答案导致该问题的原因是SQL Server无法知道在进行优化时将为in -statement返回的确切值,因此无法使用统计信息.

当您在in子句中具有确切的值时,可以将它们与统计信息进行比较,并且SQL Server很可能非常准确地估计将有多少行,然后可以选择执行的最佳计划.

我自己没有尝试过,但你可以尝试为id创建一个过滤的统计信息,分别为值字段0和1,这可能会改善这种情况.

更新

从最新的图片可以清楚地看出估计是偏离的,行数估计为1,但在嵌套循环之后实际上是59851：

而这个错误的估计似乎会导致大量的表扫描,因为预计只会执行一次：

由于这是表扫描而不是聚簇索引扫描,因此看起来该表没有聚簇索引,也没有其他可以使用的索引.你能为此做些什么吗？不知道数据量,但是包含或正常列idperfil的borrado索引可能有所帮助.这也是在0值计划中发生的情况,但由于行数仅为605,因此605表扫描没有花费那么多时间,但是当你这样做几乎多100倍时,它开始需要时间.

看看not in-plan,然后搜索的结构完全不同,很可能是因为估计的行数更接近实际的行,SQL Server使用这种计划：

所以另一个解决方案可能是用Usuarios_Perfiles创建一个临时表(带有perfiles -limitation)可以提供帮助,因为它只有1179行.

没有统计IO输出,它不是100％确定花费时间的地方,但看起来很像是由表扫描引起的.