ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

Subquery using Exists 1 or Exists *

2020-05-27 14:51:59  阅读:287  来源: 互联网

标签:sqlservr exe Exists column bytes Subquery columns using SELECT


Subquery using Exists 1 or Exists *

回答1

No, SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.

Quoth Microsoft: http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4

The select list of a subquery introduced by EXISTS almost always consists of an asterisk (*). There is no reason to list column names because you are just testing whether rows that meet the conditions specified in the subquery exist.

To check yourself, try running the following:

SELECT whatever
  FROM yourtable
 WHEREEXISTS(SELECT1/0FROM someothertable 
                WHERE a_valid_clause )

If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.

EDIT: Note, the SQL Standard actually talks about this.

ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

3) Case:
a) If the <select list> "*" is simply contained in a <subquery> that is immediately contained in an <exists predicate>, then the <select list> is equivalent to a <value expression> that is an arbitrary <literal>.

 

回答2

The reason for this misconception is presumably because of the belief that it will end up reading all columns. It is easy to see that this is not the case.

CREATETABLE T
(
X INT PRIMARYKEY,
Y INT,
Z CHAR(8000))CREATENONCLUSTEREDINDEX NarrowIndex ON T(Y)IFEXISTS(SELECT*FROM T)PRINT'Y'

Gives plan

 

 

This shows that SQL Server was able to use the narrowest index available to check the result despite the fact that the index does not include all columns. The index access is under a semi join operator which means that it can stop scanning as soon as the first row is returned.

So it is clear the above belief is wrong.

However Conor Cunningham from the Query Optimiser team explains here that he typically uses SELECT 1 in this case as it can make a minor performance difference in the compilation of the query.

The QP will take and expand all *'s early in the pipeline and bind them to objects (in this case, the list of columns). It will then remove unneeded columns due to the nature of the query.

So for a simple EXISTS subquery like this:

SELECT col1 FROM MyTable WHERE EXISTS (SELECT * FROM Table2 WHERE MyTable.col1=Table2.col2) The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed.

"SELECT 1" will avoid having to examine any unneeded metadata for that table during query compilation.

However, at runtime the two forms of the query will be identical and will have identical runtimes.

I tested four possible ways of expressing this query on an empty table with various numbers of columns. SELECT 1 vs SELECT * vs SELECT Primary_Key vs SELECT Other_Not_Null_Column.

I ran the queries in a loop using OPTION (RECOMPILE) and measured the average number of executions per second. Results below

 

 

+-------------+----------+---------+---------+--------------+| Num of Cols |*|1|   PK    |NotNull col |+-------------+----------+---------+---------+--------------+|2|2043.5|2043.25|2073.5|2067.5||4|2038.75|2041.25|2067.5|2067.5||8|2015.75|2017|2059.75|2059||16|2005.75|2005.25|2025.25|2035.75||32|1963.25|1967.25|2001.25|1992.75||64|1903|1904|1936.25|1939.75||128|1778.75|1779.75|1799|1806.75||256|1530.75|1526.5|1542.75|1541.25||512|1195|1189.75|1203.75|1198.5||1024|694.75|697|699|699.25|+-------------+----------+---------+---------+--------------+| Total       |17169.25|17171|17408|17408|+-------------+----------+---------+---------+--------------+

As can be seen there is no consistent winner between SELECT 1 and SELECT * and the difference between the two approaches is negligible. The SELECT Not Null col and SELECT PK do appear slightly faster though.

All four of the queries degrade in performance as the number of columns in the table increases.

As the table is empty this relationship does seem only explicable by the amount of column metadata. For COUNT(1) it is easy to see that this gets rewritten to COUNT(*) at some point in the process from the below.

SET SHOWPLAN_TEXT ON;

GO

SELECT COUNT(1)FROM master..spt_values

Which gives the following plan

|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))|--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))|--Index Scan(OBJECT:([master].[dbo].[spt_values].[ix2_spt_values_nu_nc]))

Attaching a debugger to the SQL Server process and randomly breaking whilst executing the below

DECLARE@V int 

WHILE(1=1)SELECT@V=1WHEREEXISTS(SELECT1FROM##T)OPTION(RECOMPILE)

I found that in the cases where the table has 1,024 columns most of the time the call stack looks like something like the below indicating that it is indeed spending a large proportion of the time loading column metadata even when SELECT 1 is used (For the case where the table has 1 column randomly breaking didn't hit this bit of the call stack in 10 attempts)

sqlservr.exe!CMEDAccess::GetProxyBaseIntnl()-0x1e2c79 bytes  
sqlservr.exe!CMEDProxyRelation::GetColumn()+0x57 bytes   
sqlservr.exe!CAlgTableMetadata::LoadColumns()+0x256 bytes    
sqlservr.exe!CAlgTableMetadata::Bind()+0x15c bytes   
sqlservr.exe!CRelOp_Get::BindTree()+0x98 bytes   
sqlservr.exe!COptExpr::BindTree()+0x58 bytes 
sqlservr.exe!CRelOp_FromList::BindTree()+0x5c bytes  
sqlservr.exe!COptExpr::BindTree()+0x58 bytes 
sqlservr.exe!CRelOp_QuerySpec::BindTree()+0xbe bytes 
sqlservr.exe!COptExpr::BindTree()+0x58 bytes 
sqlservr.exe!CScaOp_Exists::BindScalarTree()+0x72 bytes  
... Lines omitted ...
msvcr80.dll!_threadstartex(void * ptd=0x0031d888)  Line 326+0x5 bytes C
kernel32.dll!_BaseThreadStart@8()+0x37 bytes 

This manual profiling attempt is backed up by the VS 2012 code profiler which shows a very different selection of functions consuming the compilation time for the two cases (Top 15 Functions 1024 columns vs Top 15 Functions 1 column).

Both the SELECT 1 and SELECT * versions wind up checking column permissions and fail if the user is not granted access to all columns in the table.

An example I cribbed from a conversation on the heap

CREATEUSER blat WITHOUT LOGIN;
GO
CREATETABLE dbo.T
(
X INT PRIMARYKEY,
Y INT,
Z CHAR(8000))
GO

GRANTSELECTON dbo.T TO blat;DENYSELECTON dbo.T(Z)TO blat;
GO
EXECUTEASUSER='blat';
GO

SELECT1WHEREEXISTS(SELECT1FROM   T);/*  ↑↑↑↑ 
Fails unexpectedly with 

The SELECT permission was denied on the column 'Z' of the 
           object 'T', database 'tempdb', schema 'dbo'.*/

GO
REVERT;DROPUSER blat
DROPTABLE T

So one might speculate that the minor apparent difference when using SELECT some_not_null_col is that it only winds up checking permissions on that specific column (though still loads the metadata for all). However this doesn't seem to fit with the facts as the percentage difference between the two approaches if anything gets smaller as the number of columns in the underlying table increases.

In any event I won't be rushing out and changing all my queries to this form as the difference is very minor and only apparent during query compilation. Removing the OPTION (RECOMPILE) so that subsequent executions can use a cached plan gave the following.

 

 

+-------------+-----------+------------+-----------+--------------+| Num of Cols |*|1|    PK     |NotNull col |+-------------+-----------+------------+-----------+--------------+|2|144933.25|145292|146029.25|143973.5||4|146084|146633.5|146018.75|146581.25||8|143145.25|144393.25|145723.5|144790.25||16|145191.75|145174|144755.5|146666.75||32|144624|145483.75|143531|145366.25||64|145459.25|146175.75|147174.25|146622.5||128|145625.75|143823.25|144132|144739.25||256|145380.75|147224|146203.25|147078.75||512|146045|145609.25|145149.25|144335.5||1024|148280|148076|145593.25|146534.75|+-------------+-----------+------------+-----------+--------------+| Total       |1454769|1457884.75|1454310|1456688.75|+-------------+-----------+------------+-----------+--------------+

The test script I used can be found here

 

 

No, SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.

Quoth Microsoft: http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4

The select list of a subquery introduced by EXISTS almost always consists of an asterisk (*). There is no reason to list column names because you are just testing whether rows that meet the conditions specified in the subquery exist.

To check yourself, try running the following:

SELECT whatever
  FROM yourtable
 WHEREEXISTS(SELECT1/0FROM someothertable 
                WHERE a_valid_clause )

If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.

EDIT: Note, the SQL Standard actually talks about this.

ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

3) Case:
a) If the <select list> "*" is simply contained in a <subquery> that is immediately contained in an <exists predicate>, then the <select list> is equivalent to a <value expression> that is an arbitrary <literal>.

标签:sqlservr,exe,Exists,column,bytes,Subquery,columns,using,SELECT
来源: https://www.cnblogs.com/chucklu/p/12973077.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有