Asked  7 Months ago    Answers:  5   Viewed   46 times

I'm curious which of the following below would be more efficient?

I've always been a bit cautious about using IN because I believe SQL Server turns the result set into a big IF statement. For a large result set, this could result in poor performance. For small result sets, I'm not sure either is preferable. For large result sets, wouldn't EXISTS be more efficient?

WHERE EXISTS (SELECT * FROM Base WHERE bx.BoxID = Base.BoxID AND [Rank] = 2)

vs.

WHERE bx.BoxID IN (SELECT BoxID FROM Base WHERE [Rank = 2])

 Answers

57

EXISTS will be faster because once the engine has found a hit, it will quit looking as the condition has proved true.

With IN, it will collect all the results from the sub-query before further processing.

Tuesday, June 1, 2021
 
Niels
answered 7 Months ago
76

The answer will of course be "it depends" but based on testing this end...

Assuming

  1. 1 million products
  2. product has a clustered index on product_id
  3. Most (if not all) products have corresponding information in the product_code table
  4. Ideal indexes present on product_code for both queries.

The PIVOT version ideally needs an index product_code(product_id, type) INCLUDE (code) whereas the JOIN version ideally needs an index product_code(type,product_id) INCLUDE (code)

If these are in place giving the plans below

Plans

then the JOIN version is more efficient.

In the case that type 1 and type 2 are the only types in the table then the PIVOT version slightly has the edge in terms of number of reads as it doesn't have to seek into product_code twice but that is more than outweighed by the additional overhead of the stream aggregate operator

PIVOT

Table 'product_code'. Scan count 1, logical reads 10467
Table 'product'. Scan count 1, logical reads 4750
   CPU time = 3297 ms,  elapsed time = 3260 ms.

JOIN

Table 'product_code'. Scan count 2, logical reads 10471
Table 'product'. Scan count 1, logical reads 4750
   CPU time = 1906 ms,  elapsed time = 1866 ms.

If there are additional type records other than 1 and 2 the JOIN version will increase its advantage as it just does merge joins on the relevant sections of the type,product_id index whereas the PIVOT plan uses product_id, type and so would have to scan over the additional type rows that are intermingled with the 1 and 2 rows.

Wednesday, June 9, 2021
 
Sagar
answered 6 Months ago
27

INT will be faster - here's why:

  • SQL Server organizes its data and index into pages of 8K
  • if you have an index page with INT key on it, you get roughly 2'000 INT entries
  • if you have NVARCHAR(128) and you use on average 20 characters, that's 40 bytes per entry, or roughly 200 entries per page

So for the same amount of index entries, the NVARCHAR(128) case would use ten times as many index pages.

Loading and searching those index pages will incur significantly more I/O operations.

So to make things short: if you can, always use INT .

Wednesday, August 4, 2021
 
fhonics
answered 4 Months ago
44

Make sure you are searching on indexed columns, with no manipulation of the data within those columns (like substring etc.)

Sunday, September 19, 2021
 
helgoboss
answered 3 Months ago
25
  • Autogrows on the database? Check for messages in the SQL error logs.
  • Page splits due to inserted records? Check table fragmentation with DBCC SHOWCONTIG
  • Antivirus scans? Don't.
  • Out of date statistics? Don't rely on auto-update statistics on tables that change a lot.
  • Don't rule out a problem on the client end, or the networking between them.
  • Run profiler with a filter on duration, only capturing events with duration > 10 seconds, look for patterns in parameters, clients, time of day.
Saturday, November 13, 2021
 
St.Antario
answered 3 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share