Asked  7 Months ago    Answers:  5   Viewed   38 times

What is the main purpose of using CROSS APPLY?

I have read (vaguely, through posts on the Internet) that cross apply can be more efficient when selecting over large data sets if you are partitioning. (Paging comes to mind)

I also know that CROSS APPLY doesn't require a UDF as the right-table.

In most INNER JOIN queries (one-to-many relationships), I could rewrite them to use CROSS APPLY, but they always give me equivalent execution plans.

Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?


Edit:

Here's a trivial example, where the execution plans are exactly the same. (Show me one where they differ and where cross apply is faster/more efficient)

create table Company (
    companyId int identity(1,1)
,   companyName varchar(100)
,   zipcode varchar(10) 
,   constraint PK_Company primary key (companyId)
)
GO

create table Person (
    personId int identity(1,1)
,   personName varchar(100)
,   companyId int
,   constraint FK_Person_CompanyId foreign key (companyId) references dbo.Company(companyId)
,   constraint PK_Person primary key (personId)
)
GO

insert Company
select 'ABC Company', '19808' union
select 'XYZ Company', '08534' union
select '123 Company', '10016'


insert Person
select 'Alan', 1 union
select 'Bobby', 1 union
select 'Chris', 1 union
select 'Xavier', 2 union
select 'Yoshi', 2 union
select 'Zambrano', 2 union
select 'Player 1', 3 union
select 'Player 2', 3 union
select 'Player 3', 3 


/* using CROSS APPLY */
select *
from Person p
cross apply (
    select *
    from Company c
    where p.companyid = c.companyId
) Czip

/* the equivalent query using INNER JOIN */
select *
from Person p
inner join Company c on p.companyid = c.companyId

 Answers

26

Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?

See the article in my blog for detailed performance comparison:

  • INNER JOIN vs. CROSS APPLY

CROSS APPLY works better on things that have no simple JOIN condition.

This one selects 3 last records from t2 for each record from t1:

SELECT  t1.*, t2o.*
FROM    t1
CROSS APPLY
        (
        SELECT  TOP 3 *
        FROM    t2
        WHERE   t2.t1_id = t1.id
        ORDER BY
                t2.rank DESC
        ) t2o

It cannot be easily formulated with an INNER JOIN condition.

You could probably do something like that using CTE's and window function:

WITH    t2o AS
        (
        SELECT  t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
        FROM    t2
        )
SELECT  t1.*, t2o.*
FROM    t1
INNER JOIN
        t2o
ON      t2o.t1_id = t1.id
        AND t2o.rn <= 3

, but this is less readable and probably less efficient.

Update:

Just checked.

master is a table of about 20,000,000 records with a PRIMARY KEY on id.

This query:

WITH    q AS
        (
        SELECT  *, ROW_NUMBER() OVER (ORDER BY id) AS rn
        FROM    master
        ),
        t AS 
        (
        SELECT  1 AS id
        UNION ALL
        SELECT  2
        )
SELECT  *
FROM    t
JOIN    q
ON      q.rn <= t.id

runs for almost 30 seconds, while this one:

WITH    t AS 
        (
        SELECT  1 AS id
        UNION ALL
        SELECT  2
        )
SELECT  *
FROM    t
CROSS APPLY
        (
        SELECT  TOP (t.id) m.*
        FROM    master m
        ORDER BY
                id
        ) q

is instant.

Tuesday, June 1, 2021
 
muncherelli
answered 7 Months ago
58

It depends heavily on the CMS's (module/theme) API, for instance, one Drupal modularity in my opinion is its greatest strength, although learning Drupal itself is not to be taken lightly either. I've seen a lot of commercial sites done in Drupal, most of them look successful, but it made me think what was the total cost of creating the modules, customizing it, etc.

Since you mention you are a newbie in this stuff, do take in account all the stuff to take care of when you are creating a website from the scracth:

  • Security (XSS prevention, sql injection, blah blah)
  • Authentication
  • A flexible theme system (unless you mix html with code... good luck, although there are some really nice template system available for PHP 5)
  • Database Layer (just use an ORM)
  • Learning JavaScript, then learn jQuery, MooTools, etc.
  • Administration panel
  • Adding stuff like content management

But most importantly, plan something before doing it. Starting out just because you feel like without planning what you want and how will you implement it creates uncertainty, just too many doubts...

So start with a CMS, even for a personal site. There are solutions like Joomla! Drupal, SimpleCMS, some django CMS are also out there. Learn the language of the CMS and start creating your own module as you see fit. Always read their documentation, search in forums before asking, or search here in stackoverflow. Really.. just ask here.. better than googling for a solution :P

Wednesday, March 31, 2021
 
Aamir
answered 9 Months ago
59

Generally speaking, IN and JOIN are different queries that can yield different results.

SELECT  a.*
FROM    a
JOIN    b
ON      a.col = b.col

is not the same as

SELECT  a.*
FROM    a
WHERE   col IN
        (
        SELECT  col
        FROM    b
        )

, unless b.col is unique.

However, this is the synonym for the first query:

SELECT  a.*
FROM    a
JOIN    (
        SELECT  DISTINCT col
        FROM    b
        )
ON      b.col = a.col

If the joining column is UNIQUE and marked as such, both these queries yield the same plan in SQL Server.

If it's not, then IN is faster than JOIN on DISTINCT.

See this article in my blog for performance details:

  • IN vs. JOIN vs. EXISTS
Tuesday, June 1, 2021
 
TMichel
answered 7 Months ago
70

There will be no difference as you can test yourself by inspecting the execution plans. If id is the clustered index, you should see an ordered clustered index scan; if it is not indexed, you'll still see either a table scan or a clustered index scan, but it won't be ordered in either case.

The TOP 1 approach can be useful if you want to pull along other values from the row, which is easier than pulling the max in a subquery and then joining. If you want other values from the row, you need to dictate how to deal with ties in both cases.

Having said that, there are some scenarios where the plan can be different, so it is important to test depending on whether the column is indexed and whether or not it is monotonically increasing. I created a simple table and inserted 50000 rows:

CREATE TABLE dbo.x
(
  a INT, b INT, c INT, d INT, 
  e DATETIME, f DATETIME, g DATETIME, h DATETIME
);
CREATE UNIQUE CLUSTERED INDEX a ON dbo.x(a);
CREATE INDEX b ON dbo.x(b)
CREATE INDEX e ON dbo.x(e);
CREATE INDEX f ON dbo.x(f);

INSERT dbo.x(a, b, c, d, e, f, g, h)
SELECT 
  n.rn, -- ints monotonically increasing
  n.a,  -- ints in random order
  n.rn, 
  n.a, 
  DATEADD(DAY, n.rn/100, '20100101'), -- dates monotonically increasing
  DATEADD(DAY, -n.a % 1000, '20120101'),     -- dates in random order
  DATEADD(DAY, n.rn/100, '20100101'),
  DATEADD(DAY, -n.a % 1000, '20120101')
FROM
(
  SELECT TOP (50000) 
     (ABS(s1.[object_id]) % 10000) + 1, 
     rn = ROW_NUMBER() OVER (ORDER BY s2.[object_id])
  FROM sys.all_objects AS s1 
  CROSS JOIN sys.all_objects AS s2
) AS n(a,rn);
GO

On my system this created values in a/c from 1 to 50000, b/d between 3 and 9994, e/g from 2010-01-01 through 2011-05-16, and f/h from 2009-04-28 through 2012-01-01.

First, let's compare the indexed monotonically increasing integer columns, a and c. a has a clustered index, c does not:

SELECT MAX(a) FROM dbo.x;
SELECT TOP (1) a FROM dbo.x ORDER BY a DESC;

SELECT MAX(c) FROM dbo.x;
SELECT TOP (1) c FROM dbo.x ORDER BY c DESC;

Results:

enter image description here

The big problem with the 4th query is that, unlike MAX, it requires a sort. Here is 3 compared to 4:

enter image description here

enter image description here

This will be a common problem across all of these query variations: a MAX against an unindexed column will be able to piggy-back on the clustered index scan and perform a stream aggregate, while TOP 1 needs to perform a sort which is going to be more expensive.

I did test and saw the exact same results across testing b+d, e+g, and f+h.

So it seems to me that, in addition to producing more standards-compliance code, there is a potential performance benefit to using MAX in favor of TOP 1 depending on the underlying table and indexes (which can change after you've put your code in production). So I would say that, without further information, MAX is preferable.

(And as I said before, TOP 1 might really be the behavior you're after, if you're pulling additional columns. You'll want to test MAX + JOIN methods as well if that's what you're after.)

Friday, July 30, 2021
 
francadaval
answered 4 Months ago
56

"NavigationWindow does not store an instance of a content object in navigation history. Instead, NavigationWindow creates a new instance of the content object each time it is navigated to by using navigation history. This behavior is designed to avoid excessive memory consumption when large numbers and large pieces of content are being navigated to. Consequently, the state of the content is not remembered from one navigation to the next. However, WPF provides several techniques by which you can store a piece of state for a piece of content in navigation history...."

http://msdn.microsoft.com/en-us/library/system.windows.navigation.navigationwindow.aspx

Monday, August 16, 2021
 
Tak
answered 4 Months ago
Tak
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share