CROSS APPLY produces outer joinSQL counting distinct over partitionHow to use merge hints to isolate complex queries in SQL ServerSSIS Merge Join Trouble: 3 tables into 1Outer Apply vs Left Join PerformanceCROSS APPLY on Scalar functionFull outer join problemsWhy are these two queries having such different executions?Using CROSS APPLY with GROUP BY and TOP 1 with duplicate dataPerformance improvement Outer ApplyGroup by sum based on under group in SQL ServerSQL counting distinct over partition

Why doesn't Adrian Toomes give up Spider-Man's identity?

Why didn't Voldemort recognize that Dumbledore was affected by his curse?

How to trick the reader into thinking they're following a redshirt instead of the protagonist?

Cascading Switches. Will it affect performance?

Can U.S. Tax Forms Be Legally HTMLified?

What is the highest possible temporary AC at level 1, without any help from others?

Arriving at the same result with the opposite hypotheses

Giant Steps - Coltrane and Slonimsky

A IP can traceroute to it, but can not ping

How is John Wick 3 a 15 certificate?

Implement Own Vector Class in C++

How to manually rewind film?

Soft question: Examples where lack of mathematical rigour cause security breaches?

Colloquialism for “see you later”

is it possible for a vehicle to be manufactured witout a catalitic converter

How can I end combat quickly when the outcome is inevitable?

What's up with this leaf?

Playing a Character as Unobtrusive and Subservient, Yet Not Passive

Compiling C files on Ubuntu and using the executable on Windows

What is the actual quality of machine translations?

How do governments keep track of their issued currency?

Any way to create a link to a custom setting's "manage" page?

Group Integers by Originality

Did Milano or Benatar approve or comment on their namesake MCU ships?

CROSS APPLY produces outer join

SQL counting distinct over partitionHow to use merge hints to isolate complex queries in SQL ServerSSIS Merge Join Trouble: 3 tables into 1Outer Apply vs Left Join PerformanceCROSS APPLY on Scalar functionFull outer join problemsWhy are these two queries having such different executions?Using CROSS APPLY with GROUP BY and TOP 1 with duplicate dataPerformance improvement Outer ApplyGroup by sum based on under group in SQL ServerSQL counting distinct over partition

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():

SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A
 -- GROUP BY mt2.Col_A 
 ) AS ca;

The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?

enter image description here

Also why does Uncommenting the group by clause results in an inner join?

enter image description here

I dont think the data is important but copying from that given by kevinwhat on the other question:

create table #MyTable (
Col_A varchar(5),
Col_B int
)

insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)

insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)

edited 8 hours ago

asked 8 hours ago

user182461

462

New contributor

add a comment |

In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():

SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A
 -- GROUP BY mt2.Col_A 
 ) AS ca;

The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?

enter image description here

Also why does Uncommenting the group by clause results in an inner join?

enter image description here

I dont think the data is important but copying from that given by kevinwhat on the other question:

create table #MyTable (
Col_A varchar(5),
Col_B int
)

insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)

insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)

edited 8 hours ago

asked 8 hours ago

user182461

462

New contributor

add a comment |

In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():

SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A
 -- GROUP BY mt2.Col_A 
 ) AS ca;

The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?

enter image description here

Also why does Uncommenting the group by clause results in an inner join?

enter image description here

I dont think the data is important but copying from that given by kevinwhat on the other question:

create table #MyTable (
Col_A varchar(5),
Col_B int
)

insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)

insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)

edited 8 hours ago

asked 8 hours ago

user182461

462

New contributor

In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():

SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A
 -- GROUP BY mt2.Col_A 
 ) AS ca;

The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?

enter image description here

Also why does Uncommenting the group by clause results in an inner join?

enter image description here

I dont think the data is important but copying from that given by kevinwhat on the other question:

create table #MyTable (
Col_A varchar(5),
Col_B int
)

insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)

insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)

sql-server execution-plan cross-apply

edited 8 hours ago

asked 8 hours ago

user182461

462

New contributor

edited 8 hours ago

asked 8 hours ago

user182461

462

New contributor

edited 8 hours ago

asked 8 hours ago

user182461

462

New contributor

asked 8 hours ago

user182461

462

asked 8 hours ago

user182461

462

New contributor

add a comment |

2 Answers
2

active

oldest

votes

Summary

SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.

Details

SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.

You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);

This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

Index Spool apply plan

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);

The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Apply plan without spool

Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).

Outer Join

The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.

Scalar and Vector Aggregates

Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.

When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.

The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.

-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();

db<>fiddle demo

Transforming apply to join

SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;

The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:

SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.

With the `GROUP BY`

Continuing the simplified example, but adding a GROUP BY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

-- Original
SELECT * FROM @A AS A
CROSS APPLY 
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;

The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.

These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):

-- Rewrite
SELECT A.*, J1.c 
FROM @A AS A
JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

Final note

The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.

The optimizer worries about these edge cases so you don't have to.

edited 2 hours ago

answered 2 hours ago

Paul White♦

55.7k14293465

add a comment |

-3

Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.

There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.

You can find a list of the physical operators in the link below.

https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017

The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.

Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.

edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.

answered 8 hours ago

J. Maes

1384

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

user182461 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f239865%2fcross-apply-produces-outer-join%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Summary

Details

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);

This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

Index Spool apply plan

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);

The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Apply plan without spool

Outer Join

Scalar and Vector Aggregates

When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.

The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.

-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();

db<>fiddle demo

Transforming apply to join

SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;

SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

With the `GROUP BY`

Continuing the simplified example, but adding a GROUP BY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

-- Original
SELECT * FROM @A AS A
CROSS APPLY 
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;

The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.

-- Rewrite
SELECT A.*, J1.c 
FROM @A AS A
JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

Final note

The optimizer worries about these edge cases so you don't have to.

edited 2 hours ago

answered 2 hours ago

Paul White♦

55.7k14293465

add a comment |

Summary

Details

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);

This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

Index Spool apply plan

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);

The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Apply plan without spool

Outer Join

Scalar and Vector Aggregates

When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.

The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.

-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();

db<>fiddle demo

Transforming apply to join

SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;

SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

With the `GROUP BY`

Continuing the simplified example, but adding a GROUP BY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

-- Original
SELECT * FROM @A AS A
CROSS APPLY 
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;

The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.

-- Rewrite
SELECT A.*, J1.c 
FROM @A AS A
JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

Final note

The optimizer worries about these edge cases so you don't have to.

edited 2 hours ago

answered 2 hours ago

Paul White♦

55.7k14293465

add a comment |

Summary

Details

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);

This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

Index Spool apply plan

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);

The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Apply plan without spool

Outer Join

Scalar and Vector Aggregates

When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.

The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.

-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();

db<>fiddle demo

Transforming apply to join

SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;

SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

With the `GROUP BY`

Continuing the simplified example, but adding a GROUP BY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

-- Original
SELECT * FROM @A AS A
CROSS APPLY 
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;

The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.

-- Rewrite
SELECT A.*, J1.c 
FROM @A AS A
JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

Final note

The optimizer worries about these edge cases so you don't have to.

edited 2 hours ago

answered 2 hours ago

Paul White♦

55.7k14293465

Summary

Details

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);

This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

Index Spool apply plan

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:

SELECT *
FROM #MyTable AS mt
CROSS APPLY 
(
 SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
 FROM #MyTable AS mt2
 WHERE mt2.Col_A = mt.Col_A 
 --GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);

The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Apply plan without spool

Outer Join

Scalar and Vector Aggregates

When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.

The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.

-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();

db<>fiddle demo

Transforming apply to join

SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;

SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

With the `GROUP BY`

Continuing the simplified example, but adding a GROUP BY:

DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);

INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);

-- Original
SELECT * FROM @A AS A
CROSS APPLY 
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;

The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.

-- Rewrite
SELECT A.*, J1.c 
FROM @A AS A
JOIN
(
 SELECT B.A, c = COUNT_BIG(*) 
 FROM @B AS B
 GROUP BY B.A
) AS J1
 ON J1.A = A.A;

db<>fiddle demo

Final note

The optimizer worries about these edge cases so you don't have to.

edited 2 hours ago

answered 2 hours ago

Paul White♦

55.7k14293465

edited 2 hours ago

answered 2 hours ago

Paul White♦

55.7k14293465

answered 2 hours ago

Paul White♦

55.7k14293465

answered 2 hours ago

Paul White♦

55.7k14293465

add a comment |

-3

Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.

There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.

You can find a list of the physical operators in the link below.

https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017

The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.

Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.

answered 8 hours ago

J. Maes

1384

add a comment |

-3

Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.

There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.

You can find a list of the physical operators in the link below.

https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017

The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.

Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.

answered 8 hours ago

J. Maes

1384

add a comment |

-3

Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.

There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.

You can find a list of the physical operators in the link below.

https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017

The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.

Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.

answered 8 hours ago

J. Maes

1384

Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.

There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.

You can find a list of the physical operators in the link below.

https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017

The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.

Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.

answered 8 hours ago

J. Maes

1384

answered 8 hours ago

J. Maes

1384

answered 8 hours ago

J. Maes

1384

answered 8 hours ago

J. Maes

1384

add a comment |

user182461 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

user182461 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

2 Answers 2

Summary

Details

Outer Join

Scalar and Vector Aggregates

Transforming apply to join

With the GROUP BY

Final note

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Summary

Details

Outer Join

Scalar and Vector Aggregates

Transforming apply to join

With the GROUP BY

Final note

Summary

Details

Outer Join

Scalar and Vector Aggregates

Transforming apply to join

With the GROUP BY

Final note

Summary

Details

Outer Join

Scalar and Vector Aggregates

Transforming apply to join

With the GROUP BY

Final note

Summary

Details

Outer Join

Scalar and Vector Aggregates

Transforming apply to join

With the GROUP BY

Final note

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

2 Answers
2

With the `GROUP BY`

2 Answers
2

2 Answers
2

With the `GROUP BY`

With the `GROUP BY`

With the `GROUP BY`

With the `GROUP BY`