CROSS APPLY produces outer joinSQL counting distinct over partitionHow to use merge hints to isolate complex queries in SQL ServerSSIS Merge Join Trouble: 3 tables into 1Outer Apply vs Left Join PerformanceCROSS APPLY on Scalar functionFull outer join problemsWhy are these two queries having such different executions?Using CROSS APPLY with GROUP BY and TOP 1 with duplicate dataPerformance improvement Outer ApplyGroup by sum based on under group in SQL ServerSQL counting distinct over partition
Why doesn't Adrian Toomes give up Spider-Man's identity?
Why didn't Voldemort recognize that Dumbledore was affected by his curse?
How to trick the reader into thinking they're following a redshirt instead of the protagonist?
Cascading Switches. Will it affect performance?
Can U.S. Tax Forms Be Legally HTMLified?
What is the highest possible temporary AC at level 1, without any help from others?
Arriving at the same result with the opposite hypotheses
Giant Steps - Coltrane and Slonimsky
A IP can traceroute to it, but can not ping
How is John Wick 3 a 15 certificate?
Implement Own Vector Class in C++
How to manually rewind film?
Soft question: Examples where lack of mathematical rigour cause security breaches?
Colloquialism for “see you later”
is it possible for a vehicle to be manufactured witout a catalitic converter
How can I end combat quickly when the outcome is inevitable?
What's up with this leaf?
Playing a Character as Unobtrusive and Subservient, Yet Not Passive
Compiling C files on Ubuntu and using the executable on Windows
What is the actual quality of machine translations?
How do governments keep track of their issued currency?
Any way to create a link to a custom setting's "manage" page?
Group Integers by Originality
Did Milano or Benatar approve or comment on their namesake MCU ships?
CROSS APPLY produces outer join
SQL counting distinct over partitionHow to use merge hints to isolate complex queries in SQL ServerSSIS Merge Join Trouble: 3 tables into 1Outer Apply vs Left Join PerformanceCROSS APPLY on Scalar functionFull outer join problemsWhy are these two queries having such different executions?Using CROSS APPLY with GROUP BY and TOP 1 with duplicate dataPerformance improvement Outer ApplyGroup by sum based on under group in SQL ServerSQL counting distinct over partition
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():
SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
-- GROUP BY mt2.Col_A
) AS ca;
The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?

Also why does Uncommenting the group by clause results in an inner join?

I dont think the data is important but copying from that given by kevinwhat on the other question:
create table #MyTable (
Col_A varchar(5),
Col_B int
)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)
sql-server execution-plan cross-apply
New contributor
user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():
SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
-- GROUP BY mt2.Col_A
) AS ca;
The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?

Also why does Uncommenting the group by clause results in an inner join?

I dont think the data is important but copying from that given by kevinwhat on the other question:
create table #MyTable (
Col_A varchar(5),
Col_B int
)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)
sql-server execution-plan cross-apply
New contributor
user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():
SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
-- GROUP BY mt2.Col_A
) AS ca;
The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?

Also why does Uncommenting the group by clause results in an inner join?

I dont think the data is important but copying from that given by kevinwhat on the other question:
create table #MyTable (
Col_A varchar(5),
Col_B int
)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)
sql-server execution-plan cross-apply
New contributor
user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():
SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
-- GROUP BY mt2.Col_A
) AS ca;
The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?

Also why does Uncommenting the group by clause results in an inner join?

I dont think the data is important but copying from that given by kevinwhat on the other question:
create table #MyTable (
Col_A varchar(5),
Col_B int
)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)
sql-server execution-plan cross-apply
sql-server execution-plan cross-apply
New contributor
user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 8 hours ago
user182461
New contributor
user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 8 hours ago
user182461user182461
462
462
New contributor
user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Summary
SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.
Details
SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.
You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);
This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);
The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).
Outer Join
The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.
Scalar and Vector Aggregates
Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.
When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.
The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.
-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;
-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();
db<>fiddle demo
Transforming apply to join
SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;
The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:
SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.
With the GROUP BY
Continuing the simplified example, but adding a GROUP BY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
-- Original
SELECT * FROM @A AS A
CROSS APPLY
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;
The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.
These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):
-- Rewrite
SELECT A.*, J1.c
FROM @A AS A
JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
Final note
The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.
The optimizer worries about these edge cases so you don't have to.
add a comment |
Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.
There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.
You can find a list of the physical operators in the link below.
https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017
The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.
Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.
edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
user182461 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f239865%2fcross-apply-produces-outer-join%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Summary
SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.
Details
SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.
You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);
This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);
The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).
Outer Join
The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.
Scalar and Vector Aggregates
Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.
When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.
The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.
-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;
-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();
db<>fiddle demo
Transforming apply to join
SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;
The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:
SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.
With the GROUP BY
Continuing the simplified example, but adding a GROUP BY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
-- Original
SELECT * FROM @A AS A
CROSS APPLY
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;
The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.
These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):
-- Rewrite
SELECT A.*, J1.c
FROM @A AS A
JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
Final note
The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.
The optimizer worries about these edge cases so you don't have to.
add a comment |
Summary
SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.
Details
SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.
You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);
This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);
The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).
Outer Join
The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.
Scalar and Vector Aggregates
Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.
When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.
The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.
-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;
-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();
db<>fiddle demo
Transforming apply to join
SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;
The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:
SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.
With the GROUP BY
Continuing the simplified example, but adding a GROUP BY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
-- Original
SELECT * FROM @A AS A
CROSS APPLY
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;
The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.
These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):
-- Rewrite
SELECT A.*, J1.c
FROM @A AS A
JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
Final note
The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.
The optimizer worries about these edge cases so you don't have to.
add a comment |
Summary
SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.
Details
SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.
You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);
This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);
The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).
Outer Join
The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.
Scalar and Vector Aggregates
Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.
When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.
The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.
-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;
-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();
db<>fiddle demo
Transforming apply to join
SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;
The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:
SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.
With the GROUP BY
Continuing the simplified example, but adding a GROUP BY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
-- Original
SELECT * FROM @A AS A
CROSS APPLY
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;
The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.
These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):
-- Rewrite
SELECT A.*, J1.c
FROM @A AS A
JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
Final note
The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.
The optimizer worries about these edge cases so you don't have to.
Summary
SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.
Details
SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.
You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114);
This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:

The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:
SELECT *
FROM #MyTable AS mt
CROSS APPLY
(
SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
--GROUP BY mt2.Col_A
) AS ca
OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);
The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:

Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).
Outer Join
The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.
Scalar and Vector Aggregates
Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.
When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.
The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.
-- Produces zero
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;
-- Produces no rows
SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();
db<>fiddle demo
Transforming apply to join
SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
SELECT * FROM @A AS A
CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;
The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:
SELECT A.*, c = COALESCE(J1.c, 0)
FROM @A AS A
LEFT JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.
With the GROUP BY
Continuing the simplified example, but adding a GROUP BY:
DECLARE @A table (A integer NULL, B integer NULL);
DECLARE @B table (A integer NULL, B integer NULL);
INSERT @A (A, B) VALUES (1, 1);
INSERT @B (A, B) VALUES (2, 2);
-- Original
SELECT * FROM @A AS A
CROSS APPLY
(SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;
The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.
These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):
-- Rewrite
SELECT A.*, J1.c
FROM @A AS A
JOIN
(
SELECT B.A, c = COUNT_BIG(*)
FROM @B AS B
GROUP BY B.A
) AS J1
ON J1.A = A.A;
db<>fiddle demo
Final note
The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.
The optimizer worries about these edge cases so you don't have to.
edited 2 hours ago
answered 2 hours ago
Paul White♦Paul White
55.7k14293465
55.7k14293465
add a comment |
add a comment |
Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.
There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.
You can find a list of the physical operators in the link below.
https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017
The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.
Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.
edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.
add a comment |
Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.
There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.
You can find a list of the physical operators in the link below.
https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017
The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.
Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.
edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.
add a comment |
Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.
There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.
You can find a list of the physical operators in the link below.
https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017
The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.
Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.
edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.
Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.
There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.
You can find a list of the physical operators in the link below.
https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017
The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.
Usually, a logical operation can be implemented by multiple physical
operators. However, in rare cases, a physical operator can implement
multiple logical operations as well.
edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.
answered 8 hours ago
J. MaesJ. Maes
1384
1384
add a comment |
add a comment |
user182461 is a new contributor. Be nice, and check out our Code of Conduct.
user182461 is a new contributor. Be nice, and check out our Code of Conduct.
user182461 is a new contributor. Be nice, and check out our Code of Conduct.
user182461 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f239865%2fcross-apply-produces-outer-join%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown