CROSS APPLY produces outer joinSQL counting distinct over partitionHow to use merge hints to isolate complex queries in SQL ServerSSIS Merge Join Trouble: 3 tables into 1Outer Apply vs Left Join PerformanceCROSS APPLY on Scalar functionFull outer join problemsWhy are these two queries having such different executions?Using CROSS APPLY with GROUP BY and TOP 1 with duplicate dataPerformance improvement Outer ApplyGroup by sum based on under group in SQL ServerSQL counting distinct over partition

Why doesn't Adrian Toomes give up Spider-Man's identity?

Why didn't Voldemort recognize that Dumbledore was affected by his curse?

How to trick the reader into thinking they're following a redshirt instead of the protagonist?

Cascading Switches. Will it affect performance?

Can U.S. Tax Forms Be Legally HTMLified?

What is the highest possible temporary AC at level 1, without any help from others?

Arriving at the same result with the opposite hypotheses

Giant Steps - Coltrane and Slonimsky

A IP can traceroute to it, but can not ping

How is John Wick 3 a 15 certificate?

Implement Own Vector Class in C++

How to manually rewind film?

Soft question: Examples where lack of mathematical rigour cause security breaches?

Colloquialism for “see you later”

is it possible for a vehicle to be manufactured witout a catalitic converter

How can I end combat quickly when the outcome is inevitable?

What's up with this leaf?

Playing a Character as Unobtrusive and Subservient, Yet Not Passive

Compiling C files on Ubuntu and using the executable on Windows

What is the actual quality of machine translations?

How do governments keep track of their issued currency?

Any way to create a link to a custom setting's "manage" page?

Group Integers by Originality

Did Milano or Benatar approve or comment on their namesake MCU ships?



CROSS APPLY produces outer join


SQL counting distinct over partitionHow to use merge hints to isolate complex queries in SQL ServerSSIS Merge Join Trouble: 3 tables into 1Outer Apply vs Left Join PerformanceCROSS APPLY on Scalar functionFull outer join problemsWhy are these two queries having such different executions?Using CROSS APPLY with GROUP BY and TOP 1 with duplicate dataPerformance improvement Outer ApplyGroup by sum based on under group in SQL ServerSQL counting distinct over partition






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








9















In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():



SELECT *
FROM #MyTable AS mt
CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
FROM #MyTable AS mt2
WHERE mt2.Col_A = mt.Col_A
-- GROUP BY mt2.Col_A
) AS ca;


The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?



enter image description here



Also why does Uncommenting the group by clause results in an inner join?



enter image description here



I dont think the data is important but copying from that given by kevinwhat on the other question:



create table #MyTable (
Col_A varchar(5),
Col_B int
)

insert into #MyTable values ('A',1)
insert into #MyTable values ('A',1)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',2)
insert into #MyTable values ('A',3)

insert into #MyTable values ('B',4)
insert into #MyTable values ('B',4)
insert into #MyTable values ('B',5)









share|improve this question









New contributor



user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    9















    In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():



    SELECT *
    FROM #MyTable AS mt
    CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
    FROM #MyTable AS mt2
    WHERE mt2.Col_A = mt.Col_A
    -- GROUP BY mt2.Col_A
    ) AS ca;


    The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?



    enter image description here



    Also why does Uncommenting the group by clause results in an inner join?



    enter image description here



    I dont think the data is important but copying from that given by kevinwhat on the other question:



    create table #MyTable (
    Col_A varchar(5),
    Col_B int
    )

    insert into #MyTable values ('A',1)
    insert into #MyTable values ('A',1)
    insert into #MyTable values ('A',2)
    insert into #MyTable values ('A',2)
    insert into #MyTable values ('A',2)
    insert into #MyTable values ('A',3)

    insert into #MyTable values ('B',4)
    insert into #MyTable values ('B',4)
    insert into #MyTable values ('B',5)









    share|improve this question









    New contributor



    user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      9












      9








      9


      2






      In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():



      SELECT *
      FROM #MyTable AS mt
      CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
      FROM #MyTable AS mt2
      WHERE mt2.Col_A = mt.Col_A
      -- GROUP BY mt2.Col_A
      ) AS ca;


      The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?



      enter image description here



      Also why does Uncommenting the group by clause results in an inner join?



      enter image description here



      I dont think the data is important but copying from that given by kevinwhat on the other question:



      create table #MyTable (
      Col_A varchar(5),
      Col_B int
      )

      insert into #MyTable values ('A',1)
      insert into #MyTable values ('A',1)
      insert into #MyTable values ('A',2)
      insert into #MyTable values ('A',2)
      insert into #MyTable values ('A',2)
      insert into #MyTable values ('A',3)

      insert into #MyTable values ('B',4)
      insert into #MyTable values ('B',4)
      insert into #MyTable values ('B',5)









      share|improve this question









      New contributor



      user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      In answer to SQL counting distinct over partition Erik Darling posted this code to work around for the lack of COUNT(DISTINCT) OVER ():



      SELECT *
      FROM #MyTable AS mt
      CROSS APPLY ( SELECT COUNT(DISTINCT mt2.Col_B) AS dc
      FROM #MyTable AS mt2
      WHERE mt2.Col_A = mt.Col_A
      -- GROUP BY mt2.Col_A
      ) AS ca;


      The query uses CROSS APPLY (not OUTER APPLY) so why is there an outer join in the execution plan instead of an inner join?



      enter image description here



      Also why does Uncommenting the group by clause results in an inner join?



      enter image description here



      I dont think the data is important but copying from that given by kevinwhat on the other question:



      create table #MyTable (
      Col_A varchar(5),
      Col_B int
      )

      insert into #MyTable values ('A',1)
      insert into #MyTable values ('A',1)
      insert into #MyTable values ('A',2)
      insert into #MyTable values ('A',2)
      insert into #MyTable values ('A',2)
      insert into #MyTable values ('A',3)

      insert into #MyTable values ('B',4)
      insert into #MyTable values ('B',4)
      insert into #MyTable values ('B',5)






      sql-server execution-plan cross-apply






      share|improve this question









      New contributor



      user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share|improve this question









      New contributor



      user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share|improve this question




      share|improve this question








      edited 8 hours ago







      user182461













      New contributor



      user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      asked 8 hours ago









      user182461user182461

      462




      462




      New contributor



      user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      New contributor




      user182461 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          5














          Summary



          SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.




          Details



          SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.



          You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:



          SELECT *
          FROM #MyTable AS mt
          CROSS APPLY
          (
          SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
          FROM #MyTable AS mt2
          WHERE mt2.Col_A = mt.Col_A
          --GROUP BY mt2.Col_A
          ) AS ca
          OPTION (QUERYTRACEON 9114);


          This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:



          Index Spool apply plan



          The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:



          SELECT *
          FROM #MyTable AS mt
          CROSS APPLY
          (
          SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
          FROM #MyTable AS mt2
          WHERE mt2.Col_A = mt.Col_A
          --GROUP BY mt2.Col_A
          ) AS ca
          OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);


          The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:



          Apply plan without spool



          Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).



          Outer Join



          The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.



          Scalar and Vector Aggregates



          Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.



          When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.



          The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.



          -- Produces zero
          SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

          -- Produces no rows
          SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();


          db<>fiddle demo



          Transforming apply to join



          SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:



          DECLARE @A table (A integer NULL, B integer NULL);
          DECLARE @B table (A integer NULL, B integer NULL);

          INSERT @A (A, B) VALUES (1, 1);
          INSERT @B (A, B) VALUES (2, 2);

          SELECT * FROM @A AS A
          CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;


          The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:



          SELECT A.*, c = COALESCE(J1.c, 0)
          FROM @A AS A
          LEFT JOIN
          (
          SELECT B.A, c = COUNT_BIG(*)
          FROM @B AS B
          GROUP BY B.A
          ) AS J1
          ON J1.A = A.A;


          db<>fiddle demo



          To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.



          With the GROUP BY



          Continuing the simplified example, but adding a GROUP BY:



          DECLARE @A table (A integer NULL, B integer NULL);
          DECLARE @B table (A integer NULL, B integer NULL);

          INSERT @A (A, B) VALUES (1, 1);
          INSERT @B (A, B) VALUES (2, 2);

          -- Original
          SELECT * FROM @A AS A
          CROSS APPLY
          (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;



          The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.



          These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):



          -- Rewrite
          SELECT A.*, J1.c
          FROM @A AS A
          JOIN
          (
          SELECT B.A, c = COUNT_BIG(*)
          FROM @B AS B
          GROUP BY B.A
          ) AS J1
          ON J1.A = A.A;


          db<>fiddle demo



          Final note



          The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.



          The optimizer worries about these edge cases so you don't have to.






          share|improve this answer
































            -3














            Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.



            There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.



            You can find a list of the physical operators in the link below.



            https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017




            The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.



            Usually, a logical operation can be implemented by multiple physical
            operators. However, in rare cases, a physical operator can implement
            multiple logical operations as well.




            edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.






            share|improve this answer























              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "182"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );






              user182461 is a new contributor. Be nice, and check out our Code of Conduct.









              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f239865%2fcross-apply-produces-outer-join%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              5














              Summary



              SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.




              Details



              SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.



              You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:



              SELECT *
              FROM #MyTable AS mt
              CROSS APPLY
              (
              SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
              FROM #MyTable AS mt2
              WHERE mt2.Col_A = mt.Col_A
              --GROUP BY mt2.Col_A
              ) AS ca
              OPTION (QUERYTRACEON 9114);


              This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:



              Index Spool apply plan



              The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:



              SELECT *
              FROM #MyTable AS mt
              CROSS APPLY
              (
              SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
              FROM #MyTable AS mt2
              WHERE mt2.Col_A = mt.Col_A
              --GROUP BY mt2.Col_A
              ) AS ca
              OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);


              The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:



              Apply plan without spool



              Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).



              Outer Join



              The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.



              Scalar and Vector Aggregates



              Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.



              When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.



              The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.



              -- Produces zero
              SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

              -- Produces no rows
              SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();


              db<>fiddle demo



              Transforming apply to join



              SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:



              DECLARE @A table (A integer NULL, B integer NULL);
              DECLARE @B table (A integer NULL, B integer NULL);

              INSERT @A (A, B) VALUES (1, 1);
              INSERT @B (A, B) VALUES (2, 2);

              SELECT * FROM @A AS A
              CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;


              The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:



              SELECT A.*, c = COALESCE(J1.c, 0)
              FROM @A AS A
              LEFT JOIN
              (
              SELECT B.A, c = COUNT_BIG(*)
              FROM @B AS B
              GROUP BY B.A
              ) AS J1
              ON J1.A = A.A;


              db<>fiddle demo



              To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.



              With the GROUP BY



              Continuing the simplified example, but adding a GROUP BY:



              DECLARE @A table (A integer NULL, B integer NULL);
              DECLARE @B table (A integer NULL, B integer NULL);

              INSERT @A (A, B) VALUES (1, 1);
              INSERT @B (A, B) VALUES (2, 2);

              -- Original
              SELECT * FROM @A AS A
              CROSS APPLY
              (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;



              The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.



              These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):



              -- Rewrite
              SELECT A.*, J1.c
              FROM @A AS A
              JOIN
              (
              SELECT B.A, c = COUNT_BIG(*)
              FROM @B AS B
              GROUP BY B.A
              ) AS J1
              ON J1.A = A.A;


              db<>fiddle demo



              Final note



              The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.



              The optimizer worries about these edge cases so you don't have to.






              share|improve this answer





























                5














                Summary



                SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.




                Details



                SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.



                You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:



                SELECT *
                FROM #MyTable AS mt
                CROSS APPLY
                (
                SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
                FROM #MyTable AS mt2
                WHERE mt2.Col_A = mt.Col_A
                --GROUP BY mt2.Col_A
                ) AS ca
                OPTION (QUERYTRACEON 9114);


                This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:



                Index Spool apply plan



                The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:



                SELECT *
                FROM #MyTable AS mt
                CROSS APPLY
                (
                SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
                FROM #MyTable AS mt2
                WHERE mt2.Col_A = mt.Col_A
                --GROUP BY mt2.Col_A
                ) AS ca
                OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);


                The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:



                Apply plan without spool



                Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).



                Outer Join



                The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.



                Scalar and Vector Aggregates



                Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.



                When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.



                The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.



                -- Produces zero
                SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

                -- Produces no rows
                SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();


                db<>fiddle demo



                Transforming apply to join



                SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:



                DECLARE @A table (A integer NULL, B integer NULL);
                DECLARE @B table (A integer NULL, B integer NULL);

                INSERT @A (A, B) VALUES (1, 1);
                INSERT @B (A, B) VALUES (2, 2);

                SELECT * FROM @A AS A
                CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;


                The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:



                SELECT A.*, c = COALESCE(J1.c, 0)
                FROM @A AS A
                LEFT JOIN
                (
                SELECT B.A, c = COUNT_BIG(*)
                FROM @B AS B
                GROUP BY B.A
                ) AS J1
                ON J1.A = A.A;


                db<>fiddle demo



                To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.



                With the GROUP BY



                Continuing the simplified example, but adding a GROUP BY:



                DECLARE @A table (A integer NULL, B integer NULL);
                DECLARE @B table (A integer NULL, B integer NULL);

                INSERT @A (A, B) VALUES (1, 1);
                INSERT @B (A, B) VALUES (2, 2);

                -- Original
                SELECT * FROM @A AS A
                CROSS APPLY
                (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;



                The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.



                These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):



                -- Rewrite
                SELECT A.*, J1.c
                FROM @A AS A
                JOIN
                (
                SELECT B.A, c = COUNT_BIG(*)
                FROM @B AS B
                GROUP BY B.A
                ) AS J1
                ON J1.A = A.A;


                db<>fiddle demo



                Final note



                The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.



                The optimizer worries about these edge cases so you don't have to.






                share|improve this answer



























                  5












                  5








                  5







                  Summary



                  SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.




                  Details



                  SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.



                  You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:



                  SELECT *
                  FROM #MyTable AS mt
                  CROSS APPLY
                  (
                  SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
                  FROM #MyTable AS mt2
                  WHERE mt2.Col_A = mt.Col_A
                  --GROUP BY mt2.Col_A
                  ) AS ca
                  OPTION (QUERYTRACEON 9114);


                  This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:



                  Index Spool apply plan



                  The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:



                  SELECT *
                  FROM #MyTable AS mt
                  CROSS APPLY
                  (
                  SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
                  FROM #MyTable AS mt2
                  WHERE mt2.Col_A = mt.Col_A
                  --GROUP BY mt2.Col_A
                  ) AS ca
                  OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);


                  The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:



                  Apply plan without spool



                  Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).



                  Outer Join



                  The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.



                  Scalar and Vector Aggregates



                  Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.



                  When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.



                  The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.



                  -- Produces zero
                  SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

                  -- Produces no rows
                  SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();


                  db<>fiddle demo



                  Transforming apply to join



                  SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:



                  DECLARE @A table (A integer NULL, B integer NULL);
                  DECLARE @B table (A integer NULL, B integer NULL);

                  INSERT @A (A, B) VALUES (1, 1);
                  INSERT @B (A, B) VALUES (2, 2);

                  SELECT * FROM @A AS A
                  CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;


                  The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:



                  SELECT A.*, c = COALESCE(J1.c, 0)
                  FROM @A AS A
                  LEFT JOIN
                  (
                  SELECT B.A, c = COUNT_BIG(*)
                  FROM @B AS B
                  GROUP BY B.A
                  ) AS J1
                  ON J1.A = A.A;


                  db<>fiddle demo



                  To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.



                  With the GROUP BY



                  Continuing the simplified example, but adding a GROUP BY:



                  DECLARE @A table (A integer NULL, B integer NULL);
                  DECLARE @B table (A integer NULL, B integer NULL);

                  INSERT @A (A, B) VALUES (1, 1);
                  INSERT @B (A, B) VALUES (2, 2);

                  -- Original
                  SELECT * FROM @A AS A
                  CROSS APPLY
                  (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;



                  The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.



                  These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):



                  -- Rewrite
                  SELECT A.*, J1.c
                  FROM @A AS A
                  JOIN
                  (
                  SELECT B.A, c = COUNT_BIG(*)
                  FROM @B AS B
                  GROUP BY B.A
                  ) AS J1
                  ON J1.A = A.A;


                  db<>fiddle demo



                  Final note



                  The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.



                  The optimizer worries about these edge cases so you don't have to.






                  share|improve this answer















                  Summary



                  SQL Server uses the correct join (inner or outer) and adds projections where necessary to honour all the semantics of the original query when performing internal translations between apply and join. The differences in the plans can all be explained by the different semantics of aggregates with and without a group by clause in SQL Server.




                  Details



                  SQL Server can produce an inner join plan for the example query, it just chooses not to for cost reasons. The cost of the outer join plan shown in the question is 0.02898 units on my laptop's SQL Server 2017 instance.



                  You can force an apply (correlated join) plan using undocumented and unsupported trace flag 9114 just for illustration. In the real world, we would typically have an index to support a seek on the inner side of the apply to encourage SQL Server to choose this option:



                  SELECT *
                  FROM #MyTable AS mt
                  CROSS APPLY
                  (
                  SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
                  FROM #MyTable AS mt2
                  WHERE mt2.Col_A = mt.Col_A
                  --GROUP BY mt2.Col_A
                  ) AS ca
                  OPTION (QUERYTRACEON 9114);


                  This produces an apply nested loops plan with a lazy index spool. The total estimated cost is 0.0463983, which is higher than the selected plan:



                  Index Spool apply plan



                  The performance spool caches inner side results for repeated outer values. It can also be removed with a hint:



                  SELECT *
                  FROM #MyTable AS mt
                  CROSS APPLY
                  (
                  SELECT COUNT_BIG(DISTINCT mt2.Col_B) AS dc
                  FROM #MyTable AS mt2
                  WHERE mt2.Col_A = mt.Col_A
                  --GROUP BY mt2.Col_A
                  ) AS ca
                  OPTION (QUERYTRACEON 9114, NO_PERFORMANCE_SPOOL);


                  The resulting plan fully scans the inner side table for each outer row, costing 0.109779 units:



                  Apply plan without spool



                  Note the join predicate is not evaluated at the join for an apply plan. This is the essential difference between an apply (correlated join parameter(s) evaluated on the inner branch) and a regular join (predicate(s) evaluated at the join operator).



                  Outer Join



                  The outer join in the question arises from an optimizer transformation (ApplyHandler) from apply to join. SQL Server tries to rewrite applies as joins up front because it knows more plan space exploration tricks with joins than apply. Once transformed (if possible) the optimizer may consider rewriting it back to an apply later on during cost-based optimization.



                  Scalar and Vector Aggregates



                  Without a GROUP BY clause the COUNT is a scalar aggregate. In SQL Server, this means the aggregate will always produce a row, even if it is given no rows to aggregate. The SQL Server scalar COUNT aggregate of no rows is zero.



                  When a GROUP BY clause is present, the COUNT aggregate is a vector aggregate. This produces no rows at all (not zero) when presented with an empty set.



                  The following toy queries show the difference. You can also read more about scalar and vector aggregates in my article Fun with Scalar and Vector Aggregates.



                  -- Produces zero
                  SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1;

                  -- Produces no rows
                  SELECT COUNT_BIG(*) FROM #MyTable AS MT WHERE 0 = 1 GROUP BY ();


                  db<>fiddle demo



                  Transforming apply to join



                  SQL Server is very careful to preserve all the semantics when translating from apply to join. To simplify, consider the following APPLY:



                  DECLARE @A table (A integer NULL, B integer NULL);
                  DECLARE @B table (A integer NULL, B integer NULL);

                  INSERT @A (A, B) VALUES (1, 1);
                  INSERT @B (A, B) VALUES (2, 2);

                  SELECT * FROM @A AS A
                  CROSS APPLY (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A) AS CA;


                  The correct result for column c there is zero, because the COUNT_BIG is a scalar aggregate. When translating this apply query to join form, SQL Server generates an alternative that would look similar to the following if expressed in T-SQL:



                  SELECT A.*, c = COALESCE(J1.c, 0)
                  FROM @A AS A
                  LEFT JOIN
                  (
                  SELECT B.A, c = COUNT_BIG(*)
                  FROM @B AS B
                  GROUP BY B.A
                  ) AS J1
                  ON J1.A = A.A;


                  db<>fiddle demo



                  To make this an uncorrelated join, SQL Server has to introduce a GROUP BY in the derived table (otherwise there could be no A column to join on). To preserve the semantics of the original, the join has to be an outer join so each row from table @A continues to produce a row in the output. The left join will produce a NULL for column c that needs to be further translated to zero by COALESCE.



                  With the GROUP BY



                  Continuing the simplified example, but adding a GROUP BY:



                  DECLARE @A table (A integer NULL, B integer NULL);
                  DECLARE @B table (A integer NULL, B integer NULL);

                  INSERT @A (A, B) VALUES (1, 1);
                  INSERT @B (A, B) VALUES (2, 2);

                  -- Original
                  SELECT * FROM @A AS A
                  CROSS APPLY
                  (SELECT c = COUNT_BIG(*) FROM @B AS B WHERE B.A = A.A GROUP BY B.A) AS CA;



                  The COUNT_BIG is now a vector aggregate, so the correct result for an empty input set is no longer zero, it is no row at all. In other words, running the statements above produces no output.



                  These semantics are much easier to honour when translating from apply to join. Since CROSS APPLY rejects any outer row that generates no inner side rows, we can use an inner join with no extra expression projection (and SQL Server does the same):



                  -- Rewrite
                  SELECT A.*, J1.c
                  FROM @A AS A
                  JOIN
                  (
                  SELECT B.A, c = COUNT_BIG(*)
                  FROM @B AS B
                  GROUP BY B.A
                  ) AS J1
                  ON J1.A = A.A;


                  db<>fiddle demo



                  Final note



                  The simplified examples use different tables with different contents to show the semantic differences more clearly. One could argue that the optimizer ought to be able to reason about a self-join in particular not generating any mismatched rows, but it does not contain that logic today. Accessing the same table multiple times in a query is not guaranteed to produce the same results in general anyway, depending on isolation level and concurrent activity.



                  The optimizer worries about these edge cases so you don't have to.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 2 hours ago

























                  answered 2 hours ago









                  Paul WhitePaul White

                  55.7k14293465




                  55.7k14293465























                      -3














                      Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.



                      There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.



                      You can find a list of the physical operators in the link below.



                      https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017




                      The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.



                      Usually, a logical operation can be implemented by multiple physical
                      operators. However, in rare cases, a physical operator can implement
                      multiple logical operations as well.




                      edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.






                      share|improve this answer



























                        -3














                        Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.



                        There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.



                        You can find a list of the physical operators in the link below.



                        https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017




                        The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.



                        Usually, a logical operation can be implemented by multiple physical
                        operators. However, in rare cases, a physical operator can implement
                        multiple logical operations as well.




                        edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.






                        share|improve this answer

























                          -3












                          -3








                          -3







                          Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.



                          There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.



                          You can find a list of the physical operators in the link below.



                          https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017




                          The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.



                          Usually, a logical operation can be implemented by multiple physical
                          operators. However, in rare cases, a physical operator can implement
                          multiple logical operations as well.




                          edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.






                          share|improve this answer













                          Cross Apply is a logical operation on the data. When deciding how to get that data SQL Server chooses the appropriate physical operator to get the data you want.



                          There is no physical apply operator and SQL Server translates it into the appropriate and hopefully efficient join operator.



                          You can find a list of the physical operators in the link below.



                          https://docs.microsoft.com/en-us/sql/relational-databases/showplan-logical-and-physical-operators-reference?view=sql-server-2017




                          The query optimizer creates a query plan as a tree consisting of logical operators. After the query optimizer creates the plan, the query optimizer chooses the most efficient physical operator for each logical operator. The query optimizer uses a cost-based approach to determine which physical operator will implement a logical operator.



                          Usually, a logical operation can be implemented by multiple physical
                          operators. However, in rare cases, a physical operator can implement
                          multiple logical operations as well.




                          edit/ It seems I understood your question wrong. SQL server will normally choose the most appropriate operator. Your query doesn't need to return values for all combinations of both tables which is when a cross join would be used. Just calculating the value you want for each row suffices which is what is done here.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 8 hours ago









                          J. MaesJ. Maes

                          1384




                          1384




















                              user182461 is a new contributor. Be nice, and check out our Code of Conduct.









                              draft saved

                              draft discarded


















                              user182461 is a new contributor. Be nice, and check out our Code of Conduct.












                              user182461 is a new contributor. Be nice, and check out our Code of Conduct.











                              user182461 is a new contributor. Be nice, and check out our Code of Conduct.














                              Thanks for contributing an answer to Database Administrators Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f239865%2fcross-apply-produces-outer-join%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

                              Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

                              François Viète Contents Biography Work and thought Bibliography See also Notes Further reading External links Navigation menup. 21Google Bookspp. 75–77Google BooksDe thou (from University of Saint Andrews)ArchivedGoogle BooksGoogle BooksGoogle BooksGoogle booksGoogle Bookscc-parthenay.frL'histoire universelle (fr)Universal History (en)ArchivedAdsabs.harvard.eduPagesperso-orange.frArchive.orgChikara Sasaki. Descartes' mathematical thought p.259Google BooksGoogle BooksGoogle Bookspp. 152 and onwardGoogle BooksGoogle BooksScribd.comGoogle Books1257-7979Google BooksGoogle BooksGoogle BooksGoogle BooksGoogle BooksGoogle BooksGallica.bnf.frGoogle BooksGoogle Books"François Viète"Francois Viète: Father of Modern Algebraic NotationThe Lawyer and the GamblerAbout TarporleySite de Jean-Paul GuichardL'algèbre nouvelle"About the Harmonicon"cb120511976(data)1188044800000 0001 0913 5903n82164680ola2013766880073431702w6vt1sb70287374827140948071409480