Replace long GROUP BY list with a subqueryShould I use left join to do my job in this scenario?How to use and optimize subquery on 100 million rowsDoes Detach/Attach or Offline/Online Clear the Buffer Cache for a Particular Database?SHOWPLAN does not display a warning but “Include Execution Plan” does for the same queryMysql join not workingWhy SELECT COUNT() query execution plan includes left joined table?Query plan missing ParameterCompiledValueDynamic SQL Server cross tabSelect value based on a test

More elegant way to express ((x == a and y == b) or (x == b and y == a))?

Term for anticipating counterarguments and rebutting them

How could a sequence of random dates be generated, given year interval?

How much caffeine would there be if I reuse tea leaves in a second brewing?

We know someone is scrying on us. Is there anything we can do about it?

draw line according to other angle

A partially ugly group with a casual secret

Problem aligning two alphabets

Determine structure from NMR

My bike's adjustable stem keeps falling down

How to make a Bash script to change the format of a date in a CSV file

How do Precipitation Reactions behave in the Absence of Gravity?

Does there exist a word with H sounds like [eɪtʃ]?

In Cura, can I make my top and bottom layer be all perimiters?

How to deal with non-stop callers in the service desk

Why do we use the Greek letter μ (Mu) to denote population mean or expected value in probability and statistics

Hypothesis testing- with normal approximation

Create a program that prints the amount of characters it has, in words

Why are my plastic credit card and activation code sent separately?

What are the Advantages of having a Double-pane Space Helmet?

can't upgrade from kubuntu 19.04 to 19.10

How can I list all flight numbers that connect two countries (non-stop)?

Why did the police not show up at Brett's apartment during the shootout?

What game(s) does Michael play in Mind Field S2E4?



Replace long GROUP BY list with a subquery


Should I use left join to do my job in this scenario?How to use and optimize subquery on 100 million rowsDoes Detach/Attach or Offline/Online Clear the Buffer Cache for a Particular Database?SHOWPLAN does not display a warning but “Include Execution Plan” does for the same queryMysql join not workingWhy SELECT COUNT() query execution plan includes left joined table?Query plan missing ParameterCompiledValueDynamic SQL Server cross tabSelect value based on a test






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









6

















This is a repost of my question on Stack Overflow. They suggested to ask it here:



I found an online article from 2005, where the author claims, that many devs use GROUP BY wrong, and that you should better replace it with a subquery.



I've tested it on one of my queries, where I need to sort the result of a search by the number of joined entries from another table (more common ones should appear first). My original, classic approach was to join both tables on a common ID, group by each field in the select list and order the result by the count of the sub table.



Now, Jeff Smith from the linked blog claims, that you should better use a subselect, which does all the grouping, and than join to that subselect. Checking the execution plans of both approaches, SSMS states, that the large group by requires 52% of the time and the subselect one 48%, so from a technical standpoint, it seems, that the subselect approach is actually marginally faster. However, the "improved" SQL command seems to generate a more complicated execution plan (in terms of nodes)



What do you think? Can you give me some detail about how to interpret the execution plans in this specific case and which one is generally the preferable option?



SELECT
a.ID,
a.ID_AddressType,
a.Name1,
a.Name2,
a.Street,
a.Number,
a.ZipCode,
a.City,
a.Country
FROM dbo.[Address] a
INNER JOIN CONTAINSTABLE(
dbo.[Address],
FullAddress,
'"ZIE*"',
5
) s ON a.ID = s.[KEY]
LEFT JOIN dbo.Haul h ON h.ID_DestinationAddress = a.ID
GROUP BY
a.ID,
a.ID_AddressType,
a.Name1,
a.Name2,
a.Street,
a.Number,
a.ZipCode,
a.City,
a.Country,
s.RANK
ORDER BY s.RANK DESC, COUNT(*) DESC;

SELECT
a.ID,
a.ID_AddressType,
a.Name1,
a.Name2,
a.Street,
a.Number,
a.ZipCode,
a.City,
a.Country
FROM dbo.[Address] a
INNER JOIN CONTAINSTABLE(
dbo.[Address],
FullAddress,
'"ZIE*"',
5
) s ON a.ID = s.[KEY]
LEFT JOIN (
SELECT ID_DestinationAddress, COUNT(*) Cnt
FROM dbo.Haul
GROUP BY ID_DestinationAddress
) h ON h.ID_DestinationAddress = a.ID
ORDER BY s.RANK DESC, h.Cnt DESC;


enter image description here










share|improve this question







New contributor



André Reichelt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





























    6

















    This is a repost of my question on Stack Overflow. They suggested to ask it here:



    I found an online article from 2005, where the author claims, that many devs use GROUP BY wrong, and that you should better replace it with a subquery.



    I've tested it on one of my queries, where I need to sort the result of a search by the number of joined entries from another table (more common ones should appear first). My original, classic approach was to join both tables on a common ID, group by each field in the select list and order the result by the count of the sub table.



    Now, Jeff Smith from the linked blog claims, that you should better use a subselect, which does all the grouping, and than join to that subselect. Checking the execution plans of both approaches, SSMS states, that the large group by requires 52% of the time and the subselect one 48%, so from a technical standpoint, it seems, that the subselect approach is actually marginally faster. However, the "improved" SQL command seems to generate a more complicated execution plan (in terms of nodes)



    What do you think? Can you give me some detail about how to interpret the execution plans in this specific case and which one is generally the preferable option?



    SELECT
    a.ID,
    a.ID_AddressType,
    a.Name1,
    a.Name2,
    a.Street,
    a.Number,
    a.ZipCode,
    a.City,
    a.Country
    FROM dbo.[Address] a
    INNER JOIN CONTAINSTABLE(
    dbo.[Address],
    FullAddress,
    '"ZIE*"',
    5
    ) s ON a.ID = s.[KEY]
    LEFT JOIN dbo.Haul h ON h.ID_DestinationAddress = a.ID
    GROUP BY
    a.ID,
    a.ID_AddressType,
    a.Name1,
    a.Name2,
    a.Street,
    a.Number,
    a.ZipCode,
    a.City,
    a.Country,
    s.RANK
    ORDER BY s.RANK DESC, COUNT(*) DESC;

    SELECT
    a.ID,
    a.ID_AddressType,
    a.Name1,
    a.Name2,
    a.Street,
    a.Number,
    a.ZipCode,
    a.City,
    a.Country
    FROM dbo.[Address] a
    INNER JOIN CONTAINSTABLE(
    dbo.[Address],
    FullAddress,
    '"ZIE*"',
    5
    ) s ON a.ID = s.[KEY]
    LEFT JOIN (
    SELECT ID_DestinationAddress, COUNT(*) Cnt
    FROM dbo.Haul
    GROUP BY ID_DestinationAddress
    ) h ON h.ID_DestinationAddress = a.ID
    ORDER BY s.RANK DESC, h.Cnt DESC;


    enter image description here










    share|improve this question







    New contributor



    André Reichelt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.

























      6












      6








      6


      2






      This is a repost of my question on Stack Overflow. They suggested to ask it here:



      I found an online article from 2005, where the author claims, that many devs use GROUP BY wrong, and that you should better replace it with a subquery.



      I've tested it on one of my queries, where I need to sort the result of a search by the number of joined entries from another table (more common ones should appear first). My original, classic approach was to join both tables on a common ID, group by each field in the select list and order the result by the count of the sub table.



      Now, Jeff Smith from the linked blog claims, that you should better use a subselect, which does all the grouping, and than join to that subselect. Checking the execution plans of both approaches, SSMS states, that the large group by requires 52% of the time and the subselect one 48%, so from a technical standpoint, it seems, that the subselect approach is actually marginally faster. However, the "improved" SQL command seems to generate a more complicated execution plan (in terms of nodes)



      What do you think? Can you give me some detail about how to interpret the execution plans in this specific case and which one is generally the preferable option?



      SELECT
      a.ID,
      a.ID_AddressType,
      a.Name1,
      a.Name2,
      a.Street,
      a.Number,
      a.ZipCode,
      a.City,
      a.Country
      FROM dbo.[Address] a
      INNER JOIN CONTAINSTABLE(
      dbo.[Address],
      FullAddress,
      '"ZIE*"',
      5
      ) s ON a.ID = s.[KEY]
      LEFT JOIN dbo.Haul h ON h.ID_DestinationAddress = a.ID
      GROUP BY
      a.ID,
      a.ID_AddressType,
      a.Name1,
      a.Name2,
      a.Street,
      a.Number,
      a.ZipCode,
      a.City,
      a.Country,
      s.RANK
      ORDER BY s.RANK DESC, COUNT(*) DESC;

      SELECT
      a.ID,
      a.ID_AddressType,
      a.Name1,
      a.Name2,
      a.Street,
      a.Number,
      a.ZipCode,
      a.City,
      a.Country
      FROM dbo.[Address] a
      INNER JOIN CONTAINSTABLE(
      dbo.[Address],
      FullAddress,
      '"ZIE*"',
      5
      ) s ON a.ID = s.[KEY]
      LEFT JOIN (
      SELECT ID_DestinationAddress, COUNT(*) Cnt
      FROM dbo.Haul
      GROUP BY ID_DestinationAddress
      ) h ON h.ID_DestinationAddress = a.ID
      ORDER BY s.RANK DESC, h.Cnt DESC;


      enter image description here










      share|improve this question







      New contributor



      André Reichelt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      This is a repost of my question on Stack Overflow. They suggested to ask it here:



      I found an online article from 2005, where the author claims, that many devs use GROUP BY wrong, and that you should better replace it with a subquery.



      I've tested it on one of my queries, where I need to sort the result of a search by the number of joined entries from another table (more common ones should appear first). My original, classic approach was to join both tables on a common ID, group by each field in the select list and order the result by the count of the sub table.



      Now, Jeff Smith from the linked blog claims, that you should better use a subselect, which does all the grouping, and than join to that subselect. Checking the execution plans of both approaches, SSMS states, that the large group by requires 52% of the time and the subselect one 48%, so from a technical standpoint, it seems, that the subselect approach is actually marginally faster. However, the "improved" SQL command seems to generate a more complicated execution plan (in terms of nodes)



      What do you think? Can you give me some detail about how to interpret the execution plans in this specific case and which one is generally the preferable option?



      SELECT
      a.ID,
      a.ID_AddressType,
      a.Name1,
      a.Name2,
      a.Street,
      a.Number,
      a.ZipCode,
      a.City,
      a.Country
      FROM dbo.[Address] a
      INNER JOIN CONTAINSTABLE(
      dbo.[Address],
      FullAddress,
      '"ZIE*"',
      5
      ) s ON a.ID = s.[KEY]
      LEFT JOIN dbo.Haul h ON h.ID_DestinationAddress = a.ID
      GROUP BY
      a.ID,
      a.ID_AddressType,
      a.Name1,
      a.Name2,
      a.Street,
      a.Number,
      a.ZipCode,
      a.City,
      a.Country,
      s.RANK
      ORDER BY s.RANK DESC, COUNT(*) DESC;

      SELECT
      a.ID,
      a.ID_AddressType,
      a.Name1,
      a.Name2,
      a.Street,
      a.Number,
      a.ZipCode,
      a.City,
      a.Country
      FROM dbo.[Address] a
      INNER JOIN CONTAINSTABLE(
      dbo.[Address],
      FullAddress,
      '"ZIE*"',
      5
      ) s ON a.ID = s.[KEY]
      LEFT JOIN (
      SELECT ID_DestinationAddress, COUNT(*) Cnt
      FROM dbo.Haul
      GROUP BY ID_DestinationAddress
      ) h ON h.ID_DestinationAddress = a.ID
      ORDER BY s.RANK DESC, h.Cnt DESC;


      enter image description here







      sql-server join subquery






      share|improve this question







      New contributor



      André Reichelt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor



      André Reichelt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share|improve this question




      share|improve this question






      New contributor



      André Reichelt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      asked Oct 14 at 6:43









      André ReicheltAndré Reichelt

      1333 bronze badges




      1333 bronze badges




      New contributor



      André Reichelt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      New contributor




      André Reichelt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.

























          1 Answer
          1






          active

          oldest

          votes


















          7


















          If you change the left join with dbo.Haul to a subquery, it will calculate these distinct values of ID_DestinationAddress (Stream Aggregate) and Count them (Compute scalar) directly after getting the data from the scan.



          This is what you are seeing in the execution plan:



          enter image description here



          While, when using the GROUP BY method it is only doing the grouping after data passed through the left join between dbo.Haul and dbo.[Address].



          enter image description here



          How much better it is will depend on the unique value ratio of dbo.Haul. Less unique values means a better outcome for the second execution plan, since the left join has to process less values.



          The other positive result of the second query is that only the uniqueness of ID_DestinationAddress is calculated, not the uniqueness of all the columns as a whole in the group by.



          Again, you should test & validate the results for your query, dataset & indexes. One of the ways to test if you are not familiar with execution plans is setting SET STATISTICS IO, TIME ON; before executing the queries and making these runtime stats more readable by pasting them in a tool such as statisticsparser.




          Testing



          A small test to show what differences in data can do for these queries.



          If the dbo.Haul table does not have many matches with the 5 records returned by the FULLTEXT index filtering, the difference is not so big:



          Group by query plan



          Subquery Query plan



          1000 rows could get filtered earlier, but the execution time is around 15ms for both queries anyway on my machine.



          Now, if I change my data so these 5 records have many more matches with dbo.Haul on the left join:



          The difference between the group by query
          enter image description here



           <QueryTimeStats CpuTime="1564" ElapsedTime="1566" />


          And the Subquery becomes more clear



          enter image description here



          and the stats:



          <QueryTimeStats CpuTime="680" ElapsedTime="690"/>





          share|improve this answer




























          • Thank you very much for all your valuable ideas and tips. The tool and the set statistics command is very helpful. I guess, that I have to wait a while, until there is a significant amount of data in my database. At the moment, there are so few data entries, that the execution time is below 1 ms.

            – André Reichelt
            Oct 14 at 9:41











          • @AndréReichelt No problem glad it helped :). I'll try to replicate your query plan when I have the time

            – Randi Vertongen
            Oct 14 at 9:46












          • @AndréReichelt added a small extra example

            – Randi Vertongen
            Oct 14 at 10:05






          • 1





            @AndréReichelt Nicely spotted, it got cut out for some reason, I re added it, thanks!

            – Randi Vertongen
            Oct 14 at 14:18






          • 1





            @AndréReichelt That is one of the factors to take in account, also how much aggreggation you have on these ID's will matter. It might be that it does not matter in the end for your dataset, but I would prefer the second one nonetheless, just to get that stream aggregate earlier in the execution plan. Happy to help!

            – Randi Vertongen
            Oct 14 at 16:02












          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "182"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );







          André Reichelt is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded
















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f250975%2freplace-long-group-by-list-with-a-subquery%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown


























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          7


















          If you change the left join with dbo.Haul to a subquery, it will calculate these distinct values of ID_DestinationAddress (Stream Aggregate) and Count them (Compute scalar) directly after getting the data from the scan.



          This is what you are seeing in the execution plan:



          enter image description here



          While, when using the GROUP BY method it is only doing the grouping after data passed through the left join between dbo.Haul and dbo.[Address].



          enter image description here



          How much better it is will depend on the unique value ratio of dbo.Haul. Less unique values means a better outcome for the second execution plan, since the left join has to process less values.



          The other positive result of the second query is that only the uniqueness of ID_DestinationAddress is calculated, not the uniqueness of all the columns as a whole in the group by.



          Again, you should test & validate the results for your query, dataset & indexes. One of the ways to test if you are not familiar with execution plans is setting SET STATISTICS IO, TIME ON; before executing the queries and making these runtime stats more readable by pasting them in a tool such as statisticsparser.




          Testing



          A small test to show what differences in data can do for these queries.



          If the dbo.Haul table does not have many matches with the 5 records returned by the FULLTEXT index filtering, the difference is not so big:



          Group by query plan



          Subquery Query plan



          1000 rows could get filtered earlier, but the execution time is around 15ms for both queries anyway on my machine.



          Now, if I change my data so these 5 records have many more matches with dbo.Haul on the left join:



          The difference between the group by query
          enter image description here



           <QueryTimeStats CpuTime="1564" ElapsedTime="1566" />


          And the Subquery becomes more clear



          enter image description here



          and the stats:



          <QueryTimeStats CpuTime="680" ElapsedTime="690"/>





          share|improve this answer




























          • Thank you very much for all your valuable ideas and tips. The tool and the set statistics command is very helpful. I guess, that I have to wait a while, until there is a significant amount of data in my database. At the moment, there are so few data entries, that the execution time is below 1 ms.

            – André Reichelt
            Oct 14 at 9:41











          • @AndréReichelt No problem glad it helped :). I'll try to replicate your query plan when I have the time

            – Randi Vertongen
            Oct 14 at 9:46












          • @AndréReichelt added a small extra example

            – Randi Vertongen
            Oct 14 at 10:05






          • 1





            @AndréReichelt Nicely spotted, it got cut out for some reason, I re added it, thanks!

            – Randi Vertongen
            Oct 14 at 14:18






          • 1





            @AndréReichelt That is one of the factors to take in account, also how much aggreggation you have on these ID's will matter. It might be that it does not matter in the end for your dataset, but I would prefer the second one nonetheless, just to get that stream aggregate earlier in the execution plan. Happy to help!

            – Randi Vertongen
            Oct 14 at 16:02















          7


















          If you change the left join with dbo.Haul to a subquery, it will calculate these distinct values of ID_DestinationAddress (Stream Aggregate) and Count them (Compute scalar) directly after getting the data from the scan.



          This is what you are seeing in the execution plan:



          enter image description here



          While, when using the GROUP BY method it is only doing the grouping after data passed through the left join between dbo.Haul and dbo.[Address].



          enter image description here



          How much better it is will depend on the unique value ratio of dbo.Haul. Less unique values means a better outcome for the second execution plan, since the left join has to process less values.



          The other positive result of the second query is that only the uniqueness of ID_DestinationAddress is calculated, not the uniqueness of all the columns as a whole in the group by.



          Again, you should test & validate the results for your query, dataset & indexes. One of the ways to test if you are not familiar with execution plans is setting SET STATISTICS IO, TIME ON; before executing the queries and making these runtime stats more readable by pasting them in a tool such as statisticsparser.




          Testing



          A small test to show what differences in data can do for these queries.



          If the dbo.Haul table does not have many matches with the 5 records returned by the FULLTEXT index filtering, the difference is not so big:



          Group by query plan



          Subquery Query plan



          1000 rows could get filtered earlier, but the execution time is around 15ms for both queries anyway on my machine.



          Now, if I change my data so these 5 records have many more matches with dbo.Haul on the left join:



          The difference between the group by query
          enter image description here



           <QueryTimeStats CpuTime="1564" ElapsedTime="1566" />


          And the Subquery becomes more clear



          enter image description here



          and the stats:



          <QueryTimeStats CpuTime="680" ElapsedTime="690"/>





          share|improve this answer




























          • Thank you very much for all your valuable ideas and tips. The tool and the set statistics command is very helpful. I guess, that I have to wait a while, until there is a significant amount of data in my database. At the moment, there are so few data entries, that the execution time is below 1 ms.

            – André Reichelt
            Oct 14 at 9:41











          • @AndréReichelt No problem glad it helped :). I'll try to replicate your query plan when I have the time

            – Randi Vertongen
            Oct 14 at 9:46












          • @AndréReichelt added a small extra example

            – Randi Vertongen
            Oct 14 at 10:05






          • 1





            @AndréReichelt Nicely spotted, it got cut out for some reason, I re added it, thanks!

            – Randi Vertongen
            Oct 14 at 14:18






          • 1





            @AndréReichelt That is one of the factors to take in account, also how much aggreggation you have on these ID's will matter. It might be that it does not matter in the end for your dataset, but I would prefer the second one nonetheless, just to get that stream aggregate earlier in the execution plan. Happy to help!

            – Randi Vertongen
            Oct 14 at 16:02













          7














          7










          7









          If you change the left join with dbo.Haul to a subquery, it will calculate these distinct values of ID_DestinationAddress (Stream Aggregate) and Count them (Compute scalar) directly after getting the data from the scan.



          This is what you are seeing in the execution plan:



          enter image description here



          While, when using the GROUP BY method it is only doing the grouping after data passed through the left join between dbo.Haul and dbo.[Address].



          enter image description here



          How much better it is will depend on the unique value ratio of dbo.Haul. Less unique values means a better outcome for the second execution plan, since the left join has to process less values.



          The other positive result of the second query is that only the uniqueness of ID_DestinationAddress is calculated, not the uniqueness of all the columns as a whole in the group by.



          Again, you should test & validate the results for your query, dataset & indexes. One of the ways to test if you are not familiar with execution plans is setting SET STATISTICS IO, TIME ON; before executing the queries and making these runtime stats more readable by pasting them in a tool such as statisticsparser.




          Testing



          A small test to show what differences in data can do for these queries.



          If the dbo.Haul table does not have many matches with the 5 records returned by the FULLTEXT index filtering, the difference is not so big:



          Group by query plan



          Subquery Query plan



          1000 rows could get filtered earlier, but the execution time is around 15ms for both queries anyway on my machine.



          Now, if I change my data so these 5 records have many more matches with dbo.Haul on the left join:



          The difference between the group by query
          enter image description here



           <QueryTimeStats CpuTime="1564" ElapsedTime="1566" />


          And the Subquery becomes more clear



          enter image description here



          and the stats:



          <QueryTimeStats CpuTime="680" ElapsedTime="690"/>





          share|improve this answer
















          If you change the left join with dbo.Haul to a subquery, it will calculate these distinct values of ID_DestinationAddress (Stream Aggregate) and Count them (Compute scalar) directly after getting the data from the scan.



          This is what you are seeing in the execution plan:



          enter image description here



          While, when using the GROUP BY method it is only doing the grouping after data passed through the left join between dbo.Haul and dbo.[Address].



          enter image description here



          How much better it is will depend on the unique value ratio of dbo.Haul. Less unique values means a better outcome for the second execution plan, since the left join has to process less values.



          The other positive result of the second query is that only the uniqueness of ID_DestinationAddress is calculated, not the uniqueness of all the columns as a whole in the group by.



          Again, you should test & validate the results for your query, dataset & indexes. One of the ways to test if you are not familiar with execution plans is setting SET STATISTICS IO, TIME ON; before executing the queries and making these runtime stats more readable by pasting them in a tool such as statisticsparser.




          Testing



          A small test to show what differences in data can do for these queries.



          If the dbo.Haul table does not have many matches with the 5 records returned by the FULLTEXT index filtering, the difference is not so big:



          Group by query plan



          Subquery Query plan



          1000 rows could get filtered earlier, but the execution time is around 15ms for both queries anyway on my machine.



          Now, if I change my data so these 5 records have many more matches with dbo.Haul on the left join:



          The difference between the group by query
          enter image description here



           <QueryTimeStats CpuTime="1564" ElapsedTime="1566" />


          And the Subquery becomes more clear



          enter image description here



          and the stats:



          <QueryTimeStats CpuTime="680" ElapsedTime="690"/>






          share|improve this answer















          share|improve this answer




          share|improve this answer








          edited Oct 14 at 14:17

























          answered Oct 14 at 7:28









          Randi VertongenRandi Vertongen

          10.5k3 gold badges13 silver badges35 bronze badges




          10.5k3 gold badges13 silver badges35 bronze badges















          • Thank you very much for all your valuable ideas and tips. The tool and the set statistics command is very helpful. I guess, that I have to wait a while, until there is a significant amount of data in my database. At the moment, there are so few data entries, that the execution time is below 1 ms.

            – André Reichelt
            Oct 14 at 9:41











          • @AndréReichelt No problem glad it helped :). I'll try to replicate your query plan when I have the time

            – Randi Vertongen
            Oct 14 at 9:46












          • @AndréReichelt added a small extra example

            – Randi Vertongen
            Oct 14 at 10:05






          • 1





            @AndréReichelt Nicely spotted, it got cut out for some reason, I re added it, thanks!

            – Randi Vertongen
            Oct 14 at 14:18






          • 1





            @AndréReichelt That is one of the factors to take in account, also how much aggreggation you have on these ID's will matter. It might be that it does not matter in the end for your dataset, but I would prefer the second one nonetheless, just to get that stream aggregate earlier in the execution plan. Happy to help!

            – Randi Vertongen
            Oct 14 at 16:02

















          • Thank you very much for all your valuable ideas and tips. The tool and the set statistics command is very helpful. I guess, that I have to wait a while, until there is a significant amount of data in my database. At the moment, there are so few data entries, that the execution time is below 1 ms.

            – André Reichelt
            Oct 14 at 9:41











          • @AndréReichelt No problem glad it helped :). I'll try to replicate your query plan when I have the time

            – Randi Vertongen
            Oct 14 at 9:46












          • @AndréReichelt added a small extra example

            – Randi Vertongen
            Oct 14 at 10:05






          • 1





            @AndréReichelt Nicely spotted, it got cut out for some reason, I re added it, thanks!

            – Randi Vertongen
            Oct 14 at 14:18






          • 1





            @AndréReichelt That is one of the factors to take in account, also how much aggreggation you have on these ID's will matter. It might be that it does not matter in the end for your dataset, but I would prefer the second one nonetheless, just to get that stream aggregate earlier in the execution plan. Happy to help!

            – Randi Vertongen
            Oct 14 at 16:02
















          Thank you very much for all your valuable ideas and tips. The tool and the set statistics command is very helpful. I guess, that I have to wait a while, until there is a significant amount of data in my database. At the moment, there are so few data entries, that the execution time is below 1 ms.

          – André Reichelt
          Oct 14 at 9:41





          Thank you very much for all your valuable ideas and tips. The tool and the set statistics command is very helpful. I guess, that I have to wait a while, until there is a significant amount of data in my database. At the moment, there are so few data entries, that the execution time is below 1 ms.

          – André Reichelt
          Oct 14 at 9:41













          @AndréReichelt No problem glad it helped :). I'll try to replicate your query plan when I have the time

          – Randi Vertongen
          Oct 14 at 9:46






          @AndréReichelt No problem glad it helped :). I'll try to replicate your query plan when I have the time

          – Randi Vertongen
          Oct 14 at 9:46














          @AndréReichelt added a small extra example

          – Randi Vertongen
          Oct 14 at 10:05





          @AndréReichelt added a small extra example

          – Randi Vertongen
          Oct 14 at 10:05




          1




          1





          @AndréReichelt Nicely spotted, it got cut out for some reason, I re added it, thanks!

          – Randi Vertongen
          Oct 14 at 14:18





          @AndréReichelt Nicely spotted, it got cut out for some reason, I re added it, thanks!

          – Randi Vertongen
          Oct 14 at 14:18




          1




          1





          @AndréReichelt That is one of the factors to take in account, also how much aggreggation you have on these ID's will matter. It might be that it does not matter in the end for your dataset, but I would prefer the second one nonetheless, just to get that stream aggregate earlier in the execution plan. Happy to help!

          – Randi Vertongen
          Oct 14 at 16:02





          @AndréReichelt That is one of the factors to take in account, also how much aggreggation you have on these ID's will matter. It might be that it does not matter in the end for your dataset, but I would prefer the second one nonetheless, just to get that stream aggregate earlier in the execution plan. Happy to help!

          – Randi Vertongen
          Oct 14 at 16:02











          André Reichelt is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded

















          André Reichelt is a new contributor. Be nice, and check out our Code of Conduct.












          André Reichelt is a new contributor. Be nice, and check out our Code of Conduct.











          André Reichelt is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Database Administrators Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f250975%2freplace-long-group-by-list-with-a-subquery%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown









          Popular posts from this blog

          Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

          Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

          Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її