Index Uniqueness OverheadWhy is Clustered Index on Primary Key compulsory?Why index REBUILD does not reduce index fragmentatation with a page count > 1000?Unable to drop non-PK index because it is referenced in a foreign key constraintAdding index to large mysql tablesShould the index on an identity column be nonclustered?SQL Server equivalent of Oracle USING INDEX clauseShould I remove this clustered index?Ordering of table is based on Clustered index or non clustered primary key?Nonclustered index storage on clustered columnstoreClustered Index Maintenance vs. Nonclustered Index Maintenance
Can you perfectly wrap a cube with this blocky shape?
What are some symbols representing peasants/oppressed persons fighting back?
Draw a line nicely around notes
Is there any way for an Adventurers League, 5th level Wizard, to gain heavy armor proficiency?
What are "full piece" and "half piece" in chess?
How to change checkbox react correctly?
Doing research in academia and not liking competition
Video editor for YouTube
A scene of Jimmy diversity
What are the arguments for California’s nonpartisan blanket primaries?
Accidentally deleted python and yum is not working in centos7
Are the errors in this formulation of the simple linear regression model random variables?
Why do legislative committees exist?
Is there an English equivalent for "Les carottes sont cuites", while keeping the vegetable reference?
Can a polymorphed creature understand languages spoken under the effect of Tongues?
What do mathematicians mean when they say some conjecture can’t be proven using the current technology?
Why do candidates not quit if they no longer have a realistic chance to win in the 2020 US presidents election
Why did Spider-Man take a detour to Dorset?
What is this old "lemon-squeezer" shaped pan
Cisco 3750G - SSH Diffie Hellman Group 1 SHA1 Error
Do First Order blasters maintain a record of when they were fired?
What alternatives exist to at-will employment?
line break after the word "proof" in proof environment
If I stood next to a piece of metal heated to a million degrees, but in a perfect vacuum, would I feel hot?
Index Uniqueness Overhead
Why is Clustered Index on Primary Key compulsory?Why index REBUILD does not reduce index fragmentatation with a page count > 1000?Unable to drop non-PK index because it is referenced in a foreign key constraintAdding index to large mysql tablesShould the index on an identity column be nonclustered?SQL Server equivalent of Oracle USING INDEX clauseShould I remove this clustered index?Ordering of table is based on Clustered index or non clustered primary key?Nonclustered index storage on clustered columnstoreClustered Index Maintenance vs. Nonclustered Index Maintenance
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.
Background
I have previously read a discussion that stated a Unique
index is no additional cost to maintain, since an Insert
operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique
index has no additional cost.
My coworker combats this statement by saying that Unique
is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.
At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.
Question
Does uniqueness have an additional cost on the back-end of an Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Example Table Definition
create table #test_index
(
id int not null identity(1, 1),
dt datetime not null default(current_timestamp),
val varchar(100) not null,
is_deleted bit not null default(0),
primary key nonclustered(id desc),
unique clustered(dt desc, id desc)
);
create index
[nonunique_nonclustered_example]
on #test_index
(is_deleted)
include
(val);
create unique index
[unique_nonclustered_example]
on #test_index
(is_deleted, dt desc, id desc)
include
(val);
Example
An example of why I would add the Unique
key to the end of an index is in one of our fact tables. There is a Primary Key
that is an Identity
column. However, the Clustered Index
is instead the partitioning scheme column, followed by three foreign key dimensions with no uniqueness. Select performance on this table is abysmal, and I frequently get better seek times using the Primary Key
with a key lookup rather than leveraging the Clustered Index
. Other tables that follow a similar design, but have the Primary Key
appended to the end have considerably better performance.
-- date_int is equivalent to convert(int, convert(varchar, current_timestamp, 112))
if not exists(select * from sys.partition_functions where [name] = N'pf_date_int')
create partition function
pf_date_int (int)
as range right for values
(19000101, 20180101, 20180401, 20180701, 20181001, 20190101, 20190401, 20190701);
go
if not exists(select * from sys.partition_schemes where [name] = N'ps_date_int')
create partition scheme
ps_date_int
as partition
pf_date_int all
to
([PRIMARY]);
go
if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.bad_fact_table'))
create table dbo.bad_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
fk_id int not null,
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_bad_fact_table] clustered (date_int, group_id, group_entity_id, fk_id)
)
on ps_date_int(date_int);
go
if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.better_fact_table'))
create table dbo.better_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_better_fact_table] clustered(date_int, group_id, group_entity_id, id)
)
on ps_date_int(date_int);
go
sql-server index-tuning database-internals unique-constraint
New contributor
add a comment |
I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.
Background
I have previously read a discussion that stated a Unique
index is no additional cost to maintain, since an Insert
operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique
index has no additional cost.
My coworker combats this statement by saying that Unique
is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.
At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.
Question
Does uniqueness have an additional cost on the back-end of an Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Example Table Definition
create table #test_index
(
id int not null identity(1, 1),
dt datetime not null default(current_timestamp),
val varchar(100) not null,
is_deleted bit not null default(0),
primary key nonclustered(id desc),
unique clustered(dt desc, id desc)
);
create index
[nonunique_nonclustered_example]
on #test_index
(is_deleted)
include
(val);
create unique index
[unique_nonclustered_example]
on #test_index
(is_deleted, dt desc, id desc)
include
(val);
Example
An example of why I would add the Unique
key to the end of an index is in one of our fact tables. There is a Primary Key
that is an Identity
column. However, the Clustered Index
is instead the partitioning scheme column, followed by three foreign key dimensions with no uniqueness. Select performance on this table is abysmal, and I frequently get better seek times using the Primary Key
with a key lookup rather than leveraging the Clustered Index
. Other tables that follow a similar design, but have the Primary Key
appended to the end have considerably better performance.
-- date_int is equivalent to convert(int, convert(varchar, current_timestamp, 112))
if not exists(select * from sys.partition_functions where [name] = N'pf_date_int')
create partition function
pf_date_int (int)
as range right for values
(19000101, 20180101, 20180401, 20180701, 20181001, 20190101, 20190401, 20190701);
go
if not exists(select * from sys.partition_schemes where [name] = N'ps_date_int')
create partition scheme
ps_date_int
as partition
pf_date_int all
to
([PRIMARY]);
go
if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.bad_fact_table'))
create table dbo.bad_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
fk_id int not null,
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_bad_fact_table] clustered (date_int, group_id, group_entity_id, fk_id)
)
on ps_date_int(date_int);
go
if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.better_fact_table'))
create table dbo.better_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_better_fact_table] clustered(date_int, group_id, group_entity_id, id)
)
on ps_date_int(date_int);
go
sql-server index-tuning database-internals unique-constraint
New contributor
add a comment |
I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.
Background
I have previously read a discussion that stated a Unique
index is no additional cost to maintain, since an Insert
operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique
index has no additional cost.
My coworker combats this statement by saying that Unique
is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.
At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.
Question
Does uniqueness have an additional cost on the back-end of an Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Example Table Definition
create table #test_index
(
id int not null identity(1, 1),
dt datetime not null default(current_timestamp),
val varchar(100) not null,
is_deleted bit not null default(0),
primary key nonclustered(id desc),
unique clustered(dt desc, id desc)
);
create index
[nonunique_nonclustered_example]
on #test_index
(is_deleted)
include
(val);
create unique index
[unique_nonclustered_example]
on #test_index
(is_deleted, dt desc, id desc)
include
(val);
Example
An example of why I would add the Unique
key to the end of an index is in one of our fact tables. There is a Primary Key
that is an Identity
column. However, the Clustered Index
is instead the partitioning scheme column, followed by three foreign key dimensions with no uniqueness. Select performance on this table is abysmal, and I frequently get better seek times using the Primary Key
with a key lookup rather than leveraging the Clustered Index
. Other tables that follow a similar design, but have the Primary Key
appended to the end have considerably better performance.
-- date_int is equivalent to convert(int, convert(varchar, current_timestamp, 112))
if not exists(select * from sys.partition_functions where [name] = N'pf_date_int')
create partition function
pf_date_int (int)
as range right for values
(19000101, 20180101, 20180401, 20180701, 20181001, 20190101, 20190401, 20190701);
go
if not exists(select * from sys.partition_schemes where [name] = N'ps_date_int')
create partition scheme
ps_date_int
as partition
pf_date_int all
to
([PRIMARY]);
go
if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.bad_fact_table'))
create table dbo.bad_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
fk_id int not null,
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_bad_fact_table] clustered (date_int, group_id, group_entity_id, fk_id)
)
on ps_date_int(date_int);
go
if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.better_fact_table'))
create table dbo.better_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_better_fact_table] clustered(date_int, group_id, group_entity_id, id)
)
on ps_date_int(date_int);
go
sql-server index-tuning database-internals unique-constraint
New contributor
I've been having an ongoing debate with various developers in my office on the cost of an index, and whether or not uniqueness is beneficial or costly (probably both). The crux of the issue is our competing resources.
Background
I have previously read a discussion that stated a Unique
index is no additional cost to maintain, since an Insert
operation implicitly checks for where it fits into the B-tree, and, if a duplicate is found in a non-unique index, appends a uniquifier to the end of the key, but otherwise inserts directly. In this sequence of events, a Unique
index has no additional cost.
My coworker combats this statement by saying that Unique
is enforced as a second operation after the seek to the new position in the B-tree, and thus is more costly to maintain than a non-unique index.
At worst, I have seen tables with an identity column (inherently unique) that is the clustering key of the table, but explicitly stated as non-unique. On the other side of worst is my obsession with uniqueness, and all indexes are created as unique, and when not possible to define an explicitly unique relation to an index, I append the PK of the table to the end of the index to ensure the uniqueness is guaranteed.
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow. Yes, every index should be evaluated, but when you have five servers with thousands of tables each and as many as twenty indexes on a table, you need to be able to apply some simple rules to ensure a certain level of quality.
Question
Does uniqueness have an additional cost on the back-end of an Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Example Table Definition
create table #test_index
(
id int not null identity(1, 1),
dt datetime not null default(current_timestamp),
val varchar(100) not null,
is_deleted bit not null default(0),
primary key nonclustered(id desc),
unique clustered(dt desc, id desc)
);
create index
[nonunique_nonclustered_example]
on #test_index
(is_deleted)
include
(val);
create unique index
[unique_nonclustered_example]
on #test_index
(is_deleted, dt desc, id desc)
include
(val);
Example
An example of why I would add the Unique
key to the end of an index is in one of our fact tables. There is a Primary Key
that is an Identity
column. However, the Clustered Index
is instead the partitioning scheme column, followed by three foreign key dimensions with no uniqueness. Select performance on this table is abysmal, and I frequently get better seek times using the Primary Key
with a key lookup rather than leveraging the Clustered Index
. Other tables that follow a similar design, but have the Primary Key
appended to the end have considerably better performance.
-- date_int is equivalent to convert(int, convert(varchar, current_timestamp, 112))
if not exists(select * from sys.partition_functions where [name] = N'pf_date_int')
create partition function
pf_date_int (int)
as range right for values
(19000101, 20180101, 20180401, 20180701, 20181001, 20190101, 20190401, 20190701);
go
if not exists(select * from sys.partition_schemes where [name] = N'ps_date_int')
create partition scheme
ps_date_int
as partition
pf_date_int all
to
([PRIMARY]);
go
if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.bad_fact_table'))
create table dbo.bad_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
fk_id int not null,
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_bad_fact_table] clustered (date_int, group_id, group_entity_id, fk_id)
)
on ps_date_int(date_int);
go
if not exists(select * from sys.objects where [object_id] = OBJECT_ID(N'dbo.better_fact_table'))
create table dbo.better_fact_table
(
id int not null, -- Identity implemented elsewhere, and CDC populates
date_int int not null,
dt date not null,
group_id int not null,
group_entity_id int not null, -- member of group
-- tons of other columns
primary key nonclustered(id, date_int),
index [ci_better_fact_table] clustered(date_int, group_id, group_entity_id, id)
)
on ps_date_int(date_int);
go
sql-server index-tuning database-internals unique-constraint
sql-server index-tuning database-internals unique-constraint
New contributor
New contributor
edited 5 hours ago
Paul White♦
57.6k15 gold badges303 silver badges475 bronze badges
57.6k15 gold badges303 silver badges475 bronze badges
New contributor
asked 8 hours ago
SolonotixSolonotix
434 bronze badges
434 bronze badges
New contributor
New contributor
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.
The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.
Does uniqueness have an additional cost on the back-end of an
Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?
Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?
In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE
modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.
As David Browne mentioned in a comment:
Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.
Take the following minimally complete and verifiable example:
USE tempdb;
DROP TABLE IF EXISTS dbo.IndexTest;
CREATE TABLE dbo.IndexTest
(
id int NOT NULL
CONSTRAINT IndexTest_pk
PRIMARY KEY
CLUSTERED
IDENTITY(1,1)
, rowDate datetime NOT NULL
);
I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:
CREATE INDEX IndexTest_rowDate_ix01
ON dbo.IndexTest(rowDate);
CREATE UNIQUE INDEX IndexTest_rowDate_ix02
ON dbo.IndexTest(rowDate, id);
Next, we'll several rows to the table:
INSERT INTO dbo.IndexTest (rowDate)
VALUES (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 1, GETDATE()))
, (DATEADD(SECOND, 2, GETDATE()));
As you can see above, three rows contain the same value for the rowDate
column, and two rows contain unique values.
Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE
command:
DECLARE @dbid int = DB_ID();
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @indexid int;
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix01'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix02'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.
You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.
There are several excellent resources in the Interwebz about this topic, including:
- Where Clustered Indexes Dare
- SQL Server and Binary Search
- Performance Benefits of Unique Indexes
FYI, the mere presence of an identity
column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON;
statement will allow you to insert to non-unique values into a column defined as identity
.
1
@MaxVernon thanks for all the details. I'm sure plenty of other people have wondered about this kind of thing, and may not know how to research it, or even how to phrase it properly. The additional information links are also greatly appreciated as well.
– Solonotix
5 hours ago
add a comment |
Just an add-on to Max's excellent answer.
When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.
This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).
When this happens, you will receive this error:
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID (someID).
Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows. So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.
I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like howCase
andIf
structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for aNonclustered Index
or if the clustering key isUnique
then there isn't a problem forNonclustered
indexes?
– Solonotix
5 hours ago
A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…
– Chessbrain
5 hours ago
1
@Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.
– Max Vernon
4 hours ago
For a non-clustered index on a heap, each row in a unique index must be truly unique.
– Max Vernon
4 hours ago
@DavidBrowne-Microsoft Thank you for the correction!
– Chessbrain
2 hours ago
add a comment |
I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design
- dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().
- create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than
WHERE is_deleted = 0
) and look at using a filtered index. I would even consider using 2 filtered indexes, one forwhere is_deleted = 0
and the other forwhere is_deleted = 1
Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.
add a comment |
You don't say how you are going to create your unique ID, All too many developers decide to use SELECT NEWID()
then of course because it is the one unquestionably unique field in the row, they use it in a primary key of an index. Sometimes as the single PK in a clustered index on a huge table, with no fill factor space.
GUID is a 16 byte binary SQL Server data type that is globally unique across tables, databases, and servers. The term GUID stands for Globally Unique Identifier and it is used interchangeably with UNIQUEIDENTIFIER. Source
Of course you do index maintenance at least weekly, probably using the solution by Ola Hallengren The GUID is unique but it will not fit nicely in the clustered index, so your index job spends hours sorting rows every week.
As Max says in their answer adding things add overhead it is simply impossible for that not to be case.
You should only add to your indexes and/or database where you have have justifiabel business reason for it. If you absolutely must add a GUID consider NEWSEQUENTIALID()
in many cases this can lower the overhead, but not eliminate it.
The tables I provided in the sample do not utilize theuniqueidentifier
data type, and that's because most of our tables do not use it either, in part because of all the known issues with it. I used it once for a project, and later came to realize why it was a poor choice, and have since moved to timestamps (of typedatetime
) in that particular use case, since the pruning process was using adatetime
field anyway.
– Solonotix
7 hours ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Solonotix is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f242908%2findex-uniqueness-overhead%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.
The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.
Does uniqueness have an additional cost on the back-end of an
Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?
Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?
In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE
modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.
As David Browne mentioned in a comment:
Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.
Take the following minimally complete and verifiable example:
USE tempdb;
DROP TABLE IF EXISTS dbo.IndexTest;
CREATE TABLE dbo.IndexTest
(
id int NOT NULL
CONSTRAINT IndexTest_pk
PRIMARY KEY
CLUSTERED
IDENTITY(1,1)
, rowDate datetime NOT NULL
);
I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:
CREATE INDEX IndexTest_rowDate_ix01
ON dbo.IndexTest(rowDate);
CREATE UNIQUE INDEX IndexTest_rowDate_ix02
ON dbo.IndexTest(rowDate, id);
Next, we'll several rows to the table:
INSERT INTO dbo.IndexTest (rowDate)
VALUES (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 1, GETDATE()))
, (DATEADD(SECOND, 2, GETDATE()));
As you can see above, three rows contain the same value for the rowDate
column, and two rows contain unique values.
Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE
command:
DECLARE @dbid int = DB_ID();
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @indexid int;
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix01'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix02'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.
You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.
There are several excellent resources in the Interwebz about this topic, including:
- Where Clustered Indexes Dare
- SQL Server and Binary Search
- Performance Benefits of Unique Indexes
FYI, the mere presence of an identity
column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON;
statement will allow you to insert to non-unique values into a column defined as identity
.
1
@MaxVernon thanks for all the details. I'm sure plenty of other people have wondered about this kind of thing, and may not know how to research it, or even how to phrase it properly. The additional information links are also greatly appreciated as well.
– Solonotix
5 hours ago
add a comment |
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.
The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.
Does uniqueness have an additional cost on the back-end of an
Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?
Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?
In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE
modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.
As David Browne mentioned in a comment:
Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.
Take the following minimally complete and verifiable example:
USE tempdb;
DROP TABLE IF EXISTS dbo.IndexTest;
CREATE TABLE dbo.IndexTest
(
id int NOT NULL
CONSTRAINT IndexTest_pk
PRIMARY KEY
CLUSTERED
IDENTITY(1,1)
, rowDate datetime NOT NULL
);
I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:
CREATE INDEX IndexTest_rowDate_ix01
ON dbo.IndexTest(rowDate);
CREATE UNIQUE INDEX IndexTest_rowDate_ix02
ON dbo.IndexTest(rowDate, id);
Next, we'll several rows to the table:
INSERT INTO dbo.IndexTest (rowDate)
VALUES (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 1, GETDATE()))
, (DATEADD(SECOND, 2, GETDATE()));
As you can see above, three rows contain the same value for the rowDate
column, and two rows contain unique values.
Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE
command:
DECLARE @dbid int = DB_ID();
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @indexid int;
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix01'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix02'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.
You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.
There are several excellent resources in the Interwebz about this topic, including:
- Where Clustered Indexes Dare
- SQL Server and Binary Search
- Performance Benefits of Unique Indexes
FYI, the mere presence of an identity
column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON;
statement will allow you to insert to non-unique values into a column defined as identity
.
1
@MaxVernon thanks for all the details. I'm sure plenty of other people have wondered about this kind of thing, and may not know how to research it, or even how to phrase it properly. The additional information links are also greatly appreciated as well.
– Solonotix
5 hours ago
add a comment |
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.
The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.
Does uniqueness have an additional cost on the back-end of an
Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?
Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?
In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE
modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.
As David Browne mentioned in a comment:
Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.
Take the following minimally complete and verifiable example:
USE tempdb;
DROP TABLE IF EXISTS dbo.IndexTest;
CREATE TABLE dbo.IndexTest
(
id int NOT NULL
CONSTRAINT IndexTest_pk
PRIMARY KEY
CLUSTERED
IDENTITY(1,1)
, rowDate datetime NOT NULL
);
I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:
CREATE INDEX IndexTest_rowDate_ix01
ON dbo.IndexTest(rowDate);
CREATE UNIQUE INDEX IndexTest_rowDate_ix02
ON dbo.IndexTest(rowDate, id);
Next, we'll several rows to the table:
INSERT INTO dbo.IndexTest (rowDate)
VALUES (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 1, GETDATE()))
, (DATEADD(SECOND, 2, GETDATE()));
As you can see above, three rows contain the same value for the rowDate
column, and two rows contain unique values.
Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE
command:
DECLARE @dbid int = DB_ID();
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @indexid int;
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix01'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix02'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.
You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.
There are several excellent resources in the Interwebz about this topic, including:
- Where Clustered Indexes Dare
- SQL Server and Binary Search
- Performance Benefits of Unique Indexes
FYI, the mere presence of an identity
column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON;
statement will allow you to insert to non-unique values into a column defined as identity
.
I'm frequently involved in code reviews for the dev team, and I need to be able to give general guidelines for them to follow.
The environment I'm currently involved in has 250 servers with 2500 databases. I've worked on systems with 30,000 databases. Guidelines for indexing should revolve around the naming convention, etc, not be "rules" for what columns to include in an index - every individual index should be engineered to be the correct index for that specific business rule or code touching the table.
Does uniqueness have an additional cost on the back-end of an
Insert
compared to the cost of maintaining a non-unique index? Secondly, what is wrong with appending the Primary Key of a table to the end of an index to ensure uniqueness?
Adding the primary key column to the end of a non-unique index to make it unique looks to me to be an anti-pattern. If business rules dictate the data should be unique, then add a unique constraint to the column; which will automatically create a unique index. If you're indexing a column for performance, why would you add a column to the index?
Even if your supposition that enforcing uniqueness doesn't add any extra overhead is correct (which it isn't for certain cases), what are you solving by needlessly complicating the index?
In the specific instance of adding the primary key to the end of your index key so that you can make the index definition include the UNIQUE
modifier, it actually makes zero difference to the physical index structure on disk. This is due to the nature of the structure of B-tree indexes keys, in that they always need to be unique.
As David Browne mentioned in a comment:
Since every nonclustered index is stored as unique index, there is no extra cost in inserting into a unique index. In fact the only extra cost would in failing to declare a candidate key as a unique index, which would cause the clustered index keys to be appended to the index keys.
Take the following minimally complete and verifiable example:
USE tempdb;
DROP TABLE IF EXISTS dbo.IndexTest;
CREATE TABLE dbo.IndexTest
(
id int NOT NULL
CONSTRAINT IndexTest_pk
PRIMARY KEY
CLUSTERED
IDENTITY(1,1)
, rowDate datetime NOT NULL
);
I'll add two indexes that are identical except for the addition of the primary key at the tail end of the second indexes key definition:
CREATE INDEX IndexTest_rowDate_ix01
ON dbo.IndexTest(rowDate);
CREATE UNIQUE INDEX IndexTest_rowDate_ix02
ON dbo.IndexTest(rowDate, id);
Next, we'll several rows to the table:
INSERT INTO dbo.IndexTest (rowDate)
VALUES (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 0, GETDATE()))
, (DATEADD(SECOND, 1, GETDATE()))
, (DATEADD(SECOND, 2, GETDATE()));
As you can see above, three rows contain the same value for the rowDate
column, and two rows contain unique values.
Next, we'll look at the physical page structures for each index, using the undocumented DBCC PAGE
command:
DECLARE @dbid int = DB_ID();
DECLARE @fileid int;
DECLARE @pageid int;
DECLARE @indexid int;
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix01'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix01 *****************************************';
SELECT @fileid = ddpa.allocated_page_file_id
, @pageid = ddpa.allocated_page_page_id
FROM sys.indexes i
CROSS APPLY sys.dm_db_database_page_allocations(DB_ID(), i.object_id, i.index_id, NULL, 'LIMITED') ddpa
WHERE i.name = N'IndexTest_rowDate_ix02'
AND ddpa.is_allocated = 1
AND ddpa.is_iam_page = 0;
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
DBCC TRACEON(3604);
DBCC PAGE (@dbid, @fileid, @pageid, 1);
DBCC TRACEON(3604);
PRINT N'*************************************** IndexTest_rowDate_ix02 *****************************************';
I've looked at the output using Beyond Compare, and except for obvious differences around the allocation page IDs, etc, the two index structures are identical.
You might take the above to mean that including the primary key in every index, and defining at as unique is A Good Thing™ since that's what happens under-the-covers anyway. I wouldn't make that assumption, and would suggest only defining an index as unique if in fact the natural data in the index is unique already.
There are several excellent resources in the Interwebz about this topic, including:
- Where Clustered Indexes Dare
- SQL Server and Binary Search
- Performance Benefits of Unique Indexes
FYI, the mere presence of an identity
column does not guarantee uniqueness. You need to define the column as a primary key or with a unique constraint to ensure the values stored in that column are in fact unique. The SET IDENTITY_INSERT schema.table ON;
statement will allow you to insert to non-unique values into a column defined as identity
.
edited 5 hours ago
Paul White♦
57.6k15 gold badges303 silver badges475 bronze badges
57.6k15 gold badges303 silver badges475 bronze badges
answered 8 hours ago
Max VernonMax Vernon
54.4k13 gold badges118 silver badges244 bronze badges
54.4k13 gold badges118 silver badges244 bronze badges
1
@MaxVernon thanks for all the details. I'm sure plenty of other people have wondered about this kind of thing, and may not know how to research it, or even how to phrase it properly. The additional information links are also greatly appreciated as well.
– Solonotix
5 hours ago
add a comment |
1
@MaxVernon thanks for all the details. I'm sure plenty of other people have wondered about this kind of thing, and may not know how to research it, or even how to phrase it properly. The additional information links are also greatly appreciated as well.
– Solonotix
5 hours ago
1
1
@MaxVernon thanks for all the details. I'm sure plenty of other people have wondered about this kind of thing, and may not know how to research it, or even how to phrase it properly. The additional information links are also greatly appreciated as well.
– Solonotix
5 hours ago
@MaxVernon thanks for all the details. I'm sure plenty of other people have wondered about this kind of thing, and may not know how to research it, or even how to phrase it properly. The additional information links are also greatly appreciated as well.
– Solonotix
5 hours ago
add a comment |
Just an add-on to Max's excellent answer.
When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.
This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).
When this happens, you will receive this error:
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID (someID).
Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows. So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.
I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like howCase
andIf
structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for aNonclustered Index
or if the clustering key isUnique
then there isn't a problem forNonclustered
indexes?
– Solonotix
5 hours ago
A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…
– Chessbrain
5 hours ago
1
@Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.
– Max Vernon
4 hours ago
For a non-clustered index on a heap, each row in a unique index must be truly unique.
– Max Vernon
4 hours ago
@DavidBrowne-Microsoft Thank you for the correction!
– Chessbrain
2 hours ago
add a comment |
Just an add-on to Max's excellent answer.
When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.
This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).
When this happens, you will receive this error:
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID (someID).
Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows. So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.
I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like howCase
andIf
structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for aNonclustered Index
or if the clustering key isUnique
then there isn't a problem forNonclustered
indexes?
– Solonotix
5 hours ago
A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…
– Chessbrain
5 hours ago
1
@Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.
– Max Vernon
4 hours ago
For a non-clustered index on a heap, each row in a unique index must be truly unique.
– Max Vernon
4 hours ago
@DavidBrowne-Microsoft Thank you for the correction!
– Chessbrain
2 hours ago
add a comment |
Just an add-on to Max's excellent answer.
When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.
This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).
When this happens, you will receive this error:
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID (someID).
Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows. So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.
Just an add-on to Max's excellent answer.
When it comes to creating a non unique clustered index, SQL Server creates something called a Uniquifier in the background anyways.
This Uniquifier could cause potential problems in the future if your platform has a lot of CRUD operations, since this Uniquifier is only 4 bytes big (a basic 32bit integer). So, if your system has a lot of CRUD operations it's possible you will use up all the available unique numbers and all of a sudden you will receive an error and it won't allow you to insert anymore data into your tables (because it will no longer have any unique values to assign to your newly inserted rows).
When this happens, you will receive this error:
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID (someID).
Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
Error 666 (the above error) occurs when the uniquifier for a single set of non-unique keys consumes more than 2,147,483,647 rows. So, you'll need to have either ~2 billion rows for a single key value, or you'll need to have modified a single key value ~2 billion times to see this error. As such, it's not extremely likely you'll run into this limitation.
edited 2 hours ago
answered 5 hours ago
ChessbrainChessbrain
2007 bronze badges
2007 bronze badges
I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like howCase
andIf
structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for aNonclustered Index
or if the clustering key isUnique
then there isn't a problem forNonclustered
indexes?
– Solonotix
5 hours ago
A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…
– Chessbrain
5 hours ago
1
@Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.
– Max Vernon
4 hours ago
For a non-clustered index on a heap, each row in a unique index must be truly unique.
– Max Vernon
4 hours ago
@DavidBrowne-Microsoft Thank you for the correction!
– Chessbrain
2 hours ago
add a comment |
I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like howCase
andIf
structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for aNonclustered Index
or if the clustering key isUnique
then there isn't a problem forNonclustered
indexes?
– Solonotix
5 hours ago
A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…
– Chessbrain
5 hours ago
1
@Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.
– Max Vernon
4 hours ago
For a non-clustered index on a heap, each row in a unique index must be truly unique.
– Max Vernon
4 hours ago
@DavidBrowne-Microsoft Thank you for the correction!
– Chessbrain
2 hours ago
I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how
Case
and If
structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index
or if the clustering key is Unique
then there isn't a problem for Nonclustered
indexes?– Solonotix
5 hours ago
I had no idea that the hidden uniquifier could run out off key space, but I guess all things are limited in some case. Much like how
Case
and If
structures are limited to 10 levels, it makes sense that there is also a limit to resolving non-unique entities. By your statement, this sounds like it only applies to cases when the clustering key is non-unique. Is this a problem for a Nonclustered Index
or if the clustering key is Unique
then there isn't a problem for Nonclustered
indexes?– Solonotix
5 hours ago
A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…
– Chessbrain
5 hours ago
A Unique index is (as far as I know) limited by the size of the column type (so if it's a BIGINT type, you have 8bytes to work with). Also, according to microsoft's official documentation, there is a maximum of 900bytes allowed for a clustered index and 1700bytes for non clustered (since you can have more than one non-clustered index and only 1 clustered index per table).docs.microsoft.com/en-us/sql/sql-server/…
– Chessbrain
5 hours ago
1
1
@Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.
– Max Vernon
4 hours ago
@Solonotix - the uniquifier from the clustered index is used in the non-clustered indexes. If you run the code in my example without the primary key (create a clustered index instead), you can see the output is the same for both the non-unique and the unique indexes.
– Max Vernon
4 hours ago
For a non-clustered index on a heap, each row in a unique index must be truly unique.
– Max Vernon
4 hours ago
For a non-clustered index on a heap, each row in a unique index must be truly unique.
– Max Vernon
4 hours ago
@DavidBrowne-Microsoft Thank you for the correction!
– Chessbrain
2 hours ago
@DavidBrowne-Microsoft Thank you for the correction!
– Chessbrain
2 hours ago
add a comment |
I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design
- dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().
- create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than
WHERE is_deleted = 0
) and look at using a filtered index. I would even consider using 2 filtered indexes, one forwhere is_deleted = 0
and the other forwhere is_deleted = 1
Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.
add a comment |
I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design
- dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().
- create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than
WHERE is_deleted = 0
) and look at using a filtered index. I would even consider using 2 filtered indexes, one forwhere is_deleted = 0
and the other forwhere is_deleted = 1
Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.
add a comment |
I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design
- dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().
- create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than
WHERE is_deleted = 0
) and look at using a filtered index. I would even consider using 2 filtered indexes, one forwhere is_deleted = 0
and the other forwhere is_deleted = 1
Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.
I'm not going to weigh in on the question of whether an index should be unique or not, and whether there's more overhead in this approach or that. But a couple of things bothered me in your general design
- dt datetime not null default(current_timestamp). Datetime is an older form or this, and you may be able to achieve at least some space savings by using datetime2() and sysdatetime().
- create index [nonunique_nonclustered_example] on #test_index (is_deleted) include (val). This bothers me. Take a look at how the data is to be accessed (I'm betting there's more than
WHERE is_deleted = 0
) and look at using a filtered index. I would even consider using 2 filtered indexes, one forwhere is_deleted = 0
and the other forwhere is_deleted = 1
Fundamentally this looks more like a coding exercise designed to test a hypothesis rather than a real problem / solution, but those two patterns are definitely something I look for in code reviews.
answered 1 hour ago
TobyToby
1535 bronze badges
1535 bronze badges
add a comment |
add a comment |
You don't say how you are going to create your unique ID, All too many developers decide to use SELECT NEWID()
then of course because it is the one unquestionably unique field in the row, they use it in a primary key of an index. Sometimes as the single PK in a clustered index on a huge table, with no fill factor space.
GUID is a 16 byte binary SQL Server data type that is globally unique across tables, databases, and servers. The term GUID stands for Globally Unique Identifier and it is used interchangeably with UNIQUEIDENTIFIER. Source
Of course you do index maintenance at least weekly, probably using the solution by Ola Hallengren The GUID is unique but it will not fit nicely in the clustered index, so your index job spends hours sorting rows every week.
As Max says in their answer adding things add overhead it is simply impossible for that not to be case.
You should only add to your indexes and/or database where you have have justifiabel business reason for it. If you absolutely must add a GUID consider NEWSEQUENTIALID()
in many cases this can lower the overhead, but not eliminate it.
The tables I provided in the sample do not utilize theuniqueidentifier
data type, and that's because most of our tables do not use it either, in part because of all the known issues with it. I used it once for a project, and later came to realize why it was a poor choice, and have since moved to timestamps (of typedatetime
) in that particular use case, since the pruning process was using adatetime
field anyway.
– Solonotix
7 hours ago
add a comment |
You don't say how you are going to create your unique ID, All too many developers decide to use SELECT NEWID()
then of course because it is the one unquestionably unique field in the row, they use it in a primary key of an index. Sometimes as the single PK in a clustered index on a huge table, with no fill factor space.
GUID is a 16 byte binary SQL Server data type that is globally unique across tables, databases, and servers. The term GUID stands for Globally Unique Identifier and it is used interchangeably with UNIQUEIDENTIFIER. Source
Of course you do index maintenance at least weekly, probably using the solution by Ola Hallengren The GUID is unique but it will not fit nicely in the clustered index, so your index job spends hours sorting rows every week.
As Max says in their answer adding things add overhead it is simply impossible for that not to be case.
You should only add to your indexes and/or database where you have have justifiabel business reason for it. If you absolutely must add a GUID consider NEWSEQUENTIALID()
in many cases this can lower the overhead, but not eliminate it.
The tables I provided in the sample do not utilize theuniqueidentifier
data type, and that's because most of our tables do not use it either, in part because of all the known issues with it. I used it once for a project, and later came to realize why it was a poor choice, and have since moved to timestamps (of typedatetime
) in that particular use case, since the pruning process was using adatetime
field anyway.
– Solonotix
7 hours ago
add a comment |
You don't say how you are going to create your unique ID, All too many developers decide to use SELECT NEWID()
then of course because it is the one unquestionably unique field in the row, they use it in a primary key of an index. Sometimes as the single PK in a clustered index on a huge table, with no fill factor space.
GUID is a 16 byte binary SQL Server data type that is globally unique across tables, databases, and servers. The term GUID stands for Globally Unique Identifier and it is used interchangeably with UNIQUEIDENTIFIER. Source
Of course you do index maintenance at least weekly, probably using the solution by Ola Hallengren The GUID is unique but it will not fit nicely in the clustered index, so your index job spends hours sorting rows every week.
As Max says in their answer adding things add overhead it is simply impossible for that not to be case.
You should only add to your indexes and/or database where you have have justifiabel business reason for it. If you absolutely must add a GUID consider NEWSEQUENTIALID()
in many cases this can lower the overhead, but not eliminate it.
You don't say how you are going to create your unique ID, All too many developers decide to use SELECT NEWID()
then of course because it is the one unquestionably unique field in the row, they use it in a primary key of an index. Sometimes as the single PK in a clustered index on a huge table, with no fill factor space.
GUID is a 16 byte binary SQL Server data type that is globally unique across tables, databases, and servers. The term GUID stands for Globally Unique Identifier and it is used interchangeably with UNIQUEIDENTIFIER. Source
Of course you do index maintenance at least weekly, probably using the solution by Ola Hallengren The GUID is unique but it will not fit nicely in the clustered index, so your index job spends hours sorting rows every week.
As Max says in their answer adding things add overhead it is simply impossible for that not to be case.
You should only add to your indexes and/or database where you have have justifiabel business reason for it. If you absolutely must add a GUID consider NEWSEQUENTIALID()
in many cases this can lower the overhead, but not eliminate it.
answered 7 hours ago
James JenkinsJames Jenkins
2,4382 gold badges24 silver badges48 bronze badges
2,4382 gold badges24 silver badges48 bronze badges
The tables I provided in the sample do not utilize theuniqueidentifier
data type, and that's because most of our tables do not use it either, in part because of all the known issues with it. I used it once for a project, and later came to realize why it was a poor choice, and have since moved to timestamps (of typedatetime
) in that particular use case, since the pruning process was using adatetime
field anyway.
– Solonotix
7 hours ago
add a comment |
The tables I provided in the sample do not utilize theuniqueidentifier
data type, and that's because most of our tables do not use it either, in part because of all the known issues with it. I used it once for a project, and later came to realize why it was a poor choice, and have since moved to timestamps (of typedatetime
) in that particular use case, since the pruning process was using adatetime
field anyway.
– Solonotix
7 hours ago
The tables I provided in the sample do not utilize the
uniqueidentifier
data type, and that's because most of our tables do not use it either, in part because of all the known issues with it. I used it once for a project, and later came to realize why it was a poor choice, and have since moved to timestamps (of type datetime
) in that particular use case, since the pruning process was using a datetime
field anyway.– Solonotix
7 hours ago
The tables I provided in the sample do not utilize the
uniqueidentifier
data type, and that's because most of our tables do not use it either, in part because of all the known issues with it. I used it once for a project, and later came to realize why it was a poor choice, and have since moved to timestamps (of type datetime
) in that particular use case, since the pruning process was using a datetime
field anyway.– Solonotix
7 hours ago
add a comment |
Solonotix is a new contributor. Be nice, and check out our Code of Conduct.
Solonotix is a new contributor. Be nice, and check out our Code of Conduct.
Solonotix is a new contributor. Be nice, and check out our Code of Conduct.
Solonotix is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f242908%2findex-uniqueness-overhead%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown