Monday, January 23, 2012

Non-Clustered Heaps

When I am investigating a slow-running query or looking over a database for the first time, one of the areas I tend to dive into is the Table and Index structure.  Having an understanding of how the Database is laid out at that level can help you understand what is going on at the Query and Execution Plan level.

Within that realm, I take a keen interest in any Heaps that I find.  How many are there? How are they being used?  Do they have any Non-Clustered Indexes on them?  Is the slow query issuing a lot of Reads or Writes against a Heap? etc.

I never cease to be amazed at how much I find...

A Table Taxonomy

When analyzing tables, I typically break them up into two groups: Heaps and Clustered Indexes.

Simply put, a Clustered Index helps to structure and organize the Table.  This helps you get better performance out of your database.  A full discussion of the benefits of a Clustered Index are outside the scope of this article.

A Heap is a Table that does not have a Clustered Index.  Heaps are unstructured and unordered chunks of data that are scattered around your database.  As you issue Writes against your Heaps, they get even more scattered and messy.  When you issues Reads against your Heaps, they tend to be inefficient since the Database has to look all over to find the data.  Generally speaking, heaps are bad.

Laundry Night

Think of this like your laundry.  When the dryer gets done, you have two basic options:

Option One

Time To Fold The Laundry

Grab everything as it is and shove it all into one drawer.  Socks, underwear, shirts, slacks; everything all together and not folded.  Not only it take you longer to get dressed in the morning, your clothes will be wrinkly, and your significant other will likely not appreciate it one bit.

Option Two

Sort the clothes into different stacks, fold them, and place them orderly into a few different drawers.  Now, your morning routine is much simplified.  It's easier for you to get dressed for work or that big date.  You'll look better, and the ladies (or guys) will love you for it.

Which one do you want?

Back to Databases

I'm about to paint with a broad stroke here so prepare yourself...

Generally speaking, I would say that 90% (or more) of tables need a Clustered Index.  Of those Clustered Indexes, 90% (or more) of them should be narrow, fixed-width, ever-increasing, static, and unique.  For most of those cases, 90% (or more), the best, simplest choice is an Identity column.

To restate this more simply, I would say that by default, any table that is created ought to have a Clustered Index, and that Index ought to be an Identity column.

Now don't get me wrong, there could be plenty of reasons for there being an exception to this rule.  Just have a reason and be prepared to discuss it.  Special cases are just that, special cases.  Remember, I'm speaking in general terms here...

Heaping Hierarchies

Similar to how I classify Tables, I have a method for categorizing Heaps.  It is either a Pure Heaps and Non-Clustered Heaps.

A Pure Heaps is a table with no indexes.  None, zero, nada, zip.

A Non-Clustered Heap is a Heap that has Non-Clustered Indexes on it.

Simple enough?

What's a Heap Good For

One of the common scenarios where I see Heaps being used are for Staging Tables. Perhaps you are loading in a Flat File or CSV and just need the data temporarily.  Then, another process will come along and suck the data out from that Heap into another Table with some Indexes.

Even then, you might be better off with a custom Clustered Index that is designed to help your Data Load.  Then, you might drop it and / or create a new Index that helps with your Data Migration.  And when you're done, Truncate or Drop the Table.

However, if you are keeping the data around long enough or manipulate it through large amounts of Reads or Writes, then I would argue that you are invalidating the very argument for making it a Heap in the first place.

In other words, if you are using this Table in a non-temporary manner to the point where you feel the need to create Non-Clustered Indexes on it, then what is your argument for not creating a Clustered Index on the table?

Why Is He Still Talking

OK, so now you're probably wondering what prompted this tirade?  Well, lemme tell ya...

Recently, we brought a new Software and Database package into my shop.  This is a BlackBox App from a Third-Party Vendor.  Since this was my first time to get my hands on a strange Database, I started going through some of my standard checks; starting at a higher level and then working my way down.

Since I've been burned by them so many times in the past, one of the first things I look for are Heaps.  How many, what percentage of the tables, etc.

Another check I commonly do is check for any table with more than five Indexes.  Now, five is not a hard limit.  But, if I see Tables with a high number of Indexes, then I might want to take a closer look at those tables.  Additionally, I like to check for any Tables that have Zero Indexes.

Well, what I found was an interesting intersection of indexing information.

When I analyzed the Heap count, I was a bit shocked.  What I found was that in a Database with over 5,000 Tables, over 90% of them were Heaps.

Percentage of Heaps

Only a relatively small number of the Tables, less than 200, had Zero Indexes.

What that means, by process of elimination, that almost all of the Heaps had Non-Clustered Indexes on them.  Holy Forwarded Record Batman!

Finding the Non-Clustered Heaps

In a previous post, I discussed how to obtain the overall Heap count and percentage, as well as the listing of the individual Heaps.

Now, I'll show you how to dig a little deeper and find all the Non-Clustered Heaps.

[sql]

-- Steven Ormrod, sqlavenger.com
-- 1/21/12
-- Non-Clustered Heaps

-- now get the NCL counts for those Heaps
select count(*) as 'NCL Count',
OBJECT_NAME(object_id) as 'Non-Clustered Heaps'
from sys.indexes
where type_desc = 'NONCLUSTERED'
and object_id in (

-- get the Heap IDs
select i.object_id
from sys.indexes i
join sys.tables t
on i.object_id = t.object_id
where i.type_desc = 'HEAP'

)

group by object_id
order by 'NCL Count' desc,
'Non-Clustered Heaps';
go

[/sql]

Which gives you a result similar to this:

Non-Clustered Heaps

In this case, there were 4522 Non-Clustered Heaps, or over 95%.  Wow.

Analyzing the Non-Clustered Heaps

Once you have those names, then you still need to do a little digging.  Likely, you'll want to examine the underlying Table structure and see the Index definitions.

To make life easier, I worked up a script that will generate the relevant sp_help and sp_helpindex commands.

[sql]

-- Steven Ormrod, sqlavenger.com
-- 1/21/12
-- generates the sp_helpindex and sp_help commands for Non-Clustered Heaps

-- now generate the sp_commands to help me analyze the tables and indexes
select bad.[NCL Count],
s.name + '.' + t.name as 'Non-Clustered Heap',
'exec sp_help ''' + s.name + '.' + t.name + '''' as 'sp_help command',
'exec sp_SQLskills_SQL2008_helpindex ''' + s.name + '.' + t.name + '''' as 'sp_helpindex command'

from sys.schemas s
join sys.tables t
on s.schema_id = t.schema_id
join (

-- now get the NCL counts for those Heaps
select object_id,
count(*) as 'NCL Count'
from sys.indexes
where type_desc = 'NONCLUSTERED'
and object_id in (

-- get the Heap IDs
select i.object_id
from sys.indexes i
join sys.tables t
on i.object_id = t.object_id
where i.type_desc = 'HEAP'

)

group by object_id

) as bad

on t.object_id = bad.object_id

order by bad.[NCL Count] desc,
'Non-Clustered Heap';
go

[/sql]

Which gives you a result similar to this:

Non-Clustered Heaps sp_commands

Regarding sp_helpindex, I actually use Kimberly Tripp's rewrite of this command.  It shows you tons more information than the default command.  It is very easy to install, and I highly recommend it.

Happy Heap Hunting!

No comments:

Post a Comment