Sunday, February 28, 2010

Do your Clustered Indexes on IDENTITYs Have the Correct FILLFACTOR?

The great thing about blogging is that it forces you to be as sure as you can that you know what you're talking about. That alone makes it worthwhile...

I started this FILLFACTOR post a few days ago with a completely different goal in mind. After I did my due diligence, however - perusing Books OnLine, seeing what's in my "dead trees" library, and reading my favorite gurus' blogs - I realized that the "tip" I was preparing to demonstrate was completely wrongheaded.

In the process, I learned a lot about setting FILLFACTOR values. One thing I learned was that clustered indexes on IDENTITY columns should almost always have their FILLFACTOR set to 100% (completely full - no room for expansion). The reasoning has to do with page splits, which are expensive, especially on a clustered index.

Since IDENTITY columns are monotonically increasing, INSERTs will always be applied to the end of the index, so an INSERT will never cause a page split. UPDATE-ing a record can cause page splits, but only if a variable-length field (like, an NVARCHAR) is updated with more data than it had before. Since the page is already 100% full, there's nowhere to put the extra data: that causes a page split.

Can we avoid page splits like this by setting the FILLFACTOR to something less than 100% - that is, leave some empty space on each page? Sure, but, as always, there's a downside. Database "reads" typically outnumber "writes" by a factor of n to 1, right? Whether n is 5 or 50 depends on who you believe, and the unique characteristics of each database, and certainly changes over time; so it's a little vague, but in any case, the typical OLTP database does a lot more reads than writes. That's important, because setting the FILLFACTOR to less than 100% can make reads much slower: that's the downside.

To see why this is, imagine you have a table with 1000 rows, and 50 rows will fit on an 8 KB page. With a FILLFACTOR of 100%, every page is full (by definition), this will require 100 / 50 = 20 pages. So, a scan of this table will require, at most, 20 reads.

Now imagine you've set the FILLFACTOR to 80%, leaving each page 20% empty after you REBUILD it. This leaves lots of room for growth, so we'll reduce the number of page splits. And that's a good thing.

But now, only 40 rows (80% of 50 rows) will fit on each page, so the table now requires 100 / 40 = 25 pages. A scan of the table will now require, at most, 25 reads. You've just decreased read performance by, in the worst case, 1 - (20 / 25) = 20%. Oops.

So, to be sure all the tables that are clustered on an IDENTITY column have a FILLFACTOR setting of 100%, I've written a script, Index_ClusteredOnIdentity_SetFillFactorTo100_All.sql . The script first finds the tables to be rebuilt, using a cursor on this statement:

SELECT OBJECT_SCHEMA_NAME(i.object_id)   AS 'Schema Name',
       OBJECT_NAME(i.object_id)          AS 'Table Name',
       i.name                            AS 'Index Name'
  FROM sys.indexes                  i
  JOIN sys.index_columns            ic
    ON ic.object_id = i.object_id
   AND ic.index_id  = i.index_id 
  JOIN sys.columns                  c
    ON c.object_id  = ic.object_id
   AND c.column_id  = ic.column_id
  JOIN sysindexes                   si
    ON si.id        = i.object_id
   AND si.indid     = i.index_id
 WHERE i.object_id                     > 100
   AND OBJECT_SCHEMA_NAME(i.object_id) != 'sys'
   AND i.is_disabled                   = 0 
   AND i.type_desc                     = 'CLUSTERED'
   AND c.is_identity                   = 1
   AND si.origfillfactor               NOT IN (0, 100)

This should be pretty self-explanatory. The system tables are joined in the usual way to get at the index's column(s). We restrict the result set to exclude system tables and tables in the "sys" that can be returned in certain cases. We don't REBUILD disabled indexes, because that also enables them, a side-effect you may not expect. We of course only want clustered indexes on IDENTITY columns. Finally, there's no point in rebuilding the index unnecessarily.

As we iterate through the resultset, we build up a statement like:

ALTER INDEX PK_Product_ProductID
   ON Production.Product
   REBUILD WITH (ONLINE = OFF, PAD_INDEX = ON, FILLFACTOR = 100)

Rebuilding a clustered index can be a time-intensive process, but at least, as BooksOnLine says, "Rebuilding a clustered index does not rebuild associated nonclustered indexes unless the keyword ALL is specified" - and we're not specifying it.

0 comments:

Post a Comment