Re: Index management

From: Brian Peasland <dba_at_remove_spam.peasland.com>
Date: Mon, 10 May 2004 21:50:09 GMT
Message-ID: <409FF911.C3AD82AC@remove_spam.peasland.com>

"Howard J. Rogers" wrote:
>
> Brian Peasland wrote:
>
> > On this last point, you and I agree, sort of.
>
> And it's a point Mike just made up. No-one has ever accused him of
> advocating rebuilding all indexes.
>
> > I don't say rebuild
> > indexes just because of some magic number. And I don't say to never
> > rebuild indexes. There are times to rebuild indexes and times not to.
> > But for me, those times are more dictated by the expected flow of data,
> > not by some magic number. If an index has a tendancy to get one-sided
> > (normally due to the use of a monotic sequence as the value to the
> > indexed column followed by deletions of "older" values) then rebuilding
> > periodically can help.
>
> See, this is where that paper you wrote is wrong, too, IIRC. If the
> index has got "lop-sided" because you deleted a lot of older values,
> then you now have a lot of empty leaf nodes on the left-hand edge of
> your index, and those empty blocks can now receive the next inserts from
> your monotonically incrementing sequence number. Which means that a
> rebuild is *not* warranted.

I would disagree with you here. The empty leaf nodes on the left-hand edge of the index cannot receive new inserts because those inserts will always be on the right hand side! I'll draw a crude diagram to illustrate my point. I have a table called EMP with an index on EMPID. The EMPID column is populated from my EMPID_SEQ sequence, so the next employee id will always be larger than previous employee id. Let's say I have an index that crudely looks like:

                                        
|                                       
                     ---------------------------------------
                     |                                     |
          --------------------                     ---------------
          |                   |                    |             |
  ----------------    ----------------    ----------------    ------

| | | | | | | | | | | | | | 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014

(Please excuse the very rough diagram of this simple B-tree structure. And I do hope that the formatting does go through for people.)

Anyway, The next employee id to this index will be 1015 which will be added to the right side of the B-tree. Since this key is increasing, it must be larger than all other previous index entries.

                                        
|                                       
                     ---------------------------------------
                     |                                     |
          --------------------                     ---------------
          |                   |                    |             |
  ----------------    ----------------    ----------------

| | | | | | | | | | | | | | |
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015

Now as life goes one, people leave the company for various reasons. So let's assume that some people have left the company. The leaf nodes in the tree might look something like the following:

                                        
|                                       
                     ---------------------------------------
                     |                                     |
          --------------------                     ---------------
          |                   |                    |             |
  ----------------    ----------------    ----------------

| | | | | | | | | | | | | | |

          1003      1005                1009 1010 1011 1012 1013 1014

1015

Will the next employee id use one of those empty slots? It can't. The next employee will be numbered 1016. How can it reuse one of those empty slots?

Given enough time, you can have a tree structure that starts to look similar to the following:

                                        
|                                       
                     ---------------------------------------
                     |                                     |
          --------------------                     ---------------
          |                   |                    |             |
  ----------------    ----------------    ----------------

| | | | | | | | | | | | | | |

                                        1009 1010 1011 1012 1013 1014

1015

The entire left side of the tree is unnecessarily contributing to the height of the index.

This same concept gets even worse for something like invoices. The invoice numbers are generated by a sequence. All new invoices will get added to the right side of the tree. What makes this worse is that companies tend to remove the oldest invoices all at once. For instance, the company may decide to remove all of last year's invoices (or older) from the table. In that case, there won't be a sparse look to the tree like I indicated in my 3rd diagram.

Looking at these examples, how is it that those empty blocks can now receive the next inserts from your monotonically incrementing sequence number"?

> > And if I delete massive amounts of data and I
> > don't expect to add much more data in the future, then rebuilding can
> > help.
>
> Well, I'd suggest that if you'd done that to your table, you are going
> to get a much better performance improvement from re-compacting your
> *table* (alter table X move), because you are now suffering from High
> Water Mark Inflation Syndrome. Of course, after a table move, an index
> must be rebuilt for it to be usable. And we could discuss, forever I
> suspect, how to disentangle the effects of both operations on
> performance. But I think anyone who would carefully rebuild their index,
> but happily leave their table full of fresh air, has lost the plot.
>
> In which case, we are *still* not talking about just rolling out 'alter
> index X rebuild' commands.

I fully appreciate that! Yes, if you delete a very large amount of data from your table, then you might want to consider re-compacting your table as the HWM accounts for a large amount of empty space, thus affecting things like full table scans.

> >And sometimes rebuilding an index can help in one area and hurt in
> > others, but the one area that I am fixing is more important to me than
> > the other areas.
>
> That's getting rather vague (which I think is really your point: these
> things are vague and subtle and no one number is going to give you the
> green light or red). But I'll buy into it. There are always the odd
> exceptions.

Yes it is getting vague, and that was part of my point. Lots to think about.

> > But in none of these circumstances can I arrive at any
> > ratio which defines the need to rebuild or not. There is simply no query
> > you can apply the index and table metrics to arrive at these
> > conclusions.
>
> The starting point for any discussion you may have about the need to
> rebuild an index should be, I think, that Tom Kyte has rebuilt 7 of them
> in his entire professional career.
>
> And I don't mean he just got unlucky with the allocation of keyboard
> duties over time, but that he positively and actively decided to rebuild
> only 7.
>
> Your justification for rebuilding an index when taken in that light had
> better be pretty good, I think.

To me, as I've tried to say, the justification for rebuilding an index isn't as simple as coming up with a metric. There is more to it than that. Like many things in database tuning, a simple metric can't always give an accurate picture on whether or not you need to take Course_of_action_A or Course_of_action_B.

And to be honest with you, then only time I've rebuilt an index in the last two years or so was when it was marked UNUSABLE due to some other operation that I've performed, such as an ALTER TABLE MOVE command.

Cheers,
Brian

-- 
===================================================================

Brian Peasland
dba_at_remove_spam.peasland.com

Remove the "remove_spam." from the email address to email me.


"I can give it to you cheap, quick, and good. Now pick two out of
 the three"

Received on Mon May 10 2004 - 16:50:09 CDT