Introducing an Index Rating System

7 min read

The Problem

Here are two indexes, the first on (score, gamerName) and the second on (game, gamerName):

And here is a simple query:

How do we compare and describe the performance of these two indexes for this query?

First Attempt (Metrics)

At a glance, these two indexes look like they would perform exactly the same on this data in our development environment. Descriptively, they both:

Scanned the single entry in the index.
Fetched and returned the single matching record from the table.
Contain 2 of the 3 columns used in the query. The first index is only missing game while the second index is only missing score.

From this perspective the two indexes seem identical. Can they be used interchangeably for this query?

Inspection

We can find the answer to that question by adding some more data and comparing the two once again:

Here, and across the site, we use IndeX-Ray™ to provide imaging from the database. This technology allows us to see how the database actually uses an index to find data. Check out the interactive diagrams here in the article directly or open any of them in the playground directly to explore (and modify) them further.

Even with just 4 additional records, we can already see that these two indexes cannot be used interchangeably for the query. Indeed the first index neither limits the search at all nor does it cover the query to prevent records from being fetched. That means that the database had to do a full scan of the index plus a full scan of the table. This index is therefore worse than a full table scan for the query and is useless for it.

What happened before is that we fell into the 'It Works in Dev' trap. This was compounded by an inability to succinctly and consistently describe the characteristics of the indexes for a given query. In particular, the ordering of columns in indexes is important. We need a quick and easy way to indicate how effective an index is for a query even when the execution metrics can be misleading.

Index Rating System Overview

Enter our index rating system. Rather than reinvent the wheel, we are utilizing the tried and true method of ranking them using stars.

The guiding question behind the rating system is:

What makes an index good?

We know that there are a number of things that indexes help with. These include:

Reducing the amount of data searched when executing a query, driven primarily by column ordering in the index.
Retrieving results in the order requested by the user.
Returning results directly (avoiding accessing records altogether).

It makes sense to use this as the basis for the rating system, which is exactly what we have done. In short, the more effectively an index do these things (when applicable) for a query, the higher it will be rated for it.

So how well does this approach shape up when applied to the sample indexes above?

Example Application

As the playground link will show, the ratings are:

(score, gamerName) earns 0 out of a possible 5 stars
(game, gamerName) earns near full marks

Why?

As noted above, the first index is not helpful for this query at all. It adds a full index scan on top of (randomly) accessing the full table. The index could, however, be very helpful for a different query (such as one that finds the gamerNames of users with the top scores). The rating of this index for that query would be quite different.

The second index is very helpful for the current query. The only filtering criteria on the game field, so this index allows for only relevant data to be processed. The only thing that could make this index more useful for this query would be to have it cover the operation (by including score in its definition), but doing so could improve the performance of this query by paying some cost elsewhere in the system.

Limitations

Is the system perfect? Of course not.

Consider the following query:

The current system rates the two indexes ( selectiveCol ) and ( booleanCol ) the same. But if a filter on selectiveCol matches very few records whereas a filter on booleanCol matches many, then the performance of each would look as follows:

In this case, the two indexes would perform very differently and ( selectiveCol ) should be preferred. This is really just the data sampling problem from the beginning of this article in disguise. Apart from the suggestive column names we used here, there is no way for the system to determine how accurately that sample data reflects the actual data in the database either now or in the future.

In other words, the system currently does not take the values of the sample data into consideration for the rating. That sample data is important though. It is reflected in, and indeed the driver of, the efficiency metrics that accompany an index. Those metrics will always be there and provide that additional perspective that ratings alone may miss.

There are certainly some other factors that may be worth including in the future. Such factors could include constraints defined on the schema (such as a unique column) or the data type that the column is defined as (a timestamp is probably more selective than a date which may be more selective than a boolean, e.g.). We will post more about those as we roll out those changes in the future.

Summary

Virtually the sole reason that indexes are used by databases is to speed up queries. Necessarily that means that nearly all discussions about indexes are focused on describing how 'good' or effective they are. While there is a lot of definitional nuance there, having a concise and consistent way to describe them is important.

We have introduced a simple and intuitive rating system to make reasoning and talking about indexes as easy as possible. Complimented by efficiency metrics and the innovative visuals powered by IndeX-Ray™, there has never been a more straightforward way to compare indexes for queries.

Need to go deeper?

Use IndeX-Ray™ to get a free checkup of your index, no waiting required.
Check out how indexes can return data directly.
Subscribe to the newsletter to hear when new articles like this are published and to keep up to date with other exciting announcements.

Blog

Care, for your database