Covered Queries
Indexes allow databases to efficiently find stored information. To do so they must contain an updated copy of the associated information from the records. Sometimes the information in the index alone is enough to respond to a query. This article takes a brief look at that additional optimization which is typically referred to as an Index Only Scan or a Covered Query.
One quick note before we begin. The technique described here is pretty powerful but also quite specific. While all indexes are designed for queries, covering indexes usually limit flexibility even more. There are often ample other performance improvements that should be explored prior to covering, and relying on covering to reach performance goals can be a gamble that may not pay off.
Setup
For this demonstration, consider a subset of our gaming schema populated with the following entries:
ID | gamerName | score |
---|---|---|
1 | Ice | 0 |
2 | Sky | 9 |
3 | Ray | 3 |
4 | Ice | 2 |
5 | Sky | 1 |
6 | Ice | 4 |
7 | Sky | 2 |
When a user is logged in the application will need to retrieve the scores by that user from the database. The filtering conditions could be expressed as the following SQL query:
As we know, an index can be used to execute queries like this efficiently. Let's create one now on the gamerName
column used by the query:
Here, and across the site, we use IndeX-Ray™ to provide imaging from the database. This technology allows us to see how the database actually uses an index to find data. Check out the interactive diagrams here in the article directly or open any of them in the playground directly to explore (and modify) them further.
The database did exactly as we were expecting, using the index in a targeted manner to read only the records that were of interest to us. Specifically it does a good job of removing unnecessary records from the table via the WHERE
clause:
This query and index are designed in a way that will allow the database to scale successfully over time. This is pretty typical of what we might see in a real application and does not necessarily need any further attention. But while we are here, is there anything that we can do to optimize it evenmore?
(Further) Narrowing the results
It turns out that there is more that we can do though by taking a look at the actual application requirements. Recall that we are just displaying the score
s produced by the logged in user. This draws our attention to the beginning of the query:
The *
here represents all columns in the table, but the application only needs the value from a single one! While the purpose of this article is not to discuss discuss the SELECT *
pattern, we certainly do not need the query to retrieve all of the columns. Based on this requirement, we can further narrow our results by removing extra information from the records that get retrieved by the application. Adjusting the SELECT
portion of the query slightly makes it:
Does this change execution at all?
Be sure to click the play button to see the database in action!
Not yet in terms of the query execution plan itself. There are indeed other benefits that come about from this change by itself, but determining what those are is an exercise left up to the reader (hint: some of them may be related to network traffic). That said, we are not done in terms of improving the query execution itself. Let's keep going!
When we look again at the final query, we can observe that there are only two columns, gamerName
and score
, are mentioned. The index already contains gamerName
. This suggests that the database is reading the entire record simply to retrieve the value of the one other column (score
).
Can we help the database out by giving it that additional data in the index? Let's update the index to include the other column (score
) and see what happens:
Ah now there is a change!
The database scans the same number of leaf nodes (3) using this index as it did with the previous one. The differenc is that now the index alone contains both the score
information needed for the WHERE
clause as well as the gamerName
information that should get returned to the client. This represents all of the information needed to process the query meaning that there is no reason for the database to retrieve and process the record itself. The database knows this and therefore removes that extra work from the query plan altogether. The result is this 'Index Only Scan' or 'Covered Query' seen above.
Summary
Creating an index that fully covers a query can be a powerful optimization. Doing so allows the database to skip processing records altogether which is a direct reduction in the amount of work that the database and its host server are doing. Among others, one benefit is that such queries process faster.
This technique can be opportunistically pursued for queries that need to run as quickly as possible or for query and index pairs that are already close to being covered already. It should not, however, be the goal of all or even most queries. Indexes can, and should, support multiple queries. But every additional column that is added to an index definition typically makes it less flexible for supporting other operations. Those larger indexes also create incrementally more overhead which adds up over time.
So use covered queries, but do so responsibly.
Need to go deeper?
- Use IndeX-Ray™ to get a free checkup of your index, no waiting required.
- Check out how indexes can return data directly.
- Subscribe to the newsletter to hear when new articles like this are published and to keep up to date with other exciting announcements.