Blog

Care, for your database

← Blog

EXPLAIN Quirks: PostgreSQL's Missing Index Usage Details

15 min read

Last updated 23 Jan 2025

The EXPLAIN command provides a window for us to peek through to see how a database executes a query. That knowledge is a fundamental and indispensable component when it comes to understanding and fixing slow queries.

The format and "language" of EXPLAIN output varies greatly by vendor. PostgreSQL's variant has some nice qualities, with the BUFFERS option being particularly insightful. As always though, there is plenty of room for improvement.

This article discusses two peculiar aspects of PostgreSQL's format that ends up hiding some really basic details about the query plan. Fair warning, once you become aware of this gap there's no unseeing it!

Update: There is an open proposal (5063) to address the very gaps described in this post!

The Problem

Consider a (denormalized) table that represents scores from playing various video games. You can see that schema and interact with the playground powering all of the demos in this article here.

If we want to find entries for a specific game ('Cyber Quest') that are above a certain score (1400), we could do so with the following SQL:

It is possible for the database to use either of the indexes when running the query:

  1. (score, game)
  2. (game, score)

We can confirm that claim by looking at the corresponding EXPLAIN outputs[1] from PostgreSQL, in brief:

Does this mean that these indexes work equally well for this query? IndeX-Ray quickly shows us the answer is absolutely not!

With the (score, game) index, the database has to scan through half of the entries in the index to find the record of interest:

When using (game, score) index, the database is able to find the record right away:

Here, and across the site, we use IndeX-Ray™ to provide imaging from the database. This technology allows us to see how the database actually uses an index to find data. Check out the interactive diagrams here in the article or open the playground to explore your own.

There are two aspects of the EXPLAIN format used by PostgreSQL that are problematic and keep this information hidden from us:

  1. The representation of indexed conditions
  2. The lack of metrics about the work performed by the index scan stage.

Let's take a closer look at each of these.

Index Boundaries vs Filtering

The first is problem with the EXPLAIN format is that the value of the Index Cond field is identical in the two examples (apart from the order in which they are reported):

This implies that the database is able to use the indexes in the same way for this query. Is this actually true? Spoiler: It turns out that this is terribly misleading.

An important fact to be aware of is that there are two different techniques that can be used to check if an indexed value matches a condition of the query (e.g. a WHERE predicate). Use the Index, Luke! describes this as follows (emphasis added):

  • access predicates: The access predicates express the start and stop conditions for the leaf node traversal.
  • filter predicates: Filter predicates are applied during the leaf node traversal only. They don’t contribute to the start and stop conditions and do not narrow the scanned range.

We won't get into the details in this article, but the first of these approaches is way better and is used by the database whenever possible.

Returning to our example, the EXPLAIN output itself reported identical Index Conds for the two indexes. This is at least misleading, if not downright wrong, because it completely hides the distinction between the two different ways that the index can be used to check a predicate condition! The IndeX-Ray diagram helps reveal that:

  • For (game, score), both are access predicates.
  • For (score, game), only score is an access predicate. The game predicate is being applied as a filter predicate!

PostgreSQL itself obviously knows about the distinction between the two techniques as it is actively applying them. Why is this crucial piece of information intentionally obscured in the EXPLAIN output?

Growing Awareness

Updated: We are very grateful to Michael from pgMustard who generously spent some time delving into this topic with me. This section has been rewritten based on that discussion, and it's much better as a result!

Much of our surprise and confusion on this topic stems from the official PostgreSQL documentation itself. Naturally that was the first place that we checked to learn more about how to properly interpret the EXPLAIN output. As far as we can tell at the time of writing, there is no mention about these two different behaviors in the documentation at all.

Thankfully other sources are more helpful in this regard. Our friends at pgMustard (which is our go-to spot for all things related to PostgreSQL query execution) state the following in their glossary entry for Index Cond:

The condition(s) used to find the locations of rows from the index.

Postgres uses the structured nature of the index to jump to the rows it's looking for. Different index types use different strategies.

Unfortunately, Postgres doesn’t yet distinguish between conditions used to access and conditions used to filter rows as part of the Index Cond. As such, you need to compare the index definition and query conditions to work it out, and can compare the COSTS or BUFFERS between query plans as a check.

Any conditions that could not be used to search the index will be added as a Filter, which later discards any rows necessary based on their values. This is naturally less efficient.

This is a succinct definition that is approachable, accurate, and actionable! It's quite an impressive combination really.

That page also links to additional resources in case the reader wants to learn more. One of those additional resources happens to be another page from Use the Index, Luke! which we had failed to discover when browsing it previously. Nested under Execution Plans is a section for PostgreSQL and a page specifically about Distinguishing Access and Filter-Predicates. That page gives an example of this problem and directly states:

PostgreSQL execution plans do not show index access and filter predicates separately—both show up as “Index Cond”. That means the execution plan must be compared to the index definition to differentiate access predicates from index filter predicates.

Note: The PostgreSQL explain plan does not provide enough information for finding index filter predicates.

Perhaps most importantly, there is an open proposal for PostgreSQL (ID: 5063) to address the very gaps described in this post! Put forth in June of 2024, the original message includes the following:

However, it's not easy to confirm whether multi-column indexes are being used efficiently because we need to compare the index definitions and query conditions individually.

...

I'm considering adding new information, "Index Bound Cond", which specifies what quals will be used for the boundary condition of the B-Tree index.

We'll be watching that one closely.

Now onto the second glaring omission in EXPLAIN output which is the actual work done by index scans!

Index Scan Work

One of the helpful options to EXPLAIN is ANALYZE. It does the following according to the documentation:

Carry out the command and show actual run times and other statistics.

We can use the same query from earlier to demonstrate a second quirk about PostgreSQL's EXPLAIN ANALYZE metrics. To do so we will first swap the (game, score) index for a (score) index. With that index the EXPLAIN ANALYZE output now contains (in part) the following:

As expected, the database used the index to only process rows that have the correct score value. Since the other column used in the query (game) is not part of the index, the database must filter the candidate rows to remove any that don't match. In our example that extra and inefficient work is surfaced via the Rows Removed by Filter entry.

IndeX-Ray reveals this same information, which is visually represented as the yellow records that were scanned but not returned to the client in the result set:

Having this information directly available, both in EXPLAIN ANALYZE and through IndeX-Ray, is great because it allows us to be aware of wasted work done while processing a query!

We saw in the previous section that this same filtering technique can be applied while scanning the index. Even though the distinction isn't made in the Index Cond representation, surely the filtering metrics are surfaced by EXPLAIN ANALYZE?

The IndeX-Ray diagram for the query using the (score, game) index does indeed show the filtering performed during the index scan via the leaf nodes that are yellow (and the related records not having to be scanned at all):

And the EXPLAIN ANALYZE output? In full we have:

There is absolutely no indication about this filtering activity!

In other words, the database could do an arbitrarily large amount of work during the index scan but not reveal any of it in the EXPLAIN ANALYZE output.

Final Thoughts

All databases have their own quirks when it come to the "language" they use in their EXPLAIN formats, and PostgreSQL is no different in that regard. The two items discussed here are also far from the only oddities in PostgreSQL (looking at you index-only scans that aren't).

The surprising thing here is that the missing details here represent core concepts about index efficiency. In turn, index efficiency is typically at the heart of query optimization and often a motivating reason for looking at EXPLAIN output in the first place. What's the motivation and justification to keep this information from showing up in EXPLAIN where it is seemingly needed the most?

As far as we can tell, there is really no way for an average user to discover any of this information. The biggest hint about there being a difference between the two indexes in our original example would be the fact that PostgreSQL will probably actively resist using the (score, game) index under most conditions. The database is usually correct in making that decision, but it is bewilderingly difficult to learn why this is without existing database indexing expertise.

We hope that future versions of PostgreSQL have improvements to its EXPLAIN output to address the two problems highlighted in this article.

These two problems are just a couple of fascinating things that IndeX-Ray has shown us about PostgreSQL. What will it help you discover about your database?

Interested in this and want more?

  • Use IndeX-Ray™ to get a free checkup of your index, no waiting required.
  • Hop into our Discord to say hi, tell us what you thought about this article, or ask your own question about databases.
  • Subscribe to our newsletter to hear when new articles like this are published and to keep up to date with other exciting announcements.

Stay updated

Appendix

Changelog

23 Jan 2025

Courtesy of information and pointers provided by Michael Christofides, the Uncommon Knowledge? section was rewritten as Growing Awareness. It includes the new definion of "Index Cond" provided by pgMustard, a link to a page about this problem on Use the Index, Luke!, and a link to a proposal to enhance PostgreSQL EXPLAIN output to address this gap.

Notes

1. PostgreSQL presently offers four format types for EXPLAIN output: TEXT, XML, JSON, or YAML. The default is TEXT, but we have used JSON in this article. We also only provided the relevant portions of the output (unless otherwise noted) for readability purposes.

Setup

The playground for this article can be interacted with here. The Edit tab contains the DML and DDL used to configure it.

Table description:

Column NameData TypeConstraintsDescription
playerVARCHARNOT NULLName of the player
gameVARCHARNOT NULLTitle of the video game
scoreINTNOT NULLScore achieved by the player in the game

Sample data:

PlayerGameScore
Shadow WolfCyber Quest890
Pixel KingDragon Fury4600
Pixel KingShadow Racer1550
Shadow WolfDragon Fury4500
Dragon SlayerShadow Racer1400
Dragon SlayerCyber Quest870
Shadow WolfShadow Racer1500
Dragon SlayerCyber Quest4400
Pixel KingCyber Quest920
Shadow WolfCyber Quest895