Saturday, June 23, 2018

Planning queries involving foreign PostgreSQL tables

Cost based optimization

A query may be executed in many different ways, modelled as plans in query optimizer, differing in resources required and/or execution time. A typical DBMS's query optimizer tries to find all the possible plans for executing a given query and chooses the fastest plan amongst those. But it's not possible to calculate the time required by a plan unless the query is executed. Thus an optimizer tries to associate an estimate of execution time with each possible plan and choose the one with the least estimated value. PostgreSQL is no different. It associates a cost with each possible plan. The cost is a rough estimation of the time required to execute the query. The plan with the lowest cost is chosen for execution. The time required to execute a query is sum of time required to perform various operations involved in the plan being executed e.g. time required to scan the tables in the query, time required to compute joins, etc. Thus a plan's cost is sum of the costs of operations involved in the plan. In order to efficiently and correctly estimate the cost of a plan, PostgreSQL maintains the statistics about sizes of tables, indexes, the values stores in various columns of tables and so on. In a DBMS, where data keeps changing, the statistics often gets stale and needs to be updated. PostgreSQL keeps the statistics up-to-date by frequently sampling the tables. This works reasonably well as long as the tables involved in the query are part of the DBMS.

But now a days, often, applications run queries which require data external to the DBMS. PostgreSQL supports querying external data through a Foreign Data Wrapper (FDW in short), a method based on SQL/MED standard. We will discuss the methods employed by the query optimizer to plan such queries and methods to maintain the statistics about the external data, esp. the data residing in other PostgreSQL server/s, in this post.

Foreign tables and statistics

PostgreSQL allows external data to be represented as "foreign tables". While PostgreSQL scans the local regular tables frequently to keep the statistics up-to-date, it can not do so in case of a "foreign table", since accessing external data itself might consume precious network bandwidth and might take longer than accessing local data. If the "foreign table" is not accessed frequently, it performed network I/O for no reason. Hence PostgreSQL does not sample external data or foreign tables frequently by itself. Instead a user is required to run ANALYZE command on a foreign table periodically. As part of this command, FDW brings samples of external data to PostgreSQL which, in turn, derives the requires statistics from it.

Costing queries involving foreign tables

When a query involves foreign tables, the PostgreSQL optimizer works with the corresponding FDWs to produce various plans for that query. For example, if the query has a join between two foreign tables which use the same FDW, PostgreSQL has two choices.
  1. Fetch the foreign table data from the foreign server (optionally applying any conditions at the foreign server) and perform join locally. In this case, FDW is responsible for costing the scans on the foreign server.
  2. If the FDW is capable of performing the join itself, then delegate the join to the FDW. In most cases, this means that the two foreign tables reside on the same foreign server which is capable of performing the join in a way that the result of join is same as PostgreSQL. In this case, FDW is responsible for computing the cost of the join.
Similar logic is applied for other SQL operations like grouping, sorting etc. An FDW may use the statistics collected by PostgreSQL, use PostgreSQL's costing methods or employ entirely different methods to collect statistics and/or compute costs. Each FDW may implement its own costing model. But it is expected to produce costs that are consistent with the rest of the optimizer. Next we will take example of postgres_fdw.

postgres_fdw costing model

postgres_fdw is used for accessing external data from an other PostgreSQL server. It uses two different modes for computing costs, governed by option "use_foreign_estimate". Read more about this option here.
  1. When 'use_remote_estimate' is true, postgres_fdw fetches the costs from the foreign server using EXPLAIN.
  2. When it's false, postgres_fdw computes the costs based on the statistics about external data available locally.

The second method works fine for a simple scan on a foreign table as long as the statistics about the foreign table is kept up-to-date. But it doesn't do justice to complex operations like join, grouping whose performance depends upon a number of factors like availability of suitable indexes, memory for hash table or sorting, which are not covered by the statistics.

Take, for example, the following query involving two foreign tables ft1and ft2, pointing to tables t1 and t2 on the foreign server and each having columns c1 to c8. The plan with "use_remote_estimate" is disabled looks like:
explain (analyze) select * from ft1 t1 join ft2 t2 on t1.c1 = t2.c1 and t1.c1 + t2.c1 <= 20;
                                                       QUERY PLAN                                                        
 Hash Join  (cost=247.94..301.01 rows=274 width=108) (actual time=8.289..13.270 rows=8 loops=1)
   Hash Cond: (t1.c1 = t2.c1)
   Join Filter: ((t1.c1 + t2.c1) <= 20)
   Rows Removed by Join Filter: 814
   ->  Foreign Scan on ft1 t1  (cost=100.00..137.66 rows=822 width=54) (actual time=1.223..5.918 rows=822 loops=1)
   ->  Hash  (cost=137.66..137.66 rows=822 width=54) (actual time=7.050..7.050 rows=822 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 83kB
         ->  Foreign Scan on ft2 t2  (cost=100.00..137.66 rows=822 width=54) (actual time=1.203..6.684 rows=822 loops=1)
 Planning Time: 0.307 ms
 Execution Time: 13.887 ms
(10 rows)

When we enable "use_remote_estimate" for both the tables, the plan changes to
explain (analyze) select * from ft1 t1 join ft2 t2 on t1.c1 = t2.c1 and t1.c1 + t2.c1 <= 20;
                                            QUERY PLAN                                            
 Foreign Scan  (cost=131.47..172.63 rows=274 width=108) (actual time=2.153..2.154 rows=8 loops=1)
   Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Planning Time: 3.755 ms
 Execution Time: 2.336 ms
(4 rows)

Observe the difference in execution and planning time. When "use_remote_estimate" is true, the planning time is ten times more than the planning time with "use_remote_estimate" off. But the execution time with "use_remote_estimate" is true is almost 6 times lesser compared to that when "use_remote_estimate" is false. The planning time has increased so much since it fires many EXPLAIN queries on the foreign server trying to cost possible plans for join. If you have turned auditing ON on the foreign server, you will find that for planning the above query, postgres_fdw has fired following EXPLAIN commands on the foreign server.

  • EXPLAIN SELECT c1, c2, c3, c4, c5, c6, c7, c8 FROM t1
  • EXPLAIN SELECT c1, c2, c3, c4, c5, c6, c7, c8 FROM t2
  • EXPLAIN SELECT c1, c2, c3, c4, c5, c6, c7, c8 FROM t1 ORDER BY c1 ASC NULLS LAST
  • EXPLAIN SELECT c1, c2, c3, c4, c5, c6, c7, c8 FROM t1 WHERE (((c1 + ((SELECT null::integer)::integer)) <= 20)) AND ((((SELECT null::integer)::integer) = c1))
  • EXPLAIN SELECT c1, c2, c3, c4, c5, c6, c7, c8 FROM t2 ORDER BY c1 ASC NULLS LAST
  • EXPLAIN SELECT c1, c2, c3, c4, c5, c6, c7, c8 FROM t2 WHERE (((((SELECT null::integer)::integer) + c1) <= 20)) AND ((((SELECT null::integer)::integer) = c1))
  • EXPLAIN SELECT r1.c1, r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2.c1, r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM (t1 r1 INNER JOIN t2 r2 ON ((((r1.c1 + r2.c1) <= 20)) AND ((r1.c1 = r2.c1))))

Each such EXPLAIN is a network trip to the foreign server, which consumes network bandwidth and time. While turning "use_remote_estimate" ON, improved query performance, it has drastically increased query planning time. In some cases the planning and execution time together may turn out to be same with and without "use_remote_estimate" turned ON. That takes away all the benefit of join or grouping push-down, which isn't good. This may not sound that bad when only some of the queries involve foreign tables or when the data volumes are so high that the planning time, even including those network trips, are only tiny fraction of the total execution time. But it does matter a lot when we move towards an FDW based built-in sharding as described in a blog by Robert Haas and in the slides by my friends from Japan and by Bruce Momjian.

We could reduce the network trips, and thus the reduce planning time, if we turn "use_remote_estimate" OFF, but then the plan comes out to be poor as seen above. This happens because PostgreSQL tries to cost the plans without knowing the capabilities of the foreign server, the plans that the foreign server can "think of". What if we could combine best of the two approaches by making local PostgreSQL "think" like the foreign PostgreSQL server? That's possible at-least in theory.

Costing foreign operations locally

After all the foreign server in this case is a PostgreSQL with costing model, optimizer and executor same as the local PostgreSQL. If the local PostgreSQL knows values of all the parameters which affect query optimizer, the costs it could compute locally would be much closer to the cost it gets from EXPLAIN output. Here's a rough list of what those parameters are:
  1. Various GUCs that affect the query optimizers. There are three classes of GUCs: a. the GUCs that determine the costs of certain operations like random_page_cost and cpu_operator_cost. b. the GUCs that enable/disable certain planner strategies like enable_hashjoin. c. the GUCs that constrain resources that the query executor can use e.g. parallel workers, memory.
  2. Statistics about the tables involved in the query. We already have a method to gather statistics on the remote tables, but it requires scheduling ANALYZE commands manually. Further-more when a foreign table is ANALYZEd, postgres_fdw fetches sample data from the foreign server and derives statistics from it. That consumes network bandwidth and time. It can do better by fetching the "statistics" itself from the foreign server. After-all the statistics on the foreign server was collected by a method similar to the one used to derive statistics locally. It will also help whether or not to ANALYZE a foreign table automatically is left to the user to decide. A user may be happy to spend the network bandwidth but keep the statistics up-to-date. In that case, s/he will be happy to let that happen automatically like a regular table rather than set up a cron-job and maintain it.
  3. Metadata used by the query optimizer. Query optimizer uses the knowledge about the constraints on the table, indexes on the table to create and cost various plans. Right now PostgreSQL supports "declarative constraints" on the foreign table, i.e. constraints which are not enforced but used by the query optimizer. But a user needs to set those up itself. It would be better if postgres_fdw can setup them up itself by knowing the constraints on the foreign server. However, PostgreSQL has no knowledge of indexes available at the foreign server. It would be better if postgres_fdw support declarative indexes on the foreign table as well.

Empowered with this knowledge, the costs of delegated operations computed locally using the local costing model would be much closer to the actual costs computed at the foreign server. This would eliminate any need to have a "use_remote_estimate" option.

The tricky part is to keep this knowledge about the foreign server up-to-date. The values of GUCs and the metadata may change on the foreign server without local PostgreSQL knowing about it. But those changes are not that frequent and syncing those sufficiently frequently would suffice. Statistics about the foreign data, however, may change rapidly if the transaction rate on the foreign server is high. Keeping that fresh enough to choose the optimal plans would be a challenge. But it can be achieved if we allow the foreign server to periodically push the information to the local server and over a right wire-protocol.


  1. Nice write up, Ashutosh!

    It seems you need to swap where the last two links in the following sentence point to.

    "But it does matter a lot when we move towards an FDW based built-in sharding as described in a blog by Robert Haas and in the slides by my friends from Japan and by Bruce Momjian."

    1. Thanks Amit for pointing out the mistake. I have fixed the links now.