Walking with the Elephants: November 2016

PostgreSQL: Aggregate push-down in postgres_fdw

It’s exciting times in PostgreSQL for those who are using postgres_fdw or FDW in general. Users often complained that a simple count(*) on a foreign table was slow since PostgreSQL required to pull all the rows from the foreign server just to count them to produce count(*). “What a waste of resources”, they complained, “Why just it doesn’t get the count from the foreign server?”. The complaint was completely valid. It was easier said than done. But no more complaining! PostgreSQL 10 (the next version of the world’s largest open source database) will do exactly that. Towards the end of version 9.6, Tom Lane provided the basic support in core for FDWs to delegate aggregation, grouping operations to the foreign server. Jeevan Chalke modified postgres_fdw to use this support and also patiently catered to my complaints about the code. Robert Haas ultimately committed it. It’s there for you to try out!

Aggregate push-down

For the simplicity, in this post, we will assume that any table qualified by "remote" schema is a foreign table. Any unqualified table or table qualified by other schema is a regular table.

In PostgreSQL 9.6 if you had a query to count number of customer in each country, it would produce a plan like:

explain (verbose, costs off) select count(*) from remote.customer group by country_code;
QUERY PLAN
--------------------------------------------------------------
HashAggregate
Output: count(*), country_code
Group Key: customer.country_code
-> Foreign Scan on remote.customer
Output: name, country_code
Remote SQL: SELECT country_code FROM public.customer

Remote SQL is the SQL query sent to the foreign server to fetch required columns from the foreign server. We can see that it will fetch all the rows from the foreign server and then count them locally. That’s what users complained about. But with aggregate push-down support in postgres_fdw, now the query plan looks like:

explain (verbose, costs off) select count(*) from remote.customer group by country_code;
QUERY PLAN
----------------------------------------------------------------------------------------
Foreign Scan
Output: (count(*)), country_code
Relations: Aggregate on (remote.customer)
Remote SQL: SELECT count(*), country_code FROM public.customer GROUP BY country_code

Remote SQL in this plan shows us that count(*) gets computed on the foreign server and instead of all the rows, only a single number is fetched from the foreign server. Isn’t that wonderful?

Performance impact

What does that get us? In 9.6 above query run on 1M customer records across 100 countries, would take tons of time.

explain (analyze, costs off) select count(*) from remote.customer group by country_code;
QUERY PLAN
-----------------------------------------------------------------------------------
HashAggregate (actual time=3411.904..3411.929 rows=100 loops=1)
Group Key: country_code
-> Foreign Scan on customer (actual time=0.749..3067.483 rows=1000000 loops=1)
Planning time: 0.090 ms
Execution time: 3412.523 ms
(5 rows)

In version 10, it would take much lesser time

explain (analyze, costs off) select count(*) from remote.customer group by country_code;
QUERY PLAN
--------------------------------------------------------------
Foreign Scan (actual time=353.786..353.896 rows=100 loops=1)
Relations: Aggregate on (remote.customer)
Planning time: 0.104 ms
Execution time: 354.197 ms
(4 rows)

In 9.6 the foreign server spent time and bandwidth in pushing 1M rows on the network and then the local server spent time and bandwidth to pull those many rows off of network. But in version 10, the foreign server pushed only 100 rows saving time and bandwidth of both the servers, making users happy, since the aggregate is now 10 times faster than what it used to be!

Join push-down and aggregate push-down

Good news is, this works with join push-down feature added in 9.6. So, one could calculate the country-wise consumption of a particular product like:

explain (verbose, costs off)
select count(*) from remote.product p left join remote.orders o on (p.prod_id = o.prod_id) left join remote.customer c on (c.name = o.customer) group by p.prod_id, c.country_code;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Foreign Scan
Output: (count(*)), p.prod_id, c.country_code
Relations: Aggregate on (((remote.product p) LEFT JOIN (remote.orders o)) LEFT JOIN (remote.customer c))
Remote SQL: SELECT count(*), r1.prod_id, r4.country_code FROM ((public.product r1 LEFT JOIN public.orders r2 ON (((r1.prod_id = r2.prod_id)))) LEFT JOIN public.customer r4 ON (((r4.name = r2.customer)))) GROUP BY r1.prod_id, r4.country_code

It would compute the whole query on the foreign server and give us just the result.

I have taken example of count(*), since that’s what most users complained about. But all kinds of aggregates are pushed down as long as postgres_fdw feels confident that it will get the same results whether aggregation happens locally or on the foreign server. That’s kind of a safety precaution every push-down takes.

There’s still something missing

If you want the counts sorted by prod_id and country_code, the sorting still needs to happen after we
have fetched the aggregates. It’s quite possible that the remote server orders the results by same columns as group by clauses, while performing aggregation. It won’t require any extra CPU, if we asked the foreign server to give the result sorted by those columns. But right now, it doesn’t. Hopefully, someone would get time to add few lines of code to do that.

Wednesday, November 2, 2016

PostgreSQL: Aggregate push-down in postgres_fdw

Aggregate push-down

Performance impact

Join push-down and aggregate push-down

There’s still something missing