Tuesday, August 8, 2023

Partitioning as a query optimization strategy?

I had discussed about query optimization techniques applicable to queries involving partitioned tables in PGConf.India 2018 (Video recording, slides). (My previous blog discusses these techniques in detail.) The takeaway from that presentation was these query optimization techniques improved query performance if the tables were already partitioned. But partitioning wasn't good enough as a query optimization strategy by itself even when partitionwise join, partitionwise aggregate and parallel query were all used together on small data-sizes. Experiments then hinted that if the data was large enough partitioning would become a query optimization strategy. But we didn't know how large is large enough. Experiments to establish would require beefy machines with larger resources which were costly, took long time to procure or get access to. On top of them it took long time to setup and finish the runs. At one point we stopped experimenting. Fast forward to today and things have changed drastically, thanks to the cloud!

EDB's BigAnimal comes to help

EnterpriseDB offers PostgreSQL-as-a-service in the form of a DBAAS platform called BigAnimal. It allows its users to deploy and run PostgreSQL in cloud on hardware configuration of their choice. It also provides a starter free credit to try out this platform. I experimented with very large datasets by using BigAnimal. I ran the experiments on PostgreSQL 15 hosted on a m5.4xlarge instance (64 GB RAM, 16 vCPUs) with 1500 GB storage. All of this without wasting much time and also money; I destroyed the instance as soon as my experiments were over.

Experiment

I wanted to see the impact of only partitioning as a query optimization strategy. So instead of using whole TPCH setup, I crafted a micro-benchmark with two queries involving two tables li and ord modeled after lineitem and orders tables in TPCH benchmark. When partitioned each of these two tables have matching 1000 partitions each. The tables have following schema

$\d+ li
                                            Table "public.li"
 Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 key1   | integer |           | not null |         | plain    |             |              |
 key2   | integer |           | not null |         | plain    |             |              |
 d      | date    |           |          |         | plain    |             |              |
 m      | money   |           |          |         | plain    |             |              |
 t1     | text    |           |          |         | extended |             |              |
 t2     | text    |           |          |         | extended |             |              |
Indexes:
    "li_pkey" PRIMARY KEY, btree (key1, key2)
Access method: heap

$\d+ ord
                                           Table "public.ord"
 Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 key1   | integer |           | not null |         | plain    |             |              |
 d      | date    |           |          |         | plain    |             |              |
 m      | money   |           |          |         | plain    |             |              |
 t1     | text    |           |          |         | extended |             |              |
 t2     | text    |           |          |         | extended |             |              |
Indexes:
    "ord_pkey" PRIMARY KEY, btree (key1)
Access method: heap

When partitioned they are partitioned by range on key1. Each row in ord has 3 matching rows in li, roughly imitating the data-size ratio between corresponding tables in TPCH benchmark.

Query 1 which extracts relevant parts of TPCH Q3 or Q4 looks like

select count(*)
    from (select o.key1, sum(o.m) revenue, o.d
            from li l, ord o
            where l.key1 = o.key1 and
                o.d > current_date + 300 and
                l.d < current_date + 700
            group by o.key1, o.d
            order by revenue, o.d
    ) as t1

 Query 2 which is a pure join between li and ord looks like

select o.key1, l.key1, o.d
            from li l, ord o
            where l.key1 = o.key1 and
                o.d > current_date + 300 and
                l.d < current_date + 700

The time required to execute these two queries is measured using EXPLAIN ANALYZE. We varied the number of rows per partition as well as the number of partitions.

The execution times for queries are given in tables below.

Table 1: 10K rows per partition


Average execution time Q1 (ms) Average execution time Q2 (ms)
No. of partitions unpartitioned table partitioned table without PWJ partitioned table with PWJ unpartitioned table partitioned table without PWJ partitioned table with PWJ
5 83.05 93.29 53.68 48.83 60.55 50.85
10 195.87 221.33 90.24 104.40 129.06 105.20
50 1,183.25 1,487.00 432.07 584.31 723.90 527.97
100 2,360.19 3,001.81 888.46 1,342.69 1,595.53 1,053.91
500 11,968.68 15,220.69 4,350.62 6,903.91 8,082.09 5,381.46
1000 33,772.31 31,090.57 8,847.61 16,461.44 17,646.42 10,875.05

Table 2: 100K rows per partition


Average execution time Q1 (ms) Average execution time Q2 (ms)
No. of partitions unpartitioned table partitioned table without PWJ partitioned table with PWJ unpartitioned table partitioned table without PWJ partitioned table with PWJ
5 1,157.23 1,489.53 514.68 609.81 773.31 582.07
10 2,326.40 2,990.32 1,041.11 1,375.69 1,597.55 1,152.33
50 11,899.34 15,181.49 4,792.88 7,196.35 8,446.64 5,828.54
100 24,139.10 30,660.87 9,594.33 14,277.53 16,753.36 11,512.35
500 1,53,922.35 1,65,550.06 50,308.85 74,387.34 85,175.79 58,282.17
1000 3,13,534.59 3,38,718.63 1,31,482.31 2,03,569.14 1,32,688.60 1,23,643.18


Same numbers in the form of graphs are better to understand. Next we see graphs depicting the average execution time of each of these queries varying with the number of partitions. In each graph Y-axis shows the execution times in logarithmic scale, X-axis shows the number of partitions. Blue line shows the query execution times when tables are not partitioned. Red line shows query execution times when tables are partitioned but partitionwise join and aggregation are not used (turning both enable_partitionwise_join and enable_partitionwise_aggregate OFF). Yellow line shows query execution times when tables are partitioned and partitionwise join and partitionwise aggregate is used.

Note that the Y-axis denoting the execution time is drawn with logarithmic scale. Thus the linear difference on that axis shows improvement in integer multiples instead of fractions. For example, Q1's execution time improves almost by 4 times when tables are partitioned and partitionwise join and aggregate are enabled.

Graph 1


Graph 2

Graph 3

Graph 4

Key takeaways

The graphs above make it clear that when datasizes are very large partitioning can also be used as a query optimization technique along with its other advantages. I will share some key points here

  1.  When the total data size reaches the house of millions, partitioning can be considered as a query optimization strategy. The exact number of partitions and average rows per partition do not make much difference. We see similar performance whether 5M rows are divided into 500 partitions or 50 partitions.
  2. The exact thresholds depend upon properties of data and queries. E.g. size of each rows, columns used in query, operations performed by the query etc.
  3. Since these optimization techniques are very much dependent upon the partition key, choosing the right partition key is very important.
  4. When tables are partitioned, queries perform better when partitionwise operations are used irrespective of the datasize.

Each workload is different. Above charts provide some guidance. But experimenting with the size and number of partitions as well as the partition key is important to know whether partitioning will help you optimize queries in your application or not. Experimentation shouldn't be an issue anymore. EDB's BigAnimal platform allows its users to experiment quickly without requiring a large upfront investment.

No comments:

Post a Comment