Walking with the Elephants: Advanced partition matching for partition-wise join

Earlier I had written a blog about partition-wise join in PostgreSQL. In that blog I had talked about an advanced partition matching technique which will allow partition-wise join to be used in more cases. In this blog we will discuss this technique in detail. I will suggest to read my blog on basic partition-wise join again to get familiar with the technique.

Basic partition matching technique allows a join between two partitioned tables to be performed using partition-wise join technique if the two partitioned tables had exactly same partitions (more precisely exactly matching partition bounds). For example consider two partitioned tables prt1 and prt2

\d+ prt1
... [output clipped]
Partition key: RANGE (a)
Partitions: prt1_p1 FOR VALUES FROM (0) TO (5000),
prt1_p2 FOR VALUES FROM (5000) TO (15000),
prt1_p3 FOR VALUES FROM (15000) TO (30000)

and

\d+ prt2

... [ output clipped ]

Partition key: RANGE (b)

Partitions: prt2_p1 FOR VALUES FROM (0) TO (5000),

prt2_p2 FOR VALUES FROM (5000) TO (15000),

prt2_p3 FOR VALUES FROM (15000) TO (30000)

A join between prt1 and prt2 is executed as join between matching partitions i.e. prt1_p1 joins prt2_p1, prt1_p2 joins prt2_p2 and prt1_p3 joins prt2_p3. This has many advantages as discussed in my previous blog.

But basic partition matching can not join two partition tables with different partitions (more precisely different partition bounds). For example, in the above case if prt1 an extra partition prt1_p4 FOR VALUES FROM (30000) TO (50000), a join between prt1 and prt2 would not use partition-wise join. Many applications use partitions to segregate actively used and stale data, a technique I discussed in my another blog. The stale data is eventually removed by dropping partitions. New partitions are created to accommodate fresh data. Let's say the partition scheme of two such tables is such that they usually have matching partitions. But when an active partition gets added to one of these tables or a stale one gets deleted, they will have mismatched partitions for a small duration. We don't want a join hitting the database during this small duration to perform bad since it can not use partition-wise join. Advanced partition matching algorithm helps here.

Advanced partition matching algorithm

Advanced partition matching is very much similar to the merge join algorithm. It takes the sorted partition bounds and finds matching partitions by comparing the bounds from both the tables in their sorted order. Any two partitions, one from either partitioned table, whose bounds match exactly or overlap are considered to be joining partners since they may contain rows that join. Continuing with the above example:

\d+ prt1

... [output clipped]

Partition key: RANGE (a)

Partitions: prt1_p1 FOR VALUES FROM (0) TO (5000),

prt1_p2 FOR VALUES FROM (5000) TO (15000),

prt1_p3 FOR VALUES FROM (15000) TO (30000)

and

\d+ prt2

... [ output clipped ]

Partition key: RANGE (b)

Partitions: prt2_p1 FOR VALUES FROM (0) TO (5000),

prt2_p2 FOR VALUES FROM (5000) TO (15000),

prt2_p3 FOR VALUES FROM (15000) TO (30000),

prt1_p4 FOR VALUES FROM (30000) TO (50000)

Similar to the basic partition matching algorithm this will join prt1_p1 and prt2_p1, prt1_p2 and prt2_p2, and prt1_p3 and prt2_p3. But unlike basic partition matching it will also know that prt1_p4 does not have any join partner in prt1. Thus if the join between prt1 and prt2 is INNER join or when prt2 is INNER relation of join, the join will contain only three joins leaving prt2_p4 aside. In PostgreSQL, a join where prt2 is OUTER relation, we won't be able to use partition-wise join even if We will come back to this again when we will discuss outer joins further.

This is simple right, but consider another example of listed partitioned tables

\d+ plt1_adv

Partition key: LIST (c)

Partitions: plt1_adv_p1 FOR VALUES IN ('0001', '0003'),

plt1_adv_p2 FOR VALUES IN ('0004', '0006'),

plt1_adv_p3 FOR VALUES IN ('0008', '0009')

and

\d+ plt2_adv

Partition key: LIST (c)

Partitions: plt2_adv_p1 FOR VALUES IN ('0002', '0003'),

plt2_adv_p2 FOR VALUES IN ('0004', '0006'),

plt2_adv_p3 FOR VALUES IN ('0007', '0009')

Observe that there are exactly three partitions in both the relations but lists corresponding plt1_adv_p2 match exactly that of plt2_adv_p2 but other two partitions do not have exactly matching lists. Advanced partition matching algorithm helps to determine that plt1_adv_p1 and plt2_adv_p1 have overlapping lists and their lists do not overlap with any other partition from the other relation. Similarly for plt1_adv_p3 and plt2_adv_p3. Thus it allows join between plt1_adv and plt2_adv to be executed as partition wise join by joining their matching partitions. The algorithm can find matching partitions in even more complex partition bound sets.

The problem with outer joins

Outer joins pose a particular problem in PostgreSQL world. Consider again the example of join between prt2 LEFT JOIN prt1. prt2_p4 does not have a joining partner in prt1 and yet the rows in that partition will be part of the join since it is an outer relation, albeit with the columns from prt1 all "null"ed. Usually in PostgreSQL when the INNER side is empty, it's represented by a "dummy" relation which emits no rows but still knows the schema of that relation. Without partition-wise join a "concrete" relation which has some presence in the original query turns dummy and thus planner has "something" to join the outer relation with. So PostgreSQL's planner doesn't have to do anything extra when such outer joins occur. But when there is no matching inner partition for an outer partition e.g. prt2_p4, there is "no entity" which can represent the "dummy" inner side of that outer join. PostgreSQL does not have a way right now to induce such "dummy" relations during planning right now. But that's not required. Ideally such a join with empty inner only requires schema of the inner relation and not an entire relation itself. Once we build support to execute such a join with a solid outer relation and schema of inner relation, we will be able to tackle partition-wise join where there are no matching partitions on inner side. Hopefully we will solve that problem some time soon.

When there is no matching partition on the outer side of the join, the inner partition does not contribute to the result of join and can be just ignored. So partition-wise joins where there are no matching partitions on the inner side are not a problem at all.

Multiple matching partitions

When the partitioned tables are such that multiple partitions from one side match one partition or more partitions on the other side, partition-wise join simply bails out since there is no way to induce an "Append" relation during planning time which represents two or more partitions together. Hopefully we will remove that limitation as well sometime.

Curious case of hash partitioned tables

It doesn't make much sense to use it for a hash partitioned table since usually partitions of two hash partitioned table using same modulo always match. When the modulo is different, the data from one a given partition of one table can find its join partners in all the partitions of the other, thus rendering partition-wise join ineffective.

Even with all these limitation, what we have today is a very useful solution which serves most of the practical cases. Needless to say that this feature works seemlessly with FDW join push down to adding to sharding capabilities that PostgreSQL already has!

Walking with the Elephants

Wednesday, January 26, 2022

Advanced partition matching for partition-wise join

Advanced partition matching algorithm

The problem with outer joins

Multiple matching partitions

Curious case of hash partitioned tables

No comments:

Post a Comment