Saturday, June 23, 2018

Upgrade your partitioning from inheritance to declarative

Before PostgreSQL 10, Postgres users partitioned their data using inheritance based partitioning. The method used constraints to define the partitions and rules or triggers to route the data to appropriate partition. A user had to write and maintain code for all that. PostgreSQL 10 introduced declarative partitioning, which is much easier to setup and requires almost no maintenance. PostgreSQL 11
is adding a number of partitioning related enhancements that work with declarative partitioning. Users who have implemented inheritance based partitioning would want to move to declarative partitioning (after upgrading to v11, of course) to benefit from those features. Here's how they can do so.

Example setup

You may have created a parent table and several child tables, one per partition, and triggers, rules and constraints as required. Here's an example setup similar to the one described in PostgreSQL documentation.

\d+ measurement
                                Table "inh_part.measurement"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
-----------+---------+-----------+----------+---------+---------+--------------+-------------
 city_id   | integer |           | not null |         | plain   |              | 
 logdate   | date    |           | not null |         | plain   |              | 
 peaktemp  | integer |           |          |         | plain   |              | 
 unitsales | integer |           |          |         | plain   |              | 
Child tables: measurement_y2006m02,
              measurement_y2006m03,
              measurement_y2006m04,
              measurement_y2006m05,
              measurement_y2006m06

-- here's how a child looks like
\d+ measurement_y2006m03
                            Table "inh_part.measurement_y2006m03"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
-----------+---------+-----------+----------+---------+---------+--------------+-------------
 city_id   | integer |           | not null |         | plain   |              | 
 logdate   | date    |           | not null |         | plain   |              | 
 peaktemp  | integer |           |          |         | plain   |              | 
 unitsales | integer |           |          |         | plain   |              | 
Indexes:
    "measurement_y2006m03_logdate" btree (logdate)
Check constraints:
    "measurement_y2006m03_logdate_check" CHECK (logdate >= '2006-03-01'::date AND logdate < '2006-04-01'::date)
Inherits: measurement

Moving to declarative partitioning

One could simply create a partitioned table and required number of partitions, then create indexes and other objects on this partitioned table except the constraints, rules and triggers used for inheritance partitioning, and then copy the data from the inheritance parent to the partitioned table using SELECT INTO. A user may optimize data movement by copying data from child-table to corresponding partition again using SELECT INTO. But PostgreSQL offers something better, an ability to ATTACH an existing table as a partition to a partitioned table. This method is faster compared to other methods since there is no data movement involved. In the experiment I run with few MBs of partitioned data, it was 2X faster. As the data grows data movement takes longer time, even if you move data from child-tables to partitions. The time to ATTACH child-tables as partitions however, doesn't increase with the size of data. Here are the steps

Step 0

Take a backup of inherited parent table and all the child-tables. A database level backup would be awesome! This is optional step but very important so that you can restore the data in case something goes wrong while performing the next steps.

Step 1

Start a transaction, so that everything gets rolled back in case of an error.
BEGIN TRANSACTION;

Step 2

Create the partitioned table, with the same definition as the inheritance parent. Annotate the CREATE TABLE command with PARTITION BY clause. You will need to specify the columns or expression to use as a partition key in PARTITION BY clause. But those should be apparent from the constraints on the inheritance children. For example, in the above setup, the constraints are all based on the column 'logdate', which is the intended partition key. But the partition key may not be so evident, if there's a spaghetti of constraints surrounding each child table. If the constraints, rules or tiggers are well documented, it should not be difficult to spot the partition key. If not, a deeper examination of these objects would reveal the partition key.

CREATE TABLE measurement_part (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

Step 3

We need to add the child tables as partitions to the partitioned table using ALTER TABLE ... ATTACH. To do that, first we need to remove child-tables from inheritance hierarchy using NO INHERIT clause. For example,

ALTER TABLE measurement_y2006m02 NO INHERIT measurement; 

Step 4

Craft FOR VALUES clause from the constraints of a given child-table. This should be straight-forward, if the partition key has been correctly identified. Now, run ALTER TABLE ... ATTACH PARTITION command as below for each of the child-tables.

ALTER TABLE measurement_part ATTACH PARTITION measurement_y2006m02 FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

You may carry out steps 3 and 4 together for each child or perform step 3 for all children followed by step 4 for all children. Do not drop the constraints, on the child tables, which have been been converted into FOR VALUES clause before you are done with these steps. If you keep them while carrying out ATTACH step and also set client_min_messages to INFO, you will see messages like

INFO:  partition constraint for table "measurement_y2006m02" is implied by existing constraints

Usually when we attach a table as a partition, the table is scanned to check if it contains any rows which would not fit that partition (to be specific, would not fit that partition's bounds). This scan is avoided if the table has constraint/s that imply the partition bounds. By retaining the original constraints, we avoid the scan, saving significant I/O and CPU time.

Step 5

Starting PostgreSQL 11, users can create indexes on the partitioned table and the partitions automatically "inherit" those. The system is intelligent enough not to create index there's already one similar to the index on the partitioned table. In our example, all the child-tables already had the
required index. So, we just create an index on the partitioned table so that the optimizer knows about it.

CREATE INDEX measurement_logdate ON measurement(logdate);

Step 6

There may be views, constraints or other SQL objects on the parent inheritance table. PostgreSQL associates a table's OID with the objects created on it.  Since the partitioned table's OID is different from the inheritance parent, the old views or triggers still point to the inheritance parent even if the partitioned table is named same as the inheritance parent (albeit after renaming the inheritance parent itself). So, they won't work as they are and need to be recreated on the partitioned table.

It would actually help, if PostgreSQL had a command like ALTER TABLE ... PARTITION BY ... to convert a regular table into a partitioned table. But that's easier said than done. Hope we see somebody put significant effort in implementing that command.

Step 7

Drop the inheritance parent and all the objects created on the inheritance parent. DROP TABLE .. CASCADE might help here. These should be the same objects, except the partitioning constraints, recreated in step 6 on the partitioned table. This allows us to rename the partitioned table with the same name as the inheritance parent, so that the queries, procedures, functions point work on the partitioned table instead of inheritance parent.

DROP TABLE measurement CASCADE;
ALTER TABLE measurement_part RENAME TO measurement;

Step 8

Now drop the partitioning constraints present on the child-tables which are now partitions of the partitioned table and do not need those constraints. You may perform this step right after step 4, but delaying it might allow those constraints to be used in the later steps if necessary.

ALTER TABLE measurement_y2006m06 DROP CONSTRAINT  measurement_y2006m06_logdate_check;

Step 9

Run any sanity tests before we commit the transaction. For example, check the output of \d+ command on the partitioned table and individual partitions. Make sure that those tests don't throw any errors when everything is right, lest everything we did till now rolls back.

COMMIT TRANSACTION;

Your partitioned table is now ready.

No comments:

Post a Comment