Friday, November 29, 2024

The PostgreSQL operator labyrinth

While working on SQL/PGQ patch I wanted to find an equality operator for given left and right argument types to construct a condition to match an edge with its adjacent vertexes. It would look as simple as calling C function oper() with operator as "=" and required left and right data types. But soon it turned out to be a walk in PostgreSQL's operator labyrinth, which held my equality operator at the center instead of Minotaur.

First and foremost in PostgreSQL '=' does not necessarily mean and equality operator. It's simply a name of an operator used for comparing operands for equality. One could get swallowed by Sphinx for that. So oper() is useless. Equality operators are instead identified by strategies EqualStrategyNumbers like HTEqualStrategyNumber, BTEqualStrategyNumber, RTEqualStrategyNumber and so on. But there's no C function which would provide you an equality operator given the strategy number and data types of left and right operands. Suddenly I found myself trapped in the index labyrinth since BT, HT and RT are related to hash, b-tree and R-tree indexes. Now, all I was doing was begging to get out of the labyrinth rather than finding answer to my seemingly simple question. But this Wit-Sharpening Potion helped me to find my path out of the labyrinth and also answered my question.

The path is surprising simple Index -> Operator Class -> Operator Family -> Operator. Like Daedalus's labyrinth, it's unicursal but has a four course design instead of seven course. Like the An index needs operators to compare values of a column or an indexed expression. All values being indexes are of the same datatype. An operator class holds all the required comparison operators for that datatype. However, a a value being searched or compared to in that index may not necessarily have the same datatype. For example an index may be on an column of type int4 but it could still be used to search a value of type int2. PostgreSQL requires different operators for different pairs of operand data types as the semantics to compare values from same datatype may be different from those from different data types. That's where an operator family comes into picture. It holds operator classes, one for each data type in the "family" of data types e.g. integers. Each operator class would still contains operators comparing values of the same datatype. "Loose" operators in an operator family are used to compare values from different datatypes.

If you know an operator family, equality strategy and the data types of left and right operands, you can find the operator using get_opfamily_member(). But there's no ready function to get operator family given the data types of operands. Instead you have to taken a convoluted route, otherwise I would't call that simple path a labyrinth. From the two datatypes we choose one, usually the datatype of the values in a set being searched. Like datatype of primary key, which holds the set of values in which we search for a foreign key value. Find the comparison operators for that datatype using get_sort_group_operators(). Using sorting operator returned by that function, search for the operator family using get_ordering_op_properties(). Pass that operator family (and strategy) to get_opfamily_member() along with the datatypes of operands to reach the operator you want. Interestingly, get_sort_group_operators() calls lookup_type_cache() which saves the preferred operator family tree in type cache. But it's not exposed outside.

Hope this blog serves as a Cretan coin depicting PostgreSQL operator labyrinth.

Update on 3rd December 2024 - There's another passage from datatypes to equality operator. Use GetDefaultOpclass() to get the default operator class for the chosen datatype. From there get the operator family of that operator class using get_opclass_family(). Use get_opfamily_member() to get the desired operator. With this method, you can try both hash and btree methods since an equality operator is available in both the methods. In the earlier method you could get the operator family only if there existed an ordering which is not available in hash method. It doesn't look unicursal anymore and thus not a labyrinth but a maze!


No comments:

Post a Comment