Walking with the Elephants: July 2025

A couple weeks back, I received my speaker's gift from POSETTE: An Event for Postgres.

The woolen elephant was claimed by my daughter as soon as the parcel ws opened (and gave me this photo in exchange). My son has been collecting the activity books and stickers for some time now. He claimed those two. Me and my wife are eyeing the pair of socks. The most precious, a handwritten personal thank you note, remains with me!

PostgreSQL conferences are an important part of any PostgreSQL community member's life, developers, users, DBAs, anybody who is involved with PostgreSQL in any role. There are plenty of them that run throughout the year. Some of them are meetups which run for a few hours, some are PGDays which usually run for a day and then some are full blown conferences which run for at least a couple of days and have auxilliary events surrounding the conference days. Each of them is unique because of geography, language or the content. While their schedules facilitate networking and people interaction more, they are centered around talks. The conferences usually do not pay their speakers but they convey their appreciation through speaker's gifts. I thought I would post about speaker's gifts I have received so far and are still with me, the ones in the photo.

Leftmost is a bottle, made of mostly bamboo and thus environment friendly, which I received from Hari Kiran who runs https://lnkd.in/giwyMh96 and PGDay Hyderabad. Next to that is a black bottle I received from PGConf India last year. PGConf.Asia, when it was hosted in Tokyo, presented a steel glass with each speaker's name embossed on it. That is the oldest surviving gift but the steel is still strong and shiny as it was back then. I actually liked the box with deep shiny blue satin lining that contained it. I still have the box too.

The frame is again from PGConf.Asia when it was hosted in China but virtually. It accompanied me in my office in the COVID lock-down. It's eyes are peculiar and make it quite lively. I don't know what cultural significance the mask has in China but some people visiting me in my office have saluted it thinking it to be some protector deity.

Next to it is a simple looking bottle of candies from PGConf.dev 🐘 this year. They are different from usual candies we get in India or those that I have tasted till now. first they have something written on them. Every candy reminds you of pgconf.dev 2025. They seem to be small pieces of long bars of made of stands of candies which are made of different flavours. So when you put a candy in your mouth those strands separate and leave different flavours on your tongue at a time. Very enjoyable experience. That's why it's only half empty - we are consuming them carefully.

While the conference organizers work hard to make each conference memorable, these gifts make them even more so.

When implementing an optimization for derived clause lookup myself, Amit Langote and David Rowley argued about the initial size of hash table (which would hold the clauses). See some discussions around this email on pgsql-hackers.

The hash_create() API in PostgreSQL takes initial size as an argument. It allocates memory for those many hash entries upfront. If more entries are added, it will expand that memory later. The point of argument was what should be the initial size of the hash table, introduced by that patch, containing the derived clauses. During the discussion, David hypothesised that the size of the hash table affects the efficiency of the hash table operations depending upon whether the hash table fits cache line. While I thought it's reasonable to assume so, the practical impact wouldn't be noticeable. I thought that beyond saving a few bytes choosing the right hash table size wasn't going to have any noticeable effects. If an derived clause lookup or insert became a bit slower, nobody would even notice it. It was practically easy to address David's concern by using the number of derived clauses at the time of creating the hash table to decide initial size of the hash table. The patch was committed.

Within a few months, I faced the same problem again when working on resizing shared buffers without server restart. The buffer manager maintains a buffer look table in the form of a hash table to map a page to buffer. When the number of configured buffers changes upon a server restart the size of buffer lookup table also changes. Doing that in a running server would be significant work. To avoid that, we could create a buffer lookup table large enough to accommodate future buffer size needs. Even if the buffer pool shrinks or expands, the size of the buffer lookup table would not change. As long as the expansion is within the buffer lookup table size limit, it could be done without a restart. Buffer lookup table isn't as large as the buffer pool itself, thus wasting a bit of memory can be considered worth the flexibility it provided. However, David's concern about the hash table size came back again. This time though, I decided to actually measure the impact.

Experiment

I am building an extension that can be used to measure performance of various PostgreSQL APIs or data structures. I used function sparse_hashes() in branch for this experiment. The function takes number of entries in the hash table and inital size of hash table as inputs and outputs time taken to insert, search and delete the given number of entries from the hash table. Since I was interested in measuring the effect of hash table size on buffer lookup table, I used buffer look up table's key and entry structures to create the hash table.

/* Create a hash table, BufferTag maps to Buffer */
info.keysize = sizeof(BufferTag);
info.entrysize = sizeof(BufferLookupEnt);
info.num_partitions = NUM_BUFFER_PARTITIONS;

hashtab = hash_create("experimental buffer hash table", max_entries, &info,
HASH_ELEM | HASH_BLOBS);

I invoked this function with varying hash table sizes but keeping the number of elements to 16384 (the default buffer pool size), using following query:

create table msmts_v as
select run,
hash_size,
(sparse_hashes(hash_size, 16 * 1024)).*
from generate_series(16 * 1024, 16 * 1024 * 1024, 8 * 1024) hash_size,
generate_series(1, 120) run;

Mind you, the query takes a lot of time to run. Following query is used to consolidate the results.

select hash_size/(16 * 1024) "hash size as multiple of number of elements",
round((log(hash_size/(16 * 1024))/log(64))::numeric, 1) as "logarithmic hash size steps",
avg(insert_time) insert,
avg(search_time) search,
avg(delete_time) delete
from msmts
group by hash_size
order by hash_size;

with that, plot of insert, search and delete timings, for 16384 elements, against the hash size looks like

The plot clearly shows that the hash table performance degrades as the hash table size increases. So David's hypotheses was right. However it doesn't degrade with a small change in hash table size which was what was being debated in the derived clauses thread. Even if the hash table size is hundreds of times higher than the actual number of elements in it, there's hardly noticeable effect on performance. Next interesting thing to notice is the degradation is step-wise. Looking at the purple, green and blue lines against the orange line, we can infer that the steps roughly correspond to some multiple of log of hash table to the base of 64, which is roughly the cache line size. Again David seems to be correct in that the degradation is because of cache line faults. Nonetheless our notion of hash table size being immaterial in O(1) is questionable!

Going back to the shared buffer lookup table size, configuring a very large buffer lookup table would certainly impact the performance but whether that will be noticeable in TPS is subject of another experiment and another post.

Walking with the Elephants

Monday, July 21, 2025

Perks of speaking at PostgreSQL conferences

Friday, July 11, 2025

Effect of hash table size on hash operations

Experiment