Walking with the Elephants: July 2025

When implementing an optimization for derived clause lookup myself, Amit Langote and David Rowley argued about the initial size of hash table (which would hold the clauses). See some discussions around this email on pgsql-hackers.

The hash_create() API in PostgreSQL takes initial size as an argument. It allocates memory for those many hash entries upfront. If more entries are added, it will expand that memory later. The point of argument was what should be the initial size of the hash table, introduced by that patch, containing the derived clauses. During the discussion, David hypothesised that the size of the hash table affects the efficiency of the hash table operations depending upon whether the hash table fits cache line. While I thought it's reasonable to assume so, the practical impact wouldn't be noticeable. I thought that beyond saving a few bytes choosing the right hash table size wasn't going to have any noticeable effects. If an derived clause lookup or insert became a bit slower, nobody would even notice it. It was practically easy to address David's concern by using the number of derived clauses at the time of creating the hash table to decide initial size of the hash table. The patch was committed.

Within a few months, I faced the same problem again when working on resizing shared buffers without server restart. The buffer manager maintains a buffer look table in the form of a hash table to map a page to buffer. When the number of configured buffers changes upon a server restart the size of buffer lookup table also changes. Doing that in a running server would be significant work. To avoid that, we could create a buffer lookup table large enough to accommodate future buffer size needs. Even if the buffer pool shrinks or expands, the size of the buffer lookup table would not change. As long as the expansion is within the buffer lookup table size limit, it could be done without a restart. Buffer lookup table isn't as large as the buffer pool itself, thus wasting a bit of memory can be considered worth the flexibility it provided. However, David's concern about the hash table size came back again. This time though, I decided to actually measure the impact.

Experiment

I am building an extension that can be used to measure performance of various PostgreSQL APIs or data structures. I used function sparse_hashes() in branch for this experiment. The function takes number of entries in the hash table and inital size of hash table as inputs and outputs time taken to insert, search and delete the given number of entries from the hash table. Since I was interested in measuring the effect of hash table size on buffer lookup table, I used buffer look up table's key and entry structures to create the hash table.

/* Create a hash table, BufferTag maps to Buffer */
info.keysize = sizeof(BufferTag);
info.entrysize = sizeof(BufferLookupEnt);
info.num_partitions = NUM_BUFFER_PARTITIONS;

hashtab = hash_create("experimental buffer hash table", max_entries, &info,
HASH_ELEM | HASH_BLOBS);

I invoked this function with varying hash table sizes but keeping the number of elements to 16384 (the default buffer pool size), using following query:

create table msmts_v as
select run,
hash_size,
(sparse_hashes(hash_size, 16 * 1024)).*
from generate_series(16 * 1024, 16 * 1024 * 1024, 8 * 1024) hash_size,
generate_series(1, 120) run;

Mind you, the query takes a lot of time to run. Following query is used to consolidate the results.

select hash_size/(16 * 1024) "hash size as multiple of number of elements",
round((log(hash_size/(16 * 1024))/log(64))::numeric, 1) as "logarithmic hash size steps",
avg(insert_time) insert,
avg(search_time) search,
avg(delete_time) delete
from msmts
group by hash_size
order by hash_size;

with that, plot of insert, search and delete timings, for 16384 elements, against the hash size looks like

The plot clearly shows that the hash table performance degrades as the hash table size increases. So David's hypotheses was right. However it doesn't degrade with a small change in hash table size which was what was being debated in the derived clauses thread. Even if the hash table size is hundreds of times higher than the actual number of elements in it, there's hardly noticeable effect on performance. Next interesting thing to notice is the degradation is step-wise. Looking at the purple, green and blue lines against the orange line, we can infer that the steps roughly correspond to some multiple of log of hash table to the base of 64, which is roughly the cache line size. Again David seems to be correct in that the degradation is because of cache line faults. Nonetheless our notion of hash table size being immaterial in O(1) is questionable!

Going back to the shared buffer lookup table size, configuring a very large buffer lookup table would certainly impact the performance but whether that will be noticeable in TPS is subject of another experiment and another post.

Walking with the Elephants

Friday, July 11, 2025

Efficiency of a sparse hash table

Experiment