akazuko

Notes for hashed and hierarchical timing wheels paper.

Feb 20, 2024

Tags: tech, timers

All the following notes have been prepared from the Paper: Hashed and Hierarchical Timing wheels: Data Structures for the efficient implementation of a timer facility.

Basic implementation would be to maintain a list of timers and scan it every time to fire eligible timers. But this is linear in cost so overhead grows with number of timers scheduled.

Dimensions to think of when implementing timer algorithms:

Cost: algorithms implemented by a processor that is interrupted each time a hardware clocks, carries a substantial overhead.
Precision: fine granularity of timers.
Scale: number of timers to service.

Different timer algorithms:

Linear

Every time unit, decrease the timer by 1 and fire the ones which have become 0.
Useful when number of timers being handled are low.
Also uses minimum metadata about a timer.
Specialised hardware to assist with performance?

Ordered timer list/queue

Unlike #1, here a ordered queue is used. Thus, start timer (which adds a timer to the queue) becomes a bit costlier since correct position is identified to insert the timer. Preferred to store the absolute expiry time to order the timers than the interval.
Per tick processing becomes cheaper than linear since iterating over the entire list can be avoided (except when all timers have expired). Worst case is still O(n).
With doubly linked list, some operations’ cost reduces.
Used in VMS and UNIX.
Extra space usage O(n) (previous and next pointers).
Priority queue to rescue? - Allows finding the closest timer to expire.

Tree based algorithms:

Priority queue implemented using tree structure - time complexity for insert and remove O(log(n)).

Timer wheels:

Time is broken down into cycles where each cycle is made up on N units of time.
If timer ptr is pointing to i bucket, then current time = current cycle * N + i
Timers exceeding current cycle + 1 * N are put in the overflow list. When timer reaches the last index, then timers from the overflow list eligible to be scheduled in the next cycle are distributed in each time bucket. Then the index is reset to 0 again.
Each bucket is a linked list of timers to be expired at that time instance.
Overflow list size can be large here so there is some cost associated with managing it.

What if we can guarantee that all timers are set for periods less than `MaxInterval`?

We can reduce the complexity for scheduling a timer, stopping a timer and per tick bookkeeping to O(1).

Since we know max timer interval is limited, we can use a circular buffer with dimensions [0, MaxInterval - 1].

In the previous algorithm, say N (number of units of time) = 1. Scheduling timer with interval j from current time i, we use index = i + j mod MaxInterval to add timer to the end of the list at this index.

Each tick, i = (i + 1) mod MaxInterval and check the array elements being pointed to in the corresponding bucket.

This allows scheduling a timer and each tick book keeping O(1). With timers list being double linked list, stopping a timer also becomes O(1).

Now, we have ensured all timers are inserted into a timers list and no overflow list is needed.

Time wheel turns here every tick and not like every MaxInterval units as mentioned in the initial timer wheel explanation.

As MaxInterval increases, memory requirements increase too. Paper suggest some extension to previous algorithm to mitigate this issue:

Extension: Hashing

Hashing to support arbitrary sized timer. Suggests using a table with size = power of 2.

How does this help? - When a timer value (say 32 bits) is divided by TableSize, then remainder i.e. lower order bits can be used to index in the table. Higher order bits are then added to the list pointed out by that index.

Example mentioned in the paper:

TableSize = 256, Timer = 32 bit, last 8 bits value (remainder) = 20, current time index = 10
timer index = 10 (current time index) + 20 (remainder) = 30
24 higher order bits are then inserted into a list that is pointed to by the 30th element.

Complexity analysis showed that choice of hashing algorithm from timer value to index here is not significant. Obtaining the remainder after dividing by a power of 2 is cheap (& operation) and therefore recommended.

Extension: Hierarchy

To represent timer values within 32-bit range, we don’t need 2^32 elements array. We can instead utilise the different granularity of time. Paper gives an example of using four arrays to support MaxInterval of upto 100 days:

A 100 element array in which each element represents a day.
A 24 element array in which each element represents an hour.
A 60 element array in which each element represents a minute.
A 60 element array in which each element represents a second. Therefore, it goes down from 100*24*60*60 = 8.84 million locations to 100+24+60+60 = 244 locations.

Apart from usual hashing to an index, there is one important thing to note. At every t units of tick which amounts to 1s, only 60s timer table timer ptr is updated. No other timer tables are touched. Instead a timer is used to initiate update of timer ptr of parent generation of timer table. This means a 60s timer is always present in seconds timer table, which upon expiry results in increment in timer ptr in minute timers table and so on.

Timer scheduling happens in reverse order. Timer, when scheduled, goes to highest generation of timers first. When the timer expires there, it is scheduled in the next generation and so on till it is at the last generation in the hierarchy as per the timer value. It finally expires there, invoking the user callback.

Complexity of operations:

Schedule timer - O(m) where m is the number of arrays in the hierarchy.
Stop/Cancel timer - O(1) assuming all lists are doubly linked.
Per tick book-keeping: Average latency of O(1) for large array sized.

Timing wheels