Intrusive Collections or: How I Learned to Stop Worrying and Love the @fieldParentPtr

Saturday, May 4, 2024

Sometimes you don't want to allocate nodes separately from their contents in a linked list. Enter zig's @fieldParentPtr builtin.

A normal implementation of a linked list has an outer Node structure which holds the actual item in the collection. This is easy to implement and works for any item type, but requires allocating the nodes separately which can lead to the overhead of managing those allocations (a memory pool can be effective here).

An intrusive collection gets rid of this by just putting the next pointer or other collection infrastructure inside the items. The downside is obvious: the item type has to be written with that in mind. The upside is also obvious: the consumer of the list already has allocated space for the next pointer because it is part of the item object.

Making this work in a reusable, general way is annoying however, as the collection type becomes tied to the layout of the item type.

Enter @fieldParentPtr.

For those who don't know zig already, @fieldParentPtr is a builtin (that is to say, it is implemented in the compiler rather than in user/standard library code, signified by the leading @). It takes two parameters: a pointer to a field, and a compile-time constant string containing the name of the field. It returns a pointer to the parent structure which the field is a member of, regardless of what the layout or offset happens to be. (As with most other builtins in zig, @fieldParentPtr uses the contextually implied result type of the expression to determine the outer struct type. I do not like this in the case of @fieldParentPtr, but it is made this way for consistency with other builtins. One can use the @as(Type, value) builtin to manually set the contextual result type if the compiler cannot infer it.)

To make use of @fieldParentPtr for an easy intrusive linked list, we encapsulate the next pointer in its own struct which will then be a field of the item type:

pub const Node = extern struct {
    next: ?*Node = null,
};

With that, here is a basic singly linked list implemented with @fieldParentPtr (credit goes to github user N00byEdge who released this implementation with slightly different naming as part of Florence under the CC0 license):

pub fn SinglyLinkedList(comptime T: type, comptime field_name: []const u8) type {
    return struct {
        head: ?*Node = null,
        tail: ?*Node = null,

        fn node_from_ref(ref: *T) *Node {
            return &@field(ref, field_name);
        }
        fn ref_from_node(node: *Node) *T {
            return @fieldParentPtr(field_name, node);
        }

        pub fn append(self: *@This(), node: *T) void {
            const hook = node_from_ref(node);
            hook.next = null;

            if (self.tail) |tail| {
                tail.next = hook;
                self.tail = hook;
            } else {
                std.debug.assert(self.head == null);
                self.head = hook;
                self.tail = hook;
            }
        }

        pub fn prepend(self: *@This(), node: *T) void {
            const hook = node_from_ref(node);
            hook.next = self.head;
            self.head = hook;
        }

        pub fn peek(self: *const @This()) ?*T {
            return if (self.head) |head| ref_from_node(head) else null;
        }

        pub fn dequeue(self: *@This()) ?*T {
            if (self.head) |head| {
                if (head.next) |next| {
                    self.head = next;
                } else {
                    self.head = null;
                    self.tail = null;
                }
                return ref_from_node(head);
            } else {
                return null;
            }
        }
    };
}

The relevant part to this discussion is:

        tail: ?*Node = null,

fn node_from_ref(ref: *T) *Node {     return &@field(ref, field_name);
}
fn ref_from_node(node: *Node) *T {
    return @fieldParentPtr(field_name, node);
}

pub fn append(self: *@This(), node: *T) void {

(@field is another builtin, taking an object or pointer to object and the name of a field and returning the value of that field. It's basically the . member access operator but with compile-time string member name)

With these two functions, the list can access the node of an object and from a pointer to a node (as is used in both the next pointer and the head/tail fields) obtain the object itself. The rest of the implementation here is a bog-standard simple singly linked list and not worth going over in excessive detail.

Bonus! A priority queue.

As a bonus, here's a singly-linked priority queue made with the same idea! The tail of the sublist of a given priority is stored, as is the overall list head. Inserting an item iterates backwards over the tails to find the lowst priority tail that is at least as high as the passed element (assuming the list is nonempty), and then inserts the item after that node. This ensures that the item is immediately after the last item with priority no lower than that of the item being inserted, and proceeds in O(p) where p is the number of priorities - insertion is O(1) in the number of items in the queue. After adding the node, the tail of the new item's priority is set to the new node in order to maintain this structure:

head     ts[1]    ts[2]             ts[4]    ts[0], ts[3], ts[5]...
  v        v        v                 v              v
+---+    +---+    +---+    +---+    +---+
| 1 | -> | 1 | -> | 2 | -> | 4 | -> | 4 |           nul
+---+    +---+    +---+    +---+    +---+
pub fn PriorityQueue(
comptime T: type,
comptime node_field_name: []const u8,
comptime prio_field_name: []const u8,
comptime P: type,
) type {
    const Tails = std.EnumArray(P, ?*Node);
    const Indexer = Tails.Indexer;
    return struct {
        head: ?*Node,
        tails: Tails = Tails.initFill(null),

        fn node_from_ref(ref: *T) *Node {
            return &@field(ref, node_field_name);
        }
        fn ref_from_node(node: *Node) *T {
            return @fieldParentPtr(node_field_name, node);
        }
        fn node_prio(node: *const Node) P {
            return @field(ref_from_node(node).*, prio_field_name);
        }

        pub fn add(self: *@This(), item: *T) void {
            const prio: P = @field(item.*, prio_field_name);
            const hook = node_from_ref(item);
            if (self.head) |head| {
                // iterate backwards to find the lowest tail with priority 
                // no lower than the new element, whose next will be set to 
                // the new element. if such a tail doesnt exist then this 
                // element gets prepended as its a higher prio than everything
                // in the list so far
                var i = Indexer.indexOf(prio) + 1;
                while (std.math.sub(usize, i, 1)) |idx| {
                    i = idx;
                    // if the tail of the ith priority exists
                    if (self.tails.values[idx]) |tail| {
                        // stick this node on the end of that tail
                        hook.next = tail.next;
                        tail.next = hook;
                        break;
                    }
                } else |_| {
                    hook.next = head;
                    self.head = hook;
                }
            } else {
                hook.next = null;
                self.head = hook;
            }
            self.tails.getPtr(prio).* = hook;
        }

        pub fn dequeue(self: *@This()) ?*T {
            if (self.head) |head| {
                const head_prio = node_prio(head);
                if (head == self.tails.get(head_prio)) {
                    self.tails.set(head_prio, null);
                }
                self.head = head.next;
                head.next = null;
                return ref_from_node(head);
            } else {
                return null;
            }
        }
    };
}