Memory Allocators


A memory allocator is a runtime utility that manages dynamic memory for a process by allocating and freeing blocks from the heap. It tracks used and free regions and requests more memory from the operating system when needed, aiming to balance performance and memory efficiency. It tries to keep fragmentation as minimum and have fast allocation while using reasonable memory for overheads.

My thought behind the project

The main goal I picked up this project is to understand the memory allocation on a lower level and understand how the kernel manages memory which I think will be helpful when I do the other projects that I have on my to-do list.

Kickstart

I start by trying to clear up the my understanding and get into further detail on how memory allocators work and what types of those exist. I was able to get the basic idea from this article. It explains the basic working of the allocator very well.

Now, I looked into what other approaches are used in real-world software, and I came across different types of them. Here is a short overview.

Memory Allocators and few of its types

ArchitectureImplementation DifficultyAllocation SpeedCoalescing SpeedFragmentationBest For
Implicit ListVery EasySlow O(N)Slow O(N)HighLearning / Tiny Embedded
Explicit ListMediumMedium O(Free)Fast O(1)LowGeneral Purpose / Systems Learning
SegregatedHardFast O(1)Fast O(1)LowProduction (glibc)
BuddyMediumFast O(1)Very Fast O(1)High (Internal)OS Page Management
SlabEasy (if fixed size)InstantInstantZeroKernel Objects / Games
  1. Implicit Free List (The “Naive” Approach) This is similar to the article mentioned above. The blocks form a list just by being adjacent in memory. To find a free block, you start at the beginning of the heap and jump from header to header (using the size field) until you find one with allocated == 0. It is the absolute simplest to implement.

  2. Segregated Free Lists (The “Bucket” Approach) This is the “Pro” version of the Explicit List we are building. Instead of one long list of free blocks, you have an array of lists (buckets) for different sizes.

    • Bucket 0: Blocks of 16-31 bytes.
    • Bucket 1: Blocks of 32-63 bytes.
    • Bucket N: Blocks > 4KB. etc
  3. The Buddy System (The “Binary” Approach) Used by the Linux Kernel (Page Allocator). Memory is always split into powers of two. If you need 14KB, we round up to 16KB.

    • Splitting: If we have a 64KB block, we split it into two 32KB “buddies.” One is used, the other is free.
    • Coalescing: This is the magic. Because blocks are powers of 2, we can find the address of a block’s “buddy” using a simple XOR bitwise operation. We don’t need boundary tags!
    • Tradeoff: Very fast coalescing, but suffers from Internal Fragmentation. (If you ask for 33KB, you get 64KB, wasting 31KB).
  4. Slab / Pool Allocators Used by the Linux Kernel (kmalloc) and high-performance game engines.

    Concept: Fragmentation happens because we mix different sizes (16 bytes next to 10MB). Slab allocators solve this by having separate “pools” for specific object types.

    We grab a huge chunk (4KB) and cut it into strictly 32-byte slots. There is no complex splitting or coalescing; just a bitmap or list of slots.

  5. Thread-Caching Allocators Examples: tcmalloc (Google), jemalloc (Facebook/FreeBSD).

    In multi-threaded apps, a single lock on the heap destroys performance.

    To solve this, every thread here gets its own tiny “Mini-Heap” (Thread Local Cache). Small allocations come from the thread’s local cache. Only when the local cache is empty does the thread talk to the central global heap (which uses locks).

Buddy and Slab Allocators lecture at University of Toronto

Conclusion

I am going to be starting from the basic one and slowly buliding up to the more complex ones.

Absolutely Naive Allocator
Home/Projects/Custom Memory Allocators/Absolutely Naive Allocator Naive Memory Allocator I implemented the basic allocator implementation mentioned in the article mentioned previously. This is a basic functional allocator. There are a few problems, I can directly figure that we would face. freeing is O(n). so as we allocate more and more memory, we will take longer to free the blocks. incase all blocks except the last one are freed and then we free the last one, programbreak is only moved till the end of this last block only. all the already allocated headers still exist. two consecutive free headers may be sufficient together but not alone, but they are checked seperately so none will be allocated and program break needs to be extended. malloc() implementation absolutely-naive-allocator/alloc.c view raw free() implementation absolutely-naive-allocator/alloc.c view raw Conclusion Now proceeding to an implicit free list allocator
Implicit Freelist Allocator
Home/Projects/Custom Memory Allocators/Implicit Freelist Allocator Why “Implicit”? We don’t store pointers to the next free block. The list is implied by the size in the header. To find a free block, we must visit every block (allocated or free) sequentially until we find one that fits. The Architecture: Block Format: Header + Payload + Footer (Boundary Tags). Traversal: Linear Scan (O(N)). Freeing: Very fast (just flip a bit). Coalescing: We will implement immediate coalescing using boundary tags. Key Characteristic: The search path includes allocated blocks. This makes malloc slow (O(N)) when the heap is full of allocated chunks.
Explicit Freelist Allocator
Home/Projects/Custom Memory Allocators/Explicit Freelist Allocator Please read the Implicit Freelist Allocator first. Why Upgrade to Explicit? The Implicit List Bottleneck: In the implicite freelist allocator, findfit loops through every single block in the heap (allocated or free) to find a free one. Time Complexity: _O(N) where N is the total number of heap blocks. So if we have 10GB of data allocated in tiny chunks, malloc will have to scan millions of headers just to find one free slot. This is unacceptably slow.
Buddy Allocator
Home/Projects/Custom Memory Allocators/Buddy Allocator Buddy Allocator Previously, I built an Explicit Free List Allocator. It is fast and efficient for general user-space applications (malloc). But when we look at the Operating System Kernel, the requirements change. The OS doesn’t just need any memory, it often needs physically contiguous memory to interface with hardware. Our Linked List allocator suffered from External Fragmentation, making it unsuitable for the kernel.
Slab Allocator
Home/Projects/Custom Memory Allocators/Slab Allocator Slab Allocator The Slab Allocator is the top layer of the kernel’s memory management stack. While the Buddy Allocator manages raw 4KB pages, the Slab Allocator acts as a “retailer”, carving those pages into small, fixed-size objects (like inodes, task structs, or integers) to eliminate internal fragmentation. The Problem with Buddy: Internal Fragmentation The Buddy Allocator has a Granularity problem. Lets say I need 32 bytes to store an integer. But the smallest unit the Buddy allocator it can give is 4KB (4096 bytes). As a result, we waste 4064 bytes (99.2%) of memory. This is called Internal Fragmentation.