KoizOS - Malloc/Free and Memory

Introduction

First post of 2024! After getting the FAT16 filesystem set up, the next step is move to dynamic memory, specifically implementing malloc/free commands. The allocator I'm building for my operating system is loosely inspired by the Slab Allocator in the Linux Kernel.

Building up to a good enough solution

Whenever an application needs to reserve memory, the simplest solution would be to allocate a page of memory, that is, 4096 bytes. This works fine, it's what I currently do with the FAT16 filesystem, but it's not the most efficient

Assume that an application requests the following sizes: 512 B, 4096B, 2000 B, 400 B, and 2000 B. Here, I'm simply giving the application an entire page (4096B) for each request.

In this scenario, over 50% of the space is wasted since we're allocating an entire page of memory for each malloc request! To get around this, we can break up each page into fixed smaller units such as 2048B, 1024B, and 512B units. Whenever a user requests an arbitrary size of memory, I'll instead look for the smallest free fixed unit it fits in. If there are no free units, I'll allocate a new page and break it up to get more free fixed units.

The algorithm will look like this for the first 400B entry:

Find the smallest fixed slab value that the malloc request can fit into (512 B)
If there are no free entries available for the fixed slab value (512 B), allocate a 4096 B page and divide it up into 8 free entries of the fixed slab value (512 B). Mark all entries in this page as free.
Mark 1 free entry as used and return a pointer to said entry.

Continuing to run the algorithm for the rest of the previous entries gives us a much nicer looking layout.

Our space utilization jumped from around ~50% to around ~75%! That's a pretty good increase.

Now, let's see what the code for the algorithm looks like:

// 1. Find the smallest fixed slab value that the malloc request can fit into
int current_size = PHYS_BLOCK_SIZE;
int i;
for(i = 0; i < SLAB_COUNT; i++)
{
    if(size <= current_size) {
        c_size = current_size;
        c_slab = &(slabs[i]);
    }
    current_size /= 2;
}


// 2. If there are no free entries available for the fixed slab value, allocate a page and divide it up into the fixed slab value. Mark all entries in this page as free.

/* if we already have free entries, don't need to allocate more */
if(slab->free_entries == 0)
{
    /* Divide up the page into entries */
    int additional_entries = PHYS_BLOCK_SIZE / slab_size;
    int new_max_entries = slab->max_entries + additional_entries;
    if(new_max_entries > SLAB_ENTRY_COUNT) {
        additional_entries = SLAB_ENTRY_COUNT - slab->max_entries;
        new_max_entries = SLAB_ENTRY_COUNT;
    }
    
    /* Allocate new page */
    uint32_t* c_page = pmem_alloc();
    uint32_t c_addr = (uint32_t)c_page;
    slab->pages[slab->current_page] = c_page;
    ++(slab->current_page);
    
    /* Split up physical memory block and add records. Mark all as free */
    for(i = 0; i < additional_entries; i++)
    {
        c_addr += slab_size;
        slab->entries[slab->max_entries].mem_free = 1;
        slab->entries[slab->max_entries].mem_addr = (uint32_t*)c_addr;
        ++(slab->max_entries);
        ++(slab->free_entries);
    }
}


// 3. Mark a free entry as used and return a pointer to said entry.
uint32_t c_ptr = c_slab->current_entry;
do {
    /* Found an entry! Mark as used and return address */
    if(c_slab->entries[c_ptr].mem_free) {
        c_slab->entries[c_ptr].mem_free = 0;
        c_slab->current_entry = c_ptr;
        --(c_slab->free_entries);
        return c_slab->entries[c_ptr].mem_addr;
    }
    /* Increment */
    c_ptr = (c_ptr + 1) % c_slab->max_entries;
} 
/* We looped all the way around, we don't actually have a few entry */
while(c_ptr != c_slab->current_entry);

As always, the full code is available on GitHub. Here's a link to the malloc.c file where this is specifically implemented.

What about free()?

The standard free() function takes a memory address as an argument by default. Admittedly, my free() function is pretty inefficient. I just search through all the slabs and their corresponding entries to find the address and mark the unit as free. Since I know the fixed slab size beforehand, I know exactly how much memory I need to release. My free function looks like so

void free(void* ptr)
{
    /* TODO: Optimize this */
    
    /* Run thru each slab until we find the address and free it */
    int current_size = PHYS_BLOCK_SIZE;
    int i;
    for(i = 0; i < SLAB_COUNT; i++)
    {
        if(free_ptr_slab(&(slabs[i]), ptr))
            return;

        current_size /= 2;
    }

    panic("failed to free address");
}

/*
 * Returns 1 if something was freed and 0 if nothing was freed
 */
static int free_ptr_slab(kmemslab_t* slab, void* ptr)
{
    int i;
    for(i = (slab->max_entries - 1); i >= 0; i--)
    {
        /* 
         * Free entry and set current pointer to the freed block so
         * it can be used by malloc() right away 
         */
        if(slab->entries[i].mem_addr == ptr) {
            slab->entries[i].mem_free = 1;
            slab->current_entry = i;
            ++(slab->free_entries);
            return 1;
        }
    }
    return 0;
}

Implementing commands to see memory usage

I'll be implementing more shell commands in the next post, but here's two basic ones to show the total memory usage as well as memory usage by my slab allocator.

Conclusions

There's plenty of many limitations to my approach, the most obvious being that I cannot request more than 4096 bytes of memory at a time. Second, performance may suffer as the entry array becomes more fragmented and nears full (It scales O(n)). Lastly, the free() function is also wildly inefficient as it scans through all the slabs to find the address.

Although the memory allocation stuff took the majority of this post, the interactive shell also took a bit of work too. I decided to make a separate post for the console stuff as to not clutter this post.

The next thing after this is making the operating system more interactive by implementing additional shell commands!

Introduction

Building up to a good enough solution

What about free()?

Implementing commands to see memory usage

Conclusions

Tags