KoizOS - Dropping to Userspace

Introduction

We're finally here, ready to make our way into user space! This stretch probably took me the longest when I was working on the project, simply because I was finishing up my last degree at the time, and I had to work through some issues that kept popping up. Note that I'm writing this years after I finished this project, and only coming back/finishing my drafts so this doesn't look incomplete on my blog.

With that being said, there are three things we need to take care of first before we can finally drop to user space:

Programmable Interval Timer
Implementing the TSS
Adding System Calls

Programmable Interval Timer

The programmable interval timer is a bit finicky, so much of my implementation was based on the corresponding OSDev wiki page here. Of course, since I chose FASM, my assembly will be a bit different. This timer is extremely important as it allows preemption. Without it, the kernel has no way of regaining control back from user processes (with the exception of System calls of course, which I'll get to later). The majority of the code can be found in "kernel/drivers/irq/pit.asm", while a snippet is provided below:

format elf
use32

section '.text' executable

    ; public functions
    public pit_initialize
    public pit_interrupt_handler

    ; Standard Lib Functions
    extrn printf
    extrn panic

    ; Does things like install the timer 
    pit_initialize:
        ...
        ret

    ; Handles interrupts
    pit_interrupt_handler:
        push eax

        ; Increment the interrupt count
        add [pit_interrupt_count], 1
        
        ; Handle very-possible overflow situation.
        ; I simply fail-fast here
        jo .pit_overflow_interrupt_count

        ; Send EOI to the PIC
        mov al, 0x20
        out 0x20, al

        pop eax

        ret

    ; Handle interrupt count overflow overflow
    .pit_overflow_interrupt_count:
        ccall panic, pit_panic_msg
        hlt

section '.bss'
    pit_success_msg     db "PIT Successfully initialized! Divisor: %x",0,0xA
    pit_panic_msg       db "Interrupt Count Overflow!",0,0xA
    pit_divisor         dw 0xFFFF
    pit_hz              dd 0x0
    pit_interrupt_count dd 0x0
    

section '.data' writeable

The two main functions here are the initialize function, and the interrupt handler itself. The interrupt handler is pretty rudimentary: it simply sends an EOI (end of interrupt) signal to the PIC.

Implementing the TSS

Next on the list is the Task State Segment, or TSS for short. As the name suggests, it stores CPU state information for a task such as CPU registers, stack pointers, and segment selectors. The full state list can be seen below:


struct tss_entry {
	uint32_t prevTss;
	uint32_t esp0;
	uint32_t ss0;
	uint32_t esp1;
	uint32_t ss1;
	uint32_t esp2;
	uint32_t ss2;
	uint32_t cr3;
	uint32_t eip;
	uint32_t eflags;
	uint32_t eax;
	uint32_t ecx;
	uint32_t edx;
	uint32_t ebx;
	uint32_t esp;
	uint32_t ebp;
	uint32_t esi;
	uint32_t edi;
	uint32_t es;
	uint32_t cs;
	uint32_t ss;
	uint32_t ds;
	uint32_t fs;
	uint32_t gs;
	uint32_t ldt;
	uint16_t trap;
	uint16_t iomap;
} __attribute__((packed));

typedef struct tss_entry tss_entry_t;

As we typically use the TSS for software-based switching as opposed to hardware-based (as most modern OS's do), we really only care about two fields SS0 (the segment selector) and ESP0 (the stack selector). The TSS entry is installed in the same GDT we mentioned a few posts ago. When we transition from user mode back to kernel mode (via interrupt), we read the SS0 and ESP0 values from the current TSS and store the old user state (SS, ESP, etc). We then continue running the rest of the interrupt code in kernel mode. Once we're done handling the interrupt, we can safely restore the user state, drop back into user mode, and go on our way.

Adding System calls

Besides uses the PIC to preempt changes from user mode to kernel mode, an application may need access to privileged resources that are only accessible in kernel space (e.g. file, I/O, memory). The process usually consists of: (1) the user program preparing arguments, (2) triggering a special instruction to cause an interrupt, (3) switching to kernel mode, (4) handling the interrupt, (5) returning control back to user mode. For this milestone, I didn't actually create any system calls, so I'm mostly handling steps 2-5.

Step 2 is the instruction itself that switches to kernel mode (Step 3). Classic Linux opts for 0x80, while I decided to use 0x33:

section '.text' executable

    public common_interrupt_handler
    
    ;; common_interrupt_handler() - This is called for every interrupt
    ;; 
    ;; This function then delegates the interrupt to the appropiate handler
    common_interrupt_handler:

        ...

        ; Interrupt 0x33 is a system call!
        mov ecx, 0x33
        cmp [edi], ecx
        ;mov ebx, 0         ; Don't print
        je .call_systemcall_handler

Step 4 is handling the interrupt itself. In this case, I really only call an internal kernel update method and continue.

    .call_systemcall_handler:
        push ebx
        ;ccall printf, systemcall_msg
        ccall kernel_update
        pop ebx
        jmp .resume

Step 5 is returning to user mode, which is handled by our resume function above. The final exit out of the interrupt is done by iret.

    .resume:
        cmp ebx, 1
        jne .interrupt_cleanup
        ccall printf, msg, [edi], [ecx]

    ; Clean up after interrupt
    .interrupt_cleanup:

        ...

        ; We need to use iret to return from instead of ret
        ; since we're in an interrupt
        iret

Entering Usermode

We now have pretty much all we need to drop to user mode. The main section of code that executes this is below:

format ELF
use32

include '../../libc/ccall.inc'

section '.text'
    public _enter_usermode
    extrn set_kernel_stack

    _enter_usermode:

        ; Set user data segments
        mov ax, 0x23
        mov ds, ax
        mov es, ax
        mov fs, ax
        mov gs, ax

        ; Build up a frame for IRET
        push 0x23		; SS, notice it uses same selector as above
		push esp		; ESP
		pushfd			; EFLAGS
        pop eax         ; Get EFLAGS back into EAX. The only way to read EFLAGS is to pushf then pop.
        or eax, 0x200   ; Set the IF flag.
        push eax        ; Push the new EFLAGS value back onto the stack.
		push 0x1b		; CS, user mode code selector is 0x18.
                        ; With RPL 3 this is 0x1b
		lea eax, [a]	; EIP first
		push eax

        ; Save the kernel stack
        mov eax, esp
        ccall set_kernel_stack, esp

        ; Drop to user mode!
		iretd
	a:
        ; This runs in user mode!
		add esp, 4 
        int 0x33    
        
    ; infinite loop in user mode
    .loop2:
        int 0x33
        jmp .loop2

        ret

There's quite a bit happening here, so we'll step through each thing:

First is setting the user data segments. Here, the user data segment selector is 0x20 for Ring 3, with a Descriptor Privilege Level (DPL) of 3. The Requested Privilege Level (RPL) is also 3, or 0x03. Taking the OR of both gives us 0x23. Remember that Ring 0 is kernel level while Ring 3 is the user level in this case. Doing this is vital to avoid a general protection fault that would immediately occur by having SS point towards a kernel segment (DPL=0).

Second, we build out a frame in the layout IRET expects when it is called. These values will be popped when IRET is called. set_kernel_stack() sets the SS/ESP0 as part of the TSS component we talked about earlier. We can finally see it in use here.

Third, we actually drop into user mode with iretd! The Current Privilege Level (CPL) becomes 3, and we start executing at the (a) label above, which is executed as user mode.

Fourth, is the label (a) which is our actual user code that runs! Right now it just calls an interrupt and falls into an infinite loop.

Demo

So now comes the demo part! From the code in the last section, we can see all components we've created earlier being tested:

IRET will drop us to usermode, where we can call int 0x33 once to make sure our system calls are handled correctly. It should update the kernel and return back to user mode.
The user application runs an infinite loop. If our PIT wasn't working, it wouldn't be able to preempt the user process and thus the kernel will never be able to regain control from a faulty process.
In both cases, the interrupt needs to use the TSS correctly for the kernel stack.
As an added bonus, keyboard interrupts should be handled correctly too

Conclusion

We're almost to the end. All we have left is to build out actual user programs and add the ability to run more than one process.