KoizOS - Dropping to Userspace
Introduction
We're finally here, ready to make our way into user space! This stretch probably took me the longest when I was working on the project, simply because I was finishing up my last degree at the time, and I had to work through some issues that kept popping up. Note that I'm writing this years after I finished this project, and only coming back/finishing my drafts so this doesn't look incomplete on my blog.
With that being said, there are three things we need to take care of first before we can finally drop to user space:
- Programmable Interval Timer
- Implementing the TSS
- Adding System Calls
Programmable Interval Timer
The programmable interval timer is a bit finicky, so much of my implementation was based on the corresponding OSDev wiki page here. Of course, since I chose FASM, my assembly will be a bit different. This timer is extremely important as it allows preemption. Without it, the kernel has no way of regaining control back from user processes (with the exception of System calls of course, which I'll get to later). The majority of the code can be found in "kernel/drivers/irq/pit.asm", while a snippet is provided below:
format elf
use32
section '.text' executable
; public functions
public pit_initialize
public pit_interrupt_handler
; Standard Lib Functions
extrn printf
extrn panic
; Does things like install the timer
pit_initialize:
...
ret
; Handles interrupts
pit_interrupt_handler:
push eax
; Increment the interrupt count
add [pit_interrupt_count], 1
; Handle very-possible overflow situation.
; I simply fail-fast here
jo .pit_overflow_interrupt_count
; Send EOI to the PIC
mov al, 0x20
out 0x20, al
pop eax
ret
; Handle interrupt count overflow overflow
.pit_overflow_interrupt_count:
ccall panic, pit_panic_msg
hlt
section '.bss'
pit_success_msg db "PIT Successfully initialized! Divisor: %x",0,0xA
pit_panic_msg db "Interrupt Count Overflow!",0,0xA
pit_divisor dw 0xFFFF
pit_hz dd 0x0
pit_interrupt_count dd 0x0
section '.data' writeable
The two main functions here are the initialize function, and the interrupt handler itself. The interrupt handler is pretty rudimentary: it simply sends an EOI (end of interrupt) signal to the PIC.
Implementing the TSS
Next on the list is the Task State Segment, or TSS for short. As the name suggests, it stores CPU state information for a task such as CPU registers, stack pointers, and segment selectors. The full state list can be seen below:
struct tss_entry {
uint32_t prevTss;
uint32_t esp0;
uint32_t ss0;
uint32_t esp1;
uint32_t ss1;
uint32_t esp2;
uint32_t ss2;
uint32_t cr3;
uint32_t eip;
uint32_t eflags;
uint32_t eax;
uint32_t ecx;
uint32_t edx;
uint32_t ebx;
uint32_t esp;
uint32_t ebp;
uint32_t esi;
uint32_t edi;
uint32_t es;
uint32_t cs;
uint32_t ss;
uint32_t ds;
uint32_t fs;
uint32_t gs;
uint32_t ldt;
uint16_t trap;
uint16_t iomap;
} __attribute__((packed));
typedef struct tss_entry tss_entry_t;
As we typically use the TSS for software-based switching as opposed to hardware-based (as most modern OS's do), we really only care about two fields SS0 (the segment selector) and ESP0 (the stack selector). The TSS entry is installed in the same GDT we mentioned a few posts ago. When we transition from user mode back to kernel mode (via interrupt), we read the SS0 and ESP0 values from the current TSS and store the old user state (SS, ESP, etc). We then continue running the rest of the interrupt code in kernel mode. Once we're done handling the interrupt, we can safely restore the user state, drop back into user mode, and go on our way.
Adding System calls
Besides uses the PIC to preempt changes from user mode to kernel mode, an application may need access to privileged resources that are only accessible in kernel space (e.g. file, I/O, memory). The process usually consists of: (1) the user program preparing arguments, (2) triggering a special instruction to cause an interrupt, (3) switching to kernel mode, (4) handling the interrupt, (5) returning control back to user mode. For this milestone, I didn't actually create any system calls, so I'm mostly handling steps 2-5.
Step 2 is the instruction itself that switches to kernel mode (Step 3). Classic Linux opts for 0x80, while I decided to use 0x33:
section '.text' executable
public common_interrupt_handler
;; common_interrupt_handler() - This is called for every interrupt
;;
;; This function then delegates the interrupt to the appropiate handler
common_interrupt_handler:
...
; Interrupt 0x33 is a system call!
mov ecx, 0x33
cmp [edi], ecx
;mov ebx, 0 ; Don't print
je .call_systemcall_handlerStep 4 is handling the interrupt itself. In this case, I really only call an internal kernel update method and continue.
.call_systemcall_handler:
push ebx
;ccall printf, systemcall_msg
ccall kernel_update
pop ebx
jmp .resumeStep 5 is returning to user mode, which is handled by our resume function above. The final exit out of the interrupt is done by iret.
.resume:
cmp ebx, 1
jne .interrupt_cleanup
ccall printf, msg, [edi], [ecx]
; Clean up after interrupt
.interrupt_cleanup:
...
; We need to use iret to return from instead of ret
; since we're in an interrupt
iretEntering Usermode
We now have pretty much all we need to drop to user mode. The main section of code that executes this is below:
format ELF
use32
include '../../libc/ccall.inc'
section '.text'
public _enter_usermode
extrn set_kernel_stack
_enter_usermode:
; Set user data segments
mov ax, 0x23
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
; Build up a frame for IRET
push 0x23 ; SS, notice it uses same selector as above
push esp ; ESP
pushfd ; EFLAGS
pop eax ; Get EFLAGS back into EAX. The only way to read EFLAGS is to pushf then pop.
or eax, 0x200 ; Set the IF flag.
push eax ; Push the new EFLAGS value back onto the stack.
push 0x1b ; CS, user mode code selector is 0x18.
; With RPL 3 this is 0x1b
lea eax, [a] ; EIP first
push eax
; Save the kernel stack
mov eax, esp
ccall set_kernel_stack, esp
; Drop to user mode!
iretd
a:
; This runs in user mode!
add esp, 4
int 0x33
; infinite loop in user mode
.loop2:
int 0x33
jmp .loop2
ret
There's quite a bit happening here, so we'll step through each thing:
First is setting the user data segments. Here, the user data segment selector is 0x20 for Ring 3, with a Descriptor Privilege Level (DPL) of 3. The Requested Privilege Level (RPL) is also 3, or 0x03. Taking the OR of both gives us 0x23. Remember that Ring 0 is kernel level while Ring 3 is the user level in this case. Doing this is vital to avoid a general protection fault that would immediately occur by having SS point towards a kernel segment (DPL=0).
Second, we build out a frame in the layout IRET expects when it is called. These values will be popped when IRET is called. set_kernel_stack() sets the SS/ESP0 as part of the TSS component we talked about earlier. We can finally see it in use here.
Third, we actually drop into user mode with iretd! The Current Privilege Level (CPL) becomes 3, and we start executing at the (a) label above, which is executed as user mode.
Fourth, is the label (a) which is our actual user code that runs! Right now it just calls an interrupt and falls into an infinite loop.
Demo
So now comes the demo part! From the code in the last section, we can see all components we've created earlier being tested:
- IRET will drop us to usermode, where we can call int 0x33 once to make sure our system calls are handled correctly. It should update the kernel and return back to user mode.
- The user application runs an infinite loop. If our PIT wasn't working, it wouldn't be able to preempt the user process and thus the kernel will never be able to regain control from a faulty process.
- In both cases, the interrupt needs to use the TSS correctly for the kernel stack.
- As an added bonus, keyboard interrupts should be handled correctly too
Conclusion
We're almost to the end. All we have left is to build out actual user programs and add the ability to run more than one process.