src/x86/bringup-1.s
bits 32

Machine bringup #1

Now we need to write the code to get the machine to call main(argc, argv), and that means explaining a little about the way an x86 machine boots up.

Note

This explanation does not include EFI, the new BIOS replacement. But note that EFI has a BIOS emulation mode, so this is still valid.

On poweron, the first thing the processor does is set the program counter (EIP) to the address 0x7C00 and start executing. It will find BIOS code there, which will peform a power on self test (POST) and initialise a load of the peripherals.

The BIOS will eventually try and transfer control to user code - it does this by loading the first sector of whatever disk you specify. This is usually a hard disk, but could be a floppy drive or CD-ROM.

I’ll only talk about the hard and floppy disk cases here, as they’re both handled the same and are the most common. If you want to work out how bootloaders work with CD-ROMs, google for “El Torito”.

Anyway, the BIOS loads the first sector of the boot media - this is 512 bytes. Note also that this must contain two “signature” bits at the end, so the amount of available space for instructions is actually 510 bytes.

For obvious reasons this is called a “stage 1” bootloader. It needs to, in 510 bytes of instruction space, work out how to load its companion “stage 2” bootloader, which is more featureful and can provide an interface etc - this is what you perceive as GRUB’s interface when you see it on linux bootup.

The stage 1 and stage 2 bootloaders can make use of an API the BIOS provides to load and store sectors to media, write to the screen and other stuff. However if it wants to use this, it has some restrictions too - it must be in the legacy 16-bit mode of the x86, which can only address a maximum of 1MB of memory (2^16 << 4 if you’re interested).

Normally a bootloader will transition to 32-bit mode at this point and load a kernel (this is a simplified view but it’ll do for the moment).

There is a lot of legacy cruft that must be dealt with when writing a bootloader, so for this tutorial we are going to assume one already exists and use that.

Multiboot

There are a multitude of bootloaders, but for x86 there is a de facto standard way of interfacing between kernel and bootloader, called the [Multiboot specification](http://www.gnu.org/software/grub/manual/multiboot/multiboot.html).

Most of the actions in this specification are for the bootloader to perform, but we must do at least one thing, and that is expose a multiboot header somewhere within the first 8KB of our kernel image.

The multiboot header consists of a magic number and a set of flags, along with a checksum.

In return, the bootloader will:
  • Load an ELF image and start executing at the image’s entry point.
  • Leave the machine in a predictable state (with interrupts disabled).
  • Leave a pointer to a structure containing information about the environment (multiboot info struct) in the EBX register.

We’ll talk a bit more about the multiboot information struct later, but for now I should mention that there is only one flag in the multiboot header that we care about, and that has the value 1<<1 (i.e. 2, the 1st bit set). This flag will instruct the bootloader to give the kernel a memory map of where all the RAM is in the system as part of the multiboot info struct. More on that later.

Enough talk, let’s begin...

Firstly we need to inform the assembler (NASM in this case) that we are assembling for 32-bit mode.

src/x86/bringup-1.s
        ;; Flag to request memory map information from the bootloader.
MBOOT_MEM_INFO      equ 1<<1
        ;; Multiboot magic value
MBOOT_HEADER_MAGIC  equ 0x1BADB002
MBOOT_FLAGS         equ MBOOT_MEM_INFO
MBOOT_CHECKSUM      equ -(MBOOT_HEADER_MAGIC+MBOOT_FLAGS)

Then, we can define the multiboot header. The following are the NASM equivalent of C #defines.

The checksum field is to ensure the magic and flags got read correctly, and is defined as the number required to add to the magic number and flags in order to make the result zero. Another important role checksum field serves is to guarantee that this is actually a multiboot header that the bootloader has found and not some random bytes just looking the same way.

src/x86/bringup-1.s
section .init
mboot:  dd      MBOOT_HEADER_MAGIC
        dd      MBOOT_FLAGS
        dd      MBOOT_CHECKSUM

Now let’s define the header.

src/x86/bringup-1.s
        ;; Kernel entry point from bootloader.
        ;; At this point EBX is a pointer to the multiboot struct.
global _start:function _start.end-_start
_start: mov     eax, pd         ; MAGIC START!
        mov     dword [eax], pt + 3 ; addrs 0x0..0x400000 = pt | WRITE | PRESENT
        mov     dword [eax+0xC00], pt + 3 ; addrs 0xC0000000..0xC0400000 = same

        ;; Loop through all 1024 pages in page table 'pt', setting them to be
        ;; identity mapped with full permissions.
        mov     edx, pt
        mov     ecx, 0          ; Loop induction variable: start at 0

.loop:  mov     eax, ecx        ; tmp = (%ecx << 12) | WRITE | PRESENT
        shl     eax, 12
        or      eax, 3
        mov     [edx+ecx*4], eax ; pt[ecx * sizeof(entry)] = tmp

        inc     ecx
        cmp     ecx, 1024       ; End at %ecx == 1024
        jnz     .loop

        mov     eax, pd+3       ; Load page directory | WRITE | PRESENT
        mov     cr3, eax        ; Store into cr3.
        mov     eax, cr0
        or      eax, 0x80000000 ; Set PG bit in cr0 to enable paging.
        mov     cr0, eax

        jmp     higherhalf
.end:

section .init.bss nobits
pd:     resb    0x1000
pt:     resb    0x1000          ; MAGIC END!

} That’s it, that’s all that is required to create a multiboot compliant image.

Higher half kernels

You might notice that we put the header in a section called .init. What is .init? And why do we need it?

We are kind of jumping the gun here - I’d have preferred to keep this explanation until the chapter on virtual memory and paging. But there’s no way to move it later, so here goes.

Under normal operation, the addresses given to the CPU such as pointers or call addresses are virtual - they may be transformed before the hardware goes and accesses memory.

The address you as a programmer give to the CPU is a virtual address. The address the CPU gives in turn to the memory controller is a physical address. The CPU maintains a set of mappings (via several mechanisms) between the two.

This is used for multiple reasons - one reason is that the RAM in a system may not be contiguous from zero up to however-much-you-have - it may appear in clumps throughout the memory space and there may be holes in between.

The other major reason for this is protection - under normal operating system conditions every user process has its own virtual address space. Because of this, it cannot address and therefore cannot maliciously interact with memory belonging to another process. This is a hardware enforced isolation mechanism.

Why am I telling you this now and not later? Well, the de facto standard is to have the first N GB of address space available for user process use, and the higher 4-N GB reserved for the kernel. The usual value for N is either 2 or 3, by the way. It is 3GB in Linux by default, for a 32-bit x86 system. Obviously for a 64-bit system it will be much higher.

A kernel that resides in the higher regions of virtual memory is called a higher half kernel.

Because we really want to stick to de facto standards (it makes our life easier in the long run), it’s a good idea to move to the higher half early. There are multiple ways to do this, but what we’re going to do is enable paging, which is one of the mechanisms the x86 has to perform virtual -> physical mappings.

I’d really like to explain it all to you now, but I feel it really should wait until the chapter on paging for a full explanation. So for the moment, please think of it as magic and it’ll make sense later :)

Anyway, most of our code will be linked to run in the higher half, but we need some code to run in the lower half as the bootloader will leave us with paging disabled. That is what the .init section is for, and will be defined later in the linker script.

With that out of the way, lets go ahead and define our first few bytes of code, which will essentially be magic for now.

All you need to know about that code is that it set up some mappings such that addresses 0xC0000000 .. 0xC0400000 virtual get mapped to 0x00000000 .. 0x00400000 physical. So we can basically just add 0xC0000000 (which is 3GB, by the way) to any physical address and get the virtual address.

It then jumped to a function called “higherhalf”, which we are about to define.

src/x86/bringup-1.s

Now, when I explained about multiboot, you may not have noticed that setting up a valid stack (value in the ESP register) was not part of the contract between bootloader and kernel.

Therefore we need to set one up now before we can perform any CALL instructions. Remember also that the multiboot info struct was passed in the EBX register by the bootloader (and we deliberately didn’t clobber it in the above code.

src/x86/bringup-1.s

src/x86/bringup-1.s
extern bringup

        ;; Note that we're now defining functions in the normal .text section,
        ;; which means we're linked in the higher half (based at 3GB).
section .text
global higherhalf:function higherhalf.end-higherhalf
higherhalf:
        mov     esp, stack      ; Ensure we have a valid stack.
        xor     ebp, ebp        ; Zero the frame pointer for backtraces.
        push    ebx             ; Pass multiboot struct as a parameter
        call    bringup         ; Call kernel bringup function.
        cli                     ; Kernel has finished, so disable interrupts ...
        hlt                     ; ... And halt the processor.
.end:

section .bss
align 8192
global stack_base
stack_base:
        resb    0x2000
stack:

Here we created a stack, zeroed the EBP register (remember that x xor x is always zero), pushed the multiboot info struct onto the stack as the first parameter to the function bringup, which we’re about to define and which will be the first time we can run code written in C.

Once that function returns, we perform a cli/hlt in order to stop the processor entirely. This should ideally never happen, but it’s better than running off into undefined memory.

Now we get to C code, and can leave pure assembly behind. We have a stack set up, and we’re about to call a C function called bringup.

src/x86/bringup-2.c

src/include/x86/multiboot.h
#define MBOOT_MEM      (1<<0)
#define MBOOT_BOOT_DEV (1<<1)
#define MBOOT_CMDLINE  (1<<2)
#define MBOOT_MODULES  (1<<3)
#define MBOOT_ELF_SYMS (1<<5)
#define MBOOT_MMAP     (1<<6)

typedef struct multiboot {
  uint32_t flags;

  uint32_t mem_lower;
  uint32_t mem_upper;

  uint32_t boot_device;

  uint32_t cmdline;

  uint32_t mods_count;
  uint32_t mods_addr;

  uint32_t num, size, addr, shndx;

  uint32_t mmap_length, mmap_addr;

} __attribute__((packed)) multiboot_t;

#define MBOOT_IS_MMAP_TYPE_RAM(x) (x == 1)

typedef struct multiboot_mmap_entry {
  uint32_t size;
  uint64_t base_addr;
  uint64_t length;
  uint32_t type;
} __attribute__((packed)) multiboot_mmap_entry_t;

typedef struct multiboot_module_entry {
  uint32_t mod_start;
  uint32_t mod_end;
  uint32_t string;
  uint32_t reserved;
} __attribute__((packed)) multiboot_module_entry_t;

#endif /* X86_MULTIBOOT_H */

Now we come on to getting information out of the bootloader, and that brings us on to defining the multiboot info struct.

You’ll recall that we passed a set of flags to the bootloader to tell it to do stuff for us (just pass a memory map, in our case) - well there is a similar set of flags that the bootloader will pass to us to tell us exactly what it did. Here’s the flag definitions and the info struct definition; the fields are as such:

flags
Consists of a logical-OR of the MBOOT_* constants, describing which parts of the structure are actually valid.
mem_lower, mem_upper
The address of the first usable memory address in lower (< 1MB) and upper (>= 1MB) memory. If flags & MBOOT_MMAP is nonzero, there is better information in the mmap_* fields.
cmdline
Valid if flags & MBOOT_CMDLINE is nonzero. This contains the address of a NUL-terminated string containing the command line given to the bootloader. This can contain arguments for the kernel (and we will use as our argc/argv to pass to main).
mods_count, mods_addr
Valid if flags & MBOOT_MODULES is nonzero - this points to a list of kernel modules loaded via the “module” command in GRUB.
num, size, addr, shndx
Valid if flags & MBOOT_ELF_SYMS is nonzero, these describe the location and size of the ELF symbol table that the bootloader has loaded. addr refers to the address of an array of num ELF section headers, each of which is size large, and of which the shndx‘th is the section header that describes the .shstrtab section, which is required to identify other sections. We’ll come on to this a bit more in a later chapter, on debugging.
mmap_length, mmap_addr
Valid if flags & MBOOT_MMAP is nonzero, mmap_addr points to an array of structures that describe how the physical memory space is layed out - in particular this can tell you where exactly RAM (which has type ‘1’) is located, because it’s often not contiguous from zero, as you might have expected!
src/x86/bringup-2.c
#include "string.h"
#include "types.h"
#include "x86/multiboot.h"

/* Give the early allocator 2KB to play with. */
#define EARLYALLOC_SZ 2048

extern int main(int argc, char **argv);

/* The global multiboot struct, which will have all its pointers pointing to
   memory that has been earlyalloc()d. */
multiboot_t mboot;

static uintptr_t earlyalloc(unsigned len) {
  static uint8_t buf[EARLYALLOC_SZ];
  static unsigned idx = 0;

  if (idx + len >= EARLYALLOC_SZ)
    /* Return NULL on failure. It's too early in the boot process to give out a
       diagnostic.*/
    return NULL;

  uint8_t *ptr = &buf[idx];
  idx += len;

  return (uintptr_t)ptr;
}

/* Helper function to split a string on space characters ' ', resulting
   in 'n' different strings. This is used to convert the kernel command line
   into a form suitable for passing to main(). */
static int tokenize(char tok, char *in, char **out, int maxout) {
  int n = 0;

  while(*in && n < maxout) {
    out[n++] = in;

    /* Spool until the next instance of 'tok', or end of string. */
    while (*in && *in != tok)
      ++in;
    /* If we exited because we saw a token, make it a NUL character
       and step over it.*/
    if (*in == tok)
      *in++ = '\0';
  }

  return n;
}

/* Entry point from assembly. */
void bringup(multiboot_t *_mboot) {
  /* Call all global constructors. */
  extern size_t __ctors_begin;
  extern size_t __ctors_end;
  for (size_t *i = &__ctors_begin; i < &__ctors_end; ++i) {
    ((void (*)(void)) *i)();
  }

  /* Copy the multiboot struct itself. */
  memcpy((uint8_t*)&mboot, (uint8_t*)_mboot, sizeof(multiboot_t));

  /* If the cmdline member is valid, copy it over. */
  if (mboot.flags & MBOOT_CMDLINE) {
    /* We are now operating from the higher half, so adjust the pointer to take
       this into account! */
    _mboot->cmdline += 0xC0000000;
    int len = strlen((char*)_mboot->cmdline) + 1;
    mboot.cmdline = earlyalloc(len);
    if (mboot.cmdline)
      memcpy((uint8_t*)mboot.cmdline, (uint8_t*)_mboot->cmdline, len);
  }

  if (mboot.flags & MBOOT_MODULES) {
    _mboot->mods_addr += 0xC0000000;
    int len = mboot.mods_count * sizeof(multiboot_module_entry_t);
    mboot.mods_addr = earlyalloc(len);
    if (mboot.mods_addr)
      memcpy((uint8_t*)mboot.mods_addr, (uint8_t*)_mboot->mods_addr, len);
  }

  if (mboot.flags & MBOOT_ELF_SYMS) {
    _mboot->addr += 0xC0000000;
    int len = mboot.num * mboot.size;
    mboot.addr = earlyalloc(len);
    if (mboot.addr)
      memcpy((uint8_t*)mboot.addr, (uint8_t*)_mboot->addr, len);
  }

  if (mboot.flags & MBOOT_MMAP) {
    _mboot->mmap_addr += 0xC0000000;
    mboot.mmap_addr = earlyalloc(mboot.mmap_length + 4);
    if (mboot.mmap_addr) {
      memcpy((uint8_t*)mboot.mmap_addr,
             (uint8_t*)_mboot->mmap_addr - 4, mboot.mmap_length+4);
      mboot.mmap_addr += 4;
      mboot.mmap_addr = _mboot->mmap_addr;
    }
  }

OK, one other small problem is that the multiboot spec doesn’t say where in memory the multiboot info struct needs to be placed by the bootloader, and in fact if we’re not careful we could end up overwriting it accidentally!

Because of this, we need to first copy the structure and all of the things it points to somewhere we know cannot be overwritten. We also need to adjust all the pointers in the structure to point to somewhere in the higher half rather than the lower half as they currently do - we do this simply by adding 3GB (0xC0000000) to every pointer value.

For this, we need a memory allocator, earlyalloc. This is a simple bump-pointer allocator, which merely increments a pointer by the size requested and can never free memory.

src/x86/bringup-2.c
  static char *argv[256];
  int argc = tokenize(' ', (char*)mboot.cmdline, argv, 256);

  (void)main(argc, argv);
}

And then finally all we need to do is take the kernel command line and split it for passing to main() - to do this we use a helper function tokenize(), defined slightly earlier, to split the string on every space character.

src/include/x86/multiboot.h
#ifndef X86_MULTIBOOT_H
#define X86_MULTIBOOT_H
src/x86/link.ld
OUTPUT_FORMAT("elf32-i386")
OUTPUT_ARCH("i386")
ENTRY(_start)

You might think that we’re done already, but we have yet to get our kernel to build!

The next thing we have to do is tell the linker how to link our kernel. Normal application programs under linux have a lot of magic done by the linker - we want to stop all that happening and also do some magic of our own. For this, we need a linker script (*.ld).

Note that the format of a linker script is not the nicest in the world, so I’m going to assume that you either don’t care about the syntax or can sort of pick it up as you go along. Firstly we have to inform the linker of our target, which is 32-bit x86.

src/x86/link.ld
SECTIONS
{
  .init 0x100000 :
  {
    PROVIDE(__start = .);
    *(.init)
  }
  .init.bss ALIGN(4096) :
  {
    *(.init.bss)
  }

Then we define how to map sections. A section is an ELF concept, and is a chunk of code or data. It can have a name, a physical location and a virtual location - that is, you can instruct the linker to create an ELF section that will be loaded at one address but linked as if it were at another address. We’ll need that functionality for the higher half part of our kernel.

Firstly we want to create the .init section, which will be linked and loaded at 1MB, and the special .init.bss section which contains some of the magic in the bringup process that has yet to be fully described.

src/x86/link.ld
  . += 0xC0000000;

Now that the lower half stuff has been defined, we add 3GB to the link address (.) so that everything else will be linked in the higher half.

src/x86/link.ld
  .text ALIGN(4096) : AT(ADDR(.text) - 0xC0000000)
  {
    *(.mboot)
    *(.text.unlikely .text.*_unlikely)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
  }

  .rodata ALIGN(4096) : AT(ADDR(.rodata) - 0xC0000000) {
    *(.rodata .rodata.* .gnu.linkonce.r.*)
  }
  .data ALIGN(4096) : AT(ADDR(.data) - 0xC0000000)
  {
    PROVIDE (__startup_begin = .);
    *(.startup)
    PROVIDE (__startup_end = .);
    PROVIDE (__shutdown_begin = .);
    *(.shutdown)
    PROVIDE (__shutdown_end = .);

    *(.data .data.* .gnu.linkonce.d.*)
    SORT(CONSTRUCTORS)

    PROVIDE (__ctors_begin = .);
    *(.ctors)
    PROVIDE (__ctors_end = .);

    /* Hack to silence warning when using GNU Gold as the linker. */
    *(.note.gnu.gold-version)
    *(.note.gnu.build-id)
  }

  .bss ALIGN(4096) : AT(ADDR(.bss) - 0xC0000000)
  {
   *(.dynbss)
   *(.bss .bss.* .gnu.linkonce.b.*)
   *(COMMON)
    PROVIDE(__end = . - 0xC0000000);
  }

  /* Get rid of all other sections. */
  /DISCARD/ : { *(.*) }

}

All other sections (.text, which contains the program code, .rodata which contains read-only data, and .data which contains mutable data) should be aligned on a page boundary, and loaded at the current link address (which will be >3GB) minus 3GB so that they are loaded to addresses in the lower half.

Now we are actually done! we need to compile the kernel with some special command line options however:

-nostdlibinc -fno-builtin -DX86 -m32 -ffreestanding
-nostdlibinc
This causes the compiler not to implicitly allow includes of files from the C standard library. Some tutorials recommend -nostdinc - -nostdlibinc is better as this removes the problems surrounding including C library headers but still allows headers required by the C standard and defined by the compiler - notably <stdarg.h> and <stdint.h>, which allow us later to use uintptr_t and size_t.
-fno-builtin
This stops the compiler from inserting calls to memcpy() and memset() when it feels it should - it can’t do this because we don’t have a version of memcpy yet! :(
-DX86
Sets the preprocessor define #define X86, which we use in several places to detect the target architecture.
-m32
Forces compilation for 32-bit mode, which is required on a 64-bit system which will default to 64-bit.
-ffreestanding
#undef s STDC_HOSTED and a variety of other #define s that standard headers use to detect if you are on a system with an OS underneath. We’re not ;)

We require some options for the linker too:

-Tsrc/x86/link.ld -lgcc -n
-Tsrc/x86/link.ld
This tells the linker to use the linker script “src/x86/link.ld” that we just defined instead of its default.
-lgcc
Forces the linking in of libgcc, which contains definitions for 64-bit arithmetic and other libcalls that the compiler relies on.
-n
Works around a “feature” of GNU Gold. If you’re really interested, see the comment in the Makefile.

This should now compile and link the kernel. All that’s left is to squirt the built ELF image into a floppy disc image with GRUB preinstalled. This is all handled by the kernel’s build system, so you just need to type:

make TARGET=x86 -j3
qemu -fda build/kernel.img