<- Back to articles

Bootsector in assembly

This explains how to write a very minimal OS... Well, could I even say this, it's only a bootsector. This also could be a very brief a simple overview of what assembler looks like since the only ode you will find here is x86 assembly. I always found this interesting and I now wish to share this... A glossary is at the end of the article (Appendix 1), please refer to it for anything ou don't get and mail me if you don't understand something so I can improve the article.

Let's go!

Be aware we develop in real mode, we only can access to 650KB of memory and 16 bits instructions but this allow us access to all BIOS functionalities. Using this mode avoids some huge work (like memory management) and it's really enough for us.

What do we need?

We need some programming tools:
For our assembly code, we'll use NASM, it will take our assembly code and make it an executable binary. This software is available for many platforms, this include *nix and windows and is available for download here.

We'll also copy our bootsector right on the first sector of a disk (say a floppy or a virtual drive, I'll take a floppy for this example) and this will require us to use a tool for this, you can use "dd" if you're on *nix or partcopy if you're on windows.

The assembly language

Creating an operating system requires you to know assembly. Let's see some basics. First thing to be aware of, in assembly, most things are done in hexadecimal, so don't be surprised to see number written this way: 0x0100. This means "100 in hexadecimal" (base 16) which equals to 256 in decimal (base 10).

In assembly, we use interuptions, quite like those hardware interruptions you may have already heard of in an IRQ conflict :P. Interruptions give us access to functions (like in C) generated by the BIOS or the operating system. We'll use in this article 2 BIOS interruptions, namely 0x10 and 0x13.
0x10 will be used to write messages on this screen, say, "Welcome on our operating system".
0x13 will be used to read other sectors of the disk to read our future kernel (and unless I someday decide to write a next article on the topic this will never happen).

In assembly, we use "registers" which are in fact "containers" allowing us to manipulate values we need, quite like a variable in C. Just to name the only ones we'll use here, they are named AX, BX, CX and DX and are all on 16 bits.

AX for example can be divided in two parts: AH and AL, each one is on 8 bits. You can call AH or AL alone or AX to directly manipulate the 16 bits. If you put 0x1234 in AX, you'll have 0x12 in AH and 0x34 in AL.

Let's now see some instructions: "mov" allows you to copy a value. For example, mov ax, 0x1234 tells the assembler to "put 0x1234 to the AX registry". The instruction "int" allows to call an interruption, for example "int 0x13" calls the 0x13 interrupt which will check the register as if they were function arguments and will act subsequently.

The bootsector code

You can have a look at the entire code (Appendix 2) at the end of the article. I'll detail every part of it here and explain what it does.

[BITS 16]

Means we work in 16 bits (we are in real mode, just where the BIOS leaves us.

[ORG 0x0700] 

Defines the place where our code will be loaded in memory by the BIOS. Then, the first instruction there will be executed, that's where we take the hand.

[SEGMENT .text]

Simply tells the assembler everything after this is code, not data.

jmp start

tells the processor to jump to the label start. The label start is defined right after this line:

"bootmsg db 'It works !!', 10, 13, 0"

which means we create a string "'It works !!', 10, 13, 0" identified by "bootmsg" so we can reference it later on. bootmsg is actually only a human readable way to express the memory location of the begining of the message. db means "define bytes". Each of those characters are bytes, we just tell the assembler we define them here. Just so you know, "10, 13, 0" are ascii codes, 10 to return at the begining of the line, 13 passes to the next line and 0 tells the string ends.

"mov si, bootmsg"

means we're going to store the message address in SI, an other register used to point to a memory address and we'll call our function "message" which will print it to the screen. This function is defined later in the code. After that function is done, we get back to the place we called it.

Now it's very important for your understanding that you have in mind that an interrupt is in fact a function we call with arguments. Those arguments are usually stored in registers. In this article, I use 2 BIOS interrupts described here:
Interrupt 10, function 0x0E : used to print a character to the screen

AH = 0x0E,
AL = ascii code of the character,
BH = The page number to write to,
BL = colour.

Page number to write to... sounds cryptic, right? if you want to write to the 
screen, use 0 here. (BH = 0)


Interrupt 13, function 0 : used to initialize a drive

AH = 0,
DL = Drive (0=A, 1=B, 0x80=C),


Interrupt 13, function 2 : used to read one or more sectors from a disk

AH = 2,
AL = Number of sectors to read.
CH = Track/Cylinder,
CL = Number of the first sector to be read,
DH = Head number,
DL = Drive (0=A, 1=B, 0x80=C),
ES:BX= memory place where to load the data.

Later in the code we find this:

xor ah, ah
xor dl, dl
int 0x13
jc reboot

This code initialize a drive.

xor is a logical operation. Explaining this is out of the scope of this document but it's the same as doing mov ax, 0. It's just faster to execute.

We set ah to 0 (calling function zero of interrupt 13) and dl to zero (floppy drive). In case an error happens, the instruction "jc reboot" will tell the processor to jump to our reboot code. Actually, jc means "jump if carry", if int 0x13 fails carry will be set, hence this.

Later in the code we find this:

  mov ax, 0x1000
  mov es, ax
  xor bx, bx

  mov ax, 0x0201  ;We read (0x02) one (0x01) sector.
  mov ch, 0       ;Cylinder 0
  mov cl, 2       ;Sector 2 (Starts at 1, not 0)
  mov dh, 0       ;Head 0 (or face 1 if a floppy)
  mov dl, 0       ;Floppy drive
  int 0x13

  jc reboot

Here, we use the function 2 of the interrupt 13 (see a little above). The data will be loaded at 0x1000:0x0000 it's a memory address. To do so, ES:BX has to reflect this.
We can't set ES directly, it's a law. So we set AX at 0x1000 and we move the value of ax to es (mox ax, es).
we set bx to 0 and we are set.

AH is at 2, so we read data, AL at 1 so, 1 sector and dl at 0 so, it's from the floppy.
We now call the interrupt 0x13 to actually perform the operation. Now, our kernel should be loaded at 0x1000:0x0000. Again, we use "jc reboot" explained above.

Of course, we consider our kernel would be of the size of a sector here, meaning 512 bytes. Should our kernel grow, this value shall be modified.

One could be wondering... "Hey, if when you create a program it becomes a file, why not simply call the filename?". I'm afraid this ain't that simple. Remember the operating system is not loaded, we _ARE_ the operating system here. This means we have no file system and so, no file at all. The hard drive is only a long line of bits...

db 0xea
dw 0, 0x1000
call reboot   ; Should never be executed.

If we get here, this means everything so far went fine. We didn't reboot. This means the disk initialization went fine and the read operation of the kernel went fine.

db 0xea is an opcode. it's an instruction, like "mov" or "int" but in binary. At some point, with friends, we used to write programs in pure binary, for fun, that's how it was :). This opcode tells the processor to jump to that location defined afterwards by this line: "dw 0, 0x1000" meaning "jump at 0x1000:0x0000" where we loaded the kernel. This is normally the end of our program, now the kernel should run, it took over. If for some reason the jump didn't get through, we reboot. This should never happen.

Our boosector is done... well, not exactly, if you remember well, we called some "functions" earlier... "message" to display a message on the screen and "reboot" to reboot the computer.

The bootsector code "functions"

message:       ; Prints DS:SI to the screen
  lodsb        ; Puts the next DS:SI character to al
  cmp al, 0    ; We compare al to 0.
  je done      ; If equal, we jump to "done"

  mov ah, 0xe  ; We call the function 0x0E of int 0x10
  int 0x10     ; to print the character stored in al.
  jmp message  ; loop to the begining of the function 
               ; to get to the next character.

done:          ; When done, we get here.
  ret          ; and return back.

This is a simple function once you get how it works, keep focused:
Our message is a string made of bytes and ending by a byte at value 0. If you keep this in mind, it will be easy to understand this function.

"message" is our entry point.
"lodsb" takes the first byte at SI (remember we have put the memory address of our bootmsg to SI?) and puts it in al. it will then increment SI of one so the next time it's called, it will load the next byte in the string.
"cmp al, 0" Is al == 0?
"je done" If al was equal 0, we jump to "done"

"mov ah, 0xe" We'll use the function 0xE of the interrupt 0x10 to print the character stored in al onto the screen.
"int 0x10" Actually calling the interrupt and printing on the screen.

"done:" Once we reach the end of the string, we get here.
"ret" We return back to the place who called the message function in the first place.

  db 0xea
  dw 0x0000
  dw 0xFFFF  ; And the computer reboots...

"db 0xea" we get to the place... "dw 0x0000" "dw 0xFFFF" 0x0000:FFFF! This has to effect to reboot the machine.

times 510-($-$$) db 0
dw 0xAA55

Standards indicate a boot sector has to end with two bytes: 0xAA and 0x55. A sector has a size of 512 bytes. "times 510-($-$$) db 0" indicates we fill the program with zeros until the byte 510. "dw 0xAA55" we define the two next bytes, 511 and 512 beeing 0xAA and 0x55.

Done with the code!

Now compiling this:
save the source as boot.asm and type the following:
nasm -f bin -o boot.bin boot.asm
We now have to put it on the floppy disk.
dd if=/the/right/path/boot.bin of=/dev/fd0
You can now reboot your machine with the floppy inserted and you should see the message. Then, programming the kernel would be the next step but I didn't write this article. If many people get interrested in this, I may do the effort.

Appendix 1:

Basic Input Output System, first software to run, right after the 
hardware. It detects basic hardware and allows access to it. 
It's the program who'll call our bootsector.

Action of the BIOS allowing to start an operating system stored on a drive.

Software transforming source code to binary code understandable by the 

Heart of the operating system. It controls the memory, video, drives, 

Real mode:
Default mode of an x86 processor. The one we're at when leaving the 
BIOS. The reachable memory is then only 640KB. With this mode, you 
can access BIOS functionalities.

Protected mode:
With this mode, we can access all the computer memory (4GB max on 
32 systems). You can't access BIOS functionalities but have programs 
running with 32 bits instructions.

Group of eight bits.

Operating system:
Software allowing to use material resources of a computer. Also 
called O.S. Example: Linux, windows, freebsd...

An hard drive is divided in parts called sectors, each of 512 bytes.

Boot sector:
First sector of the hard drive, loaded and executed by the BIOS in order to 
boot the O.S.

Appendix 2:

[BITS 16]     ;Set code generation to 16 bit mode
[ORG 0x7C00]  ;Set code start address to 7C00h

[SEGMENT .text]

jmp start

bootmsg db 'Ca marche !!', 10, 13, 0


  mov si, bootmsg
  call message


  xor ax, ax
  xor dl, dl
  int 0x13
  jc reboot

;Lecture du secteur sur lequel se trouve
;le kernel

  mov ax, 0x1000
  mov es, ax
  xor bx, bx

  mov ax, 0x0201  ;Lecture d'un secteur
  mov ch, 0       ;Cylindre 0
  mov cl, 2       ;Secteur 2 (cela debute a 1 pas a 0)
  mov dh, 0       ;Tete 0 (ou face 1 si c'est un floppy)
  mov dl, 0       ;Lecteur de disquettes
  int 0x13

  jc reboot

;Execution du kernel

  db 0xea
  dw 0, 0x1000

  call reboot

;Gestion de messages

message:       ;Met ds:si a l'ecran
  lodsb        ;Met un caractere de ds:si dans al
  cmp al, 0    ;Teste si al == 0, si oui c'est fini
  je done
  mov ah, 0xe  ;Appelle la fonction d'affichage
  int 0x10
  jmp message



  db 0xea
  dw 0x0000
  dw 0xFFFF  ;Et le systeme reboote....

;A la fin du secteur...

  times 510-($-$$) db 0
  dw 0xAA55