<- Back to articles
Bootsector in assembly
This explains how to write a very minimal OS... Well, could I even say this, it's only a bootsector. This also could be a very brief a simple overview of what assembler looks like since the only ode you will find here is x86 assembly. I always found this interesting and I now wish to share this... A glossary is at the end of the article (Appendix 1), please refer to it for anything ou don't get and mail me if you don't understand something so I can improve the article.Let's go!
Be aware we develop in real mode, we only can access to 650KB of memory and 16 bits instructions but this allow us access to all BIOS functionalities. Using this mode avoids some huge work (like memory management) and it's really enough for us.
What do we need?
We need some programming tools:
For our assembly code, we'll use NASM, it will take our assembly code and make it an executable binary. This software is available for many platforms, this include *nix and windows and is available for download here.
We'll also copy our bootsector right on the first sector of a disk (say a floppy or a virtual drive, I'll take a floppy for this example) and this will require us to use a tool for this, you can use "dd" if you're on *nix or partcopy if you're on windows.
The assembly language
Creating an operating system requires you to know assembly. Let's see some basics. First thing to be aware of, in assembly, most things are done in hexadecimal, so don't be surprised to see number written this way: 0x0100. This means "100 in hexadecimal" (base 16) which equals to 256 in decimal (base 10).
In assembly, we use interuptions, quite like those hardware interruptions you may have already heard of in an IRQ conflict :P. Interruptions give us access to functions (like in C) generated by the BIOS or the operating system. We'll use in this article 2 BIOS interruptions, namely 0x10 and 0x13.
0x10 will be used to write messages on this screen, say, "Welcome on our operating system".
0x13 will be used to read other sectors of the disk to read our future kernel (and unless I someday decide to write a next article on the topic this will never happen).
In assembly, we use "registers" which are in fact "containers" allowing us to manipulate values we need, quite like a variable in C. Just to name the only ones we'll use here, they are named AX, BX, CX and DX and are all on 16 bits.
AX for example can be divided in two parts: AH and AL, each one is on 8 bits. You can call AH or AL alone or AX to directly manipulate the 16 bits. If you put 0x1234 in AX, you'll have 0x12 in AH and 0x34 in AL.
Let's now see some instructions: "mov" allows you to copy a value. For example, mov ax, 0x1234 tells the assembler to "put 0x1234 to the AX registry". The instruction "int" allows to call an interruption, for example "int 0x13" calls the 0x13 interrupt which will check the register as if they were function arguments and will act subsequently.
The bootsector code
You can have a look at the entire code (Appendix 2) at the end of the article. I'll detail every part of it here and explain what it does.
Means we work in 16 bits (we are in real mode, just where the BIOS leaves us.
Defines the place where our code will be loaded in memory by the BIOS. Then, the first instruction there will be executed, that's where we take the hand.
Simply tells the assembler everything after this is code, not data.
tells the processor to jump to the label start. The label start is defined right after this line:
which means we create a string "'It works !!', 10, 13, 0" identified by "bootmsg" so we can reference it later on. bootmsg is actually only a human readable way to express the memory location of the begining of the message. db means "define bytes". Each of those characters are bytes, we just tell the assembler we define them here. Just so you know, "10, 13, 0" are ascii codes, 10 to return at the begining of the line, 13 passes to the next line and 0 tells the string ends.
"bootmsg db 'It works !!', 10, 13, 0"
means we're going to store the message address in SI, an other register used to point to a memory address and we'll call our function "message" which will print it to the screen. This function is defined later in the code. After that function is done, we get back to the place we called it.
"mov si, bootmsg"
Now it's very important for your understanding that you have in mind that an interrupt is in fact a function we call with arguments. Those arguments are usually stored in registers. In this article, I use 2 BIOS interrupts described here:
Interrupt 10, function 0x0E : used to print a character to the screen AH = 0x0E, AL = ascii code of the character, BH = The page number to write to, BL = colour. Page number to write to... sounds cryptic, right? if you want to write to the screen, use 0 here. (BH = 0) ------------------------------------------------------- Interrupt 13, function 0 : used to initialize a drive AH = 0, DL = Drive (0=A, 1=B, 0x80=C), ------------------------------------------------------- Interrupt 13, function 2 : used to read one or more sectors from a disk AH = 2, AL = Number of sectors to read. CH = Track/Cylinder, CL = Number of the first sector to be read, DH = Head number, DL = Drive (0=A, 1=B, 0x80=C), ES:BX= memory place where to load the data.
Later in the code we find this:
This code initialize a drive.
xor ah, ah xor dl, dl int 0x13 jc reboot
xor is a logical operation. Explaining this is out of the scope of this document but it's the same as doing mov ax, 0. It's just faster to execute.
We set ah to 0 (calling function zero of interrupt 13) and dl to zero (floppy drive). In case an error happens, the instruction "jc reboot" will tell the processor to jump to our reboot code. Actually, jc means "jump if carry", if int 0x13 fails carry will be set, hence this.
Later in the code we find this:
Here, we use the function 2 of the interrupt 13 (see a little above). The data will be loaded at 0x1000:0x0000 it's a memory address. To do so, ES:BX has to reflect this.
mov ax, 0x1000 mov es, ax xor bx, bx mov ax, 0x0201 ;We read (0x02) one (0x01) sector. mov ch, 0 ;Cylinder 0 mov cl, 2 ;Sector 2 (Starts at 1, not 0) mov dh, 0 ;Head 0 (or face 1 if a floppy) mov dl, 0 ;Floppy drive int 0x13 jc reboot
We can't set ES directly, it's a law. So we set AX at 0x1000 and we move the value of ax to es (mox ax, es).
we set bx to 0 and we are set.
AH is at 2, so we read data, AL at 1 so, 1 sector and dl at 0 so, it's from the floppy.
We now call the interrupt 0x13 to actually perform the operation. Now, our kernel should be loaded at 0x1000:0x0000. Again, we use "jc reboot" explained above.
Of course, we consider our kernel would be of the size of a sector here, meaning 512 bytes. Should our kernel grow, this value shall be modified.
One could be wondering... "Hey, if when you create a program it becomes a file, why not simply call the filename?". I'm afraid this ain't that simple. Remember the operating system is not loaded, we _ARE_ the operating system here. This means we have no file system and so, no file at all. The hard drive is only a long line of bits...
If we get here, this means everything so far went fine. We didn't reboot. This means the disk initialization went fine and the read operation of the kernel went fine.
db 0xea dw 0, 0x1000 call reboot ; Should never be executed.
db 0xea is an opcode. it's an instruction, like "mov" or "int" but in binary. At some point, with friends, we used to write programs in pure binary, for fun, that's how it was :). This opcode tells the processor to jump to that location defined afterwards by this line: "dw 0, 0x1000" meaning "jump at 0x1000:0x0000" where we loaded the kernel. This is normally the end of our program, now the kernel should run, it took over. If for some reason the jump didn't get through, we reboot. This should never happen.
Our boosector is done... well, not exactly, if you remember well, we called some "functions" earlier... "message" to display a message on the screen and "reboot" to reboot the computer.
The bootsector code "functions"
This is a simple function once you get how it works, keep focused:
message: ; Prints DS:SI to the screen lodsb ; Puts the next DS:SI character to al cmp al, 0 ; We compare al to 0. je done ; If equal, we jump to "done" mov ah, 0xe ; We call the function 0x0E of int 0x10 int 0x10 ; to print the character stored in al. jmp message ; loop to the begining of the function ; to get to the next character. done: ; When done, we get here. ret ; and return back.
Our message is a string made of bytes and ending by a byte at value 0. If you keep this in mind, it will be easy to understand this function.
"message" is our entry point.
"lodsb" takes the first byte at SI (remember we have put the memory address of our bootmsg to SI?) and puts it in al. it will then increment SI of one so the next time it's called, it will load the next byte in the string.
"cmp al, 0" Is al == 0?
"je done" If al was equal 0, we jump to "done"
"mov ah, 0xe" We'll use the function 0xE of the interrupt 0x10 to print the character stored in al onto the screen.
"int 0x10" Actually calling the interrupt and printing on the screen.
"done:" Once we reach the end of the string, we get here.
"ret" We return back to the place who called the message function in the first place.
"db 0xea" we get to the place... "dw 0x0000" "dw 0xFFFF" 0x0000:FFFF! This has to effect to reboot the machine.
reboot: db 0xea dw 0x0000 dw 0xFFFF ; And the computer reboots...
Standards indicate a boot sector has to end with two bytes: 0xAA and 0x55. A sector has a size of 512 bytes. "times 510-($-$$) db 0" indicates we fill the program with zeros until the byte 510. "dw 0xAA55" we define the two next bytes, 511 and 512 beeing 0xAA and 0x55.
times 510-($-$$) db 0 dw 0xAA55
Done with the code!
Now compiling this:
save the source as boot.asm and type the following:
nasm -f bin -o boot.bin boot.asmWe now have to put it on the floppy disk.
dd if=/the/right/path/boot.bin of=/dev/fd0You can now reboot your machine with the floppy inserted and you should see the message. Then, programming the kernel would be the next step but I didn't write this article. If many people get interrested in this, I may do the effort.
BIOS: Basic Input Output System, first software to run, right after the hardware. It detects basic hardware and allows access to it. It's the program who'll call our bootsector. Boot: Action of the BIOS allowing to start an operating system stored on a drive. Compiler: Software transforming source code to binary code understandable by the computer. Kernel: Heart of the operating system. It controls the memory, video, drives, programs... Real mode: Default mode of an x86 processor. The one we're at when leaving the BIOS. The reachable memory is then only 640KB. With this mode, you can access BIOS functionalities. Protected mode: With this mode, we can access all the computer memory (4GB max on 32 systems). You can't access BIOS functionalities but have programs running with 32 bits instructions. Byte: Group of eight bits. Operating system: Software allowing to use material resources of a computer. Also called O.S. Example: Linux, windows, freebsd... Sector: An hard drive is divided in parts called sectors, each of 512 bytes. Boot sector: First sector of the hard drive, loaded and executed by the BIOS in order to boot the O.S.
[BITS 16] ;Set code generation to 16 bit mode [ORG 0x7C00] ;Set code start address to 7C00h [SEGMENT .text] jmp start bootmsg db 'Ca marche !!', 10, 13, 0 start: ;---------------- mov si, bootmsg call message ;---------------- xor ax, ax xor dl, dl int 0x13 jc reboot ;--------------------------------------- ;Lecture du secteur sur lequel se trouve ;le kernel mov ax, 0x1000 mov es, ax xor bx, bx mov ax, 0x0201 ;Lecture d'un secteur mov ch, 0 ;Cylindre 0 mov cl, 2 ;Secteur 2 (cela debute a 1 pas a 0) mov dh, 0 ;Tete 0 (ou face 1 si c'est un floppy) mov dl, 0 ;Lecteur de disquettes int 0x13 jc reboot ;------------------- ;Execution du kernel db 0xea dw 0, 0x1000 call reboot ;------------------- ;Gestion de messages message: ;Met ds:si a l'ecran lodsb ;Met un caractere de ds:si dans al cmp al, 0 ;Teste si al == 0, si oui c'est fini je done mov ah, 0xe ;Appelle la fonction d'affichage int 0x10 jmp message done: ret ;------ ;REBOOT reboot: db 0xea dw 0x0000 dw 0xFFFF ;Et le systeme reboote.... ;------------------- ;A la fin du secteur... times 510-($-$$) db 0 dw 0xAA55