My (dead) disassembled laptop

Programming in x86 Assembly on OSX

written on July 22, 2019

categories: engineering

tags: assembly, x86, macOS

Recently I've decided to learn x86 Assembly.

I know, it's not the most beautiful language to program in; I've already coded an entire terminal-based game in MIPS and it caused me a lot of headaches, but also it taught me many important things about programming and optimizing code. By looking into how your code gets translated into actual machine code, you can better undestand how to optimize it.

Besides, I'm studying embedded systems engineering; coding in assembly is a must-have skill, not because it's something we use every day, but because it can help us debug weird errors and make code run faster, especially on microcontrollers.

The Makefile

So here we go. First of all, I'm on a machine running OSX, so I'll need something that can compile x86 Assembly code for my machine in particular. By doing some looking around, I found out that I need to use NASM in order to compile Assembly code, and then use ld to link it. Let's set up a Makefile!

Makefile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
TARGET = hello
ENTRY ?= _start

NASM = nasm
LD = ld
NFLAGS = -f macho -O0
LFLAGS = -e $(ENTRY) -lSystem -macosx_version_min 10.8


all: $(TARGET)

%: %.o
    $(LD) $(LFLAGS) $< -o $@

%.o: %.asm
    $(NASM) $(NFLAGS) $<

clean:
    rm -rf *.o $(TARGET)

.PHONY: all clean

Let's break this down: The first two lines have to do with setting up the name of the file that contains the source code, as well as the output file's name. I also set up a variable called ENTRY which is basically the name of the entry point of the program; in a normal C program that would be main, but here we can call it whatever we want. The ?= operator means that if this variable is not set by the user when make is called, it will default to _start.

Lines 4-7 set up our assembler and linker; as I said before, I'm using nasm to assemble my sources, along with a couple of arguments. We need to tell it that the architecture we're compiling for is macho (more on that here) and that we don't want it to optimize the output code (hence the -O0 flag). This last point is important, as it will allow us to keep more or less the same code before and after assembly, and will make debugging easier. As far as ld is concerned, first of all we need to give it our entry point (it will look for _main by default). I also add in a library called System which is, apparently, necessary (ld throws an error if I don't include it) and I also give it a minimum OSX version to silence ld's warnings about that.

The rest of the Makefile is pretty straightforward. All it does is basically assemble and link the source in two steps, as follows:

$ nasm -f macho -O0 hello.asm
$ ld -e _start -lSystem -macosx_version_min 10.8 hello.o -o hello

Let's look at the source code!

The source code

By doing some more looking around (visit the sources at the end of this post to learn more about where I found all this info), I found out how to perform a basic syscall that prints a string to the screen. If you don't know what a syscall is, here's the 30-second version of it: whenever you're trying to do something that involves external processor resources (like printing to stdout, aka the terminal in this case) you need to call the kernel with a certain opcode (operation code) while also having the right values in the right registers (registers are basically little memory banks that sit literally inside the CPU — their size is what makes your computer 32 or 64-bit).

In the case of the write syscall, whose opcode is 0x4, the following registers need to contain the following values:

These registers are special in the sense that they hold values that get used during a syscall.

Before I show you the assembly source code, there's another little thing we need to talk about (if you're not familiar with assembly code): the .data and .text sections.

Every assembly source is typically split into two sections: .data, that contains the actual variables in the code (if you do something like int i = 0; in C, it basically ends up in this section — but only if it's a global variable or a static string), and .text, which contains the actual program (that manipulates whatever's in the .data section). Now let's look at the code!

First off, we need to declare two variables, one for the string we want to print and another that will store the string's length:

hello.asm

1
2
3
4
section .data ; Beginning of our data section

msg db "Hello, World!", 0xa ; String with a carriage-return
len equ $ - msg ; String length in bytes

As you can see, the first line of this snippet declares the start of the .data section. We then declare a bytes variable called msg (hence db, for declare bytes), and add a newline character (0xa) at the end.

We then get the length of the msg string in bytes in a pretty clever way: the $ operator points to the end of the last variable declaration in data, so in our case by subtracting the msg pointer (which points at the start of the msg string) from $ we get the actual length of the string.

Let's now set up the actual program code:

hello.asm

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
section .text ; Start of the code indicator
global _start ; Make the main function externally visible

_syscall: ; Declaring a kernel call function
    int 0x80 ; Make the system call
    ret ; Return to the caller

_start: ; Entry point for linker
    ; Write our string to standard output
    mov edx, len ; Move the message's length into EDX
    mov ecx, msg ; Move the message's pointer into ECX
    mov ebx, 0x1 ; Move the file descriptor (0x1 for stdout) into EBX
    mov eax, 0x4 ; Move the syscall opcode (0x4 for sys_write) into EAX
    call _syscall ; Call the kernel

    mov ebx, 0x0 ; Move the exit code (0 for successful exit) into EBX
    mov eax, 0x1 ; Move the syscall opcode (0x1 for sys_exit) into EAX
    call _syscall ; Call the kernel

As before, we start by declaring the section, then we make our entry point (_start) global, which means that it will be externally accessible — we need this in order to be able to execute our program.

Right after that, I'm setting up a syscall flag; this behaves kind of like a small function, in the sense that every time I call it it will perform a syscall (that's what int 0x80 does) and it will then return to the normal flow of the program. It basically lets me replace int 0x80 by _syscall, which makes a bit more sense to the programmer reading this code and requires a bit less brainwork to figure out what it does.

We then head right into _start which is, as we said, the entry point of our program (much like a main() function). We set up the registers according to the syscall's specifications, and we call the kernel.

The last bit of code makes sure the program exits properly, by making a syscall with the sys_exit opcode. The return value of the program is the value stored in EBX (that's why we put 0x0 in there before making the syscall, as 0 means that the program was executed successfully). Let's assemble and link it, and see if it works!

Something's wrong

$ make
nasm -f macho -O0 hello.asm
ld -e _start -lSystem -macosx_version_min 10.8 hello.o -o hello
$ ./hello
$

Hmm, that's weird... nothing's getting printed to the screen. Let's check the value that the program returned:

$ echo $?
81

That's even weirder; it was supposed to return 0. Let's examine the program with lldb, which is the equivalent of gdb for OSX:

$ lldb ./hello
(lldb) target create "./hello"
Current executable set to './hello' (i386).
(lldb) b start
Breakpoint 1: 2 locations.
(lldb) r
Process 23319 launched: '/Users/kokkonisd/code/x86nasm/hello' (i386)
Process 23319 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00001fc9 hello`start
hello`start:
->  0x1fc9 <+0>:  movl   $0xe, %edx
    0x1fce <+5>:  movl   $0x2000, %ecx             ; imm = 0x2000
    0x1fd3 <+10>: movl   $0x1, %ebx
    0x1fd8 <+15>: movl   $0x4, %eax
Target 0: (hello) stopped.

This might look weird, but all I've done is set a breakpoint in start, which will make the debugger stop the program when it gets there; that will allow us to take a good look at the register values to see why we're not printing anything to the screen.

As you can see, there's a small arrow (->) pointing to the line that's going to be run next; here, it's the initialization of the EDX register. It's about to be set to 0xe, which is good, because 0xe in decimal is 14 (which is precisely the length of our string in bytes). Let's advance a little bit by hitting n, and stop right before the syscall:

(lldb) n

...

(lldb) n
Process 23319 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x00001fdd hello`start + 20
hello`start:
->  0x1fdd <+20>: calll  0x1fc6                    ; syscall
    0x1fe2 <+25>: movl   $0x0, %ebx
    0x1fe7 <+30>: movl   $0x1, %eax
    0x1fec <+35>: calll  0x1fc6                    ; syscall
Target 0: (hello) stopped.

We've now stopped right before the first syscall, which means that all the registers should contain the right values. Let's print them:

(lldb) p $edx
(unsigned int) $0 = 14
(lldb) p (char *) $ecx
(char *) $1 = 0x00002000 "Hello, World!\n"
(lldb) p $ebx
(unsigned int) $2 = 1
(lldb) p $eax
(unsigned int) $3 = 4

As you can see, everything seems to be normal. All the registers are initialized with the correct values, so... why aren't we seeing anything in stdout?

Why it doesn't work

As it turns out, 32-bit MacOS system calls on BSD do not use registers, except for the system call number in EAX. This means that our register values get completely ignored by the kernel, and thus it doesn't print anything; this is also the reason why the return value has nothing to do with what we put in EBX. What I'd found out about how the sys_write syscall works was right, but only on Linux. So, how can we make it work on OSX?

The solution

On OSX, the kernel uses values that have been pushed on the stack to execute a syscall; so let's do that:

hello.asm

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
section .data ; Beginning of our data section

msg db "Hello, World!", 0xa ; String with a carriage-return
len equ $ - msg ; String length in bytes


section .text ; Start of the code indicator
global _start ; Make the main function externally visible

_syscall: ; Declaring a kernel call function
    int 0x80 ; Make the system call
    ret ; Return to the caller

_start: ; Entry point for linker
    ; Write our string to standard output

    ; We need to push some variables to the stack
    ; The stack has as follows:
    ; EBX - on top
    ; ECX - right under EBX
    ; EDX - right under ECX

    push dword len ; Push message length onto stack (for EDX)
    push dword msg ; Push message to write onto stack (for ECX)
    push dword 1 ; Push file descriptor value (STDOUT) onto stack (for EBX)
    mov eax, 4 ; System call number (sys_write)
    call _syscall ; Call the kernel

    add esp, 12 ; Clear the stack

    push dword 0 ; Exit code
    mov eax, 0x1 ; System call number (sys_exit)
    call _syscall ; Make the system call

I've only really changed lines 23-25, 29 and 31, the rest being the same code as before plus some comments to better explain how it works. Let's test it out:

$ make
nasm -f macho -O0 hello.asm
ld -e _start -lSystem -macosx_version_min 10.8 hello.o -o hello
$ ./hello
Hello, World!
$ echo $?
0

As we can see, by pushing words (16 bits, or 2 bytes) to the stack, we now have a functioning program that performs both syscalls as intended.

Conclusion

The point of all this is pretty simple: when coding in Assembly, you're coming by definition closer to the hardware of the machine, so the code is starting to get more and more processor-dependent. Furthermore, the code also depends on the OS; as the nice people that commented on my Stack Overflow question pointed out, you can't even be sure that a syscall will have the same opcode on Linux and OSX. Even if they're both UNIX-based, their kernels still differ and will thus need different code at some point.

Sources

Here are some sources that helped me put all of this together:


< Building Pong in C++ Refactoring my Pong clone >