Assembly language, the “lowest-level” programming language on any computer, has a similar reputation: difficult, mysterious, and beyond understanding.
addi a1, a0, 10
is a simple line of assembly: it describes a single instruction in text form. Assembly language is “just” a textual representation of the program’s machine code.
On Architectures
There isn’t “an” assembly language. Every computer has a different instruction set architecture, or “ISA”. Each ISA has a corresponding assembly language that describes that ISA’s specific instructions, but they all generally have similar overall structure.
It’s actually very rare to write actual assembly. Thanks to modern (relatively) languages like Rust, C++ and Go, and even things like Haskell and JavaScript, virtual no programmers need to write assembly anymore.
But that’s only because it’s the leading language written by computers themselves (compiler). A compiler’s job is, fundamentally, to write the assembly you would have had to write for you. To better understand what is compiler is doing for you, you need to be able to read its output.
It’s worth looking at the C compilation model.
Diving in
clang -S square.c
command would output :
.text
this tells the assembler to place all code that following in the .text
section, where executable data goes.
.file
This is just metadata that tools can use to figure out how the executable was built..globl
This asks the assembler to marksquare_and_print
as an externally linkable symbol. Other files that refer tosquare_and_print
will be able to find it at link time.
square_and_print
This is a label, which gives this position in the executable a name that can be referenced. They’re very similar to goto
labels from C.
And this label content’s description comment above code.
.L.str
give our string constant a private name. By convention, .L labels are private names emitted by the compiler.
.asciz
Emit an ASCII string into.rodata
with an extra null terminator at the end: that’s what thez
stands for.
The Core Syntax
All assemblers are different, but the core syntax tends to be the same.
- Instructions, which consists of a mnemonic followed by some number of operands, such as
addi sp, sp -16
andcall printf
above. These are the text encoding of machine code. - Labels, which consist of a symbol followed by a colon, like
square_and_print:
or.L.str:
. These are used to let instruction operands refer to locations in the program. - Directives, which vary wildly by assembler. GCC-style assembly like that above uses a
.directive arg, arg
syntax, as seen in.text
,.globl
and.asciz
. They control the behavior of the assembler in various ways.
An assembler’s purpose is to read the .s
file and serialize it as a binary .o
file. It’s kind of like a compiler, but it does virtually no interesting work at all, beyond knowing how to encode instructions.
Directives control how this serialization occurs (such as moving around the output cursor); instructions are emitted as-is, and labels refer to locations in the object file.
Type of instructions
Available instructions tend to be motivated by providing one of three classes of functionality:
- A Turing-complete register machine execution environment. This tends to the Turing tarpit nature of assembly: only the absolute minimum in terms of control flow and memory access is provides.
- Efficient silicon implementation of common operations on bit strings and integers, ranging from arithmetic to cryptographic algorithms.
- Building a secure operating system, hosting virtual machines, and actuating hardware external to the processer, like a monitor, a keyword, or speakers.
Instructions can be broadly classified into four categories: arithmetic memory, control flow, and “everything else”. In the last thirty years, the bar for general purpose architectures is usually “this is enough to implement a C runtime”.
Arithmetic instructions
addition, subtractions, bitwise, or, nor, as well as unary not and negation
Multiplication and division are somewhat rarer, because they are expensive to implement in silicon: smaller devices don’t have them. Division in particular is very complex to implement in silicon. Instructions sets usually have different behavior around division by zero: some architectures will fault, similar to a memory error, while some, like RISC-V, produce a well-defined trap value.
copy instruction that move the value of one register to another, which is kind of like trivial arithmetic instruction.
Some architectures also offer more exotic arithmetic. This is just a sampler of what’s sometimes available:
- bit rotation
- byte reversal
- bit extraction
- carry-less multiplication. This is used to implement Galois/Counter mode encryption.
- fused instructions, like
xnor
andnand
- floating point instructions, usually implementing the IEEE 754 standard.
Memory instruction
load, fetch memory from RAM into register, while store, write it back.
These instructions frequently have an alignment constraint: the pointer value must (or, at least, should) be divisible by the number of bytes being loaded.
This category also includes instructions necessary for implementing atomics, such as lock cmpxchg
on x86 and lr/sc
on RISC-V. Atomics are fundamentally about changing the semantics of reading and writing from RAM, and thus require special processor support.
Control flow instructions
unconditional jumps implement goto
: given some label
, the j label
instruction jumps directly to it.
conditional jumps, often called branches, implement if
. beq a0, a1, label
will jump to label
if a0
and a1
contain the same value. RISC-V provides branch instructions for all kinds of comparisons, like bne
, blt
, bge
.
conditional and unconditional jumps can be used together to build loops, much like we could in C using if
and goto
.
Miscellaneous instructions
“Everything else” is, well… everything else.
No-op instructions do nothing: nop
‘s only purpose is to take up space in the instruction stream. No-op instructions can be used to pad space in the instruction stream, provide space for the linker to fix things up later, or implement nop
sleds.
Instructions for poking processer state like, csrrw
in RISC-V and wrmsr
in x86 also belong in this category, as do “hinting” instructions like memory prefetches.
There are also instructions for special control flow: ecall
is RISC-V’s “syscall” instruction, which “traps” to the kernel for it to do something; other architectures have simlilar instructions.
Breakpoint instructions and “fence” instructions belong here, too.
The calling convention
Functions are the core abstraction of all of programming. Assembly is no different: we have functions there, too!
Like in any language, functions are passed a list of arguments, perform some work, and return a value. For example, in C:
Unfortunately, there isn’t anything like function call syntax in assembly. As with everything else, we need to it instruction by instruction. All we do get in most architectures is a call
instruction, which sets up a return address somewhere, and a ret
instruction, which uses the return address to jump to where the function was called.
We need some way to pass arguments, return a computed value, and maintain a call stack, so that each function’s return address is kept intact for its ret
instruction to consume. We also need this to be universal: if I pull in a library, I should be able to call its functions.
This mechanism is called the calling convention of the platform’s ABI. It’s a convention, because all libraries must respect it in their exposed API for code to work correctly at runtime.
A function call in slow-mo
At the instruction level, function calls look something like this:
- Pre-call setup. the caller sets up the function call arguments by placing them in the appointed locations for arguments. These are usually either registers or locations on the stack. a. The caller also saves the caller-saved registers to the stack.
- Jump to the function. The caller execute
call
instruction (or whatever the function call instructions might be called - virtually all architectures have one). This sets the program counter to the first instruction of the callee. - Function prologue. The callee does some setup before executing its code. a. The callee reserves space on the stack in an architecture-dependent manner.b. The callee saves the callee-saved registers to this stack space.
- Function body. The actual code of the function runs now! This part of the function needs to make sure the return value winds up wherever the return slot for the functions is.
- Function epilogue. The callee undoes whatever work it did in the prologue, such as restoring saved registers, and executes a
ret
(or equivalent) instruction to return - Post-call cleanup. The caller is now executing again; it can unspill any saved state that is needs immediately after the function call, and can retrieve the return value from the return slot. In some ABIs, such as C++‘s on Linux, this is where the destructors of the arguments get run. (Rust, and C++ on Windows, have callee-destroyed arguments instead).
When people say the function calls have overhead, this is what they mean. Not only does the call
instruction cause the processor to slam the breaks on its pipeline, causing all kinds of work to get thrown away, but state always needs to be delicately saved and restored across the function boundary to maintain the illusion of a callstack.
Caller-side
lui, addi
do the work of actually putting that pointer into a0
. the second argument x
is passed in a1
, copied from s0
where it landed from the earlier mul
instruction.
Callee-side
Look at the square_and_print
comments about prologue/epilogue.