LLVM IR Tutorial
LLVM IR notes
Basic constructs
Global variable
int variable = 21;
@variable = global i32 21
Globals are prefixed with the @
character. You can see that also functions, such as main
, are also global variables in LLVM. LLVM views global variables as pointers; so you must explicitly dereference the global variable using the load
instruction when accessing its value, likewise you must explicitly store the value of a global variable using the store
instruction.
Local variables
- Temporary variables/Registers: created by introducing a new symbol for the variable
- Stack-allocated local variables: created by allocating the variable on the stack
%reg = add i32 4, 2
%stack = alloca i32
Notice that alloca
yields a pointer to the allocated type. As is generally the case in LLVM, you must explicitly use a load
or store
instruction to read or write the value respectively.
Constants
- Constants that do not occupy allocated memory.
- Constants that do occupy allocated memory.
%1 = add i32 %0, 17 ; 17 is an inlined constant
@hello = internal constant [6 x i8] c"hello\00"
%struct = type { i32, i8 }
@struct_constant = internal constant %struct { i32 16, i8 4 }
Such a constant is really a global variable whose visibility can be limited with private
or internal
so that it is invisible outside the current module.
Structures
struct Foo { int a; char *b; double c; };
%Foo = type {
i32, ; 0: a
i8*, ; 1: b
double ; 2: c
}
//---------------------------------------------------------------------------------------
Foo foo;
char **bptr = &foo.b;
%foo = alloca %Foo
// char **bptr = &foo.b GetElementPointer GEP instruction
%1 = getelementptr %Foo, %Foo* %foo, i32 0, i32 1
//---------------------------------------------------------------------------------------
Foo bar[100];
bar[17].c = 0.0;
; Foo bar[100]
%bar = alloca %Foo, i32 100
; bar[17].c = 0.0
%2 = getelementptr %Foo, %Foo* %bar, i32 17, i32 2
store double 0.0, double* %2
Casts
- Bitwise casts (type casts): A bitwise cast (
bitcast
) reinterprets a given bit pattern without changing any bits in the operand.
Foo *foo = (Foo *) malloc(sizeof(Foo));
%1 = call i8* @malloc(i32 4)
%foo = bitcast i8* %1 to %Foo*
- Zero-extending casts (unsigned upcasts): To upcast an unsigned value like in the example below using
zext
uint8 byte = 117;
uint32 word;
word = byte;
@byte = global i8 117
@word = global i32 0
%1 = load i8, i8* @byte
%2 = zext i8 %1 to i32
store i32 %2, i32* @word
- Sign-extending casts (signed upcasts): To upcast a signed value, you replace the
zext
instruction with thesext
instruction
@char = global i8 -17
@int = global i32 0
%1 = load i8, i8* @char
%2 = sext i8 %1 to i32
store i32 %2, i32* @int
- Truncating casts (signed and unsigned downcasts): Both signed and unsigned integers use the same instruction,
trunc
, to reduce the size of the number in question.
@int = global i32 -1
@char = global i8 0
%1 = load i32, i32* @int
%2 = trunc i32 %1 to i8
store i8 %2, i8* @char
- Floating-point extending casts (float upcasts): Floating points numbers can be extended using the
fpext
instruction
@small = global float 1.25
@large = global double 0.0
%1 = load float, float* @small
%2 = fpext float %1 to double
store double %2, double* @large
- Floating-point truncating casts (float downcasts): Likewise, a floating point number can be truncated to a smaller size using
fptrunc
@large = global double 1.25
@small = global float 0.0
%1 = load double, double* @large
%2 = fptrunc double %1 to float
store float %2, float* @small
- Pointer-to-integer casts: Pointers do not support arithmetic, which is sometimes needed when doing systems programming. LLVM has support for casting pointer types to integer types using the
ptrtoint
instruction. - Integer-to-pointer casts: The
inttoptr
instruction is used to cast an integer back to a pointer - Address-space casts (pointer casts).
Function Definitions and Declarations
- Simple definition and Declaration
define i32 @Bar() nounwind { ret i32 17 }
declare i32 @Bar(i32 %value)
- With variable number of parameters
declare i32 @printf(i8*, ...) nounwind
@.textstr = internal constant [20 x i8] c"Argument count: %d\0A\00"
define i32 @main(i32 %argc, i8** %argv) nounwind {
; printf("Argument count: %d\n", argc)
%1 = call i32 (i8*, ...) @printf(i8* getelementptr([20 x i8], [20 x i8]* @.textstr, i32 0, i32 0), i32 %argc)
ret i32 0
}
- Overloading: function names are mangled in the LLVM IR level
define i32 @_Z8functionii(i32 %a, i32 %b) #0 {
; [...]
ret i32 %5
}
define double @_Z8functionddd(double %a, double %b, double %x) #0 {
; [...]
ret double %8
}
For the detailed description of
struct
arguments in the function please refer to this linkFunction Points
int (*Function)(char *buffer);
@Function = global i32(i8*)* null
Unions
TODO: since it is not commonly used today
Control-Flow Constructs
if-then-else
conversion
In LLVM IR control-flow is implemented by jumping between basic blocks, which contain instruction sequences that do not change control flow. Each basic block ends with an instruction that changes the control flow. The most common branching instruction is br
// Usage of br instruction
br i1 %cond, label %iftrue, label %iffalse
br label %dest
Max function example
int max(int a, int b) {
if (a > b) {
return a;
} else {
return b;
}
}
Translate into LLVM IR: there are four basic blocks
define i32 @max(i32 %a, i32 %b) {
entry:
%retval = alloca i32, align 4
%0 = icmp sgt i32 %a, %b
br i1 %0, label %btrue, label %bfalse
btrue: ; preds = %2
store i32 %a, i32* %retval, align 4
br label %end
bfalse: ; preds = %2
store i32 %b, i32* %retval, align 4
br label %end
end: ; preds = %btrue, %bfalse
%1 = load i32, i32* %retval, align 4
ret i32 %1
}
PHI
The phi
instruction is named after the φ function used in the theory of SSA. This functions magically chooses the right value, depending on the control flow. In LLVM you have to manually specify the name of the value and the previous basic block.
%retval = phi i32 [%a, %btrue], [%b, %bfalse]
After using the PHI
function the above max function can be transform to
define i32 @max(i32 %a, i32 %b) {
entry:
%0 = icmp sgt i32 %a, %b
br i1 %0, label %btrue, label %bfalse
btrue: ; preds = %2
br label %end
bfalse: ; preds = %2
br label %end
end: ; preds = %btrue, %bfalse
%retval = phi i32 [%a, %btrue], [%b, %bfalse]
ret i32 %retval
}
Usually the compiler back end will use the stack for implementing the phi
instruction. However, if we use a little more optimization in the back end (i.e., llc -O1
), we can get a more optimized version.
Object Oriented Constructs
Classes
A class is nothing more than a structure with an associated set of functions that take an implicit first parameter, namely a pointer to the structure. Therefore, is is very trivial to map a class to LLVM IR
#include <stddef.h>
class Foo
{
public:
Foo() { _length = 0; }
void SetLength(size_t value) { _length = value; }
private:
size_t _length;
};
first transform this code into two separate pieces:
- The structure definition.
- The list of methods, including the constructor.
; The structure definition for class Foo.
%Foo = type { i32 }
; The default constructor for class Foo.
define void @Foo_Create_Default(%Foo* %this) nounwind {
%1 = getelementptr %Foo, %Foo* %this, i32 0, i32 0
store i32 0, i32* %1
ret void
}
; The Foo::SetLength() method.
define void @Foo_SetLength(%Foo* %this, i32 %value) nounwind {
%1 = getelementptr %Foo, %Foo* %this, i32 0, i32 0
store i32 %value, i32* %1
ret void
}
Then we make sure that the constructor (Foo_Create_Default
) is invoked whenever an instance of the structure is created.
Virtual Methods
A virtual method is no more than a compiler-controlled function pointer. Each virtual method is recorded in the vtable
, which is a structure of all the function pointers needed by a given class. Please refer this link to learn more about Objected-Oriented Constructs.
Some important Instructions
getelementptr
Ins, also other fantastic notes from official website
Syntax
<result> = getelementptr <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
<result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
<result> = getelementptr <ty>, <ptr vector> <ptrval>, [inrange] <vector index type> <idx>