The job of compiler is splitted into several phases with well defined interfaces.These phases operate in sequence and the output of one phase is input of next one.
A common division into phases is described below. In some compilers, the ordering of phases may differ slightly, some phases may be combined or split into several phases or some extra phases may be inserted between those mentioned below.
This phase will do the reading and analysing the program text.The text is read and divided into tokens, each of which corresponds to a symbol in the programming language, e.g., a variable name, keyword or number.
This phase takes the list of tokens produced by the lexical analysis and arranges these in a tree-structure (called the syntax tree) that reflects the structure of the program. This phase is often called parsing.
This phase analyses the syntax tree to determine if the program violates certain consistency requirements, e.g., if a variable is used but not declared or if it is used in a context that does not make sense given the type of the variable, such as trying to use a boolean value as a function pointer.
Intermediate code generation
The program is translated to a simple machineindependent intermediate language. Register allocation The symbolic variable names used in the intermediate code are translated to numbers, each of which corresponds to a register in the target machine code.
Machine code generation
The intermediate language is translated to assembly language (a textual representation of machine code) for a specific machine architecture.
Assembly and linking
The assembly-language code is translated into binary representation and addresses of variables, functions, etc., are determined.
The first three phases are collectively called the frontend of the compiler and the last three phases are collectively called the backend.
The middle part of the compiler is in this context only the intermediate code generation, but this often includes various optimisations and transformations on the intermediate code.
Each phase, through checking and transformation, establishes stronger invariants on the things it passes on to the next, so that writing each subsequent phase is easier than if these have to take all the preceding into account.
For example, the type checker can assume absence of syntax errors and the code generation can assume absence of type errors.
Assembly and linking are typically done by programs supplied by the machine or operating system vendor, and are hence not part of the compiler itself.