=====> COMPILATION PROCESS <======
|
|—-> Input is Source file(.c)
|
V
+=================+
| |
| C Preprocessor |
| |
+=================+
|
| —> Pure C file ( comd:cc -E <file.name> )
|
V
+=================+
| |
| Lexical Analyzer|
| |
+—————–+
| |
| Syntax Analyzer |
| |
+—————–+
| |
| Semantic Analyze|
| |
+—————–+
| |
| Pre Optimization|
| |
+—————–+
| |
| Code generation |
| |
+—————–+
| |
| Post Optimize |
| |
+=================+
|
|—> Assembly code (comd: cc -S <file.name> )
|
V
+=================+
| |
| Assembler |
| |
+=================+
|
|—> Object file (.obj) (comd: cc -c <file.name>)
|
V
+=================+
| Linker |
| and |
| loader |
+=================+
|
|—> Executable (.Exe/a.out) (com:cc <file.name> )
|
V
Executable file(a.out)
C preprocessor:-
C preprocessing is the first step in the compilation. It handles:
#define statements.
#include statements.
Conditional statements.
Macros
The purpose of the unit is to convert the C source file into Pure C code file.
C compilation :
There are six steps in the unit :
1) Lexical Analyzer:
It combines characters in the source file, to form a “TOKEN”. A token is a set of characters that do not have ‘space’, ‘tab’ and ‘new line’. Therefore this unit of compilation is also called “TOKENIZER”. It also removes the comments, generates symbol table and relocation table entries.
2) Syntactic Analyzer:
This unit check for the syntax in the code. For ex:
{
int a;
int b;
int c;
int d;
d = a + b – c * ;
}
The above code will generate the parse error because the equation is not balanced.
This unit checks this internally by generating the parser tree as follows:
=
/ \
d –
/ \
+ *
/ \ / \
a b c ?
Therefore this unit is also called PARSER.
3) Semantic Analyzer:
This unit checks the meaning of the statements. For ex:
{
int i;
int *p;
p = i;
—–
—–
—–
}
The above code generates the error “Assignment of incompatible type”.
4) Pre-Optimization:
This unit is independent of the CPU, i.e., there are two types of optimization
Preoptimization (CPU independent)
Post optimization (CPU dependent)
This unit optimizes the code in the following forms:
I) Dead code elimination
II) Subcode elimination
III) Loop optimization
I) Dead code elimination:
For ex:
{
int a = 10;
if ( a > 5 ) {
/*
…
*/
} else {
/*
…
*/
}
}
Here, the compiler knows the value of ‘a’ at compile-time, therefore it also knows that if the condition is always true. Hence it eliminates the else part in the code.
II) Sub code elimination:
For ex:
{
int a, b, c;
int x, y;
/*
…
*/
x = a + b;
y = a + b + c;
/*
…
*/
}
can be optimized as follows:
{
int a, b, c;
int x, y;
/*
…
*/
x = a + b;
y = x + c; // a + b is replaced by x
/*
…
*/
}
III) Loop optimization:
For ex:
{
int a;
for (i = 0; i < 1000; i++ ) {
/*
…
*/
a = 10;
/*
…
*/
}
}
In the above code, if ‘a’ is local and not used in the loop, then it can be optimized as follows:
{
int a;
a = 10;
for (i = 0; i < 1000; i++ ) {
/*
…
*/
}
}
5) Code generation:
Here, the compiler generates the assembly code so that the more frequently used variables are stored in the registers.
6) Post-Optimization:
Here the optimization is CPU dependent. Suppose if there are more than one jumps in the code then they are converted to one as:
—–
jmp:<addr1>
<addr1> jmp:<addr2>
—–
—–
The control jumps to the directly.
Then the last phase is Linking (which creates executable or library). When the executable is run, the libraries it requires are Loaded.
Now, we will see how to achieve it.
GCC compiles a C/C++ program into an executable in 4 steps.
For example, gcc -o hello hello.c
is carried out as follows:
1. Pre-processing
Preprocessing via the GNU C Preprocessor (cpp.exe
), which includes the headers (#include
) and expands the macros (#define
).
cpp hello.c > hello.i
The resultant intermediate file “hello.i” contains the expanded source code.
2. Compilation
The compiler compiles the pre-processed source code into assembly code for a specific processor.
gcc -S hello.i
The -S option specifies to produce assembly code, instead of object code. The resultant assembly file is “hello.s”.
3. Assembly
The assembler (as.exe
) converts the assembly code into machine code in the object file “hello.o”.
as -o hello.o hello.s
4. Linker
Finally, the linker (ld.exe
) links the object code with the library code to produce an executable file “hello”.
ld -o hello hello.o ...libraries...