Creating new worlds with GNU Compiler Collection.
This is the first article in a series about what the compilation process is and how it works, using the C language as an example.
- Creating new worlds with GNU Compiler Collection. (this article)
- The power of a book, or how to use static libraries in C.
- Dynamic libraries in C: creating something on what the others will rely.
***
A computer with it’s current architecture designed to receive, process and store data in form of a series of binary units, usually represented as 1
and 0
.
However, we are thinking and communicating with natural language, and to breach this gap between human and computer, we have a high-level programming languages. Some of them are interpreted languages, like PHP and Python (not quite true), others are compiled, like C and Go. Today we will talk about compiled languages, compilers and compilation process with help of the simple command
gcc main.c
An abbreviation gcc
stands for GNU Compiler Collection, and command above runs GNU C compiler with the filename as an argument. This launches the process of compilation of executable, and because we are curious, let’s look what’s going on under the hood.
For this, let’s create a very simple C program:
gcc
will create the final software executable file in four phases:
1. Preprocessing (to expand any macros)
2. Compilation (from source code to Assembly)
3. Assembly (from Assembly language to machine code)
4. Linking (completes the links and creates the executable)
To compile a program, we can just type in gcc main.c
to generate the executable file named a.out
. Everything that happens between the previous statement and the final a.out
file is invisible to the programmer, unless the -save-temps
option is used:
During the first stage, Preprocessing, any macros are expanded as well as header files. With the -save-temps
option, the main.i
file will be created, and regardless of the fact that our program is small, the main.i
file is 843 lines long!
In the 2nd phase, Compilation, the code from phase #1 is translated into Assembly code for a specific CPU type, such as Intel, AMD, ARM, etc. The result of the phase is a main.s
In the 3rd phase, Assembly, the compiler will translate the assembly code into machine code and create an object file, the main.o
file in our case. It is impossible to list the contents of the main.o
file, but we should consider it as the sequence of ones and zeros.
During the last 4th phase, the Linker creates the final executable by completing the links in the object file to other libraries. By default, the file which will be generated, will have name a.out
. However, if the user initially entered gcc main.c -o main
, then the executable becomes a file named main
(-o stands for “name of the output file”). Because it is executable, it is also impossible to list it here, but instead we can execute it:
For those who really want to see the gcc compiler work its magic, gcc has a verbose option, gcc -v
which will show in the compilation process greater detail.
That’s all for today, folks!