A C source file becomes an executable file through translation steps that preprocess source, generate lower-level code, assemble machine code into object files, and link those object files into a loadable program file.
Learning Question
How does a .c file become an executable file?
Earlier chapters focused on how C concepts map to instructions, registers, memory, pointers, arrays, and stack frames.
This chapter steps back and asks where those instructions come from.
A C source file is not directly executable by the CPU.
It must be transformed into a file format that contains machine code and metadata the operating system can load.
The first mental model is:
An executable file is a packaged result of translation and linking. It is not the source file, and it is not yet a running process.
The Simplified Path
A common simplified path is:
C source code
-> preprocessing
-> compilation
-> assembly
-> object file
-> linking
-> executable fileTools such as gcc often hide these steps behind one command:
gcc add.c -o addThat command can preprocess, compile, assemble, and link.
The single command is convenient, but the conceptual stages are different.
Keeping them separate prevents confusion later.
Source Files
A C source file is text written by the programmer.
For example:
#include <stdio.h>
int add(int left, int right)
{
int result = left + right;
return result;
}
int main(void)
{
int value = add(2, 3);
printf("%d\n", value);
return 0;
}This file contains C syntax, names, types, function definitions, and preprocessor directives such as:
#include <stdio.h>The source file is the starting representation.
It is written for humans and translation tools.
It is not the binary instruction form the CPU executes.
Preprocessing
Preprocessing handles C preprocessor directives before ordinary compilation.
Common preprocessing work includes:
- expanding
#include - expanding macros
- handling conditional compilation such as
#if - removing comments
For example:
#include <stdio.h>causes declarations from a header to become visible to the translation unit.
The preprocessor does not turn the program into machine code.
It produces a source-like result that the compiler proper can compile.
The boundary is:
Preprocessing prepares C text for compilation; it does not create the final executable.
Compilation
Compilation translates the preprocessed C program into a lower-level representation.
Depending on the toolchain and options, the compiler may emit assembly text as an intermediate output.
For example:
gcc -S -O0 add.c -o add.sproduces assembly text for inspection.
That assembly might include code for the add function:
add:
pushq %rbp
movq %rsp, %rbp
movl %edi, -20(%rbp)
movl %esi, -24(%rbp)
movl -20(%rbp), %edx
movl -24(%rbp), %eax
addl %edx, %eax
movl %eax, -4(%rbp)
movl -4(%rbp), %eax
popq %rbp
retThe exact output may vary by compiler, target, and options.
The important point is that compilation is where C source-level constructs are lowered toward instruction-level form.
Assembly
Assembly, as a build step, turns assembly text into machine-code bytes inside an object file.
This can be confusing because the word “assembly” is used in two related ways:
| Meaning | Description |
|---|---|
| assembly language | human-readable instruction-level text |
| assembly step | the tool step that turns assembly text into machine code |
An assembler reads assembly text and produces object code.
For example, gcc can stop after producing an object file:
gcc -c add.c -o add.oThis command does not produce the final executable.
It produces an object file.
Object Files
An object file is a compiled but not fully linked file.
It usually contains:
- machine code for compiled functions
- data for compiled objects
- symbol information
- relocation information
- section metadata
For example, add.o may contain machine code for add and main.
But it may still refer to external functions such as:
printfThe object file can say, in effect:
this code calls printf, but the final address will be resolved laterThat unresolved relationship is one reason linking is needed.
The key boundary is:
An object file contains compiled pieces, but it is not necessarily a complete program file that can be loaded and run by itself.
Symbols
A symbol is a name used by object files and linkers to refer to code or data.
Examples include:
- function names
- global variable names
- external library function names
In the C source, printf is a function name.
In an object file, printf may appear as an unresolved external symbol.
The linker uses symbols to connect references to definitions.
This is different from local C variable names.
Many local variable names do not appear as ordinary linker symbols.
The symbol system is about connecting compiled pieces at the object-file and executable-file level.
Linking
Linking combines object files and libraries into an executable file.
The linker resolves references between compiled pieces.
For example:
gcc add.o -o addcan link add.o with the needed startup code and libraries to produce an executable named add.
In a larger program, linking may combine multiple object files:
main.o
math.o
io.o
libraries
-> executableThe linker decides how the compiled pieces fit together in the final executable file.
It also records information needed for loading and dynamic linking when the platform uses those mechanisms.
Executable Files
An executable file is a file representation of a program that the operating system can load.
It contains machine code and metadata.
Depending on the platform, executable formats include:
- ELF on many Linux systems
- PE on Windows
- Mach-O on macOS
This collection does not need to fully teach executable file formats.
The important point is:
The executable file is the packaged form of the program before it becomes a running process.
It is more concrete than source code.
It is still not the same thing as a running program.
The next chapter explains that boundary.
Why the Executable Is Not the Running Program
The executable file exists on disk.
A running process exists after the operating system loads that executable and creates runtime state.
The executable file can contain:
- instruction bytes
- read-only data
- initialized data
- metadata about sections or segments
- information for dynamic linking
- entry-point information
A process additionally has:
- a virtual address space
- a stack
- heap state
- loaded shared libraries
- register state
- operating-system process metadata
So the executable is necessary for execution, but it is not execution itself.
Commands as Learning Boundaries
Different gcc commands expose different boundaries:
| Command | Stops After |
|---|---|
gcc -E add.c -o add.i | preprocessing |
gcc -S add.c -o add.s | assembly text generation |
gcc -c add.c -o add.o | object file generation |
gcc add.c -o add | executable generation |
These commands are useful because they make the hidden stages inspectable.
They also show why “the compiler makes an executable” is a convenient shortcut.
More precisely, the toolchain performs multiple steps, and gcc often drives those steps for the programmer.
What This Chapter Does Not Explain Yet
This chapter explains the path from source code to executable file.
It does not yet fully explain:
- full compiler internals
- optimization passes
- object-file format details
- relocation records in depth
- static versus dynamic linking in detail
- loader behavior
- process memory maps
- shared library loading
Those topics matter, but the first boundary is:
Source code is translated and linked into an executable file before the operating system can create a running process from it.
Core Mental Model
Keep these boundaries separate:
- C source code is the human-readable starting representation.
- Preprocessing prepares source text for compilation.
- Compilation lowers C toward instruction-level form.
- Assembly turns assembly text into machine-code bytes.
- Object files contain compiled pieces that may still need linking.
- Linking combines object files and libraries into an executable file.
- An executable file is a loadable program file, not a running process.
When reasoning about the build path, ask:
Am I looking at source text, assembly text, an object file, a linked executable, or a running process?
Final Summary
A C program becomes executable through multiple translation steps.
The source file is preprocessed and compiled, assembly text may be produced, machine code is placed into object files, and the linker combines compiled pieces into an executable file.
That executable file is the program’s loadable representation, but it does not become a running program until the operating system creates a process from it.