
Understanding the Program Building Process in C
Learn about the step-by-step process of building a C program, from source code to executable file. Discover how a C source program is transformed through preprocessing, compiling, assembling, and linking to generate the final executable file. Explore the creation of processes and the loading of programs in Unix/Linux environments.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CS252: Systems Programming Ninghui Li Based on Slides by Prof. Gustavo Rodriguez- Rivera Topic 2: Program Structure and Using GDB
What Happens From a C Source Program, to Program Execution Building a program, i.e, generating an executable file from source code What are the steps? What does an executable file look like? Loading a program Each time when you execute a program, a process is created. In Unix/Linux, use ps command to show processes in the system
Building a Program The programmer writes a program hello.c The preprocessor expands #define, #include, #ifdef etc preprocessor statements and generates a hello.i file. The compiler compiles hello.i, optimizes it and generates an assembly instruction listing hello.s The assembler (as) assembles hello.s and generates an object file hello.o The compiler (cc or gcc) by default hides all these intermediate steps. You can use compiler options to run each step independently.
Original file hello.c #include <stdio.h> main() { printf("Hello\n"); }
After preprocessor gcc -E hello.c > hello.i (-E stops compiler after running preprocessor) hello.i: /* Expanded /usr/include/stdio.h */ typedef void *__va_list; typedef struct __FILE __FILE; typedef int ssize_t; struct FILE { }; extern int fprintf(FILE *, const char *, ...); extern int fscanf(FILE *, const char *, ...); extern int printf(const char *, ...); /* and more */ main() { printf("Hello\n"); }
After assembler gcc -S hello.c (-S stops compiler after generating assembly code) Resulting file is hello.s Actual code depends on the system LC0: .ascii "Hello\0" .text .globl _main .def _main; .scl _main: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp call ___main movl $LC0, (%esp) call _puts leave ret 2; .type 32; .endef
After compiling & assembling gcc -c hello.c generates hello.o The main function already has a value in the object file hello.o hello.o has undefined symbols, like the _puts function call that we don t know where it is placed. The command nm can lists the symbols from object files
Output of nm hello.o 0000000000000000 b .bss 0000000000000000 d .data 0000000000000000 t .text U __main 0000000000000000 T main U puts // uninitilized data // Global and static vars // entry point of program // main function defined in code __main and puts are undefined in hello.o They are provided by the libraries
Building a program (continued) The linker puts together all object files as well as the object files in static libraries. The linker also takes the definitions in shared libraries and verifies that the symbols (functions and variables) needed by the program are completely satisfied. If there is symbol that is not defined in either the executable or shared libraries, the linker will give an error. Static libraries (.a files) are added to the executable. shared libraries (.so files) are not added to the executable file.
Static and Shared Libraries Shared libraries are shared across different processes. There is only one instance of each shared library for the entire system. Static libraries are not shared. There is an instance of an static library for each process.
After linking gcc o hello hello.c generates the hello executable The hello.o object code is statically linked with libraries to include code of library functions In linking, static = compilation/building time Sometimes, not all functions code are included, some code are stored in shared libraries and dynamically linked. In linking, dynamic = loading/execution time
Building a Program hello.c hello.i C Editor Compiler (cc) Optimizer Preprocessor Programmer hello.s Executable File (hello) hello.o (static) Assembler (as) Linker (ld) Other .o files Static libraries (.a files) They add to the size of the executable.
What is a program? A program is a file in a special format that contains all the necessary information to load an application into memory and make it run. A program file includes: machine instructions initialized data List of library dependencies List of memory sections that the program will use List of undefined values in the executable that will be known until the program is loaded into memory.
Executable File Formats There are different executable file formats ELF Executable Link File It is used in most UNIX systems (Solaris, Linux) Can use elfdump to see information in binary file COFF Common Object File Format It is used in Windows systems a.out Used in BSD (Berkeley Standard Distribution) and early UNIX It was very restrictive. It is not used anymore. Note: BSD UNIX and AT&T UNIX are the predecessors of the modern UNIX flavors like Solaris and Linux.
Loading a Program After one types hello in a shell, the shell creates a new process and load the file hello. The loader is a program that is used to run an executable file in a process. Before the program starts running, the loader allocates space for all the sections of the executable file (text, data, bss etc) It loads into memory the executable and shared libraries (if not loaded yet)
Loading a Program It also writes (resolves) any values in the executable to point to the functions/variables in the shared libraries.(E.g. calls to printf in hello.c) Once memory image is ready, the loader jumps to the _start entry point that calls init() of all libraries and initializes static constructors. Then it calls main() and the program begins. _start also calls exit() when main() returns. The loader is also called runtime linker .
Loading a Program Loader Executable in memory Executable File (runtime linker) (/usr/lib/ld.so.1) Shared libraries (.so, .dll)
Memory of a Process A 32-bit process sees memory as an array of bytes that goes from address 0 to 232-1 (0 to 4GB-1) (4GB-1) 232-1 0
Memory Sections The memory is organized into sections called memory mappings . Stack 232-1 Shared Libs Heap Bss Data Text 0
Memory Sections Each section has different permissions: read/write/execute or a combination of them. Text- Instructions that the program runs Data Initialized global variables. Bss Uninitialized global variables. They are initialized to zeroes. Heap Memory returned when calling malloc/new. It grows upwards. Stack It stores local variables and return addresses. It grows downwards.
Memory Sections Dynamic libraries They are libraries shared with other processes. Each dynamic library has its own text, data, and bss. Each program has its own view of the memory that is independent of each other. Virtual memory, mapped by OS to physical memory This view is called the Address Space of the program. If a process modifies a byte in its own address space, it will not modify the address space of another process.
Where things are located Program hello.c int a = 5; // Stored in data section int b[20]; // Stored in bss int main() { // Stored in text int x; // Stored in stack int *p =(int*) malloc(sizeof(int)); //In heap }
Memory Gaps Between each memory section there may be gaps that do not have any memory mapping. If the program tries to access a memory gap, the OS will send a SEGV signal that by default kills the program and dumps a core file. The core file contains the value of the variables global and local at the time of the SEGV. The core file can be used for post mortem debugging. gdb program-name core gdb> where
What is GDB GDB is a debugger that helps you debug your program. The time you spend now learning gdb will save you days of debugging time. A debugger will make a good programmer a better programmer.
Compiling a program for gdb You need to compile with the -g option to be able to debug a program with gdb. The -g option adds debugging information to your program gcc g o hello hello.c
Running a Program with gdb To run a program with gdb type gdb progname (gdb) Then set a breakpoint in the main function. (gdb) break main A breakpoint is a marker in your program that will make the program stop and return control back to gdb. Now run your program. (gdb) run If your program has arguments, you can pass them after run.
Stepping Through your Program Your program will start running and when it reaches main() it will stop. gdb> Now you have the following commands to run your program step by step: (gdb) step It will run the next line of code and stop. If it is a function call, it will enter into it (gdb) next It will run the next line of code and stop. If it is a function call, it will not enter the function and it will go through it. Example: (gdb) step (gdb) next
Setting breakpoints You can set breakpoints in a program in multiple ways: (gdb) break function Set a breakpoint in a function E.g. (gdb) break main (gdb) break line Set a break point at a line in the current file. E.g. (gdb) break 66 It will set a break point in line 66 of the current file. (gdb) break file:line It will set a break point at a line in a specific file. E.g. (gdb) break hello.c:78
Regaining the Control When you type (gdb) run the program will start running and it will stop at a break point. If the program is running without stopping, you can regain control again typing ctrl-c.
Where is your Program The command (gdb)where Will print the current function being executed and the chain of functions that are calling that fuction. This is also called the backtrace. Example: (gdb) where #0 main () at test_mystring.c:22 (gdb)
Printing the Value of a Variable The command (gdb) print var Prints the value of a variable. E.g. (gdb) print i $1 = 5 (gdb) print s1 $1 = 0x10740 "Hello" (gdb) print stack[2] $1 = 56 (gdb) print stack $2 = {0, 0, 56, 0, 0, 0, 0, 0, 0, 0} (gdb)
Exiting gdb The command quit exits gdb. (gdb) quit The program is running. Exit anyway? (y or n) y
Debugging a Crashed Program This is also called postmortem debugging It has nothing to do with CSI When a program crashes, it writes a core file. bash-4.1$ ./hello Segmentation Fault (core dumped) bash-4.1$ The core is a file that contains a snapshot of the program at the time of the crash. That includes what function the program was running.
Debugging a Crashed Program To run gdb in a crashed program type gdb program core E.g. bash-4.1$ gdb hello core GNU gdb 6.6 Program terminated with signal 11, Segmentation fault. #0 0x000106cc in main () at hello.c:11 11 *s2 = 9; (gdb) Now you can type where to find out where the program crashed and the value of the variables at the time of the crash. (gdb) where #0 0x000106cc in main () at hello.c:11 (gdb) print s2 $1 = 0x0 (gdb) This tells you why your program crashed. Isn t that great?
Now Try gdb in Your Own Program Make sure that your program is compiled with the g option. Remember: One hour you spend learning gdb will save you days of debugging. Faster development, less stress, better results
Call Stack Aka. Execution stack, control stack, run-time stack, machine stack Why do we need to use stacks in processes? To support function calls, and especially recursive function calls. What are stored on the stack? Functional call parameters Local Return address Saved state information CS5 26 Topic 9: Software Vulnerabilities 41
Stack Frame High Address Parameters Return address Saved Stack Frame Pointer Stack Growth Local variables Low Address SP
Code Fragment for Printing Stack Frame (from prstack.c) int main(int argc, char*argv[]) { int fac(int a, int p) { int n; char f[8] = " "; int r; int b = 0; if (argc == 2) { // print stack frame n = atoi(argv[1]); gets(f); // buffer may overflow r = fac(n, 0); if (a == 1) { b = 1; } } else if (argc == 3) { else { b = a * fac(a-1,p); } n = atoi(argv[2]); // print stack frame again } r = fac(n, 1); return b; } } return 0; }
Code Fragment for Printing Stack Frame (from prstack.c) int fac(int a, int p) { char f[8] = " "; int b = 0; printf("Address %p: argument int p: 0x%.8x\n", &p, p); printf("Address %p: argument int a: 0x%.8x\n", &a, a); printf("Address %p: return address : 0x%.8x\n", &a-1, *(&a-1)); printf("Address %p: saved stack frame p: 0x%.8x\n", &a-2, *(&a-2)); printf("Address %p: local var f[4-7] : 0x%.8x\n", (char *)(&f)+4, *((int *)(&f[4]))); printf("Address %p: local var f[0-3] : 0x%.8x\n", &f, *((int *)f)); printf("Address %p: local var int b: 0x%.8x\n", &b, b); printf("Address %p: gap : 0x%.8x\n", &b-1, *(&b-1)); }
Printed Stack Frame Entering function call fac(a=2), code at 0x080484a5 Address 0xff98942c: argument int p: 0x00000001 Address 0xff989428: argument int a: 0x00000002 Address 0xff989424: return address : 0x0804860e Address 0xff989420: saved stack frame p: 0xff989440 Address 0xff98941c: local var f[4-7] : 0x00202020 Address 0xff989418: local var f[0-3] : 0x20202020 Address 0xff989414: local var int b: 0x00000000 Address 0xff989410: gap : 0x00000000 Entering function call fac(a=1), code at 0x080484a5 Address 0xff98940c: argument int p: 0x00000001 Address 0xff989408: argument int a: 0x00000001 Address 0xff989404: return address : 0x0804860e Address 0xff989400: saved stack frame p: 0xff989420 Address 0xff9893fc: local var f[4-7] : 0x00202020 Address 0xff9893f8: local var f[0-3] : 0x20202020 Address 0xff9893f4: local var int b: 0x00000000 Address 0xff9893f0: gap : 0x00000000
Stack Frame with Overflowed Buffer Entering function call fac(a=1), code at 0x080484a5 Address 0xffd5724c: argument int p: 0x00000001 Address 0xffd57248: argument int a: 0x00000001 Address 0xffd57244: return address : 0x0804860e Address 0xffd57240: saved stack frame p: 0xffd57260 Address 0xffd5723c: local var f[4-7] : 0x00202020 Address 0xffd57238: local var f[0-3] : 0x20202020 Address 0xffd57234: local var int b: 0x00000000 Address 0xffd57230: gap : 0x00000000 123456789012345 Input 15 bytes. Leaving function call fac(a=1), code at 0x80484a5 Address 0xffd5724c: argument int p: 0x00000001 Address 0xffd57248: argument int a: 0x00000001 Address 0xffd57244: return address : 0x00353433 Address 0xffd57240: saved stack frame p: 0x32313039 Address 0xffd5723c: local var f[4-7] : 0x38373635 Address 0xffd57238: local var f[0-3] : 0x34333231 Address 0xffd57234: local var int b: 0x00000001 Address 0xffd57230: gap : 0x00000001 Segmentation fault (core dumped) Overflowing f to overwrite saved sfp and return address.
What does a function do? fac 0x080484a5 <+0>: push %ebp 0x080484a6 <+1>: mov %esp,%ebp 0x080484a8 <+3>: sub $0x18,%esp 0x080484ab <+6>: movl $0x20202020,-0x8(%ebp) 0x080484b2 <+13>: movl $0x202020,-0x4(%ebp) 0x080484b9 <+20>: movl $0x0,-0xc(%ebp) 0x080484c0 <+27>: mov 0xc(%ebp),%eax 0x080484c3 <+30>: test %eax,%eax 0x080484c5 <+32>: je 0x80485e8 <fac+323> if so, skip printing frame .... 0x080485e8 <+323>: mov 0x8(%ebp),%eax 0x080485eb <+326>: cmp $0x1,%eax 0x080485ee <+329>: jne 0x80485f9 <fac+340> if not, call fac 0x080485f0 <+331>: movl $0x1,-0xc(%ebp) 0x080485f7 <+338>: jmp 0x8048617 <fac+370> . 0x08048609 <+356>: call 0x80484a5 <fac> 0x0804860e <+361>: mov 0x8(%ebp),%edx 0x08048611 <+364>: imul %edx,%eax save stack frame pointer (fp) set current stack fp allocate space for local var initialize f[0-3] initialize f[4-7] initialize b load value of p to eax check if eax is 0 load value of a to eax check if a==1 otherwise, assigns 1 to b
GDB commands for examining stack frames backtrace frame info frame info f bt f print all frames print brief current frame info print detailed current frame info See http://web.mit.edu/gnu/doc/html/gdb_8.html for more
What is Buffer Overflow? A buffer overflow, or buffer overrun, is an anomalous condition where a process attempts to store data beyond the boundaries of a fixed-length buffer. The result is that the extra data overwrites adjacent memory locations. The overwritten data may include other buffers, variables and program flow data, and may result in erratic program behavior, a memory access exception, program termination (a crash), incorrect results or especially if deliberately caused by a malicious user a possible breach of system security. Most common with C/C++ programs
History Used in 1988 s Morris Internet Worm Alphe One s Smashing The Stack For Fun And Profit in Phrack Issue 49 in 1996 popularizes stack buffer overflows Still extremely common today
What are buffer overflows? Suppose a web server contains a function: void func(char *str) { char buf[128]; strcpy(buf, str); do-something(buf); } When the function is invoked the stack looks like: sfp ret-addr str buf What if *str is 136 bytes long? After strcpy: *str ret str
Basic stack exploit Main problem: no range checking in strcpy(). Suppose *str is such that after strcpy stack looks like: top of stack *str ret Code for P Program P: exec( /bin/sh ) (exact shell code by Aleph One) When func() exits, the user will be given a shell !! Note: attack code runs in stack.