UNIX/Linux & C Programming:
Debugging C in UNIX/Linux


Author: Kevin Brown


Compiling for Debugging

gcc -g


When running gcc with the -g flag, the compiler adds links to debugging information to the binaries it creates. In the make file, use -g when compiling the source code. There are 4 different levels of debugging information that you can link to. For explicitly compiling without debugging information use -g0. This is used to negate -g, generally in makefiles. For compiling with minimal debugging information use -g1. This option links only information on functions and external variables. By default, gcc uses -g2 when given -g. This option also includes information for local variables and line numbers. For maximum debugging information, use -g3 which includes macro definitions.

Example:

   $ gcc -g -c foo.c


GDB: GNU Project Debugger

gdb is a command line debugger available for most UNIX and Windows . It can be used to debug programs written in Ada, C, C++, Objective-C, Pascal, and many other languages. Another powerful feature of gdb is that it can debug a process either locally or remotely.

To open gdb on executable foo:

   $ gdb foo

Common gdb commands:

  • run <program> [args]
       runs the program with given command line arguments

  • kill
       ends running process

  • quit
       exits gdb

  • continue
       resumes a paused process

  • list
       displays a few lines of code above and below current line of execution

  • next
       progresses to a new line of code locally, stepping over functions

  • step
       progresses to the next line of execution, stepping into functions

  • until
       progresses to a new line of code locally that has a line number greater than the last, stepping over functions

  • print <variable>
       displays value of given variable

  • set var <variable> = <value>
       stores the given value into the given variable

  • call<function name>
       calls the given function at the current line of execution li>
  • finish
       continues until end of the current function then prints out return value

  • help [command]
       displays information on given command, if not command is given, displays help options

Breakpoint commands:
   gdb will pause the running process when it reaches a line with a breakpoint on it

  • break <line>
       adds a breakpoint at given line number, for multiple files use <file>::<number>

  • break <function name>
       adds a breakpoint at given function

  • tbreak <line>
       adds a temporary breakpoint at given line number, this breakpoint is removed after it is paused at

  • tbreak <function name>
       adds a temporary breakpoint at function, this breakpoint is removed after it is paused at

  • info breakpoints
       displays a list of all breakpoints, use info watchpoints to display list of watchpoints.

  • disable <breakpoint/watchpoint number>
       gdb will not pause at given breakpoint

  • ignore <breakpoint/watchpoint number> <number of times to skip>
       gdb will skip over the given breakpoint a given number of times

Watchpoint commands:
   gdb will pause the running process when it accesses a memory with a watchpoint on in

  • watch <variable>
       places a write watchpoint at given variable

  • rwatch <variable>
       places a read watchpoint at given variable

  • awatch <variable>
       places a read and write watchpoint at given variable

  • Note: commands that control breakpoints will also control watchpoints

Other advanced commands:

  • backtrace
       displays the current process stack

  • frame <frame number>
       changes the current stack frame to given stack frame

  • info frames
       displays details of the current stack frame

  • info locals
       displays a list of local variables along with their values

  • info args
       displays a list of the arguments of current stack frame and their values

  • x/<format> <memory location>
       displays what is stored at given memory location, see help x for a list of format codes

  • info registers
       displays what is stored in the processor registers

  • core <core dump>
       loads given core dump for examination
       Note: the name of a core dump is generally core, which would make the call core core

  • disassemble <memory location>
       displays the assembly code at given memory location

For full documentation of gdb run:

   $ info gdb 


Sample GDB Session

Here is a program to output statistics on a list of numbers

     1  #include <stdio.h>
     2  #include <stdlib.h>
     3  #include <math.h>
     4
     5  int cmpfunc (const void * a, const void * b){
     6     double diff = ( *(double*)a - *(double*)b );
     7     return (int) diff;
     8  }
     9
    10  double mean(double nums[], int count){
    11     double sum = 0;
    12     int i;
    13     
    14     for (i=0; i < count; i++)
    15     {
    16        sum += nums[i];
    17     }
    18     
    19     sum /= count;
    20     
    21     return sum;
    22  }
    23
    24  double median(double nums[], int count){
    25     int mid = count/2;
    26     
    27     return nums[mid];
    28  }
    29
    30  double stddev(double nums[], int count, double mean){
    31     int i;
    32     double variance = 0;
    33     for (i=0; i <count; i++)
    34        variance += nums[i]-mean * nums[i]-mean;
    35     variance /= count;
    36     return sqrt(variance);
    37  }
    38
    39  double max(double nums[]){
    40     return nums[0];
    41  }
    42
    43  double min(double nums[], int count){
    44    return nums[count];
    45  }
    46
    47  int main(int argc, char* argv[]){
    48     double avr;
    49     int i;
    50     int count = argc -1;
    51     double* nums = malloc(count);
    52     for (i = 1; i < count; i++)
    53        nums[i] = atof(argv[i]);
    54     
    55     qsort(nums, count, sizeof(double), cmpfunc);
    56     printf(\"Max: %.2lf\\n\", max(nums));
    57     printf(\"Min: %.2lf\\n\", min(nums,count));
    58     printf(\"Mean: %d\\n\", (avr = mean(nums,count)));
    59     printf(\"Median: %d\\n\", median(nums,count));
    60     printf(\"Standard Deviation : %d\\n\", stddev(nums,count,avr));
    61     return 0;
    62  }

Fist run:
    $ ./a.out 1.1 1.2 1.5 1.3 5.5 2.7
    Segmentation fault
Run it in GDB:
    gdb ./a.out
    (gdb) run 1.1 1.2 1.5 1.3 5.5 2.7
    Starting program: /home/a.out 1.1 1.2 1.5 1.3 5.5 2.7

    Program received signal SIGSEGV, Segmentation fault.
    0x00007ffff774f43c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
Information given is not very helpful, lets step through to see where it faults
    (gdb) break main
    Breakpoint 1 at 0x400907: file stats.c, line 50.
    (gdb) run 1.1 1.2 1.5 1.3 5.5 2.7
    Starting program: /home/a.out 1.1 1.2 1.5 1.3 5.5 2.7

    Breakpoint 1, main (argc=8, argv=0x7fffffffe188) at stats.c:50
    50	   int count = argc -1;
    (gdb) next
    51	   double* nums = malloc(count);
    (gdb) next
    52	   for (i = 0; avr < count; i++)
    (gdb) next
    53	      nums[i] = atof(argv[i]);
    (gdb) until

    Program received signal SIGSEGV, Segmentation fault.
    0x00007ffff774f43c in ?? () runfrom /lib/x86_64-linux-gnu/libc.so.6
It looks like the Segmentation fault is in the loop, so there must a memory violation.
Not enough memory is being allocated on line 51. Lets change that.
    51	   double* nums = malloc(sizeof(double)*count);
Now lets run it again to see if it works
    $ a.out 1.1 1.2 1.5 1.3 5.5 2.7
    Max: 0.00
    Min: 0.00
    Mean: 1.77
    Median: 1.50
    Standard Deviation : -nan
There are still some problems, lets see if the array is being made correctly
    (gdb) break 55
    Breakpoint 3 at 0x40097a: file stats.c, line 55.
    (gdb) run 1.1 1.2 1.5 1.3 5.5 2.7
    Starting program: /home/a.out 1.1 1.2 1.5 1.3 5.5 2.7

    Breakpoint 1, main (argc=7, argv=0x7fffffffe198) at stats.c:55
    55	   qsort(nums, count, sizeof(double), cmpfunc);
    (gdb) print nums[0]
    $11 = 0
    (gdb) print nums[1]
    $12 = 1.1000000000000001
It is not, lets fix the loop and then run it
    52	   for (i = 1; i <= count; i++)
    53	      nums[i] = atof(argv[i]);
	
    $ ./a.out 1.1 1.2 1.5 1.3 5.5 2.7
    Max: 1.10
    Min: 0.00
    Mean: 2.22
    Median: 1.30
    Standard Deviation : -nan
That may have fixed something, but its still not right, lets see if its in the sort function causing problems
    (gdb) run 1.1 1.2 1.5 1.3 5.5 2.7
    Starting program: /home/kevin/Desktop/a.out 1.1 1.2 1.5 1.3 5.5 2.7

    Breakpoint 1, main (argc=7, argv=0x7fffffffe198) at stats2.c:55
    55	   qsort(nums, count, sizeof(double), cmpfunc);
    (gdb) next
    56	   printf("Max: %.2lf\n", max(nums));
    (gdb) print count
    $1 = 6
    (gdb) print (double[6]) *nums
    $2 = {1.1000000000000001, 1.2, 1.5, 1.3, 2.7000000000000002, 5.5}
The sort doesn't seem to work quite right, and its sorting it the opposite of what we want
There must be a problem with the compare function
     5	int cmpfunc (const void * a, const void * b){
     6	   double diff = ( *(double*)b - *(double*)a );
     7	   return (int) ceil(diff);
     8	}
	 
    $ ./a.out 1.1 1.2 1.5 1.3 5.5 2.7
    Max: 5.50
    Min: 0.00
    Mean: 2.22
    Median: 1.30
    Standard Deviation : -nan
That looks a lot better, now lets look at the Standard Deviation function to see what is wrong there
    (gdb) break stddev
    Breakpoint 1 at 0x400863: file stats2.c, line 32.
    (gdb) run 1.1 1.2 1.5 1.3 5.5 2.7
    Starting program: /home/kevin/Desktop/a.out 1.1 1.2 1.5 1.3 5.5 2.7
    Max: 5.50
    Min: 0.00
    Mean: 2.22
    Median: 1.30

    Breakpoint 1, stddev (nums=0x603010, count=6, mean=2.2166666666666663)
        at stats2.c:32
    32	   double variance = 0;
    (gdb) next
    33	   for (i=0; i <count; i++)
    (gdb) next
    34	      variance += nums[i]-mean * nums[i]-mean;
    (gdb) next
    35	   for (i=0; i <count; i++)
    (gdb) print variance
    $1 = -8.9083333333333314
Variance should never be negative, looks like the equation is missing parenthesis
    34	      variance += (nums[i]-mean) * (nums[i]-mean);

    $ ./a.out 1.1 1.2 1.5 1.3 5.5 2.7
    Max: 5.50
    Min: 0.00
    Mean: 2.22
    Median: 1.30
    Standard Deviation : 1.56
Looks good, except there is not a 0 in the data set. Lets go through one more time and see where it is coming from
    (gdb) break 57
    Breakpoint 1 at 0x400a21: file stats2.c, line 57.
    (gdb) run 1.1 1.2 1.5 1.3 5.5 2.7
    Starting program: /home/kevin/Desktop/a.out 1.1 1.2 1.5 1.3 5.5 2.7
    Max: 5.50

    Breakpoint 1, main (argc=7, argv=0x7fffffffe198) at stats2.c:57
    57	   printf("Min: %.2lf\n", min(nums,count));
    (gdb) print (double[6]) *nums
    $1 = {5.5, 2.7000000000000002, 1.5, 1.3, 1.2, 1.1000000000000001}
    (gdb) step
    min (nums=0x603010, count=6) at stats2.c:44
    44	  return nums[count];
    (gdb) print nums[count]
    $2 = 0
    (gdb) print count
    $3 = 6
    (gdb) print nums[count-1]
    $4 = 1.1000000000000001
Ok now that we have found the source of the problem, lets fix it and hope everything works
    50	  return nums[count-1];

    $ ./a.out 1.1 1.2 1.5 1.3 5.5 2.7
    Max: 5.50
    Min: 1.10
    Mean: 2.22
    Median: 1.30
    Standard Deviation : 1.56
And with that, its fixed.


DDD: DataDisplay Debugger

ddd is a graphical interface for command line debuggers. By default, ddd uses gdb as its underlying debugger.
ddd can run all commands of its underlying debugger.

To open ddd on executable foo:

   $ ddd foo
To open ddd on executable foo with alternative debugger xdb:
   $ ddd --xdb foo

Using ddd remotely in Windows:

Firstly Xming must be on the system. With Xming running, open an ssh client such as PuTTY. Before connecting go to Connection->SSH->X11 and check the box labeled "Enable X11 forwarding". The system should automatically forward to the current display in use, if not set X display location to "localhost:0". Proceed to connect as usual.

Using ddd remotely in UNIX:

Firstly X server must be on the system. On Linux systems, X server is built in. On MacOS, XQuartz must be downloaded. With X server running, use the command: ssh -X <server> to connect with X display forwarding. The system should automatically forward to the current display in use, if not, use the command: export DISPLAY=:0.0 to set the DISPLAY environment variable.

Note: when running ddd remotely, there can be noticeable delays in display updates.

Using ddd:

To run a program in ddd with command line arguments, go to program->run and enter arguments.

You can edit the source code in ddd through the edit function. This opens a vi window where the source code can be edited. After editing, ddd can run make with the make command to build the program. This does require a makefile to be in the same directory as the source code.

ddd allows for the display of variables in a grid area. To add a variable to the display area, right click the variable and select display, or right click display area and select new display. To display dynamically allocated arrays, create a new display and enter <array> @<size>

ddd also allows for plots of variables to be created through gnuplot. The plot is updated each time the program runs. To plot a variable, highlight the variable and click on the plot button on the top of the window


Valgrind

valgrind is a command used to run diagnostics on a program. Like ddd, valgrind runs an underlying tool on your program. It then gives a detailed report on its findings.

To run valgrind on executable foo:

   $ valgrind ./foo

Common valgrind options:

  • --tool=<toolname>
       valgrind will include in its report any file descriptors opened their statuses.

  • --trace-children=<yes|no>
       valgrind will trace through processes that were created with exec. Note: this option is not needed to trace through forked processes, as valgrind does that by default.

  • --track-fds=<yes|no>
       valgrind will include in its report any file descriptors opened their statuses.

valgrind has many different tool options, the most common of witch are Memcheck, Massif, Helgrind and DRD.

Memcheck

By default, valgrind uses Memcheck to analyze memory use throughout the process and returns any possible memory errors in the program such as memory leaks. It will also tell you when the program tries to use uninitialized memory, read or write memory after it has been freed as well as if it tries to read or write out of the bounds of a block created with malloc.

Here is a program that uses poor memory management

     1  #include <stdio.h>
     2  #include <string.h>
     3  #include <stdlib.h>
     4
     5  int findsize(int number, int base){
     6     int size = 1;
     7     while (number >= base){
     8        number /= base;
     9            size++;
    10     }
    11  }
    12
    13  int main(int argc, char **argv){
    14     FILE *file1, *outfile;
    15
    16     char* fileName1 = strdup(argv[1]);
    17     char* number;
    18
	19     int num, i, amount, size;
    20
	21     if((file1 = fopen ( fileName1, "r" )) == NULL){
    22        fprintf (stderr, "%s: %s: No such file or directory\n", argv[0], fileName1);
    23        exit(2);
    24     }
    25
    26     outfile = fopen ( "output", "w" );
    27
    28     fscanf(file1,"%d",&amount);
    29
    30     for (i = 0; i < amount; i++){
    31        fscanf(file1,"%d",&num);
    32            size = findsize(num, 10);
    33            number = malloc(size);
    34            sprintf(number, \"%d\", num);
    35            fprintf(outfile,\"Number: %s\\n\",number);
    36            size = findsize(num, 8);
    37            number = malloc(size);
    38            sprintf(number, \"%o\", num);
    39            fprintf(outfile,\"Octal: %s\\n\",number);
    40            size = findsize(num, 16);
    41            number = malloc(size);
    42            sprintf(number, \"%x\", num);;
    43            fprintf(outfile,\"Hex: %s\\n\",number);
    44     }
    45     return 0;
    46  }
This is a snippet of the output of Memcheck
==5013== HEAP SUMMARY:
==5013==     in use at exit: 1,258 bytes in 33 blocks
==5013==   total heap usage: 33 allocs, 0 frees, 1,258 bytes allocated
==5013==
==5013== LEAK SUMMARY:
==5013==    definitely lost: 122 bytes in 31 blocks
==5013==    indirectly lost: 0 bytes in 0 blocks
==5013==      possibly lost: 0 bytes in 0 blocks
==5013==    still reachable: 1,136 bytes in 2 blocks
==5013==         suppressed: 0 bytes in 0 blocks

Massif

When run with the --tool=massif option, valgrind uses Massif to analyze the heap throughout the process. When using Massif, valgrind will create a graph of heap use throughout the process' execution. To view the graph use the command ms_print massif.out.#####
If the programs runs quickly, change the time intervals with the option --time-unit=B

Here is a program that uses lots of memory

     1  #include <stdio.h>
     2  #include <stdlib.h>
     3
     4  typedef struct list_struc{
     5     int matrix[500][500];
     6     char *teams[500];
     7  } tournament;
     8
     9  int main(int argc, char* argv[]){
    10     tournament *tier1 = malloc(sizeof(tournament));
    11     tournament *tier2 = malloc(sizeof(tournament));
    12     tournament *tier3 = malloc(sizeof(tournament));
    13     //do something with the tournaments
    14     free (tier2);
    15     tier2 = malloc(sizeof(tournament));
    16     tier2 = malloc(sizeof(tournament));
    17     tier2 = malloc(sizeof(tournament));
    18     free (tier1);
    19     free (tier2);
    20     free (tier3);
    21
    22     return 0;
    23  }
This is the graph of heap usage
--------------------------------------------------------------------------------


    MB
4.787^                                                  #######
     |                                                  #
     |                                                  #
     |                                                  #
     |                                           :::::::#      :::::::
     |                                           :      #      :
     |                                           :      #      :
     |                                           :      #      :
     |                     @@@@@@@        ::::::::      #      :      :::::::
     |                     @              :      :      #      :      :
     |                     @              :      :      #      :      :
     |                     @              :      :      #      :      :
     |              :::::::@      :::::::::      :      #      :      :      :
     |              :      @      :       :      :      #      :      :      :
     |              :      @      :       :      :      #      :      :      :
     |              :      @      :       :      :      #      :      :      :
     |       ::::::::      @      :       :      :      #      :      :      :
     |       :      :      @      :       :      :      #      :      :      :
     |       :      :      @      :       :      :      #      :      :      :
     |       :      :      @      :       :      :      #      :      :      :
   0 +----------------------------------------------------------------------->MB
     0                                                                   9.575

Helgrind

When run with the --tool=helgrind -v option, valgrind uses Helgrind to analyze the process as well as its children. Helgrind's main use is to see if any memory is accessed by multiple processes and will report any possible race conditions, due to improper or non-existent use of a mutex.

Here is a program that has race conditions

     1  #include <stdio.h>
     2  #include <pthread.h>
     3
     4  int sum;
     5
     6  void* child_fn ( void* arg ) {
     7     int i;
     8     for(i =0; i <1000; i++)   {
     9       sum = sum +1;
    10       if(sum %100 != 0 && sum %50 == 0)
    11         printf("%d\n",sum);
    12     }
    13     return NULL;
    14  }
    15
    16  int main(int argc, char* argv[]){
    17
    18     sum = 0;
    19     pthread_t child;
    20     pthread_create(&child, NULL, child_fn, NULL);
    21     int i;
    22     for(i =0; i <1000; i++)   {
    23       sum++;
    24       if(sum %100 == 0)
    25         printf("%d\n",sum);
    26     }
    27     pthread_join(child, NULL);
    28
    29  }
This is a snippet of the output of Helgrind
==5057== Possible data race during read of size 4 at 0x601054 by thread #1
==5057== Locks held: none
==5057==    at 0x4007A8: main (in /home/FA_14_CPS444_03/debug/valgrindExamples/a.out)
==5057==
==5057== This conflicts with a previous write of size 4 by thread #2
==5057== Locks held: none
==5057==    at 0x4006EE: child_fn (in /home/FA_14_CPS444_03/debug/valgrindExamples/a.out)
==5057==    by 0x4C2BEE6: ??? (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==5057==    by 0x4E3C0DA: start_thread (in /lib64/libpthread-2.18.so)

DRD

When run with the --tool=drd -v option, valgrind uses DRD to analyze the process as well as its children, much like Helgrind does. DRD is less powerful than Helgrind, as it does not detect improper synchronization; however it uses significantly less memory while running.

This is a snippet of the output of DRD on the same program

==5060== Conflicting load by thread 2 at 0x00601054 size 4
==5060==    at 0x40071A: child_fn (in /home/FA_14_CPS444_03/debug/valgrindExamples/a.out)
==5060==    by 0x4C2CA3B: ??? (in /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
==5060==    by 0x4E450DA: start_thread (in /lib64/libpthread-2.18.so)


Strace

strace is a command used to trace through the program and report on all system calls made throughout the process.

To run strace on executable foo:

   $ strace ./foo

Common strace options:

  • -c
       includes statistics on time, number of calls, and errors on each different library call made

  • -f
       continues the trace through child processes created with fork

  • -r
       adds a relative time-stamp to each call in the report, based on start of program

  • -t
       adds a time-stamp to each call in the report, based on time of day, to add milliseconds to time-stamp, use -tt

  • -T
       adds the time spent on each system call in the report

  • -a <column>
       aligns the values in the report to the specified column, the default being 40.

  • -e <option>
       modifies the behavior of strace based on given option
    • Common -e options:

    • trace=<what to trace>
         strace will only trace system calls specified. Specific system calls can be listed out in a comma separated list, or a a group of related system calls can be used.
      • Groups:

      • file
           system calls which take filename as an argument like open
      • process
           system calls which involve process management such as fork, wait and exec
      • network
           system calls related to network
      • signal
           system calls related to signals
      • ipc
           system calls related to inter process communication
      • desc
           system calls related to file descriptors

    • raw=<set of system calls>
         strace return undecoded arguments for the given system calls

    • signal=<set of signals>
         strace will only trace the given signals

    • read=<set of file descriptors>
         strace will return everything read from the file descriptors given
    • write=<set of file descriptors>
         strace will return everything written to the file descriptors given

  • -o <filename>
       writes output of strace to filename

  • -p <pid>
       attaches strace to a process already running with given pid

  • -S <how to sort>
       sorts output by given criteria, either time (default), calls, name, or nothing.

  • -E <name=value>
       runs strace with the addition of name=value in the environment variables

  • -E <name>
       runs strace with the removal of given environment variable

Here is a program with errors in system calls

     1  #include <string.h>
     2  #include <unistd.h>
     3  #include <fcntl.h>
     4
     5  int main(int argc, char **argv)
     6  {
     7
     8    int number;
     9
    10    char* infileName = strdup(argv[1]);
    11
    12
    13    int infile = open(infileName, O_RDONLY);
    14    int outfile = open("out", O_WRONLY);
    15
    16
    17    fork();
    18
    19    read(infile, &number, sizeof(int));
    20    write(outfile, &number, sizeof(int));
    21
    22    close(infile);
    23    close(outfile);
    24
    25    return 0;
    26
    27  }
With strace, it is very easy to locate these errors
execve("./a.out", ["./a.out", "in"], [/* 75 vars */]) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("in", O_RDONLY)                    = 3
open("out", O_WRONLY)                   = -1 ENOENT (No such file or directory)
+++ exited with 0 +++


Return Home