FIG: Library-Level Error Injection for Shared Libraries Version 0.3 BETA Developers: Pete Broadwell - pbwell@cs Naveen Sastry - nks@cs Jonathan Traupman - jont@cs FIG (originally, Fault Injection in Glibc) is a UNIX tool to generate and inject errors from shared libraries into applications. It can be used to simulate failures from the underlying system environment when run on glibc library routines. In this manner, it is able to test the ability of applications to handle faults from library routines. In addition, it includes a call logging tool, similar to strace, and an error injection interface that uses a control file to describe error sequences. To install: see INSTALL To run: This distribution provides two ways of running FIG. The first, referred to as the stand-alone version, sets LD_PRELOAD and other FIG-related environment variables via the "setup" shell script, as described in the INSTALL file. Once this script is run, any programs initiated from that shell instance will use the instrumented FIG "stub" library functions in place of the usual library functions, assuming that FIG stubs have been generated for these functions. Logs of the actions of the FIG stubs during the session will be written to the files "fig.log.[pid]", where "pid" is the ID of the process making the calls. To turn off FIG for any subsequent programs run from the session, use the "reset" script, as described in the INSTALL file. The second way of running FIG, described as the "wrapper" method, uses the "fig" executable as a front-end. This wrapper runs the specified program under the FIG environment, as above. The logging information from the target program and any child processes it may spawn is written to "fig.log". The general command to invoke the wrapper is ./fig [options] program [program options] The wrapper also provides a number of other options; type './fig -h' to see them. How FIG works, in a nutshell: FIG makes use of the LD_PRELOAD environment variable, which can specify a special library file that is to be consulted prior to any other shared libraries when the system linker is searching for a function implementation. The FIG distribution includes files that facilitate the generation of replacement, or "stub" library functions. These functions are ultimately compiled into the FIG library (libfig.so). Enabling FIG sets this library as the value of the LD_PRELOAD variable. A subequent call by an application to a shared library function for which a FIG stub exists, such as malloc(), will result in the dynamic linker using the stub version of malloc() instead of the standard implementation. The FIG stub versions of library functions generally contain code to log the occurrence of a call to that function, consult the error injection rules and then either return an error code that is indicative of a certain failure (e.g., not enough memory), or else run the standard implementation of the function and return normally. A description of the FIG distribution files and how they are used: control - This is the top-level control file, which directs the error patterns that FIG will inject. When 'make' is run to compile the FIG library, it is pre-processed, resulting in control.out. This is the version that the instrumentation stubs consult in order to build structures that describe the error injection rules they will follow. The entries in the control file are fairly self-explanatory: all C comments are included only for human-readability. For each instrumented stub, it is necessary to include a reference to its function number (e.g., MALLOC_INDEX) and then one or more lines describing error types and frequencies. These are of the format callnumber [callno] return [error return code] errno [errno, if applicable] probability [prob, where 1.0 = 1/1, and 0.3 = 3/10] - or - interval [start of interval] to [end of interval, can use 'infinity'] ... The call numbers represent a count of the number of invocations of the instrumented function since the program began. func.desc - A tab-delimited file that gives specifics about the function stubs that are automatically generated for FIG when 'make' is run. Its format: column 1 the function name that is being overriddent. $(NAME)_INDEX (see func.h for auto-assigned index numbers) is used in the control file to refer to this function 2 the secondary definition of the function that the library should call to run the standard implementation of this function 3 the return type of the function 4 space-separated list of argument types. This is the string which is passed into fprintf to write the arguments to the log 5 \t 6 ... tab-separated pairs of argument types and names To find the secondary definition of a glibc function call, run 'objdump -T' on your system's libc.so file, and grep through the output for likely secondary names, usually of the form __libc_[function name]. If no secondary definition exists for the function, you may have to use more inventive measures, like calling getLibraryFunction() (see below) or creating a copy of the instrumented file with a different name. userfunc.h - The FIG library must be rebuilt after func.desc is modified, as described in the INSTALL file. At this time, not all functions can be auto-generated using func.desc (and the stubgen.awk script). Examples of manually-generated stub files in the distribution are malloc.c and execve.c. Entries for additional user-generated stubs must be added to the Makefile, as well as to userfunc.h. util.c - Includes code for getLibraryFunction(), which uses dlsym() to bypass the first implementation of a function that the dynamic linker finds. In the case of FIG, this is usually the instrumented stub version, so getLibraryFunction() returns a pointer to the standard implementation of the function in question. execve.c - Some programs, like MySQL and some version of Netscape, unset LD_PRELOAD in the environment that is passed to the main execution module of the program. This instrumented version of execve() catches all calls to the various versions of exec() and resets the FIG environment variables to their initial settings (as stored in /tmp/fig.conf) before allowing the call to exec() to complete normally. launcher.c - Includes code to run the "fig" wrapper program. Among other things, the wrapper sets up a region of shared memory and then spawns a separate logging daemon process, whose job is to monitor the log messages that are written by instrumented stubs to the shared memory region and dump them to a file. The purpose of this design was to provide better logging performance than the in-line approach used by the stand-alone version of FIG. In reality, the wrapper provided only a marginal peformance speedup. What mattered more than HOW we ran the logging was WHAT we logged -- cutting out the timing information from the log increases performance, as does disabling logging of certain frequently-called functions, such as malloc(). log.c - Code to implement either the "wrapper" approach to logging or the "in-line" version, depending on which is compiled in. prob.c - Implements the decision-making portion of the error injection process. Builds up data structures to represent the entries in control.out to each process, then consults these instructions upon each reference to an instrumented function to see whether an error should be injected and if so, which type of error. stubgen.awk - AWK script to auto-generate the FIG function stubs from the entries in func.desc. Errata - FIG still seems to have trouble logging and instrumenting calls made by daemon programs, especially those (like Apache) that spawn a child, which spawns another child, which becomes the daemon. For testing purposes, we can get around this in Apache by running it in single-process mode. Another multiprocessing program that cooperates less than optimally with FIG is MySQL. In general, the stand-alone version of FIG handles these programs better than the wrapper version. - See the in-code comments for other idiosyncrasies of our implementation. To do (ideas): - Modify the stand-alone version of FIG to pop up a separate, FIG-enabled shell window when the setup script is run. - Add support for time-based error injection triggers, other types of triggers.