C++ Exception Handling

The gory details of an implementation

Peter Edwards, Arista Networks.

C++ Exceptions



  • "Happy path" code kept together
  • Error-handling is separated
  • Cleanup with RAII happens implicitly
  • Should be cheap/free if exceptions are not thrown


  • setjmp/longjmp, as used by GCC where there's no other option
  • DWARF based, as used on supporting platforms, including Linux on x86 and amd64

Implementing Exceptions with setjmp/longjmp


Quick overview of semantics

  • call setjmp once, it can return twice
  • first returns 0
  • ... but stores machine state (register set) in the jmp_buf
  • longjmp causes a non-local transfer of control - restores registers from jmp_buf, including stack pointer and program counter
  • Control resumes at call site to setjmp, but now with the return value as specified by the longjmp call
  • Can use this to direct code flow to exception path from longjmp non-local transfer
  • Any stack storage used is recovered, but that's it - no destructors called


Dealing with cleanup: Maintain a runtime-stack of clean-ups and catch blocks

  • try sets up an exception handling "frame" with a jmp_buf embedded in it
  • Functions can push cleanup handlers onto the "current" frame
  • throwing an exception runs cleanups, and eventually does longjmp



As implemented in FreeBSD 4.11 (c 2004)


As implemented in FreeBSD 4.11 (c 2004)


Note the happy path overhead ->


As implemented in FreeBSD 4.11 (c 2004)


What's wrong with it?

Affects the happy path in two distinct ways:

  • For any function, like g, we need to keep track of what we need to clean up
  • For any catch block, we need to call setjmp, to get a copy of the machine state, and store it somewhere
  • a jmp_buf is 200 bytes on amd64, 156 on i386

Implementing Exceptions With DWARF

Within any function:
  • At a specific point in the code of a function, the set of things we need to clean up is known at compile time.
  • If we're in a try block, we know where the code for the associated catch block(s) are
  • Therefore, we can store the info at compile time in a table, keyed by program counter
  • This is the same data that any one function would have recorded with our SJLJ approach
  • Pretty much 0 cost for the happy path
  • May be quite expensive for the sad path - we'll look at what's involved
We store a table for each function in our code. Each table has entries containing:
  • The range of the instructions within the function that this entry pertains to
  • The address of the "landing pad", i.e., the code generated by the compiler to cleanup or handle an exception
  • "Action" information, that inform us about the type of the exception we can handle, or if we just need to run cleanup code.

The compiler associates the table with the function in the generated assembler as a "Language Specific Data Area" (LSDA), and also associates a "personality routine" with the function that can interpret that table (i.e., __gxx_personality_v0 - more later)

Compiler output foo.s Cost-free happy-path!
Source from foo.cc
"LSDA" table for g from foo.s

Stack Unwinding


Our table-based approach only tells us what to do for the local function. How do we find out where we are in g() when h() throws an exception?

ESPreturn-address back to g
ESPreturn address back to f
ESParguments to g
ESPlocal variables in f
ESPreturn address back to main
ESParguments to f
ESPmain's local variables
ESPreturn address back to C startup

Stack Unwinding: with frame pointers


Use a register to point to the current stack frame: the "frame pointer"

On function entry push frame pointer, copy stack pointer to frame pointer, so new frame pointer points to old frame pointer

i386: ENTER and LEAVE instructions; use ebp for FP.

ESPreturn-address back to g
EBPESPreturn address back to f
EBPESParguments to g
EBPESPlocal variables in f
ESPreturn address back to main
ESParguments to f
ESPmain's local variables
EBPESPreturn address back to C startup
EBPESPEBP somewhere down here

Output from modern 32-bit compiler for our sample


Stack Unwinding: with tables

  • As with SJLJ, we can build static tables, rather than doing work at runtime
  • Saves instructions, stack space, and a machine register
  • We have a table for each function, more or less
  • Each row of our table says how to calculate the content of a register back in the function that called us.
  • Includes two "special" registers/states
    • Call Frame Address - Takes place of frame pointer; Generally the stack pointer for the previous frame
    • Return Address Register (RAR) giving program counter in previous frame

Table Contents

Stack on entry to g:

Local stack space for g
ESP-> return address into f
CFA ("previous frame ESP") -> arguments into g

Addr R0 R1 R2 R3(rbx) ... CFA RAR
g - - - - ... ESP+8 *(CFA-8)
g+1 - - - *(CFA -16) ... ESP + 16 *(CFA - 8)
g+5 - - - *(CFA -16) ... ESP + 32 *(CFA - 8)
g+59 - - - *(CFA -16) ... ESP + 16 *(CFA - 8)
g+60 - - - *(CFA -16) ... ESP + 8 *(CFA - 8)
Getting the table in our Executable
  • Linux, and many other systems, use ELF to store binary images
  • ELF images contain "sections", describing program code and data that need to be loaded
  • Contain other sections that may or may not need to be loaded (linkage information, symbols, etc)
  • Debugging information is specified by another format called DWARF
  • The table above was originally specified for use by debuggers in the DWARF ".debug_frame" section
  • Modified slightly to ".eh_frame" for exception handling support (less info, with relative offsets)
Table representation
  • We don't represent the entire table row for each range of IP addresses - that'd be too big
  • .eh_frame contains Frame Description Entrys (FDEs) for each function in the object
  • A set of FDEs can share a Common Information Entry (CIE). This contains the "first row" for the FDEs, and some language specific data (the personality routine)
  • FDE implements a simple "virtual machine". From initial state in CIE, execute instructions that can:
    • update the disposition of a register in the table
    • update the disposition of the CFA
    • advance the program counter

Putting it all together

We now have enough information to handle an exception from throw to catch
  • From the CPUs program counter, locate the ELF image that provides the code
  • Within this ELF image, find the .eh_frame section
  • Find the FDE/function covering the current CPU program counter in .eh_frame
  • Initialize the function's state table from the CIE instructions, with virtual instruction pointer at start of function
  • Execute the FDE's instructions until the virtual instruction pointer reaches our CPU instruction pointer
  • This generates the state for the calling frame - including the program counter where control would have returned to it
  • Use this to find the LSDA information
  • Pass C++ runtime the LSDA and frame information to either cleanup, or catch the exception
  • We can repeat until we've found a handler for the exception

h calls __cxa_throw to actually raise the exception


__cxa_throw is part of the C++ runtime; stack unwinding is handled by _Unwind_RaiseException


First phase - Search - find the frame where execution will resume


Second phase - cleanup - handle the exception, cleaning up any stack resources.


finally, our personality routine: handle the C++ side


  • If you found this at all interesting, read Ian Lance Taylor's blogposts on Linkers and Loaders
  • I found this series of posts by Nicolas Brailovsky in the process of preparing these slides: It goes over very similar ground in a lot of detail
  • The _Unwind* interface was originally documented for Itanium here
  • The .eh_frame format is an extension of what's specified by int the DWARF
  • Section 5.4 of Technical Report on C++ Performance deals with the overhead of exceptions
  • These slides are at http://isainmdom.com/~peadar/eximpl