C++ Exception Handling

The gory details of an implementation

Peter Edwards, Arista Networks.
peadar@arista.com
http://isainmdom.com/~peadar/eximpl

C++ Exceptions

Why?

"Happy path" code kept together
Error-handling is separated
Cleanup with RAII happens implicitly
Should be cheap/free if exceptions are not thrown

How?

setjmp/longjmp, as used by GCC where there's no other option
DWARF based, as used on supporting platforms, including Linux on x86 and amd64

Implementing Exceptions with `setjmp/longjmp`

Quick overview of semantics

call setjmp once, it can return twice
first returns 0
... but stores machine state (register set) in the jmp_buf
longjmp causes a non-local transfer of control - restores registers from jmp_buf, including stack pointer and program counter
Control resumes at call site to setjmp, but now with the return value as specified by the longjmp call
Can use this to direct code flow to exception path from longjmp non-local transfer
Any stack storage used is recovered, but that's it - no destructors called

Dealing with cleanup: Maintain a runtime-stack of clean-ups and catch blocks

try sets up an exception handling "frame" with a jmp_buf embedded in it
Functions can push cleanup handlers onto the "current" frame
throwing an exception runs cleanups, and eventually does longjmp

As implemented in FreeBSD 4.11 (c 2004)

Note the happy path overhead ->

As implemented in FreeBSD 4.11 (c 2004)

What's wrong with it?

Affects the happy path in two distinct ways:

For any function, like g, we need to keep track of what we need to clean up
For any catch block, we need to call setjmp, to get a copy of the machine state, and store it somewhere
a jmp_buf is 200 bytes on amd64, 156 on i386

Implementing Exceptions With DWARF

Within any function:

At a specific point in the code of a function, the set of things we need to clean up is known at compile time.
If we're in a try block, we know where the code for the associated catch block(s) are
Therefore, we can store the info at compile time in a table, keyed by program counter
This is the same data that any one function would have recorded with our SJLJ approach
Pretty much 0 cost for the happy path
May be quite expensive for the sad path - we'll look at what's involved

We store a table for each function in our code. Each table has entries containing:

The range of the instructions within the function that this entry pertains to
The address of the "landing pad", i.e., the code generated by the compiler to cleanup or handle an exception
"Action" information, that inform us about the type of the exception we can handle, or if we just need to run cleanup code.

The compiler associates the table with the function in the generated assembler as a "Language Specific Data Area" (LSDA), and also associates a "personality routine" with the function that can interpret that table (i.e., __gxx_personality_v0 - more later)

Compiler output foo.s Cost-free happy-path!

Source from foo.cc

"LSDA" table for g from foo.s

Stack Unwinding

Our table-based approach only tells us what to do for the local function. How do we find out where we are in g() when h() throws an exception?


	...
ESP	return-address back to g
ESP	"there"
ESP	"here"
ESP	return address back to f
ESP	arguments to g
ESP	local variables in f
ESP	return address back to main
ESP	arguments to f
ESP	main's local variables
ESP	return address back to C startup
ESP	argc
ESP	argv

Stack Unwinding: with frame pointers

Use a register to point to the current stack frame: the "frame pointer"

On function entry push frame pointer, copy stack pointer to frame pointer, so new frame pointer points to old frame pointer

i386: ENTER and LEAVE instructions; use ebp for FP.

		...
	ESP	return-address back to g
	ESP	"there"
	ESP	"here"
EBP	ESP	old EBP
EBP	ESP	return address back to f
EBP	ESP	arguments to g
EBP	ESP	local variables in f
EBP	ESP	old EBP
	ESP	return address back to main
	ESP	arguments to f
	ESP	main's local variables
EBP	ESP	old EBP
EBP	ESP	return address back to C startup
EBP	ESP	argc
EBP	ESP	argv
EBP	ESP	EBP somewhere down here

Output from modern 32-bit compiler for our sample

Stack Unwinding: with tables

As with SJLJ, we can build static tables, rather than doing work at runtime
Saves instructions, stack space, and a machine register
We have a table for each function, more or less
Each row of our table says how to calculate the content of a register back in the function that called us.
Includes two "special" registers/states
- Call Frame Address - Takes place of frame pointer; Generally the stack pointer for the previous frame
- Return Address Register (RAR) giving program counter in previous frame

Table Contents

Stack on entry to g:

	Local stack space for `g`
ESP->	return address into `f`
CFA ("previous frame ESP") ->	arguments into `g`

Addr	R0	R1	R2	R3(rbx)	...	CFA	RAR
`g`	-	-	-	-	...	ESP+8	*(CFA-8)
`g+1`	-	-	-	*(CFA -16)	...	ESP + 16	*(CFA - 8)
`g+5`	-	-	-	*(CFA -16)	...	ESP + 32	*(CFA - 8)
...
`g+59`	-	-	-	*(CFA -16)	...	ESP + 16	*(CFA - 8)
`g+60`	-	-	-	*(CFA -16)	...	ESP + 8	*(CFA - 8)

Getting the table in our Executable

Linux, and many other systems, use ELF to store binary images
ELF images contain "sections", describing program code and data that need to be loaded
Contain other sections that may or may not need to be loaded (linkage information, symbols, etc)
Debugging information is specified by another format called DWARF
The table above was originally specified for use by debuggers in the DWARF ".debug_frame" section
Modified slightly to ".eh_frame" for exception handling support (less info, with relative offsets)

Table representation

We don't represent the entire table row for each range of IP addresses - that'd be too big
.eh_frame contains Frame Description Entrys (FDEs) for each function in the object
A set of FDEs can share a Common Information Entry (CIE). This contains the "first row" for the FDEs, and some language specific data (the personality routine)
FDE implements a simple "virtual machine". From initial state in CIE, execute instructions that can:
- update the disposition of a register in the table
- update the disposition of the CFA
- advance the program counter

readelf --debug-dump=frames foo.o

Putting it all together

We now have enough information to handle an exception from throw to catch

From the CPUs program counter, locate the ELF image that provides the code
Within this ELF image, find the .eh_frame section
Find the FDE/function covering the current CPU program counter in .eh_frame
Initialize the function's state table from the CIE instructions, with virtual instruction pointer at start of function
Execute the FDE's instructions until the virtual instruction pointer reaches our CPU instruction pointer
This generates the state for the calling frame - including the program counter where control would have returned to it
Use this to find the LSDA information
Pass C++ runtime the LSDA and frame information to either cleanup, or catch the exception
We can repeat until we've found a handler for the exception

h calls __cxa_throw to actually raise the exception

__cxa_throw is part of the C++ runtime; stack unwinding is handled by _Unwind_RaiseException

First phase - Search - find the frame where execution will resume

Second phase - cleanup - handle the exception, cleaning up any stack resources.

finally, our personality routine: handle the C++ side

Questions?

If you found this at all interesting, read Ian Lance Taylor's blogposts on Linkers and Loaders
I found this series of posts by Nicolas Brailovsky in the process of preparing these slides: It goes over very similar ground in a lot of detail
The _Unwind* interface was originally documented for Itanium here
The .eh_frame format is an extension of what's specified by int the DWARF
Section 5.4 of Technical Report on C++ Performance deals with the overhead of exceptions
These slides are at http://isainmdom.com/~peadar/eximpl